OpenAI just launched 'Agent Mode' for its Atlas browser, giving ChatGPT the ability to browse the web and complete tasks on your behalf. This is a practical first step toward AI agents that can actually do things instead of just finding information.

But before we get too excited, a new benchmark found that even top agents fail 97% of real-world freelance jobs. The gap between a simple command and a complex project is still massive, showing exactly where human oversight is still required.

Topics of the day:

  • OpenAI’s new agent that can browse the web

  • Why AI agents still fail 97% of real-world tasks

  • How companies are finally getting positive AI ROI

  • New AI apps speeds up writing using your voice

  • The Shortlist: Perplexity’s new research and flight tracking tools, Nvidia’s $1B bet on AI coding startup Poolside, Invisible’s $100M raise for human-in-the-loop training, and Amazon’s clarification on its 14,000 layoffs.

OpenAI gives ChatGPT a browser and a to-do list

What’s happening: OpenAI just launched 'Agent Mode' for its Atlas browser, giving ChatGPT the ability to browse the web and complete tasks on your behalf.

In practice:

  • Automate tedious research by asking your agent to compile competitor pricing, summarize customer reviews, or find potential sales leads from public sources.

  • Offload administrative tasks like booking travel, scheduling appointments, or ordering office supplies, freeing up your team’s time for more strategic work.

  • You can quickly test new market ideas by having the agent research niche audiences or analyze industry trends without manual effort.

Bottom line: This is a practical first step toward AI agents that do things instead of just finding things. For professionals, it's a glimpse into a future where you can delegate entire digital workflows to an assistant.

AI Agents Fall Short on Real-World Tasks

What’s happening: A new benchmark from Scale AI found that even the most advanced AI agents can only successfully complete 2–3% of real freelance jobs, pouring some cold water on the idea of full automation.

In practice:

  • AI excels at discrete generative tasks like creating a logo from a prompt, but struggles with complex projects that require editing, feedback, and multiple steps.

  • This highlights the continued need for human oversight, as agents can’t yet handle the ambiguity or judgment calls required in most professional client work.

  • Adopting agents isn’t a magic bullet for costs, as they introduce new operational overhead like managing rate limits, rework, and security reviews.

Bottom line: The gap between AI hype and reality is still wide when it comes to replacing entire roles. The immediate opportunity is in using AI to assist skilled professionals, not to automate them away.

But.. study shows Businesses are seeing real ROI from AI

What's happening: A new Wharton report shows AI in business has moved past the hype phase, with nearly three-quarters of companies that measure AI ROI now reporting positive returns.

In practice:

  • The conversation has shifted from adoption to accountability, so focus your AI projects on measurable productivity gains or incremental profit.

  • Start with low-hanging fruit where AI excels, like summarizing meetings, analyzing data, and creating first-draft marketing content.

  • Your biggest bottleneck isn't the tech, it's training your team, so invest in upskilling to turn AI usage into an advantage.

Bottom line: The era of casual AI experimentation is winding down. Your competitors are now tracking returns and building durable advantages with this tech.

New AI tools let you dictate anywhere

What’s happening: A group of new apps, including Aqua Voice (I was a customer), Monologue (I am a customer), Wispr Flow and others lets you use your voice to type in any text field on your Mac or PC, which is perfect for speeding up daily writing tasks.

In practice:

  • Use it to bypass the keyboard and get first drafts done in a fraction of the time it normally takes to type.

  • Speed up routine work like clearing your inbox, drafting documents, or leaving detailed feedback in Slack or Asana.

  • They use AI to remove filler words and rambling automatically, so it polishes your thoughts into clean text as you speak.

Bottom line: This is a simple, practical use case for AI tool that closes the gap between thinking and writing. If you think faster than you type, this is an easy way to automate a slow part of your day.

The Shortlist

Perplexity introduced a new patent research agent and a commercial flight tracker, adding specialized tools on top of its core AI search engine for professional use cases.

Nvidia plans to invest up to $1B in Poolside, a startup building AI tools for software development, signaling big bets on AI-native coding assistants.

Invisible Technologies raised $100M to expand its data labeling and human-in-the-loop training services that power major models, highlighting the critical need for quality data to make AI work.

Amazon clarified its recent 14,000 layoffs were due to organizational structure, not AI, pushing back on the narrative of widespread AI-driven job displacement.

Why this newsletter?

This newsletter is where I (Kwadwo) share products, articles, and links that I find useful and interesting, mostly around AI. I focus on tools and solutions that bring real value to people in everyday jobs, not just tech insiders.

Please share any feedback you have either in an answer or through the poll below 🙏🏽

Another great newsletter

Learn Prompting's Newsletter

Learn Prompting's Newsletter

Get the latest AI news, prompts, and tools... in 3 minutes or less!

Keep Reading

No posts found