AI agents are not quite there yet (but they are close)

I used the new best AI web agent to apply for a LinkedIn job and here is the result - See video below. Spoiler, it couldn't complete the task.

Recently, a new SOTA (State Of The Art) open source model was released, called Browser Use. Browser Use achieved an 89% success rate on the WebVoyager benchmark, which was designed to evaluate AI agents' abilities to navigate and interact with real-world websites.

After trying it for a simple task which is applying for a job, the results are a bit disappointing:

It struggled when a scrollable element appeared on the screen, entering an infinite loop
Too slow to be useful (video is speed up x4), it took around 5 minutes to try to complete the task, and it couldn't
Too expensive, around 30 cents per task 🤯 (Although this may vary depending the task)

The fact that the model achieved an 89% on WebVoyager but still struggled with a simple task highlights the necessity of decentralized benchmarks where everyone can upload and test their own tasks (See Agent2Bench). It is especially funny because in the Browser Use repo they show an example using the agent in Linkedin.

HOWEVER, this is just the beginning. Better models are being released every week, and I'm confident that computer agents will have improved drastically by the end of the year, mainly because more and more AI labs are focusing on this task. I really think this is the BIG next step in the AI field, solving autonomous computer using would have an immense impact on society.

Follow me for more news and real life applications from the AI world