Anthropic Computer use : Prepare for change

Published on October 23, 2024|Artificial Intelligence

The world of artificial intelligence is in turmoil. Again. This week’s announcements on agentics technologies go on and on. After OpenAI’s Swarm and Microsoft’s Copilot autonomous agents, Anthropic has just announced Computer Use.

We’re witnessing something fundamental: The emergence of AI agents capable of interacting directly with our computers.

Listen to the podcast :

Computer Use allows AI to interact directly with an operating system as a human would, by observing the screen via screenshots, moving the cursor and typing on the keyboard;

This will automate many tasks.

The great disruption: from conversational assistant to active agent

The old world: chatbots chatty but powerless

Let’s face it: until now, our interactions with AIs, impressive as they are, have been a bit like a conversation with a genie locked in his lamp.

An admittedly brilliant genius, capable of advising us, explaining us, inspiring us… but incapable of taking concrete action on our digital world.

The new paradigm: welcome to the era of “Computer Use”

With the advent of technologies such as “Computer Use”, AIs will cross a new frontier

Practical short-term changes:

Autonomous web navigation
Direct file manipulation
Real-time data search and analysis
Using applications and tools

How Computer Use works

One of Claude’s major innovations is his ability to “see” and interpret the computer screenr throughscreenshots.

This technology is therefore based on a mechanism of image analysis and pixel counting, enabling Claude to precisely locate visual elements on a screen and execute actions such as moving the cursor or entering text.

Here’s an overview of the process:

Screen access: When Claude is assigned a task, he begins by capturing an image of the target software’s user interface.
Analyzing screenshots: From these captures, Claude is able to identify specific elements such as buttons, input fields or drop-down menus.
Pixel counting: Claude uses pixel counting to determine where to move the cursor on the screen, acting with impressive precision.
Executing actions: Once the target has been identified, Claude can click, enter information, or interact with menus autonomously. This feedback loop continues until the task is completed.

The process strikingly mimics human interaction with computers: we look at a screen, identify an item, then use a mouse or keyboard to interact.

Current limitations

Although this is an impressive advance, it’s still in the experimental phase.

Claude demonstrates solid performance in interpreting screens and executing actions, but encounters challenges in certain aspects:

Complex actions

Movements such as scrolling, dragging or zooming, which are simple for a human user, still pose problems for Claude.
This limits some of his abilities to interact with more dynamic interfaces or continuous video streams.

A fragmented vision

Claude doesn’t perceive the screen continuously, but through successive screenshots. This can lead to misfires in short actions or notifications that quickly disappear, making it more difficult for AI in fast-paced environments.

Occasional errors

Like any developing technology, Claude can still make mistakes. For example, he might click on an unwanted button or misinterpret a visual element.

These limitations do not, however, diminish the importance of innovation.

Anthropic is actively working to improve performance and reduce these shortcomings, not least by gathering user feedback via the public beta.

Implications for digital professionals

1. For marketing professionals

Digital marketing is entering a new era.

Imagine an AI agent capable of:

Analyze your analytics in real time
Tune your advertising campaigns
Generate and publish optimized content
Automatically test different approaches

Spoiler alert: your job is not threatened, it is evolving towards more strategy and less execution.

2. For creative people

Creatives will be able to delegate repetitive tasks to focus on the vision:

Automation of basic retouching (with technos like Google RF-inversion)
Intelligent asset organization
Enhanced visual search

3. For project managers

Project management becomes more fluid with agents able to:

Follow progress in real time
Alerter about potential delays
Generate automatic reports
Orchestrate complex workflows

The issues and challenges for Anthropic

Security and control

Let’s face it: giving an AI access to our systems raises legitimate questions:

One of the flaws already identified by Anthropic is the risk of injecting malicious instructions (“prompt injection”).

Anthropic has taken steps to mitigate these risks by developing classifiers capable of spotting such abuse.

In addition, certain sensitive actions are explicitly blocked, preventing Claude from acting on critical or potentially dangerous functions.

How to prepare for this revolution

Skills to develop

To ride this wave rather than endure it, focus on:

Systems thinking
Advanced prompt engineering
Supervision of AI agents
Ethics applied to AI

Tools to master

An ecosystem is being set up, with:

Agent management platforms
Control frameworks
Monitoring tools
Dedicated programming interfaces

Changes to come

Anthropic’s ambition for Claude goes beyond basic interaction with existing software.

Their long-term vision is to enable AI to interact with any software, as fluidly and intuitively as a human user.

This means that Claude could eventually be able to automate complex tasks involving multiple applications, or even entire environments.

The potential applications for this capability are vast:

Automation of repetitive processes: Claude could automate complex administrative tasks, such as database management, report creation or e-mail management, significantly lightening employee workloads.
Software development: With the right level of training, Claude could also be involved in creative processes such as software development and testing, interacting directly with programming tools.
Open and creative tasks: Claude could be used for open research, exploring large datasets and generating reports or analyses based on the results.

The near future (2024-2025)

Democratization of the first consumer agents
Standardization of security protocols
Emerging concrete B2B use cases

The medium term (2025-2027)

Multi-modal agents
Inter-agent collaboration
Complex workflow automation

Open-ended questions

What place for the human in this new deal?
How will our relationship with technology evolve?
What will be the killer apps of this revolution?

Computer Use isn’t just a new AI feature. It’s a paradigm shift that will redefine our relationship with technology. Professionals have a unique opportunity to shape this revolution rather than suffer it.

Like any major transformation, it brings its share of promises and challenges. The key will be to find the right balance between innovation and caution, automation and human control, efficiency and ethics.

One thing’s for sure: we’re living in an exciting moment in tech history

And how do you see the future of Computer Use in your field? Share your thoughts in the comments!

FAQ

1. What is Claude’s computer usage functionality?

Claude is now able to interact directly with software via screenshots, mimicking human interaction with a computer.

2. How does Claude interact with a computer?

Claude captures images from the screen, analyzes them, and uses a pixel-counting mechanism to move the cursor and click on elements.

3. What are the current limitations?

Claude still has difficulty with actions like scrolling or zooming, and his screenshot-based vision can cause him to miss quick actions.

4. How does Anthropic address the security risks associated with this technology?

Anthropic uses classifiers to spot abuse and has implemented restrictions on sensitive actions.

5. What are the potential applications of this technology?

Claude could automate administrative tasks, participate in software development or carry out complex research involving several software tools.

6. How does Claude compare with other AIs on OSWorld?

Claude scored significantly higher than other AI models in evaluations based on screenshots.

7. Is Claude’s ability to use a computer already available?

Yes, this feature is in public beta, and developers can test it to provide feedback.

8. What are the next steps for this feature?

Anthropic plans to improve speed, reliability and support for more complex actions in future updates.

9. What is Anthropic’s long-term goal with Claude?

Anthropic aims to enable Claude to interact with any type of software, as fluidly as a human user.

10. Can Claude already automate complex processes?

At present, Claude is limited to relatively simple tasks, but its potential to automate complex processes is constantly evolving.

AI NEWSLETTER

Stay on top of AI with our Newsletter

Every month, AI news and our latest articles, delivered straight to your inbox.

Leave a Comment Cancel Reply

Chatgpt prompt guide

CHATGPT prompt guide (EDITION 2024)

Download our free PDF guide to crafting effective prompts with ChatGPT.

Designed for beginners, it provides you with the knowledge needed to structure your prompts and boost your productivity

With this ebook, you will:

✔ Master Best Practices

Understand how to structure your queries to get clear and precise answers.

✔ Create Effective Prompts

The rules for formulating your questions to receive the best possible responses.

✔ Boost Your Productivity

Simplify your daily tasks by leveraging ChatGPT’s features.

Similar posts

Swarm openai's open source framework for multi agent ai

Swarm : OpenAI’s open-source framework for multi-agent AI

October 19, 2024

The vision of artificial intelligence collaborating fluidly within complex systems is becoming a reality. Swarm, the latest from OpenAI, is an open-source framework that catalyzes this innovation. Designed to orchestrate …

Microsoft announces Copilot autonomous agents: automation on a grand scale ?

Microsoft announces Copilot autonomous agents: automation on a grand scale ?

October 22, 2024

Microsoft has just made a shattering announcement about the integration of autonomous agents into its Copilot ecosystem. These autonomous agents, which are supposed to transform the way we work with …

Microsoft AutoGen : Multi-Agent AI explained

Microsoft AutoGen : Multi-Agent AI explained

October 12, 2023

What is AutoGen? AutoGen isn’t just another tool in Microsoft’s technological arsenal. It’s a revolution in large-scale language modeling. At first glance, AutoGen may seem like just another framework for …