The world of artificial intelligence is in turmoil. Again. This week’s announcements on agentics technologies go on and on. After OpenAI’s Swarm and Microsoft’s Copilot autonomous agents, Anthropic has just announced Computer Use.
We’re witnessing something fundamental: The emergence of AI agents capable of interacting directly with our computers.
Listen to the podcast :
Computer Use allows AI to interact directly with an operating system as a human would, by observing the screen via screenshots, moving the cursor and typing on the keyboard;
This will automate many tasks.
The great disruption: from conversational assistant to active agent
The old world: chatbots chatty but powerless
Let’s face it: until now, our interactions with AIs, impressive as they are, have been a bit like a conversation with a genie locked in his lamp.
An admittedly brilliant genius, capable of advising us, explaining us, inspiring us… but incapable of taking concrete action on our digital world.
The new paradigm: welcome to the era of “Computer Use”
With the advent of technologies such as “Computer Use”, AIs will cross a new frontier
Practical short-term changes:
- Autonomous web navigation
- Direct file manipulation
- Real-time data search and analysis
- Using applications and tools
How Computer Use works
One of Claude’s major innovations is his ability to “see” and interpret the computer screenr throughscreenshots.
This technology is therefore based on a mechanism of image analysis and pixel counting, enabling Claude to precisely locate visual elements on a screen and execute actions such as moving the cursor or entering text.
Here’s an overview of the process:
- Screen access: When Claude is assigned a task, he begins by capturing an image of the target software’s user interface.
- Analyzing screenshots: From these captures, Claude is able to identify specific elements such as buttons, input fields or drop-down menus.
- Pixel counting: Claude uses pixel counting to determine where to move the cursor on the screen, acting with impressive precision.
- Executing actions: Once the target has been identified, Claude can click, enter information, or interact with menus autonomously. This feedback loop continues until the task is completed.
The process strikingly mimics human interaction with computers: we look at a screen, identify an item, then use a mouse or keyboard to interact.
Current limitations
Although this is an impressive advance, it’s still in the experimental phase.
Claude demonstrates solid performance in interpreting screens and executing actions, but encounters challenges in certain aspects:
Complex actions
Movements such as scrolling, dragging or zooming, which are simple for a human user, still pose problems for Claude.
This limits some of his abilities to interact with more dynamic interfaces or continuous video streams.
A fragmented vision
Claude doesn’t perceive the screen continuously, but through successive screenshots. This can lead to misfires in short actions or notifications that quickly disappear, making it more difficult for AI in fast-paced environments.
Occasional errors
Like any developing technology, Claude can still make mistakes. For example, he might click on an unwanted button or misinterpret a visual element.
These limitations do not, however, diminish the importance of innovation.
Anthropic is actively working to improve performance and reduce these shortcomings, not least by gathering user feedback via the public beta.
Implications for digital professionals
1. For marketing professionals
Digital marketing is entering a new era.
Imagine an AI agent capable of:
- Analyze your analytics in real time
- Tune your advertising campaigns
- Generate and publish optimized content
- Automatically test different approaches
Spoiler alert: your job is not threatened, it is evolving towards more strategy and less execution.
2. For creative people
Creatives will be able to delegate repetitive tasks to focus on the vision:
- Automation of basic retouching (with technos like Google RF-inversion)
- Intelligent asset organization
- Enhanced visual search
3. For project managers
Project management becomes more fluid with agents able to:
- Follow progress in real time
- Alerter about potential delays
- Generate automatic reports
- Orchestrate complex workflows
The issues and challenges for Anthropic
Security and control
Let’s face it: giving an AI access to our systems raises legitimate questions:
One of the flaws already identified by Anthropic is the risk of injecting malicious instructions (“prompt injection”).
Anthropic has taken steps to mitigate these risks by developing classifiers capable of spotting such abuse.
In addition, certain sensitive actions are explicitly blocked, preventing Claude from acting on critical or potentially dangerous functions.
How to prepare for this revolution
Skills to develop
To ride this wave rather than endure it, focus on:
- Systems thinking
- Advanced prompt engineering
- Supervision of AI agents
- Ethics applied to AI
Tools to master
An ecosystem is being set up, with:
- Agent management platforms
- Control frameworks
- Monitoring tools
- Dedicated programming interfaces
Changes to come
Anthropic’s ambition for Claude goes beyond basic interaction with existing software.
Their long-term vision is to enable AI to interact with any software, as fluidly and intuitively as a human user.
This means that Claude could eventually be able to automate complex tasks involving multiple applications, or even entire environments.
The potential applications for this capability are vast:
- Automation of repetitive processes: Claude could automate complex administrative tasks, such as database management, report creation or e-mail management, significantly lightening employee workloads.
- Software development: With the right level of training, Claude could also be involved in creative processes such as software development and testing, interacting directly with programming tools.
- Open and creative tasks: Claude could be used for open research, exploring large datasets and generating reports or analyses based on the results.
The near future (2024-2025)
- Democratization of the first consumer agents
- Standardization of security protocols
- Emerging concrete B2B use cases
The medium term (2025-2027)
- Multi-modal agents
- Inter-agent collaboration
- Complex workflow automation
Open-ended questions
- What place for the human in this new deal?
- How will our relationship with technology evolve?
- What will be the killer apps of this revolution?
Computer Use isn’t just a new AI feature. It’s a paradigm shift that will redefine our relationship with technology. Professionals have a unique opportunity to shape this revolution rather than suffer it.
Like any major transformation, it brings its share of promises and challenges. The key will be to find the right balance between innovation and caution, automation and human control, efficiency and ethics.
One thing’s for sure: we’re living in an exciting moment in tech history
And how do you see the future of Computer Use in your field? Share your thoughts in the comments!
FAQ
1. What is Claude’s computer usage functionality?
Claude is now able to interact directly with software via screenshots, mimicking human interaction with a computer.
2. How does Claude interact with a computer?
Claude captures images from the screen, analyzes them, and uses a pixel-counting mechanism to move the cursor and click on elements.
3. What are the current limitations?
Claude still has difficulty with actions like scrolling or zooming, and his screenshot-based vision can cause him to miss quick actions.
4. How does Anthropic address the security risks associated with this technology?
Anthropic uses classifiers to spot abuse and has implemented restrictions on sensitive actions.
5. What are the potential applications of this technology?
Claude could automate administrative tasks, participate in software development or carry out complex research involving several software tools.
6. How does Claude compare with other AIs on OSWorld?
Claude scored significantly higher than other AI models in evaluations based on screenshots.
7. Is Claude’s ability to use a computer already available?
Yes, this feature is in public beta, and developers can test it to provide feedback.
8. What are the next steps for this feature?
Anthropic plans to improve speed, reliability and support for more complex actions in future updates.
9. What is Anthropic’s long-term goal with Claude?
Anthropic aims to enable Claude to interact with any type of software, as fluidly as a human user.
10. Can Claude already automate complex processes?
At present, Claude is limited to relatively simple tasks, but its potential to automate complex processes is constantly evolving.
AI NEWSLETTER
Stay on top of AI with our Newsletter
Every month, AI news and our latest articles, delivered straight to your inbox.
CHATGPT prompt guide (EDITION 2024)
Download our free PDF guide to crafting effective prompts with ChatGPT.
Designed for beginners, it provides you with the knowledge needed to structure your prompts and boost your productivity
With this ebook, you will:
✔ Master Best Practices
Understand how to structure your queries to get clear and precise answers.
✔ Create Effective Prompts
The rules for formulating your questions to receive the best possible responses.
✔ Boost Your Productivity
Simplify your daily tasks by leveraging ChatGPT’s features.
Similar posts
Swarm : OpenAI’s open-source framework for multi-agent AI
The vision of artificial intelligence collaborating fluidly within complex systems is becoming a reality. Swarm, the latest from OpenAI, is an open-source framework that catalyzes this innovation. Designed to orchestrate …
Microsoft announces Copilot autonomous agents: automation on a grand scale ?
Microsoft has just made a shattering announcement about the integration of autonomous agents into its Copilot ecosystem. These autonomous agents, which are supposed to transform the way we work with …
Microsoft AutoGen : Multi-Agent AI explained
What is AutoGen? AutoGen isn’t just another tool in Microsoft’s technological arsenal. It’s a revolution in large-scale language modeling. At first glance, AutoGen may seem like just another framework for …