Intelligence is foundation
Podcast Subscribe
Builders & Makers Sunday, 22 February 2026

An AI agent that actually operates a browser

Share: LinkedIn
An AI agent that actually operates a browser

Xiaona isn't a chatbot with API access. It's an autonomous AI agent that created real accounts on GitHub, Twitter, and Dev.to by operating an actual browser - solving CAPTCHAs, handling form validation, navigating signup flows, and evading anti-bot detection.

This isn't theoretical. The write-up details real browser fingerprints, multi-tool orchestration, and what true autonomy means when an agent interacts with the web like a human would.

The technical challenge

Most "AI agents" are wrappers around APIs. They call endpoints, pass parameters, handle responses. Useful, but not autonomous in any meaningful sense.

True browser operation means dealing with the web as it actually exists. JavaScript-heavy pages. Form validation that fires on blur events. CAPTCHAs designed to detect automation. Rate limiting. Session management. The messy reality of sites built to serve humans, not bots.

Xiaona operates a real browser using Playwright - the same automation framework developers use for testing. But instead of following a pre-scripted path, it uses vision models to interpret what's on screen and LLMs to decide what to do next.

How it navigates signup flows

Take GitHub account creation. The agent has to navigate to the signup page, fill in username and email fields, solve a CAPTCHA, verify the email, and complete profile setup. Each step involves decision-making - which field comes next, what values are valid, when to wait for async validation.

The agent uses multi-tool orchestration. Vision models identify UI elements. Text models generate appropriate input. Navigation logic handles page transitions. Error handling recovers from validation failures.

CAPTCHAs are the obvious obstacle. Xiaona integrates CAPTCHA-solving services - commercial tools that use human labour or advanced ML to solve challenges. Not elegant, but pragmatic. The agent recognises when a CAPTCHA appears, routes the challenge to a solver, and applies the solution.

Browser fingerprinting and anti-bot evasion

Modern sites detect automation through browser fingerprinting. Canvas rendering, WebGL capabilities, font lists, timezone, screen resolution - dozens of signals that distinguish real browsers from headless automation.

Xiaona generates realistic browser fingerprints. It randomises headers, mimics human timing patterns, and uses residential proxies to avoid IP-based blocking. The goal isn't to deceive maliciously - it's to operate in environments designed to reject automation entirely.

What true autonomy looks like

The key insight: autonomy isn't about completing a single task perfectly. It's about handling the unexpected. A form field that wasn't there yesterday. A CAPTCHA that appears mid-flow. A validation error with unclear messaging.

The write-up is refreshingly honest about limitations. The agent doesn't succeed every time. Some sites are too aggressive in their bot detection. Some flows are too complex. But the success rate is high enough to be useful - and improving as the models improve.

Why this matters

Browser-operating agents unlock automation for the long tail of web services that don't offer APIs. Personal finance tools. E-commerce sites. Legacy enterprise systems. Anywhere human labour is spent clicking through UIs could be a target for this kind of automation.

The ethical questions are obvious. Automating account creation can enable spam, fraud, or abuse. The builder acknowledges this - and argues the technology itself is neutral. How it's deployed matters.

What stands out is the engineering honesty. This isn't a polished demo. It's a working system with real constraints, documented thoroughly enough that someone could reproduce it.

For builders watching this space: browser-operating agents are no longer research projects. They're buildable, deployable, and increasingly capable. The infrastructure exists. The models are good enough. What happens next depends on what people choose to build.

More Featured Insights

Robotics & Automation
When drones became an infrastructure problem
Voices & Thought Leaders
Azeem Azhar on agents and the tedium frontier

Video Sources

Theo (t3.gg)
We need to talk about Sonnet 4.6
Fireship
How AI is breaking the SaaS business model
ArjanCodes
CQRS in Python: Clean Reads, Clean Writes
ArjanCodes
Stop Building Ugly APIs: Use the Fluent Interface Pattern
NVIDIA Robotics
What's Next in Robotics?
Boston Dynamics YouTube
ST Engineering MRAS | Boston Dynamics
Boston Dynamics YouTube
Atlas Airborne: Boston Dynamics Research Platform Final Push
Boston Dynamics YouTube
Spot at Cargill: AI-Enhanced Industrial Inspection
Dwarkesh Patel
The AI Coding Prediction Everyone Got Wrong - Dario Amodei
Theo (t3.gg)
Gemini 3.1 Pro is the smartest model ever made
Two Minute Papers
Adobe & NVIDIA: 10,000,000 Sparkles At 280 FPS
Matthew Berman
Can normal people use OpenClaw?
Ania Kubów
The three paths AI could take from here - Shawn Wang SWYX interview
Andrej Karpathy
The Internet is 50% Fake. I Built a Detector.

Today's Sources

DEV.to AI
How I Built an Autonomous AI Agent That Browses the Web
DEV.to AI
🚨 AI Will Not Replace Developers - Lazy Developers Will Be Replaced
DEV.to AI
Seedance 2.0 @Tags: How to Direct AI Videos with Multimodal References
Hacker News Best
How I use Claude Code: Separation of planning and execution
Hacker News Best
Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU
Towards Data Science
Architecting GPUaaS for Enterprise AI On-Prem
DEV.to AI
Your First Week With an AI Coding Agent: A Realistic Timeline
DEV.to AI
The Real Cost of Running AI Coding Agents
DEV.to AI
Qwen 3.5 Review: 397B Open-Weight AI Model vs GPT-5.2, Claude, Gemini
DEV.to AI
The Hustle Loop
DEV.to AI
The Distribution Wall
DEV.to AI
Killing Your Darlings
ML Mastery
Agentify Your App with GitHub Copilot's Agentic Coding SDK
Replit Blog
Ship Enterprise Data Apps Faster with Replit and Databricks
DEV.to AI
LLMs Are Not Deterministic: Why Reliable AI Is Expensive
DEV.to AI
I Love Vibe Coding. I Don't Trust It.
DEV.to AI
I Built a Tiny MCP That Understands Your Code and Saves 70% Tokens
PyImageSearch
TF-IDF vs. Embeddings: From Keywords to Semantic Search
The Robot Report
Visual drone detection moves into critical infrastructure playbooks
The Robot Report
Amazon Robotics shuts down Blue Jay sortation project
Hackaday Robotics
Love Complex Automata? Don't Miss The Archer
Robohub
Robot Talk Episode 145 - Robotics and automation in manufacturing, with Agata Suwala
ROS Discourse
Agent ROS Bridge - Universal LLM-to-ROS bridge with auto-generated types
ROS Discourse
Canonical Observability Stack Tryout | Cloud Robotics WG Meeting 2026-02-25
The Robot Report
Toyota Motor Manufacturing Canada to deploy Agility Robotics' Digit humanoids
Robohub
Reversible, detachable robotic hand redefines dexterity
The Robot Report
Integrated motion control enables sophisticated robot motion
Robohub
Robot, make me a chair
Nvidia Robotics Blog
NVIDIA and Global Industrial Software Leaders Partner With India's Largest Manufacturers
Hackaday Robotics
Reverse Engineering a Dash Robot with Ghidra
Hackaday Robotics
R2D2 Gets New Brains
Hackaday Robotics
3D Printing Pneumatic Channels With Dual Materials for Soft Robots
Azeem Azhar
Exponential View #562: Agents & the tedium frontier; AI in the statistics; robot insurance; Claude at war, hacking pigeons, AI dignosis++
Azeem Azhar
Entering the trillion-agent economy
Benedict Evans
How will OpenAI compete?
Latent Space
Bitter Lessons in Venture vs Growth: a16z on AI Capital Flywheel
Latent Space
Anthropic's Agent Autonomy study
Ethan Mollick
A Guide to Which AI to Use in the Agentic Era
Gary Marcus
Rumors of AGI's arrival have been greatly exaggerated
Jack Clark Import AI
Import AI 445: Timing superintelligence; AIs solve frontier math proofs; a new ML research benchmark
Ben Thompson Stratechery
Shopify Earnings, Shopify's AI Advantages
Azeem Azhar
Data to start your week
DEV.to AI
LLMs Are Not Deterministic: Why Reliable AI Is Expensive
DEV.to AI
I Love Vibe Coding. I Don't Trust It.
Andrej Karpathy
Microgpt: Understanding LLMs in 200 Lines
Digital Native
5,127 Layers: The Gap Between Capability and Adoption
Lex Fridman Podcast
OpenClaw: The Viral AI Agent Framework That Broke the Internet
Addy Osmani
14 More Lessons from 14 Years at Google

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Free Daily Briefing

Start Every Morning Smarter

Luma curates the most important AI, quantum, and tech developments into a 5-minute morning briefing. Free, daily, no spam.

  • 8:00 AM Morning digest ready to listen
  • 1:00 PM Afternoon edition catches what you missed
  • 8:00 PM Daily roundup lands in your inbox

We respect your inbox. Unsubscribe anytime. Privacy Policy

© 2026 MEM Digital Ltd t/a Marbl Codes
About Sources Podcast Audio Privacy Cookies Terms Thou Art That
RSS Feed