Agent Development Kit ADK

A vision-first framework for building AI agents that automate real desktop work—reliable, modular, and ready for production.

See Demo

Explore ADK principles

We built an ADK so you can focus on perfecting the Agent

Our ADK is built on the following principles:

Atomicity

Atomicity is a key tenet of our ADK. This implies that any workflow can be decomposed into small, indivisible steps. A "step" in this paradigm is:

Action : Click, keystroke, scroll, drag, etc.
Decision: Processing information of current screen/state and picking an outcome
Human Review: Seeking approval or input from a human

LaunchAI ADK comes integrated with a workflow builder tool. Essentially the tool allows you to convert any process currently done by a human into atomic steps, which can be traversed and executed by an AI Agent with reliable precision and repeated without any ambiguity.

Atomic step graph showing action, decision and review nodes connected in a workflow

Universality

There are numerous Agent building tools. But most repetitive, time-consuming, and frankly boring office work still remains on a human employee's plate.

ADK hub connecting to API, MCP, VLI and Human integration paths

In order to truly replace a workflow from start to finish (without human assistance), an AI Agent must have full human-like capabilities. The founding ideology of Launch AI ADK is to facilitate human-like capabilities to AI Agents, so they can operate on any system and mimic any human task. Since every use case is different: ADK gives you the freedom to choose. Features are at your disposal, apply them wisely.

Use an API when the endpoint exists.
Use MCP or tool integrations when available.
Use VLI when the workflow only exists on a screen.
Use deterministic logic when correctness is non-negotiable.
Use human review when approval, validation, or exception resolution is required.

Vision-First; but not vision-only

Most agent tools try to avoid that reality. They look for APIs, connectors, functions, browser DOMs, or predefined tools. Those approaches are useful, but they break down when the workflow depends on software that was never designed for agents.

Our ADK bets on the universally available integration: the screen. Its Vision Layer Interface gives agents the ability to observe the current screen, understand visible UI state, locate elements, resolve coordinates, execute mouse and keyboard actions, extract visible data, verify outcomes, and capture evidence. VLI is not the only option, but a fairly desirable one. The result is an agent that can operate in any scenario without vendor dependent integrations.

VLI agent interface showing a browser window with element targeting

Repeatability

One-off agents make for an easy demo. Building an agent by prompting an LLM, is not a production-ready, battle-tested solution to actually offloading human work.

AI Agents in real world situations are required to repeat the behavior without hallucinations, human prompting, or unexpected exceptions.

LaunchAI ADK makes repeatability a core design principle. Every workflow is decomposed into atomic steps. A step can be an action, a decision, or a human-review event. Each step has a narrow purpose, defined inputs, expected outputs, validation logic, and an execution record.

Workflow steps list showing numbered atomic steps in sequence

This is fundamentally different from relying on one large prompt or an opaque agent loop.

Instead of saying, “Go complete this workflow,” LaunchAI ADK lets builders define the actual process in a way that the AI Agents won’t fail in the future. The agent can still use model reasoning where reasoning helps. But the workflow itself is explicit, structured, testable, and repeatable.

Retracability

The only way to gain the confidence and trust of a human, is to give them unprecedented visibility. Teams using AI Agents often need to reconstruct what happened:

What did the agent see?
What did it click?
What did it type?
What data was extracted?
Which rule was fired?
Why did it branch?
Did the screen change?
What exception occurred?
Who approved the review step?

Step-by-step execution log showing Action, Human Review, Decision and Extract records with timestamps

LaunchAI ADK comes with pre-built logging, audit and retracing widgets to precisely answer these questions.

Every workflow run can produce a step-by-step execution record. Not only are the results and exceptions in easy to navigate tables, visual actions include before-and-after screenshots, full video recording, coordinates, target regions, annotations, extracted values, timestamps, and verification results.

Decision steps can record inputs, rule outputs, model reasoning summaries, branch selections, and downstream actions. Human-review steps can record approval, rejection, correction, comments, timestamps, and retry outcomes.

Manage

Production agents must be deployed, monitored, reviewed, corrected, retried, updated, governed, and improved. Teams need to handle exceptions, support Agents that require human reviews, and preserve operational control. Most tools currently only help with one or a few pieces of this lifecycle.

LaunchAI ADK is one place to manage the entire lifecycle of the Agent. This gives teams full control after deployment.

Change one rule
Edit one branch
Update one screen action
Add one approval step
Modify a mapping table
Swap a model
Add an API call
Retrain app knowledge
Retry failed exceptions

All of this is cumbersome, if not impossible without an ADK built with the post-production needs in mind. One-off scripts/bots/agents can not be managed by a business without developers, and certainly don't give executives the confidence to offload business-critical processes to AI Agents.

LaunchAI ADK hub showing deploy, monitor, exception, correct, retry and improve connections

Commonly Confused Terms

Term Offering LaunchAI ADK

Coding agents

Can help write scripts, tools, and one-off agents.

Provides the runtime, workflow builder, VLI, review, exception handling, and audit layer needed to operate agents.

Agent frameworks

Provide primitives for tools, memory, handoffs, graphs, and multi-agent logic.

Adds the full execution environment for real workflows across screens, apps, files, rules, and humans.

Workflow automation tools

Work well when APIs, connectors, triggers, and structured data exist.

Works even when no API exists, using VLI to operate through the screen like a human.

Browser agents

Automate websites and browser sessions.

Automates browser workflows plus desktop apps, local files, PDFs, spreadsheets, and OS-level workflows.

RPA platforms

Automate structured UI tasks, often with platform-specific bots and selectors.

Uses vision-first execution, app knowledge, deterministic rules, and inspectable atomic steps to handle more variable workflows.

Computer-use tools

Give models the ability to click, type, scroll, and inspect screenshots.

Turns computer use into managed production execution with workflow maps, verification, logs, exceptions, approvals, and retraceability.

MCP servers

Expose tools, data sources, and systems to agents through a standardized protocol. Useful for connecting agents to APIs, databases, files, services, and internal tools.

Can integrate with MCP-style tools where they exist, but does not depend on them. If no MCP server exists, VLI can still operate the software visually through the screen.

Tool/function calling layers

Let agents invoke predefined functions, APIs, scripts, and services. Strong when the action can be cleanly represented as a callable tool.

Supports tool calls as one execution path, but also supports visual actions, deterministic rules, human approvals, document workflows, and desktop-native execution in the same process.

Agent harnesses

Provide scaffolding for running agents: prompts, tool loops, retry logic, task execution, testing hooks, and runtime wrappers. Helpful for experimentation and agent prototyping.

Goes beyond a harness by providing workflow design, visual execution, orchestration, state verification, exception management, review portals, logs, and lifecycle controls.

Eval frameworks

Measure agent performance on tasks, benchmarks, regression tests, model outputs, or tool-use accuracy. Useful for testing quality before deployment.

Includes retraceable execution evidence from real workflow runs: screenshots, decisions, extracted values, verification results, exceptions, and human-review outcomes.

Observability platforms

Track prompts, traces, spans, tool calls, latency, costs, errors, and model behavior. Strong for debugging LLM applications.

Observes the full business workflow, not just the model layer: what the agent saw, clicked, typed, scraped, verified, escalated, and changed on screen.

Guardrail systems

Add policy enforcement, validation, safety checks, structured outputs, restricted actions, and compliance controls around model behavior.

Combines guardrails with deterministic rule execution, workflow branching, human approval gates, visual verification, and exception routing at the process level.

RAG / vector database stacks

Help agents retrieve knowledge from documents, embeddings, internal data, and semantic search indexes. Strong for knowledge access and context injection.

Can use retrieved knowledge as context, but also acts on that knowledge across real applications, screens, files, portals, and approval workflows.

Memory systems

Store user preferences, task history, prior outputs, entity data, or long-term agent context. Useful for continuity and personalization.

Treats memory as one part of the workflow, while also managing step execution, screen state, business rules, exceptions, review, and audit evidence.

Sandbox runtimes

Give agents isolated environments to run code, browse, test, or manipulate files safely. Useful for controlled execution and experimentation.

Runs agents in the actual desktop environment where work happens, inheriting the user's OS, browser, apps, files, sessions, permissions, and network context.

Prompt engineering tools

Help teams design, version, test, and optimize prompts. Useful for improving model instructions and structured outputs.

Reduces dependence on giant prompts by decomposing workflows into atomic steps with explicit actions, decisions, rules, verification, and review.

Multi-agent systems

Coordinate multiple specialized agents with different roles, tools, memory, or responsibilities. Useful for complex reasoning and task decomposition.

Can orchestrate model reasoning where useful, but anchors execution in real workflow steps that interact with software, files, humans, and business rules.

API integration platforms

Connect systems through endpoints, schemas, credentials, webhooks, and structured data exchanges. Strong when APIs are reliable and complete.

Uses APIs when available, but continues when APIs are missing, incomplete, changing, unavailable, or insufficient for the real workflow.

Document AI tools

Extract, classify, and process data from documents, PDFs, forms, invoices, and scanned files. Strong for document understanding.

Embeds document understanding inside a full workflow: open the file, extract data, compare against business rules, update systems, route exceptions, and log evidence.

Data pipeline tools

Move, transform, validate, and sync structured data across systems. Strong for backend data workflows.

Handles workflows where the "data pipeline" includes human interfaces, screens, portals, spreadsheets, scanned documents, and manual review steps.

Agent operating platforms

Attempt to provide broader infrastructure for deploying, monitoring, and coordinating agents. Capability varies widely and often centers on cloud tools, APIs, or chat-based execution.

Provides a complete ADK for real operational work: workflow builder, VLI, orchestration, API/MCP integration, deterministic rules, human review, exception management, logs, and retraceability.

Chatbot builders

Build conversational assistants for support, internal Q&A, lead capture, knowledge retrieval, or guided workflows. Strong when the only interface is a chat window.

Builds agents that do the work, not just talk; navigate software, manipulate files, update records, extract data, verify outcomes, and escalate exceptions.

No-code agent builders

Let non-technical users configure agents through forms, prompts, templates, or simple app connections. Good for basic automations and fast setup.

Gives technical teams deeper control over real workflows with atomic steps, visual execution, deterministic logic, modular integrations, runtime evidence, and process governance.

Desktop scripting tools

Automate local actions through scripts, coordinates, hotkeys, macros, or OS automation libraries. Useful for narrow, stable workflows.

Adds model-level visual understanding, workflow orchestration, verification, screenshots, exception handling, human review, and lifecycle management on top of desktop execution.

Commonly Compared

Product The good & bad LaunchAI ADK differentiation

Claude Code

Excellent for writing on-demand code. It can understand codebases, edit files, run terminal commands, generate scripts, and help developers build one-off agents or automation logic. Its limitation is that it is fundamentally code-centric, and one-time style output. It is not an operating environment for agents that must run messy workflows across unpredictable inputs (without constant human prompting to improve). Also doesn't have a VLI.

LaunchAI ADK is a fully equipped toolbelt to build, deploy, and operate Agents. It gives teams powerful widgets: workflow builder, VLI, orchestration, deterministic and autonomous rule engine, pre-built integrations, human review, exception management, and visual audit trails. Claude Code actually works very well to help you write reliable and production-grade agents using LaunchAI ADK.

OpenAI Agents SDK

Light, yet helpful developer framework for building agents with tools, handoffs, guardrails, and agent loops. The limitation is that it is still a framework layer. Teams must build the surrounding workflow builder, individual integrations, visual execution layer (no pre-built VLI), exception portal, human-review system, desktop runtime, and operational controls themselves.

LaunchAI ADK provides the missing production system around the agent. It combines orchestration with workflow design, visual desktop execution, retraceability, human review, deterministic rules, and exception management. Instead of starting with primitives and building the operating layer from scratch, LaunchAI ADK gives engineers the complete agent-building stack.

OpenAI Computer Use / CUA

Powerful capability for allowing models to inspect screenshots and perform computer actions such as clicking, typing, and scrolling. The limitation is that computer use alone is only an action interface. It does not automatically provide workflow maps, business-rule control, step-level verification, exception queues, human approvals, visual audit trails, or lifecycle management. CUA is just one element of making a successful AI Agent.

LaunchAI ADK takes screen-based automation to production level reliability. VLI is connected to atomic process maps, runtime verification, before-and-after screenshots, annotations, deterministic logic, exception handling, and human review. The ADK doesn't just process screenshots, it facilitates the entire tech stack that needs to go with it.

LangGraph

Orchestration framework for stateful, durable, long-running agent workflows. Its limitation is that it operates primarily at the agent orchestration layer. The real-world desktop execution layer, visual UI operation, workflow builder, exception management portal, and business-user review system still need to be built around it.

LaunchAI ADK includes orchestration, but extends beyond orchestration. It gives teams a way to map human workflows, execute them visually across software surfaces, integrate APIs when available, enforce deterministic rules, route exceptions, and retain evidence. LangGraph helps structure the agent brain; LaunchAI ADK gives the agent a body, workspace, controls, and operating system.

CrewAI

Useful for creating multi-agent systems with specialized roles, crews, tasks, flows, memory, and collaboration patterns. The limitation is that multi-agent collaboration does not solve the harder operational problem: integrating with legacy apps (or apps without APIs), executing real world messy workflows, interfacing with screens, documents, files, portals, approvals, and changing UI states.

LaunchAI ADK focuses on building full capabilities within a specialized Agent that does the task—just like a human. It combines model reasoning with VLI, workflow maps, deterministic rule execution, API/tool integrations, human review, and exception handling. Instead of only defining what agents should think or discuss, LaunchAI ADK defines how agents actually operate across the systems humans use every day.

Microsoft AutoGen

Used for event-driven, distributed, and multi-agent systems. It is valuable for research, conversational agent collaboration, dynamic workflows, and scalable agent architectures. Its limitation is similar to other agent frameworks: it provides powerful building blocks, but not a complete vision-first production environment for desktop-native human workflows.

LaunchAI ADK provides a more complete execution layer for business process automation. It connects agent orchestration to screen interaction, workflow building, local machine execution, deterministic rules, human intervention, exception review, and retraceable logs. AutoGen helps coordinate agents; LaunchAI ADK helps agents complete real-world work end to end.

n8n

Offers a visual automation platform for connecting APIs, apps, credentials, triggers, and structured workflows. It works well when systems expose reliable integrations. Its limitation is that real business workflows often require actions outside APIs: logging into portals, reading scanned PDFs, using desktop apps, interacting with spreadsheets, handling popups, manipulating files, or making decisions from visible screen state.

LaunchAI ADK does not stop where integrations stop. It can use APIs and tools when they exist, but it can also operate visually through VLI when the workflow is trapped behind a screen. This makes LaunchAI ADK better suited for processes that combine integrations, local files, browser portals, documents, desktop apps, and human approvals in one workflow.

Zapier

Strong for no-code automations in a well connected app ecosystem. It is useful for simplistic, trigger-based tasks across supported apps and common cloud tools. The limitation is that app coverage is not the same as workflow universality. If the process depends on a legacy desktop app, scanned document, local file, browser-only portal, custom internal tool, or unsupported workflow state, connector-based automation breaks down.

LaunchAI ADK is built for workflows beyond SaaS connectors. It gives agents access to the same surfaces humans use: screens, files, folders, PDFs, spreadsheets, browsers, and desktop apps. LaunchAI ADK can handle workflows across systems that were never designed to be connected.

Browser Use

Converts natural language prompts to AI browser automation. It can help agents navigate websites, but can take a long time to run, often fail at repeating the same flow, and is generally limited in scope. Most office workflows are not browser-only. They involve desktop applications, file systems, OS dialogs, spreadsheets, PDFs, scanned documents, downloads, uploads, local folders, human approvals, and exceptions.

LaunchAI ADK is broader than browser automation. It supports browser workflows, but also extends to desktop apps, local machine context, documents, files, spreadsheets, and human review. ADK by the virtue of running on your browser session and fixed IP avoids bot detection entirely. It has built-in powerful Playwright MCP servers that perform superior to Browser Use.

FAQ

Atomic steps make VLI workflows more reliable, inspectable, and production-ready.

A traditional agent often hides logic inside procedural code. When it fails, teams have to inspect logs, stack traces, selectors, or brittle assumptions about the application state. VLI takes a different approach. Every action is declared in plain language, executed against the live screen, and captured with before-and-after screenshots.

This gives builders three important advantages.

Dependable: Each step has a clear purpose and a narrow scope. If the agent fails to click a button, scrape a table, or locate a field, the issue can be isolated to that exact step.

Auditable: Because VLI captures screenshots around each action, teams can see what the agent saw before acting and what changed afterward. The post-action screenshot should include a red annotation box marking the relevant click target, typed region, scroll area, or extracted screen region.

Adaptable: Decision blocks can evaluate the screen after specific actions and branch the workflow depending on what is visible. This allows the agent to handle real software conditions: pre-selected checkboxes, modals, session warnings, missing rows, disabled buttons, loading states, or unexpected screens.
Technically, no. But we would be remiss if we didn't mention that the ADK was designed for fairly technical builders. If you have an engineering background or have at least dabbled in programming, everything from our tooling to documentation will make a lot of sense. So while a user can accomplish a lot by simply using the UI and prompting in plain english, he/she will be unable to exploit the full functionality without some fundamental understanding of Python.
Please refer to the training guide linked here.
Yes, vision is always going to be slower since it requires more computation and awaits model responses in real time. When APIs are available and comprehensive, we will advise users to leverage them.

Please bear in mind that APIs can change, or be deemed incomplete when the scope of a process changes. To avoid rebuilds and delays, it's better to choose VLI over API when you're unsure.