Show HN: Agenteract – Drive mobile apps with LLMs using UI trees (no vision)

1 point

6 months ago

Hi HN,

AI agents excel at web tasks because they can read the DOM. But when you point that same agent at a mobile app, it hits a wall. There is no DOM to read. Some have tried to use computer vision, but it is slow, expensive, and flaky.

I built Agenteract to give mobile apps a "DOM" that agents can actually read.

Instead of taking screenshots, Agenteract instruments your app to "self-report" its view hierarchy through a secure local WebSocket. It serializes the UI into a token-efficient JSON tree.

On top of this, Agenteract provides a clean CLI that enables agents to interact with the app with minimal context overhead.

Here's a normal flow of using the CLI to interact with the app:

  npx @agenteract/agents hierarchy your-app

Simplified response example:

  {
    "status": "success",
    "hierarchy": {
      "name": "main(RootComponent)",
      "children": [
        {
          "name": "ViewStuff",
          "children": [
            { "name": "Text", "text": "Home" },
            {
              "name": "Pressable",
              "testID": "test-button",
              "children": [
                { "name": "Text", "text": "Simulate Target" }
              ]
            }
          ]
        }
      ]
    },
    "id": "296d657e-f5e8-49d1-a7ab-62815a"
  }

Then tapping the button:

  npx @agenteract/agents tap your-app test-button

If the interaction is successful, the agent receives:

  {"status":"success","message":"Tapped test-button"}

Alternatively, the agent could retrieve the view hierarchy again to see the updated state.

We support the following actions: - tap - input - scroll - longPress - swipe - wait - log - cmd (send keystrokes to dev servers, eg reload)

Why not just use screenshots? * Speed: JSON is instant; processing screenshots takes seconds. * Cost: Text tokens are orders of magnitude cheaper than image tokens. * Reliability: Deterministic targeting means no hallucinated coordinates.

Supported Platforms: * React Native / Expo (via Fiber tree introspection) * Flutter (via Widget tree traversal) * Native iOS (Swift/UIKit/SwiftUI) * Native Android (Kotlin)

Repo: https://github.com/agenteract/agenteract

Docs: https://agenteract.io/docs

{ "status": "success", "hierarchy": { "name": "main(RootComponent)", "children": [ { "name": "ViewStuff", "children": [ { "name": "Text", "text": "Home" }, { "name": "Pressable", "testID": "test-button", "children": [ { "name": "Text", "text": "Simulate Target" } ] } ] } ] }, "id": "296d657e-f5e8-49d1-a7ab-62815a" }