Product Audit — Simular SAI

Audit Product

Simular Sai — an Autonomous Agent

Problem Statement

The black-box nature of AI workflows causes abrupt interruptions and dead-end failures. This lack of transparency and recovery paths strips users of control, making the agent feel like an unreliable executor.

Before execution

● No plan preview before the agent starts
● Homepage front-loads a VM panel before the user has typed anything, creating cognitive overhead before the task has even started

During execution

● Running, stuck, and failed all look identical
● Clicking into the VM transfers control by accident
● Google auth popup appears on the local desktop outside the VM. Denying it triggers a silent retry loop with no recovery path
● Permission UI shows raw JavaScript written for engineers, not the person approving it

After completion

● Step-limit stop leaves no partial output, no progress summary, no path forward
● Final output (Google Sheet link) buried in a wall of text with no structured handoff

Who I Tested For

I chose a non-technical SMB founder because this is the user who can't recover when the agent fails. Technical users have workarounds. She doesn't.

Julia Marsh

Proto-Persona · SMB Founder · Singapore

Background

Non-technical. First-time agent user. Cautious with permissions.

Tools she knows

Google Workspace, Canva, ChatGPT (casual use)

What she needs from SAI

Delegate a complex task and get a usable result. No babysitting, no technical overhead.

The Scenario

Julia has a product curation call on Thursday. She opens SAI and types:

"I am launching a home decor curation brand in Singapore, focusing on Nordic and Japanese home accessories (e.g., vases, small lighting, stools, stationery — no large furniture). Please research brands popular in Singapore and trending in Europe/Japan, curate 20 items with market analysis and pricing, then compile everything and images into a Google Sheet report I can use for import decisions."

Why I Chose It

This task requires the agent to cross multiple capability boundaries in a single run. That's exactly where HITL design breaks down if it isn't built for it.

Research→ Judgement→ Auth handoff→ Delivery

How the Task Unfolded

The task ran across multiple sessions. These are the key moments — progress, friction, and failure.

Initial State

High learning load before the first task

The homepage tries to guide users with example tasks — the right instinct. But presenting too many at once creates cognitive overload rather than clarity. On top of that, the layout immediately exposes a virtual desktop panel on the right. For a first-time user, there's no explanation of what the VM is, why it's there, or what they're supposed to do with it. Two unfamiliar concepts compete for attention before anything has even started.

SAI home screen with suggested tasks and VM panel

Start

Task submitted — execution starts immediately

Accidental takeover

User clicks into VM to check progress — and interrupts the agent by accident

Clicking into the VM to see what the agent is doing transfers control to the user. The agent pauses. There's no observer mode — any attempt to watch could accidentally take over.

Loss of visibility

"Are you still working?" — User loses transparency of the progress

The execution stream updates constantly, but its contents have no hierarchy. Process logs, agent reasoning, and required actions all look identical — a single animated line of text with no way to distinguish what the agent is doing from what it needs. There's no high-level progress summary, no stage indicator, no sense of how far along the task is. The user is left staring at a stream they can't read, repeatedly wondering whether anything is actually happening.

Execution stream with no visual hierarchy

Unexpected auth environment

Google auth appears on the desktop — the user expected it to stay in the VM

The task was running entirely inside the agent's virtual machine. When Google Sheets access was needed, the auth prompt appeared as a native dialog on the user's local desktop — completely outside the VM. The user assumed the agent would handle everything remotely, so this was unexpected. They denied it.

Google account picker appeared on local desktop

No graceful fallback

User denies. The same prompt keeps coming back.

After the denial, the agent didn't pause or ask how to proceed. The auth prompt kept reappearing. The user had to explicitly type into the chat — at least twice — asking to sign in through the VM instead. Only then did the agent accept the instruction and open the auth flow inside the VM.

Sheets access dialog repeating after denial

Opaque permission request

The agent asks to run code — but the user can't tell if it's safe

A "Browser JS Execution — Safety Check" dialog appears, showing raw JavaScript code. For a non-technical user, there's no way to evaluate what this code does, why the agent needs it, or whether allowing it is safe. The permission UI is written for developers, not for the person being asked to approve it.

Browser JS execution safety check with raw code

Hard stop

"I've reached the maximum number of steps"

The agent stops mid-task with a single line: "Please let me know if you'd like me to continue or try a different approach." No summary of what was completed, no partial output handed over, no indication of how close it was to finishing. The user is left with nothing actionable — confused about what just happened and with no clear path forward except starting over from scratch.

Incomplete output

The sheet is delivered — but without the images the task required

The original prompt explicitly asked for images of the selected items. The agent produced the spreadsheet without them and gave no explanation for the omission. The user had to ask again — restating a requirement that was already there from the start.

Delivery

The Google Sheet link is buried in a wall of text

The agent's final message is a long block of prose. The Google Sheet URL is somewhere inside it — no card, no clear action, no structured handoff. The user has to scan through the text to find it, then figure out what to do next on their own.

Final output

The agent pulled off something genuinely hard. 🎉

A multi-step research, curation, and remote file creation task — completed. The output is passable: items are listed, pricing is included. The product selection feels a little generic and the analysis isn't quite at the depth you'd act on without review. The capability is real, but it's not yet at the level where you'd trust it to run unsupervised. Which is exactly why the interface needs to be designed for a human who's still very much in the loop.