Toolhouse.ai Human-Dependent

Name: Agent Native Offer Audit — Toolhouse.ai
Item: Toolhouse.ai
Rating: 44
Author: Agent Native Offers

AUDIE Score: 44/100 · Audited 2026-04-08 · Website: https://toolhouse.ai · Machine-readable: JSON

Pillar Scores

P1 Signal Architecture — 10/25

P2 Clarity Stack — 12/25

P3 Trust Envelope — 8/20

P4 Velocity Triggers — 7/10

P5 Gravity Design — 7/20

Audit History

Scores are never overwritten — every re-audit is appended as a new dated version. Machine-readable: history JSON.

Version	Date	Score	Tier	Rubric
v1	2026-04-08	44/100	Human-Dependent	v1-2026-04

Executive Summary

Toolhouse.ai scores 44/100, placing it in the lower half of the Human-Dependent tier. Its strongest asset is a genuinely frictionless activation path — agents and developers can go from zero to deployed in minutes, with no code and no sales call. However, the platform suffers from a trust infrastructure deficit: no public status page, no schema.org structured data, inconsistent pricing units, and no published OpenAPI spec. For a platform with NVIDIA, Cloudflare, and Groq as customers, the absence of a public status page is particularly damaging — it signals that the platform hasn't yet built the transparency layer that enterprise and agent buyers require. The highest-ROI immediate action is deploying a public status page (one day of work) followed by adding JSON-LD structured data (one hour of work) — together these would add approximately 8 points and push Toolhouse into Emerging tier.

Strongest Signals

Friction-free activation (P4-A: 4/5): Toolhouse has the lowest barrier to first-agent deployment in its class — start from a prompt, no code, free tier, no sales call. This is the platform's strongest agent-native trait: an AI agent or agent-building team can be operational in minutes.
Memory capability mentioned in core platform (P5-B: 2/5): The built-in MCP server includes memory as a first-class capability alongside RAG, code execution, and browser use. The architecture exists; it just needs to be formally documented and exposed as a structured API for full agent-native credit.
Notable enterprise customers (P3-A: 1/5): Cloudflare, NVIDIA, Groq, and Snowflake as customers signals real production credibility. $1M+ in TSV flowed through the platform. These are strong trust anchors that should be formalized into verifiable, machine-readable signals.
LLM-full documentation export (P1-C: 2/5): Publishing `llms-full.txt` via the docs platform shows awareness of agent-native readability needs. This just needs to be promoted to `toolhouse.ai/llms.txt` to be discoverable at the standard entry point.

Critical Gaps

No status page or verifiable uptime data (P3-A: 1/5): For a platform claiming enterprise customers like NVIDIA and Cloudflare, the absence of a public status page is the single most trust-eroding gap. An AI agent evaluating infrastructure providers will look for uptime data as a primary selection criterion.
No schema.org structured data (P1-A: 1/5): The website has no machine-readable identity layer. An LLM crawling for "no-code agent builders" or "MCP-connected AI workers" cannot parse Toolhouse's pricing, identity, or capabilities in structured form.
Pricing unit inconsistency (P2-A: 3/5 limited): "Credits" and "runs" are used inconsistently across the pricing page and docs, making programmatic interpretation unreliable. An AI agent comparing Toolhouse's pricing to competitors cannot resolve whether 15 credits = 15 runs or something else.
No OpenAPI spec published (P1-D: 3/5): The agents themselves deploy as APIs, but Toolhouse's own platform API (for creating, managing, and calling agents programmatically) lacks a published OpenAPI specification. This makes it harder for agent orchestration systems to autodiscover and use Toolhouse's management capabilities.

Priority Actions

Launch a public status page — +4 pts · P3 · Effort: Low
Add schema.org JSON-LD structured data to homepage and pricing page — +4 pts · P1 · Effort: Low
Publish OpenAPI spec for the Toolhouse platform management API — +3 pts · P1 · Effort: Medium
Document memory API and persistence model formally — +3 pts · P5 · Effort: Medium

All 20 Criteria

P1-A Structured Data — 1/5
No schema.org/JSON-LD markup discoverable on the homepage or pricing pages. The site appears to be a standard React app with no structured data layer. robots.txt is permissive (`Allow: /`) with sitemap linked, but no Product, Offer, or Organization schema published. An AI retrieval system would have to parse prose to understand what Toolhouse sells.

P1-B Machine-Readable Pricing — 2/5
Pricing is displayed in clean HTML tiers (Basic $0, Pro $10/mo, Business $500/mo) with credit counts (15, 100, 15,000). Readable in HTML tables but not structured as schema.org/Offer. "No hidden fees, no surprises" is noted in prose. Missing: machine-tagged pricing with explicit per-unit rates, overage costs, or API-accessible pricing data.

P1-C llms.txt / Agent Identity Layer — 2/5
No `/llms.txt` found (404). However, Toolhouse publishes a `llms-full.txt` via its docs platform at `docs.toolhouse.ai/toolhouse/llms-full.txt` — a full-text export of all documentation intended for LLM consumption. This is partial credit: it exists for developers using the docs, but not at the root domain where an agent would check first.

P1-D API / MCP Availability — 3/5
Agents deploy as streaming REST APIs at `https://agents.toolhouse.ai/$AGENT_ID`. Agents connect to Toolhouse's built-in MCP server. Remote MCP server connections via URL are supported (Streamable HTTP/SSE). No OpenAPI spec URL found. No published agent card or well-known endpoint. Functional MCP support exists but lacks the formal API specification that would make it fully agent-native.

P1-E Discoverability (GEO) — 2/5
Product Hunt listings exist (multiple launches). Active blog covering MCP Discovery and agent infrastructure. However, no AI-retrieval optimization layer, no semantic FAQ markup, and no evidence of AI-specific content strategy. Minimal social proof signals available for LLM training retrieval.

P2-A Offer Completeness — 3/5
Pricing page states tier names, monthly costs, and credit counts in one place. Homepage explains the platform clearly ("Build AI workers from a simple prompt"). What is unclear: "credits" vs "runs" used interchangeably; Business tier described as "15,000 credits/month" but Pro is "100 runs/month" — the unit inconsistency requires interpretation. Not fully machine-parseable from a single source.

P2-B Scope & Limits Definition — 2/5
Basic limits are stated (15 credits free, 100 runs/mo Pro, 15,000 credits/mo Business). No API rate limits documented. No per-tool execution limits stated. No timeout or concurrency limits found. The documentation mentions rate limiting exists but does not specify values in a structured, machine-readable format.

P2-C Substitution & Fallback Rules — 1/5
No documentation found for fallback behavior when MCP tools fail, when API limits are exceeded, or when third-party integrations (Zapier, n8n, scrapers) are unavailable. No substitution rules stated.

P2-D Conditional Logic Transparency — 3/5
Pricing conditions are mostly disclosed on the pricing page: private agents require Pro plan, Agent Studio requires Business, human-in-the-loop requires Business. Enterprise has custom pricing. Some conditions discoverable without sales call. However, conditions for API key scoping behavior, bundle restrictions, and model selection are documented only in technical docs, not the pricing page.

P2-E Semantic Precision — 3/5
The platform uses clear technical language in docs ("streaming API," "cron schedules," "MCP server," "Bearer token auth") but the marketing site leans on consumer-friendly but vague language: "Make work disappear," "AI workers," "no-code." This split between marketing-speak and technical precision creates ambiguity for an agent evaluating the offer.

P3-A Verifiable Performance Data — 1/5
No public status page found. No uptime data, SLA commitments, or third-party reliability verification (G2, Trustpilot) found for Toolhouse.ai. Product Hunt reviews exist but are qualitative. The platform is trusted by Cloudflare, NVIDIA, Groq, and Snowflake (mentioned on site) — strong customer signals — but no quantitative performance data is published.

P3-B Scoped Permission Model — 3/5
Toolhouse implements permission scoping: public vs. private agent visibility, API key scoping (same user_id behaves differently across API keys for team separation), and bundle restrictions controlling MCP server access. Adequate for human workflows; missing explicit time-bounded or action-bounded agent permissions typical of agent-native design.

P3-C Audit Trail / Transaction Log — 2/5
Docs mention "Execution and observability logging" in the documentation structure, but no machine-accessible audit log API or agent-readable transaction log endpoint was found. Logs appear to be human-dashboard-only. No audit log tier differentiation noted on pricing page.

P3-D Behavioral Consistency Signals — 2/5
Active blog and changelog activity implied. No formal versioning scheme, deprecation policy, or backward-compatibility commitment found in ToS or docs. The platform is actively developed (MCP Discovery launch noted in 2025) but no formal behavioral stability guarantees exist.

P4-A Friction-Free Activation — 4/5
"Start Building" free sign-up; API key accessible from dashboard after signup; no sales call required for Basic or Pro tiers. Business tier likely requires human contact. First agent deployable from a prompt with no code. Strong no-friction activation story. Deducting 1 for the absence of instant programmatic API key issuance documented in a self-serve API.

P4-B Agent Decision Signals — 3/5
Free tier exists with explicit limits (15 credits) providing a trial signal. Pro tier is explicitly priced ($10/mo, 100 runs). Pricing page says "Start free, scale as you grow." However, no programmatic usage-check API found that an agent could query to determine when to upgrade. No machine-readable signal for "you've used X of Y credits this period."

P5-A Integration Depth / Switching Cost — 3/5
Toolhouse agents deploy as APIs at unique endpoints, creating some switching cost (endpoint changes would break callers). MCP server integration, bundle configurations, and Zapier/n8n connections create workflow dependencies. However, since Toolhouse is a no-code builder on top of open MCP standards, a sufficiently motivated user could replicate agents elsewhere. Moderate switching cost.

P5-B Agent Memory / Personalization Layer — 2/5
Toolhouse's built-in MCP server provides "RAG, memory, code execution, browser use" — memory is mentioned as a capability! However, no agent-accessible memory API is documented with structured read/write endpoints. Memory appears to be accessible via MCP tool calls within agents, but the exact mechanism, persistence model, and inter-session behavior are not publicly documented in a structured way.

P5-C Programmatic Renewal Signals — 1/5
No evidence of auto-renewal API, machine-readable billing status, or programmatic subscription management. Standard billing UI is implied but not documented as agent-accessible.

P5-D Compounding Value Signal — 1/5
No agent-readable signal that communicates growing platform value (new tools added, improved models, new MCP servers available). MCP Discovery (auto-wiring of new MCP servers) is described in a blog post but not as a structured, agent-consumable feature announcement feed.

Rubric: v1-2026-04. Scores reflect the company's state on the audit date and may have improved since.