We started building Arti like many teams do — with the idea that we needed an orchestration layer for intelligent software.
But our background isn’t in AI demos or LLM wrappers. It’s in industrial software automation — manufacturing, automotive, and systems where software controls real-world processes, robots, and operations. In those environments, failure is not a UX issue. It’s a breakdown with real consequences. And in that world, orchestrating behavior isn’t enough. You need to manage state. Surface decisions. Control complexity. You need production-grade agent systems.
So we approached the agentic problem from first principles. At first, orchestration seemed like the answer. But it quickly became clear:
Orchestration alone doesn’t scale. It creates a bottleneck at the control layer.
What we needed wasn’t another framework. We needed a system architecture — one that could support a Mixture of Experts approach — not in the model sense, but in the system architecture:
-
Modular agents with clear responsibilities
-
Context-aware routing
-
Typed, testable state
-
Prompt behaviors that could be versioned, composed, and evolved
-
Collaboration between humans and AI, not top-down command loops
We tested the available frameworks:
- LangGraph: declarative graphs with brittle state passing and painful rigidity.
- AutoGen, CrewAI, Agents SDK: abstractions over abstractions. Easy to start, impossible to trust at scale.
These weren’t frameworks. They were demo wrappers. None of them survived contact with real-world complexity.
And then we found two teams thinking like we were:
-
Anthropic, emphasizing pattern clarity and composable simplicity
-
PydanticAI, advocating minimal, type-safe, system-aware agent design
Neither of them builds “frameworks” either.
They’re building philosophies that map to real-world complexity without abstraction debt.
In this post, we’ll walk through what we respect in both — and how Arti expands those principles into something deployable, introspectable, and durable.
Because agentic systems aren’t just workflows.
They’re software.
And we build software like it matters.
This post is about what Anthropic and PydanticAI get right— and what’s needed to go further.
Anthropic: Patterns Over Frameworks
Anthropic’s guide to building effective agents is one of the most practical and insightful resources in the agent framework space.
Their core advice?
Start simple.
Don’t build agents unless you need them.
Compose small, testable workflows using prompt chaining, routing, and tool calls.
Use the LLM as a decision engine, not a black box.
And they’re right. For many use cases, that’s enough.
But here’s where things break down:
⚠️ The Limits of Pattern-Only Thinking:
-
There’s no structured approach to long-term state.
-
Tool calls are prompt-engineered, not typed or contracted.
-
There’s no model for semantic memory or reusable prompt modules.
-
The execution environment lacks runtime introspection, rollback, or monitoring.
-
Their approach assumes the LLM is the agent — when in reality, it should be just one part of a modular system.
In short: great software starts with patterns, but it scales with architecture.
PydanticAI: Typed, Explicit, Composable Agents
PydanticAI takes a unique approach that we deeply respect.
Where Anthropic is pattern-first, Pydantic is software-first:
-
Typed tools and agent inputs
-
Explicit delegation and agent handoffs
-
Avoidance of DAG fetishism
-
Code that’s readable, testable, inspectable
Their best metaphor?
“Don’t use a nail gun unless you need one.”
That could be Arti’s motto.
But there are gaps here too:
❗ The Missing Layers:
-
There’s no concept of semantic context or ontologies — state is structured, but not meaningfully enriched.
-
There’s no built-in notion of collaborative interaction — everything is still “agent does X.”
-
There’s minimal treatment of execution observability or runtime introspection.
-
Prompt handling is typed, but not versioned, overloaded, or runtime-dispatched.
PydanticAI is the best foundation we’ve seen for agent logic as code — but not yet for agentic systems at scale.
Where We Come From — and Why This Matters
We didn’t start in chatbots or demos. We built automation software for manufacturing, automotive, and industrial systems.
In our world:
-
State isn’t ephemeral — it drives machines.
-
Errors aren’t recoverable with retries — they cost real money.
-
You don’t ship a product that sometimes works.
So when we look at agents, we don’t see magic. We see a control system that needs:
-
Versioned, modular behavior
-
Semantic context and state inspection
-
Real-time observability and rollback
-
Collaborative control models between human and AI
We respect Anthropic’s clarity. We align with PydanticAI’s posture. But we build like systems engineers, not AI whisperers.
What Arti Adds — Without Breaking the Philosophy
Arti builds on the strengths of Anthropic and PydanticAI, extending their principles into a production-grade solution that meets the demands of real-world applications.
Principle | Anthropic | PydanticAI | Arti |
---|---|---|---|
Simplicity first | ✅ | ✅ | ✅ but modularized |
Typed tools | ❌ | ✅ | ✅ (enforced + testable) |
Prompt modularity | ⚠️ (manual) | ✅ | ✅ (versioned + runtime dispatched) |
Execution state | ⚠️ (LLM memory) | ✅ | ✅ (typed + semantic) |
Human-AI loops | ⚠️ | ❌ | ✅ (interrupts, approvals, collaboration) |
Observability | ⚠️ | ⚠️ | ✅ (first-class, not afterthought) |
Rollback / Eval | ❌ | ❌ | ✅ (built-in evaluation + time travel) |
We’re not building a framework. We’re building an execution layer for intelligent, stateful, observable, modular software — where LLMs are a component, not a controller.
Closing: Patterns Are a Start. Systems Must Follow.
We agree with Anthropic:
“Add complexity only when it demonstrably improves outcomes.”
And we agree with PydanticAI:
“Don’t use more power than you can control.”
But eventually, even the cleanest pattern or best-typed agent will hit a wall — because agents aren’t demos anymore. They’re infrastructure.
Arti is what happens when you treat agents like real systems. Join us as we redefine the future of agentic architecture and build the foundation for the next generation of intelligent systems.
Next up:
Netflix built a foundation model for recommendation systems. We’re building one for cognition. What they’ve done shows us where agentic architecture is headed — and why DAGs, SDKs, and chat loops won’t get us there.