Diving into Agentic AI — One Month In
2026-04-12
Overview: A Shift That Feels Bigger Than It Looks
Since the beginning of the year, agentic AI hasn't just improved, it's changed the direction of AI.
Not in capability, but in how it's being applied.
- Jensen Huang talking about AI systems as active participants, not passive tools
- The rise of frameworks like OpenClaw and examples like FelixcraftAI
- Andrej Karpathy treating memory as a wiki of knowledge for agents
- Claude moving toward structured workflows and tool-driven execution
Individually, these look incremental. Together, they point to a shift:
We are moving from “AI that responds” → to “AI that acts.”
Timeline: What Actually Changed
Looking back over the past few months, these aren’t just updates, they signal a shift in how AI systems are being built.
1. OpenClaw & Local Agent Runtimes
- A move toward local, controllable agent systems
- Prompting becoming less central → orchestration becoming core
- Early foundations of an “agent OS”
2. FelixcraftAI & Tool-Oriented Agents
- Tight coupling of reasoning, tools, and execution
- Chat interfaces fading → workflows taking over
- Agents performing as a business model
3. Claude’s Agent Direction
- Tool use becoming the default interaction model
- Structured outputs enabling more control and predictability
4. Karpathy’s Memory Direction
- Memory shifting from transient chat → persistent knowledge systems
- The emergence of a real “second brain” layer for agents
My Experience: Where It Breaks (and Where It Clicks)
After working hands-on with this space for a month, one thing is clear:
The limitation is no longer intelligence. It’s structure.
What works surprisingly well:
- Breaking problems into steps
- Delegating simple workflows to agents
- Rapid iteration with tool use
What breaks constantly:
- State management
- Memory consistency
- Multi-step reliability
- Knowing when to stop
Most failures aren’t because the model is “wrong”.
They happen because:
- the system doesn’t know its state
- the context isn’t clear
- or the loop isn’t controlled
Version 1: Over-Controlled and Stuck
My first version was heavily controlled and over-engineered.
I tried to build a fully automated system around OpenClaw using:
- Markdown files as the core state and control layer
- Custom Python functions to manage workflows
- Telegram commands as the interface
Structured into layers:
- Layer 1 — Automated command handling
- Layer 2 — Local LLM (Ollama) for basic tasks
- Layer 3 — ChatGPT-4o for higher-quality outputs
On paper, it made sense:
- optimise cost
- control behaviour
- create a clean system
In practice, it failed.
- The system got stuck in loops
- Tasks never progressed meaningfully
- The stronger models were rarely used effectively
- Everything was “working” — but nothing useful was happening
I had built a system that was technically correct, but practically useless.
An important caveat: I never actually used the system for real work.
It was built on assumptions.
Because it wasn’t tested in real workflows:
- it never exposed where it was breaking
- it never “learned” it wasn’t working
- and I had no clear signal on what to fix
The system wasn’t just flawed, it was unvalidated.
I had built a system that controlled too much, too early.
What I Actually Learned From It
Even though it didn’t produce a usable system, it was valuable.
It forced me to go deeper than surface-level “agent demos”:
- Refreshing Linux fundamentals (permissions, processes, environment setup)
- Working with Python in a system context, not just scripts
- Managing SSH, networking, and remote execution
- Understanding how OpenClaw actually runs under the hood
It made one thing very clear:
OpenClaw isn’t magic: it’s a system coordinating tools, models, and execution.
A lot of what’s being shown online skips this layer entirely.
What Was Actually Breaking
It wasn’t the models.
It was the system:
- State wasn’t visible — everything was implied through files
- Control was too rigid — no flexibility once flows started
- Loops weren’t bounded — tasks continued without clear outcomes
So even when parts worked, the system didn’t produce anything meaningful.
Version 2: Letting the System Work
This time, instead of forcing structure upfront, I let the agent lead more of the system development.
That came with trade-offs.
At the time, OpenAI allowed Codex access via OAuth — I used that heavily early on and hit limits quickly.
Switching to GPT-5.4 burned through my restricted monthly allowance just as fast.
Ironically, that constraint was useful.
A good system doesn’t just enable output, it controls cost and usage.
Hitting those limits forced a pause.
And that pause forced clarity.
Instead of trying to design the full system, I focused on building something real:
- A workflow from Ideation → Analysis → Decision Making
- Agents handling specific roles within that flow
- Structure emerging from use, not assumptions
I started experimenting with sub-agents:
- Software Architect
- Designer
- Researcher
Not as isolated “agents”, but as parts of a system working toward an outcome.
This approach felt very different.
- The system evolved organically
- Work was actually moving forward
- Failures were visible and actionable
Not perfect — but finally usable.
What Actually Worked
- Breaking work into simple, sequential steps
- Letting agents handle defined roles within a workflow
- Iterating based on real outputs instead of assumptions
What Still Breaks
- State drifting across steps
- Memory consistency over time
- Reliability in longer workflows
- Knowing when to stop vs continue
Most failures weren’t because the model lacked capability.
They happened because the system lacked clarity.
Intelligence is already there. Structure is the constraint.
The Reality: Early Days, Unclear Future
It’s easy to overhype this space.
The reality is more nuanced.
What’s genuinely exciting
- Massive leverage, especially for people who think in systems
- The ability to turn ideas into working workflows quickly
- Early signs of autonomous processes replacing manual effort
What’s not solved
- Fragility, systems break in ways that aren’t always obvious
- Memory and persistence are still unreliable
- Debugging is unclear, it’s hard to trace where things went wrong
- Moving from “demo” → “production” is still a major gap
The problem isn’t what these systems can do. It’s whether you can rely on them to do it consistently.
The Real Shift: From Tools → Systems
The biggest mindset change for me:
Stop thinking about AI as a tool. Start thinking in systems.
Agentic AI isn’t:
- “better ChatGPT”
- “automation scripts”
Most people are still interacting with AI at the surface level.
It’s:
- workflows
- state machines
- feedback loops
- orchestrated execution
The moment you treat it like a system, everything changes:
- how you structure problems
- how you debug failures
- how you design outcomes
The difference isn’t the model. It’s whether you’re building prompts, or designing systems.
Where This Goes (My Current View)
In the short term, most people will struggle to get consistent results.
Not because the models aren’t capable, but because the systems around them aren’t.
The advantage won’t come from better prompts.
It will come from being able to structure workflows, manage state, and design systems that actually hold together.
Memory and orchestration are already emerging as the core problems.
This is where things start to shift from experimentation → infrastructure.
“Agent infrastructure” won’t be a concept, it will be a category.
Longer term, the direction feels clear:
- persistent digital workers
- domain-specific agent systems
- human + agent collaboration as a default way of working
Right now feels similar to early cloud or early mobile.
Messy. Fragmented. Unclear.
But directionally obvious.
The future isn’t AI answering questions.
It’s AI participating in work.
We’re not figuring out how to use AI. We’re figuring out how to work with it.
What I’m Focusing On Next
- Building structured memory systems (not chat-based)
- Improving state tracking across agent workflows
- Turning OpenClaw into a repeatable system, not just an experiment