Sammy John Rawlinson

Diving into Agentic AI — One Month In

2026-04-12

Diving into Agentic AI — One Month In

Overview: A Shift That Feels Bigger Than It Looks

Since the beginning of the year, agentic AI hasn't just improved, it's changed the direction of AI.

Not in capability, but in how it's being applied.

  • Jensen Huang talking about AI systems as active participants, not passive tools
  • The rise of frameworks like OpenClaw and examples like FelixcraftAI
  • Andrej Karpathy treating memory as a wiki of knowledge for agents
  • Claude moving toward structured workflows and tool-driven execution

Individually, these look incremental. Together, they point to a shift:

We are moving from “AI that responds” → to “AI that acts.”


Timeline: What Actually Changed

Looking back over the past few months, these aren’t just updates, they signal a shift in how AI systems are being built.

1. OpenClaw & Local Agent Runtimes

  • A move toward local, controllable agent systems
  • Prompting becoming less central → orchestration becoming core
  • Early foundations of an “agent OS”

2. FelixcraftAI & Tool-Oriented Agents

  • Tight coupling of reasoning, tools, and execution
  • Chat interfaces fading → workflows taking over
  • Agents performing as a business model

3. Claude’s Agent Direction

  • Tool use becoming the default interaction model
  • Structured outputs enabling more control and predictability

4. Karpathy’s Memory Direction

  • Memory shifting from transient chat → persistent knowledge systems
  • The emergence of a real “second brain” layer for agents

My Experience: Where It Breaks (and Where It Clicks)

After working hands-on with this space for a month, one thing is clear:

The limitation is no longer intelligence. It’s structure.

What works surprisingly well:

  • Breaking problems into steps
  • Delegating simple workflows to agents
  • Rapid iteration with tool use

What breaks constantly:

  • State management
  • Memory consistency
  • Multi-step reliability
  • Knowing when to stop

Most failures aren’t because the model is “wrong”.

They happen because:

  • the system doesn’t know its state
  • the context isn’t clear
  • or the loop isn’t controlled

Version 1: Over-Controlled and Stuck

My first version was heavily controlled and over-engineered.

I tried to build a fully automated system around OpenClaw using:

  • Markdown files as the core state and control layer
  • Custom Python functions to manage workflows
  • Telegram commands as the interface

Structured into layers:

  • Layer 1 — Automated command handling
  • Layer 2 — Local LLM (Ollama) for basic tasks
  • Layer 3 — ChatGPT-4o for higher-quality outputs

On paper, it made sense:

  • optimise cost
  • control behaviour
  • create a clean system

In practice, it failed.

  • The system got stuck in loops
  • Tasks never progressed meaningfully
  • The stronger models were rarely used effectively
  • Everything was “working” — but nothing useful was happening

I had built a system that was technically correct, but practically useless.

An important caveat: I never actually used the system for real work.

It was built on assumptions.

Because it wasn’t tested in real workflows:

  • it never exposed where it was breaking
  • it never “learned” it wasn’t working
  • and I had no clear signal on what to fix

The system wasn’t just flawed, it was unvalidated.

I had built a system that controlled too much, too early.


What I Actually Learned From It

Even though it didn’t produce a usable system, it was valuable.

It forced me to go deeper than surface-level “agent demos”:

  • Refreshing Linux fundamentals (permissions, processes, environment setup)
  • Working with Python in a system context, not just scripts
  • Managing SSH, networking, and remote execution
  • Understanding how OpenClaw actually runs under the hood

It made one thing very clear:

OpenClaw isn’t magic: it’s a system coordinating tools, models, and execution.

A lot of what’s being shown online skips this layer entirely.


What Was Actually Breaking

It wasn’t the models.

It was the system:

  • State wasn’t visible — everything was implied through files
  • Control was too rigid — no flexibility once flows started
  • Loops weren’t bounded — tasks continued without clear outcomes

So even when parts worked, the system didn’t produce anything meaningful.


Version 2: Letting the System Work

This time, instead of forcing structure upfront, I let the agent lead more of the system development.

That came with trade-offs.

At the time, OpenAI allowed Codex access via OAuth — I used that heavily early on and hit limits quickly.
Switching to GPT-5.4 burned through my restricted monthly allowance just as fast.

Ironically, that constraint was useful.

A good system doesn’t just enable output, it controls cost and usage.

Hitting those limits forced a pause.

And that pause forced clarity.


Instead of trying to design the full system, I focused on building something real:

  • A workflow from Ideation → Analysis → Decision Making
  • Agents handling specific roles within that flow
  • Structure emerging from use, not assumptions

I started experimenting with sub-agents:

  • Software Architect
  • Designer
  • Researcher

Not as isolated “agents”, but as parts of a system working toward an outcome.

This approach felt very different.

  • The system evolved organically
  • Work was actually moving forward
  • Failures were visible and actionable

Not perfect — but finally usable.

What Actually Worked

  • Breaking work into simple, sequential steps
  • Letting agents handle defined roles within a workflow
  • Iterating based on real outputs instead of assumptions

What Still Breaks

  • State drifting across steps
  • Memory consistency over time
  • Reliability in longer workflows
  • Knowing when to stop vs continue

Most failures weren’t because the model lacked capability.

They happened because the system lacked clarity.

Intelligence is already there. Structure is the constraint.


The Reality: Early Days, Unclear Future

It’s easy to overhype this space.

The reality is more nuanced.

What’s genuinely exciting

  • Massive leverage, especially for people who think in systems
  • The ability to turn ideas into working workflows quickly
  • Early signs of autonomous processes replacing manual effort

What’s not solved

  • Fragility, systems break in ways that aren’t always obvious
  • Memory and persistence are still unreliable
  • Debugging is unclear, it’s hard to trace where things went wrong
  • Moving from “demo” → “production” is still a major gap

The problem isn’t what these systems can do. It’s whether you can rely on them to do it consistently.


The Real Shift: From Tools → Systems

The biggest mindset change for me:

Stop thinking about AI as a tool. Start thinking in systems.

Agentic AI isn’t:

  • “better ChatGPT”
  • “automation scripts”

Most people are still interacting with AI at the surface level.

It’s:

  • workflows
  • state machines
  • feedback loops
  • orchestrated execution

The moment you treat it like a system, everything changes:

  • how you structure problems
  • how you debug failures
  • how you design outcomes

The difference isn’t the model. It’s whether you’re building prompts, or designing systems.


Where This Goes (My Current View)

In the short term, most people will struggle to get consistent results.

Not because the models aren’t capable, but because the systems around them aren’t.

The advantage won’t come from better prompts.

It will come from being able to structure workflows, manage state, and design systems that actually hold together.

Memory and orchestration are already emerging as the core problems.

This is where things start to shift from experimentation → infrastructure.

“Agent infrastructure” won’t be a concept, it will be a category.

Longer term, the direction feels clear:

  • persistent digital workers
  • domain-specific agent systems
  • human + agent collaboration as a default way of working

Right now feels similar to early cloud or early mobile.

Messy. Fragmented. Unclear.

But directionally obvious.

The future isn’t AI answering questions.
It’s AI participating in work.

We’re not figuring out how to use AI. We’re figuring out how to work with it.


What I’m Focusing On Next

  • Building structured memory systems (not chat-based)
  • Improving state tracking across agent workflows
  • Turning OpenClaw into a repeatable system, not just an experiment