Diving into Agentic AI — One Month In

2026-04-12

Overview: A Shift That Feels Bigger Than It Looks

Since the beginning of the year, agentic AI hasn't just improved, it's changed the direction of AI.

Not in capability, but in how it's being applied.

Jensen Huang talking about AI systems as active participants, not passive tools
The rise of frameworks like OpenClaw and examples like FelixcraftAI
Andrej Karpathy treating memory as a wiki of knowledge for agents
Claude moving toward structured workflows and tool-driven execution

Individually, these look incremental. Together, they point to a shift:

We are moving from “AI that responds” → to “AI that acts.”

Timeline: What Actually Changed

Looking back over the past few months, these aren’t just updates, they signal a shift in how AI systems are being built.

1. OpenClaw & Local Agent Runtimes

A move toward local, controllable agent systems
Prompting becoming less central → orchestration becoming core
Early foundations of an “agent OS”

2. FelixcraftAI & Tool-Oriented Agents

Tight coupling of reasoning, tools, and execution
Chat interfaces fading → workflows taking over
Agents performing as a business model

3. Claude’s Agent Direction

Tool use becoming the default interaction model
Structured outputs enabling more control and predictability

4. Karpathy’s Memory Direction

Memory shifting from transient chat → persistent knowledge systems
The emergence of a real “second brain” layer for agents

My Experience: Where It Breaks (and Where It Clicks)

After working hands-on with this space for a month, one thing is clear:

The limitation is no longer intelligence. It’s structure.

What works surprisingly well:

Breaking problems into steps
Delegating simple workflows to agents
Rapid iteration with tool use

What breaks constantly:

State management
Memory consistency
Multi-step reliability
Knowing when to stop

Most failures aren’t because the model is “wrong”.

They happen because:

the system doesn’t know its state
the context isn’t clear
or the loop isn’t controlled

Version 1: Over-Controlled and Stuck

My first version was heavily controlled and over-engineered.

I tried to build a fully automated system around OpenClaw using:

Markdown files as the core state and control layer
Custom Python functions to manage workflows
Telegram commands as the interface

Structured into layers:

Layer 1 — Automated command handling
Layer 2 — Local LLM (Ollama) for basic tasks
Layer 3 — ChatGPT-4o for higher-quality outputs

On paper, it made sense:

optimise cost
control behaviour
create a clean system

In practice, it failed.

The system got stuck in loops
Tasks never progressed meaningfully
The stronger models were rarely used effectively
Everything was “working” — but nothing useful was happening

I had built a system that was technically correct, but practically useless.

An important caveat: I never actually used the system for real work.

It was built on assumptions.

Because it wasn’t tested in real workflows:

it never exposed where it was breaking
it never “learned” it wasn’t working
and I had no clear signal on what to fix

The system wasn’t just flawed, it was unvalidated.

I had built a system that controlled too much, too early.

What I Actually Learned From It

Even though it didn’t produce a usable system, it was valuable.

It forced me to go deeper than surface-level “agent demos”:

Refreshing Linux fundamentals (permissions, processes, environment setup)
Working with Python in a system context, not just scripts
Managing SSH, networking, and remote execution
Understanding how OpenClaw actually runs under the hood

It made one thing very clear:

OpenClaw isn’t magic: it’s a system coordinating tools, models, and execution.

A lot of what’s being shown online skips this layer entirely.

What Was Actually Breaking

It wasn’t the models.

It was the system:

State wasn’t visible — everything was implied through files
Control was too rigid — no flexibility once flows started
Loops weren’t bounded — tasks continued without clear outcomes

So even when parts worked, the system didn’t produce anything meaningful.

Version 2: Letting the System Work

This time, instead of forcing structure upfront, I let the agent lead more of the system development.

That came with trade-offs.

At the time, OpenAI allowed Codex access via OAuth — I used that heavily early on and hit limits quickly.
Switching to GPT-5.4 burned through my restricted monthly allowance just as fast.

Ironically, that constraint was useful.

A good system doesn’t just enable output, it controls cost and usage.

Hitting those limits forced a pause.

And that pause forced clarity.

Instead of trying to design the full system, I focused on building something real:

A workflow from Ideation → Analysis → Decision Making
Agents handling specific roles within that flow
Structure emerging from use, not assumptions

I started experimenting with sub-agents:

Software Architect
Designer
Researcher

Not as isolated “agents”, but as parts of a system working toward an outcome.

This approach felt very different.

The system evolved organically
Work was actually moving forward
Failures were visible and actionable

Not perfect — but finally usable.

What Actually Worked

Breaking work into simple, sequential steps
Letting agents handle defined roles within a workflow
Iterating based on real outputs instead of assumptions

What Still Breaks

State drifting across steps
Memory consistency over time
Reliability in longer workflows
Knowing when to stop vs continue

Most failures weren’t because the model lacked capability.

They happened because the system lacked clarity.

Intelligence is already there. Structure is the constraint.

The Reality: Early Days, Unclear Future

It’s easy to overhype this space.

The reality is more nuanced.

What’s genuinely exciting

Massive leverage, especially for people who think in systems
The ability to turn ideas into working workflows quickly
Early signs of autonomous processes replacing manual effort

What’s not solved

Fragility, systems break in ways that aren’t always obvious
Memory and persistence are still unreliable
Debugging is unclear, it’s hard to trace where things went wrong
Moving from “demo” → “production” is still a major gap

The problem isn’t what these systems can do. It’s whether you can rely on them to do it consistently.

The Real Shift: From Tools → Systems

The biggest mindset change for me:

Stop thinking about AI as a tool. Start thinking in systems.

Agentic AI isn’t:

“better ChatGPT”
“automation scripts”

Most people are still interacting with AI at the surface level.

It’s:

workflows
state machines
feedback loops
orchestrated execution

The moment you treat it like a system, everything changes:

how you structure problems
how you debug failures
how you design outcomes

The difference isn’t the model. It’s whether you’re building prompts, or designing systems.

Where This Goes (My Current View)

In the short term, most people will struggle to get consistent results.

Not because the models aren’t capable, but because the systems around them aren’t.

The advantage won’t come from better prompts.

It will come from being able to structure workflows, manage state, and design systems that actually hold together.

Memory and orchestration are already emerging as the core problems.

This is where things start to shift from experimentation → infrastructure.

“Agent infrastructure” won’t be a concept, it will be a category.

Longer term, the direction feels clear:

persistent digital workers
domain-specific agent systems
human + agent collaboration as a default way of working

Right now feels similar to early cloud or early mobile.

Messy. Fragmented. Unclear.

But directionally obvious.

The future isn’t AI answering questions.
It’s AI participating in work.

We’re not figuring out how to use AI. We’re figuring out how to work with it.

What I’m Focusing On Next

Building structured memory systems (not chat-based)
Improving state tracking across agent workflows
Turning OpenClaw into a repeatable system, not just an experiment