Reflections

Four Hours, Fourteen Architectural Mistakes

The most important skill in the AI era is not coding

By Alex P. Wang | May 29, 2026 | Tags: AI-Assisted Development, Future of Engineering, Software Architecture, Reflections

A week ago I wrote about the team I never had before - Codex and Claude Code, working together, taking a project from design to deployment. I promised a follow-up on how to work with AI engineers well. This is that follow-up. And the lesson is not what I expected.

For the past four weeks I have worked with AI every day, as part of my engineering team.

The results have been remarkable. I have very good bug finding skills, but AI finds bugs faster than I can. It refactors large codebases in minutes. It sets up development environments, configures cloud infrastructure, and deploys applications to AWS. It writes code at a speed that would have been hard to imagine one year ago.

I ran an experiment one day, running three AI instances at once - one Codex instance and two Claude Code sessions. One built a tool to curate my photo collection. Another built a tool to trace and analyze backend logs. A third made enhancements to an existing application. The work would have taken me days, if not weeks, alone. That was probably the most productive day of my life.

The productivity gains are real. They are not marketing claims. They are happening right now.

Which is exactly why the next lesson surprised me. As AI became more capable, I expected my job to get easier. Instead, a new responsibility emerged. The biggest challenge was no longer writing code. The biggest challenge was learning how to challenge the AI.

The more capable the AI became, the more convincing its mistakes became. And those mistakes often did not look like mistakes at all.

The Day I Asked AI to Log Its Own Mistakes

I have been working with the latest advanced models - GPT-5.5, Opus 4.7, and Opus 4.8. This design session on improving a key process inside my knowledge agent app ran on Claude Opus 4.8, the most capable model I had access to. It lasted four hours. It took that long because most of the time went into proving to the AI that its proposed design was wrong or imperfect. I asked the AI to keep a log of its own design mistakes as we discovered them. This was not a bug list. No code had been written yet. These were design and reasoning mistakes found during the discussion itself.

By the end of the session, the AI had documented fourteen significant architectural mistakes. Not syntax errors. Not programming errors. Architectural mistakes that an experienced software architect would be unlikely to make.

What the fourteen looked like

Decision points placed at the wrong stage of a process
Responsibilities of different parts of the system confused
Conclusions drawn before the code was verified
Costs invented that did not exist
One proposal that eventually created a dead loop
A clean six-step process ballooning to fifteen steps, padded with steps that carried no real meaning

What surprised me was not that the AI made mistakes. It was how reasonable those mistakes sounded before they were challenged. Every one of the fourteen appeared plausible. Not one was caught because the answer looked obviously wrong.

All fourteen were discovered only because the design was challenged, questioned, debated, and traced back to its assumptions. Several required multiple rounds before the flaw became visible. In all fourteen cases, the AI withdrew its own recommendation after being forced to explain exactly how it would work.

The problem is not that AI lacks ideas. The problem is that AI can generate convincing explanations for ideas it has not fully reasoned through.

The Dead Loop That Looked Like a Good Idea

One incident stands out. While discussing a possible improvement, the AI proposed a new approach. The idea sounded useful. It explained the benefits. It described where the change would fit. Then it asked whether I wanted it to start implementing.

At first glance the proposal looked completely reasonable. But instead of approving it, I asked a simple question.

How exactly does this work underneath?

We walked through the execution path step by step. The deeper we went, the more uncomfortable the design became. Eventually it became clear the proposal would create a circular dependency. Under certain conditions, the process would trigger itself repeatedly and never reach a terminal state. A dead loop.

Once the execution path was traced completely, the AI agreed and withdrew the recommendation. The lesson was simple. Many bad designs survive a high-level discussion. Very few survive a detailed walkthrough of how they actually execute.

A plausible solution is not the same thing as a valid solution.

Why I Was Able to Catch These Problems

The interesting question is not why the AI made mistakes. It is why I noticed them. The answer has little to do with AI. It has everything to do with years of software design experience.

When the AI proposed a branch, I asked how it knew that information at this point. When it proposed a new mechanism, I asked what happens next. When it suggested a warm-up strategy, I asked it to show me where that happens in the code. When it proposed a process improvement, I asked whether the path could terminate.

These are not AI skills. These are engineering skills. Years of building systems teach you where problems hide - hidden assumptions, circular dependencies, misplaced responsibilities, architectural drift, scalability traps, maintenance nightmares.

The AI did not know where those problems were. It only knew how to generate a solution.

My Changing View of AI Coding Tools

In my last article I said Codex was strong everywhere except data modeling, where it gave textbook answers with no real-world judgment. I accepted many of its architectural suggestions and corrected its schemas. Over time, I realized that was the wrong conclusion.

The issue is not that AI is weak at data modeling. The issue is that AI can sound persuasive at every level - database design, architecture, caching, APIs, process flow, data structures, system boundaries. Every one of these areas can contain hidden assumptions that only become visible under scrutiny.

You must debate with AI on every important design decision.

Not because AI is always wrong. Because the consequences of being wrong are often delayed. Without that debate, you can build a system that appears to work perfectly today while quietly accumulating flaws that surface months later as bugs, maintenance problems, scalability issues, or architectural dead ends.

This is a limitation of AI at its current stage. The models are capable, but they do not yet know which of their own ideas will hold up. I expect future advances to narrow that gap. Until they do, the debate is not optional.

The Real Risk for the Next Generation

This leads to a question that concerns me. Many future engineers will grow up with AI writing most of the code. They may never spend years debugging memory leaks, race conditions, recursive loops, distributed systems failures, or production outages.

That sounds wonderful. But it may also mean they never develop the instinct to stop and ask: wait a minute, how does this actually work?

For decades we trained engineers to write code. The next generation may need to learn something different. How to interrogate designs. How to interrogate assumptions. How to interrogate AI.

The future skill may not be "How do I build this?" but "How do I verify this?"

The Human Role Is Changing

Many people imagine the future development process as a straight line. Human asks. AI answers. Human approves. AI implements. That is a dangerous workflow.

The dangerous workflow

Human asks → AI answers → Human approves → AI implements

The better workflow

Human asks → AI proposes → Human challenges → AI revises → Human challenges again → AI revises again → Human validates → AI implements

The human is not there to type faster. The human is there to protect the architecture. The AI can generate solutions. The human must verify assumptions. The AI can suggest improvements. The human must determine whether they actually work. The AI can write code. The human must understand the system.

Until AI can reason about systems as well as an experienced architect, this division holds. We are not there yet.

Becoming a Master of the AI

I do not believe AI will eliminate the need for engineers. But I do believe it will change what makes a great engineer. The best engineers of the AI era may not be the people who write the most code. They may be the people who know how to ask the questions that force the AI to reveal whether it truly understands the system.

The questions worth more than the code

How is this decision made?
Where does this information come from?
What assumptions are hidden here?
Can this path terminate?
What happens under load?
What happens six months from now?
What breaks if the requirements change?

The goal is not to become an assistant to the AI. The goal is to become a master of the AI. Because the greatest risk of AI-assisted development is not that AI writes bad code. It is that we stop asking whether the design makes sense.

Final Thoughts

Last week I wrote that the team I never had finally showed up, and that I did not think they were leaving. That is still true. But a week of working inside that power taught me the other half of the lesson. The team is only as good as the architect leading it.

The future belongs to engineers who can work effectively with AI. But working effectively with AI does not mean accepting its answers. It means challenging them. The AI can generate solutions. The human must provide judgment. And as AI becomes more capable, that judgment may become the most valuable engineering skill of all.

When the mistakes stop looking like mistakes, asking "how does this actually work?" becomes the most important thing an engineer can do.

Back to Ideas & Notes