Most AI Agents Are Just Fancy Prompt Wrappers. I Built One That Actually Understands Its Own Output

Why the gap between AI strategy and AI code is the most expensive problem in enterprise tech — and why I refuse to let it persist in my own work.

There's no shortage of people who can draw you an architecture diagram on a whiteboard. I've been one of them for thirty years, and I still do — strategy matters. But somewhere around the fifth time I heard a "senior AI consultant" confess they'd never actually wired up an LLM to do anything useful, I realized the real differentiator isn't choosing between strategy and implementation. It's doing both.

So alongside the roadmaps and architecture reviews, I built something. A semantic AI agent — in TypeScript, end to end — that reads a domain-specific language, reasons about its structure, and generates validated output. No Python. No Jupyter notebooks. No "just call the OpenAI API and hope for the best." A real, typed, testable system with a language server, a VS Code extension, and an AI backbone that actually understands the grammar it's working with.

Here's what the journey looked like, and what it taught me about where AI development is actually heading.

The Problem With Most AI Integrations

Most AI-powered tools today follow the same pattern: take user input, stuff it into a prompt, fire it at an LLM, and pray the response is parseable. It works — until it doesn't. And when it doesn't, you get hallucinated JSON, broken schemas, and a support ticket from someone who trusted your "intelligent" system.

The root issue is structural ignorance. The AI doesn't know the shape of what it's producing. It's pattern-matching against training data, not reasoning against a formal specification.

That's the gap I set out to close.

The Stack: TypeScript All the Way Down

Here's the architecture in plain terms, followed by what each layer actually does.

1. The Grammar (Langium)

Langium is a TypeScript-native framework for building language servers — the same technology that powers autocompletion and error-checking in your code editor. I used it to define a domain-specific language (DSL) that describes the exact structure my AI agent needs to produce.

A simplified Langium grammar rule - defines what 'valid output' looks like.

This isn't decoration. This grammar generates a full language server — a background process that validates, autocompletes, and navigates the language in real time. When the AI produces output, the language server tells me instantly whether it's structurally valid, and exactly where it went wrong if it isn't.

For the executive reading this: Think of it as giving the AI a rulebook it can't ignore, with an automated referee checking every move.

2. The AI Layer (LLM + Structured Prompting)

The agent uses a large language model, but it doesn't just throw text at it. The prompt includes the grammar specification itself, along with typed examples and validation constraints. When the model responds, the output is parsed through the same Langium parser that powers the editor.

The key insight: the language server becomes the AI's type checker. If the model hallucinates an invalid structure, the parser catches it and feeds the specific errors back for self-correction. No regex hacks. No "close enough." Either the output conforms to the grammar, or the agent tries again with precise diagnostic feedback.

3. The Editor Experience (VS Code Extension)

The whole system surfaces in VS Code through an extension built on Langium's LSP (Language Server Protocol) support. Users get syntax highlighting, autocompletion, real-time error detection, and AI-assisted generation — all in one cohesive experience.

Registering an AI command in the VS Code extension.

For the executive reading this: Your team gets an AI assistant that lives inside their existing editor, speaks the language of your domain, and never produces output that violates your business rules.

What This Approach Gets You

Let me translate the technical architecture into business outcomes, because that's what actually matters.

Deterministic validation, not probabilistic hope. Every AI-generated artifact is parsed against a formal grammar. You know — not guess, know — whether the output is structurally valid before it reaches production.

Domain specificity without fine-tuning. You don't need to train a custom model. The grammar is the domain knowledge. Change the grammar, and the agent immediately adapts to new structures. No retraining. No six-figure ML infrastructure bills.

Developer experience that doesn't require a PhD. The VS Code extension means your team works with familiar tools. Autocompletion and error messages come from the language server, not the LLM — so they're reliable, instant, and deterministic.

Composable intelligence. Because the grammar, parser, and AI layer are separate concerns, you can swap any of them independently. Upgrade the LLM? The grammar still validates. Change the domain? The AI pipeline still works. This is engineering, not a science experiment.

The Honest Part

I'll be direct about what's hard.

Grammar design is a skill. It's not something you pick up in a weekend. I've been working with formal languages since my LISP days — building an expert system for legal norm analysis using deontic logic — and Langium still demands careful thought about ambiguity, precedence, and cross-references.

The self-correction loop needs guardrails. You can't let the agent retry indefinitely. I cap retries and fall back to human review when the model can't produce valid output after a few attempts. In practice, with a well-designed grammar and good prompt engineering, the first-attempt success rate is high — but "high" isn't "always."

TypeScript isn't the fashionable choice for AI work. The Python ecosystem has more ML libraries, more tutorials, more Stack Overflow answers. But TypeScript gives you something Python doesn't: a type system that actually works at scale, seamless full-stack development from the language server to the VS Code extension to the web frontend, and an ecosystem (Node.js, Langium, GLSP) purpose-built for the kind of tooling infrastructure that makes AI agents useful rather than just impressive.

Where This Is Going

The pattern I've described — grammar-validated AI generation with language server infrastructure — isn't just a cool demo. It's the foundation for what I'm calling Semantic AI Agents: AI systems that don't just generate text, but reason about structured domains with the same rigor as a compiler.

Imagine applying this to compliance rules. To API contracts. To infrastructure-as-code. To any domain where "close enough" isn't good enough and the cost of structural errors is measured in real money or real risk.

That's the work I'm doing now. Designing the strategy and writing the code. Because in 2026, you shouldn't have to choose.