Building AI Into Real Products (Not Just Demos)

What it actually looks like to build AI-powered enterprise software. LangChain, multi-agent systems, document processing, and a lot of plumbing.

I have spent the better part of two years building AI into actual products. Not chatbot wrappers. Not “we added a button that calls GPT.” Real, production AI systems that enterprise customers depend on daily. Here is what that actually looks like.

What I Built at Hapax AI

Hapax AI is an enterprise platform serving banking and financial services clients. My job was to build the full stack, frontend and backend, that made the AI actually useful to people who are not engineers.

The core of the platform is a conversational AI system. Not a simple chat window. A multi-agent system where different AI models handle different tasks, pass context between each other, and produce structured outputs that feed into workflows. I built both the legacy chat system and the next-generation agent chat, including file viewing across CSV, PDF, images, and various document types. When a user uploads a 200-page PDF and asks the AI a question about page 47, the system needs to actually work. I made it work.

The Stack Behind It

The backend runs on FastAPI with Python, talking to both Anthropic Claude and OpenAI models through LangChain and LangGraph. We used semantic routing to direct queries to the right agent, Neo4j as a graph database for relationship-heavy enterprise data, and a document processing pipeline built on PyMuPDF and Docling for ingesting everything from scanned PDFs to Word docs.

On the frontend, I built the entire chat UI in Next.js and TypeScript. Optimized rendering so responses stream in without lag. Fixed a React error in the markdown renderer that was crashing on certain AI outputs. Built rich table rendering for structured workflow outputs with nested JSON column support. Added CSV pagination for large datasets. These are the unsexy problems that make AI products actually usable.

Agent Workflows and Orchestration

The workflow engine was one of the bigger features I shipped. Users could define multi-step AI workflows with triggers, schema validation, and structured outputs. I built the output editor with an accordion UI, a schema builder, and a step selector dropdown. When workflows produced data, I implemented rich table rendering that could handle nested JSON and dot-notation column expansion. This is the kind of thing that sounds boring until you realize it is the difference between an AI demo and an AI product.

Web Scraping and Data Extraction

One of the features that needed the most iteration was web scraping for the AI to reference. The initial implementation was pulling in garbage, ads, navigation chrome, cookie banners, all mixed into the content the AI was trying to reason about. I rewrote the scraping pipeline with ad stripping, template detection, and cache TTL improvements. The accuracy improvement was significant enough that it changed how reliable the AI answers were for customers.

What I Actually Use

  • LLM Providers: Anthropic Claude, OpenAI GPT models
  • Orchestration: LangChain, LangGraph, semantic routing
  • Vector Search: pgvector for semantic similarity
  • Document Processing: PyMuPDF, Docling, python-docx
  • Backend: FastAPI, Python async/await, Pydantic
  • Frontend: Next.js, TypeScript, streaming responses
  • Database: Neo4j for graph data, PostgreSQL for relational
  • Monitoring: Datadog for observability

The Honest Part

Most of building AI-powered applications is not prompt engineering. It is plumbing. It is making sure the document ingestion pipeline does not choke on a 50MB Excel file. It is handling the edge case where the AI returns malformed JSON and your frontend needs to not crash. It is optimizing chat rendering so that a 3,000-token response streams in smoothly instead of freezing the browser. It is building permissions so that User A cannot see User B’s AI-generated reports.

I shipped 166 backend commits and 48 merged PRs on the Hapax AI backend alone. The AI is the exciting part. The 10,000+ lines of Python that make it reliable is the job.

More about my work | The full stack I use