The Modular Monolith That Learned to Outsource

c10r.io (kompot.ai) is a CRM / business-systems platform. Contacts, opportunities, jobs, expenses, agreements, telephony, inboxes, scheduling, AI assistants — all of it lives in a single Next.js process backed by MongoDB. One repository, one deploy artifact, one set of types, one auth boundary.

Then over the last two months we shipped three microservices.

This is the story of how we decided which work earns a separate process and which doesn't — and the discipline we use to keep "another microservice" from becoming the default answer.

Why a monolith in the first place

The shape of the product is a lot of things that touch each other. An expense links to an opportunity, which links to a contact, which has a loyalty profile, which fires telephony events, which… you get it. In a CRM, almost everything is one query away from almost everything else.

The cheapest way to write code where everything talks to everything is to put it all in the same process:

One type system. A Contact in the expenses module is the same Contact in the opportunity module — not because we agreed on a schema, but because TypeScript will refuse to compile if it isn't.
One database connection pool per workspace. Multi-tenancy is a query filter, not an HTTP boundary.
One auth check. The same requirePermission('contacts:read') works for a UI page, an API route, an MCP tool, and a background job — because they all share the same session middleware.
One deploy. git push → stage → production. No "did the contacts service get the new field?" choreography.

The cost of a monolith — slow builds, big blast radius on a bad deploy, the temptation to couple unrelated modules — is real, but it's all engineering hygiene cost. You can pay it down with tests, modular boundaries, and code review. The cost of distributed systems is complexity cost, and you cannot pay it down. You can only pay it.

So our default answer is: it goes in the monolith. The bar to leave is high.

What clears the bar

Looking back, three things have moved a piece of work out of the main process. None of them are "this module is big" or "this team owns it." Size is not a reason. Org chart is not a reason.

1. Language / runtime mismatch

The receipt parser does OCR (Tesseract + vision-LLM), structured extraction, and embedding generation. The Python ecosystem for this is twenty years old. Pillow, OpenCV, PyMuPDF, the entire HuggingFace stack — they're all in Python, and they all assume Python.

Could we rewrite OCR pipelines in TypeScript? Technically. Should we, given the goal is "scan a receipt and fill an expense form"? Absolutely not. The right call here was to keep the OCR engine where its libraries live, and treat it as a remote function the monolith calls.

2. Hardware or resource isolation

Our models server runs LLMs on a GPU box. The monolith runs on commodity Linux VMs with no GPU. Co-locating them would mean either (a) renting GPU instances for the whole CRM or (b) hairpinning GPU traffic through the CRM process — both wasteful. Drawing a network boundary at the GPU was the natural shape.

3. Third-party-managed code we don't want to vendor

The receipt parser started as an existing repo with its own CI, its own contributor, its own iteration cadence. Pulling that into our monorepo would have meant either taking over its development entirely or accepting cross-repo coupling. A microservice contract — "you expose POST /receipts/process, we'll call it" — let both repos move at their own speed.

What did NOT clear the bar

Equally useful: the things people have suggested extracting, and our reasons for not doing it.

Telephony. It's complex, it has its own provider integrations (Twilio, VAPI), and it generates a lot of events. But: it shares the same Contact/Opportunity/User graph as everything else. Extracting it would mean either duplicating the graph across processes or building an internal sync API — and at that point we've added a problem larger than the one we set out to solve.
AI assistants. They run conversations, hold tool state, talk to ten different LLMs. Surely they want their own process? No — they're tightly coupled to the workspace's permissions, custom fields, and tool catalog. The tools they call are MCP handlers that live inside the monolith. Spinning them out would mean re-implementing half the monolith's auth + data layer on the other side of an HTTP hop.
MCP server. Same reasoning — the tools are the monolith. The MCP transport layer is a thin shell over the same handlers the UI uses.

The pattern: if extracting would force you to rebuild a graph of relationships on both sides of the wire, you've found something that belongs in one process.

The boundary discipline

Once you accept that some microservices are real, the next failure mode is letting each one invent its own conventions. We didn't want to end up with:

"The parser uses Bearer tokens, but the models server uses mTLS, but the next one will use API keys in a header."
"The parser exposes /health, but the models server uses /status, but the next one will dump JSON at /."
"The parser logs to its own file, the models server logs to stdout, the next one ships logs to S3."

So we built a tiny in-house protocol — not a framework, not a service mesh. Just a single TypeScript file that says: here are the microservices, here are their endpoints, here are their auth tokens, here's what they're allowed to do.

// config/microservices/services.ts (simplified)
export const services = [
  {
    slug: 'parser',
    displayName: 'Receipt Parser',
    baseUrlEnvVar: 'PARSER_SERVICE_URL',
    tokenEnvVar: 'MS_PARSER_TOKEN',
    aiTokenEnvVar: 'MS_PARSER_AI_TOKEN',
    health: { intervalSeconds: 30, degradedAfterSeconds: 90, downAfterSeconds: 300 },
    capabilities: { logs: { pull: true } },
    aiTasks: [
      { id: 'extraction', shape: 'chat',      label: 'Receipt extraction' },
      { id: 'embedding',  shape: 'embedding', label: 'Receipt embedding' },
      { id: 'ocr-vision', shape: 'chat',      label: 'Receipt OCR' },
    ],
    endpoints: [{ name: 'process', method: 'POST', path: '/receipts/process' }],
  },
] as const;

That single declaration drives:

A typed client: getMicroserviceClient('parser').call('process', { url, wsid }). Misspell the slug or the endpoint name and TypeScript yells.
A health probe loop that polls every service on the cadence each one declares.
An admin UI that lists everything, shows live health, and renders logs if the service supports logs.pull.
A typed React hook for feature components: useMicroserviceHealth('parser') returns live status, gates the "Scan Receipt" button, shows a banner when the service is degraded.
An AI Task Routing page that auto-groups aiTasks by service — adding a new MS that needs AI is one block in this file, zero edits to the admin page.

Adding a new microservice is exactly: write the service, add a block to services.ts, set two env vars. No new auth code. No new health endpoint. No new UI panel.

What we gained

The parser MS shipped in about a week of integration work on the c10r.io side, plus a similar amount of time on the parser repo itself. The expense form has a Scan Receipt button that takes a photo, sends it to a Python process running on a different host, gets back structured fields, and auto-fills the form. None of that code lives in the monolith. The monolith just knows how to ask.

The Scan Receipt button knows when the parser is down (the catalog tells it). The admin page shows live logs from the parser (the catalog tells it). The parser's three AI calls (OCR, extraction, embedding) route through c10r.io's centralized AI gateway (the catalog tells it). When we add the second microservice that needs the same things, the answer to every "how do we…" question is the same way the parser does it.

The bigger lesson

Modular monolith versus microservices is not a religious debate. It's a question of where your graph is.

If most of your nouns reference most of your other nouns — and in business systems they do — keep them in one process. You'll spend less time on plumbing and more time on the actual product.

If a piece of work has its own graph, or its own hardware needs, or its own lifecycle — let it live on the other side of an HTTP boundary. Just make sure you've built the boundary once, in a way every future service can reuse, before the second one shows up.

The monolith is not a stage you graduate out of. It is the destination for almost everything. Microservices are the exception you pay for individually, each one earning its place.