How do you architect large-scale Node.js for high concurrency?
Node.js Developer
answer
A production Node.js architecture treats the event loop as a protected asset. Favor I/O-bound work on the main thread, push CPU work to Workers, and isolate slow dependencies behind queues. Choose a modular monolith for speed of delivery, or split into microservices for independent scaling and failure isolation. Use clustering, stateless processes, backpressure, and circuit breakers; instrument latency to catch blocking before users do.
Long Answer
A production Node.js platform must keep the event-loop responsive while scaling across cores and hosts. Pick structure by cost and risk: a modular monolith ships fastest; microservices buy independent scaling and failure isolation at the price of ops and consistency. This blueprint sets domain seams, offloads CPU, and enforces backpressure for stable p95 latency.
1) Structure
Start with a well-factored monolith (domain modules, internal packages). Split only when traffic shape, risk, data needs, or team autonomy demand it. Extract along real seams (payments, search, notifications) with versioned contracts.
2) Protect the event loop
The loop handles all connections; any CPU spike blocks everyone. Push CPU-bound work—image transforms, crypto, large JSON validation—into Worker Threads or separate worker services via queues. Prefer streaming and incremental parsing; avoid synchronous work on hot paths; yield with setImmediate() during long ticks.
3) Concurrency topology
Run one Node process per core (cluster or PM2). Keep processes stateless; externalize sessions and caches (Redis). Terminate TLS and gate concurrency at a reverse proxy with timeouts.
4) Data, backpressure, queues
Use connection pools and per-dependency timeouts plus circuit breakers. Adopt streams and async iterators so producers respect consumers. Buffer bursts through durable queues (Kafka, RabbitMQ) and make retries idempotent; decouple ingestion from processing to smooth spikes.
5) API and transport
Keep payloads lean; compress. Batch chatty interactions or use GraphQL with persisted queries and dataloaders to collapse N+1. Paginate with cursors; cap response size. For real-time, use WebSocket or SSE with heartbeats, backoff, and graceful shedding.
6) Observability
Track p50/p95/p99 per route, event-loop lag, GC, and saturation. Collect RED and USE metrics. Profile with clinic.js or 0x to find sync hotspots. Alert on loop delay and heap growth; fail builds when budgets regress.
7) Storage and caching
Choose stores per access pattern (OLTP, documents, search, analytics). Add read replicas and read-through caches; set TTLs by rules. Use the outbox pattern for write-message consistency. For multi-region, prefer leader/follower with per-region caches; avoid global transactions.
8) Delivery and ops
Ship minimal containers with frozen deps. Use blue-green or canary. Warm pools on boot. Load-test and run chaos drills; publish SLOs.
9) Security
Validate inputs with fast parsers; cap payloads and headers. Sandbox untrusted code in separate processes. Rate-limit per token/IP; verify OAuth/JWT with cached JWKs. Rotate secrets.
10) When to split
Split when compute pattern, scaling curve, or failure domain truly differs or when teams must release independently. Avoid gratuitous splitting; each hop adds latency and consistency work. Keep a platform team to steward contracts.
Protect the event loop, offload CPU, manage backpressure, and instrument everything: with these guardrails, Node.js sustains high concurrency as a modular monolith or as focused microservices.
Table
Common Mistakes
Blocking the event loop with CPU-heavy JSON parsing, crypto, or template rendering on hot paths. Treating clustering as a fix for poor single-process hygiene. Letting dependency calls hang without timeouts or circuit breakers, causing cascades. Opening unbounded DB connections and streams, leading to memory bloat. Chatty APIs that issue N+1 queries or return unpaginated payloads. Building microservices too early, adding hop latency and operational drag without a clear need. Sharing state in memory (sessions, caches) across clustered workers. Skipping idempotency on retries, producing duplicate writes. Ignoring event-loop lag and p95/p99 in monitoring; staring only at averages. Deploying huge images and cold boots without warming pools. Skipping input limits and safe parsers, enabling request smuggling or DoS. Treating queues as infinite and never back-pressuring producers. Forgetting locale and date parsing cost on SSR, causing long ticks under peak.
Sample Answers (Junior / Mid / Senior)
Junior:
I would start with a modular monolith, one Node process per core, and move CPU work to Worker Threads. I would keep processes stateless, store sessions in Redis, and add timeouts on every outbound call. I would use streams for file uploads and pagination to keep responses small. Monitoring would track p95 latency and event-loop lag.
Mid:
I design by domain and split only where scaling differs, such as image processing or search. I put a reverse proxy in front with sane limits, add circuit breakers per dependency, and buffer spikes through a queue with idempotency keys. I use GraphQL persisted queries or batching to reduce round trips and profile with clinic.js to remove synchronous hotspots.
Senior:
I operate guardrails: budgets on p95, loop delay alerts, and CI gates. I choose monolith or microservices by failure domain and scaling curve, and I prove it with load tests. I standardize contracts, tracing, and outbox consistency, and I run canary releases with feature flags and rollback.
Evaluation Criteria
Strong answers protect the event loop, distinguish I/O vs CPU work, and explain how to offload heavy tasks to Workers or separate services behind queues. They propose one process per core, stateless workers, reverse proxy limits, and dependency timeouts with circuit breakers. They show backpressure literacy: streams, async iterators, bounded pools, and idempotent retries. They choose monolith vs microservices using clear criteria (scaling curve, failure domain, team autonomy) and call out the cost of extra hops. They include observability (p95/p99, loop lag, RED/USE, tracing) and performance budgets enforced in CI, plus storage choices by access pattern and outbox consistency. Red flags: clustering as a silver bullet, blocking JSON or crypto on the main thread, no pagination, no timeouts, infinite queues, or splitting services without contracts and ownership. Senior depth includes thresholds (loop delay >50 ms, max body size, pool sizes), load and chaos tests, and canary rollback. They mention cache/session externalization, multi-region reads, and SLOs tied to p95.
Preparation Tips
Build a small load-testable Node app with three modules (catalog, orders, media). Run one process per core and add Redis for sessions and caching. Create a Worker Thread for image transforms and a BullMQ queue for media jobs; demonstrate idempotent retries. Instrument p95/p99, event-loop lag, and RED metrics; add alerts for loop delay and heap growth. Profile with clinic.js and remove a synchronous hotspot. Add a reverse proxy with timeouts, header/body limits, and rate limiting. Implement streams for uploads and cursor pagination for APIs. Split one module into a microservice behind a queue and prove the benefit with a load test. Document read-replica use and a cache TTL matrix. Add CI checks that fail on budget regressions and missing timeouts. Add an outbox for orders; verify message-DB consistency and crash recovery. Write a runbook: scaling knobs, breaker defaults, pool sizes, failure playbooks.
Real-world Context
A marketplace kept a modular monolith but offloaded image transforms to Workers and a queue; p95 fell 35 percent and tail spikes disappeared during launches. A fintech extracted search into a microservice behind a read model and cache; query latency halved while checkout stayed in the monolith for transactional safety. A media site added circuit breakers and per-dependency timeouts; a flaky payment gateway no longer cascaded failures across the stack. A SaaS vendor enforced event-loop lag alerts, profiling in CI, and a budget gate on p95; blocking JSON parse issues were fixed before release. A logistics platform adopted streams for uploads, cursor pagination, and idempotent retries; memory stayed flat under bursty traffic. Across all cases, the wins came from protecting the loop, backpressure, and explicit split criteria rather than defaulting to microservices. Another global retailer moved sessions and caches out of process, enabled keep-alive and HTTP/2, and added a canary rollout; throughput rose without regressions, and rollback was instant when a dependency slowed.
Key Takeaways
- Treat the Node.js event loop as a protected resource.
- Start modular; split into microservices only for clear scaling or failure-domain needs.
- Offload CPU to Workers or services; use streams and backpressure.
- Enforce timeouts, circuit breakers, and idempotent retries on dependencies.
- Measure p95/p99 and loop lag; gate changes with budgets and canaries.
Practice Exercise
Scenario:
You must design a high-concurrency Node.js service for checkout, media processing, and search. The system must serve p95 ≤ 150 ms at 2k RPS steady with 10k RPS bursts, without event-loop blocking. Leadership wants a clear call on monolith vs microservices and a rollback plan.
Tasks:
- Draft a decision rubric: start modular monolith; split only if compute pattern or failure domain differs (media, search).
- Topology: one process per core behind a reverse proxy with timeouts, header/body limits, and rate limiting.
- Event-loop safety: move CPU work (images, crypto, large validation) to Worker Threads and a BullMQ queue; prove no main-thread blocks with loop-lag metrics.
- Data: externalize sessions, add connection pools and per-dependency timeouts; add circuit breakers and retries with idempotency keys.
- Backpressure: use streams for uploads; paginate with cursors; cap response sizes.
- Observability: instrument p50/p95/p99, loop lag, GC, and RED/USE; set budgets and CI gates; add profiling.
- Delivery: minimal containers, blue-green + canary; warm pools on boot; feature-flag risky code.
- Storage: OLTP for orders, search index for queries, cache for hot reads; add outbox for order events.
- Load test: show p95 and latency distribution before/after offloading CPU and enabling queues.
- Decision: recommend what stays in the monolith and what splits, with risks, owners, and rollback triggers.
Deliverable:
A short architecture doc, dashboards, and a load-test report proving budgets and the split decision.

