How do you choose middleware: platform vs custom build?

Middleware Engineer

How do you secure middleware across multiple services?

How do you design observability middleware that scales?

How do you handle errors, retries, and idempotency in middleware?

How do you architect a reliable, low-latency middleware layer?

answer

I start with capabilities and constraints: integration patterns, latency and throughput, data models, security, compliance, and governance. If needs align with a platform’s strengths (connectors, API management, transformations, monitoring, SLAs), I prefer off-the-shelf middleware like MuleSoft, Kafka ecosystems, or WSO2 for faster time-to-value. If requirements are niche, ultra-low latency, or cost-sensitive at scale, I consider custom middleware. The decision hinges on total cost, lock-in, operability, and how quickly the team can deliver and evolve.

Long Answer

Choosing between off-the-shelf middleware (for example MuleSoft, Kafka platforms, WSO2) and custom middleware is a risk, cost, and speed decision framed by technical and organizational realities. I use a structured scorecard across ten dimensions, then pilot a thin slice to validate assumptions.

1) Functional fit and integration patterns

List the concrete patterns you need: request–reply APIs, event streaming, pub–sub, batch, orchestration, transformation, enrichment, idempotent retries, dead-letter queues, and workflow. Platforms excel when their native features match your required patterns. MuleSoft and WSO2 shine in API-led connectivity, mediation, transformations, and policy enforcement. Kafka ecosystems are unmatched for durable logs, high-throughput streams, and exactly-once processing with schema governance. If your flows combine complex, stateful sagas, edge processing, or unusual transports, custom middleware may keep the design simpler and avoid platform contortions.

2) Latency, throughput, and performance envelopes

Define concrete service-level objectives: p99 latency, sustained throughput, fan-out, and back-pressure behavior. Kafka-based stacks are ideal for high-throughput, horizontally scalable event pipelines. API gateways and ESBs can add variable hops. If you need microsecond-level latency or deterministic jitter (for trading, realtime bidding, or physics pipelines), custom, minimal middleware written to the metal with careful I/O and memory design often wins.

3) Time-to-value and delivery velocity

Catalog connector availability. If you must integrate with Salesforce, SAP, NetSuite, or legacy mainframes next quarter, off-the-shelf platforms with certified connectors and visual mapping can compress delivery time dramatically. Custom middleware will spend sprints building adapters, mappers, and error handling that platforms already provide. Conversely, if you control both ends or only need a few lightweight protocols, custom code can ship faster with fewer moving parts.

4) Operability, reliability, and SRE burden

Assess how each option handles observability, tracing, circuit breaking, retries, idempotency, and multi-region failover. Platforms typically include dashboards, audit logs, and governance out of the box. Kafka platforms provide strong durability semantics, consumer groups, and replay. Custom middleware must assemble these from libraries and infrastructure, which increases SRE ownership but can be tailored precisely to your run-books and golden signals.

5) Governance, security, and compliance

Regulated environments need consistent policy enforcement: authentication, authorization, secrets, token exchange, throttling, schema validation, and data residency. Platforms offer policy engines and role-based control that speed audits. Custom middleware provides maximal transparency and can be stripped to the minimum attack surface, but you must implement controls, key rotation, audit trails, and evidence generation yourself.

6) Data model and evolution

If your enterprise depends on canonical data models, versioned schemas, and change data capture, platforms with schema registries and transformation tooling reduce coupling. Kafka with a schema registry makes evolution predictable. Custom middleware gives full control over serialization, migrations, and edge cases, but demands discipline to avoid format drift and accidental tight coupling.

7) Cost model and total cost of ownership

Add up licenses, cloud consumption, support, and headcount. Off-the-shelf platforms shift cost into licenses and support but lower engineering lift and incident risk. Custom middleware avoids licenses and can be cheaper at scale if workloads are predictable and the team is experienced, yet it incurs higher build and maintenance costs plus on-call overhead. Model three years of TCO, not just year one.

8) Vendor lock-in and portability

Platforms can lock you into proprietary connectors, policies, or message formats. If multi-cloud or on-prem portability is strategic, prefer open standards, open source, or build with portable components (for example Kafka API compatibility, CNCF projects, or plain HTTP). Custom middleware can be cloud-agnostic by design, though it requires discipline to avoid cloud-specific conveniences.

9) Team skills and hiring market

Choose the path your team can support at 02:00. If you have strong Java or Node.js engineers comfortable with distributed systems, custom solutions are viable. If your organization benefits from admin-friendly tooling, low-code mappings, and centralized governance, platforms reduce the skills barrier and spread ownership beyond a few experts.

10) Roadmap and ecosystem longevity

Evaluate vendor roadmaps, community health, release cadence, and deprecation history. Prefer platforms with active ecosystems, long-term support, and clear migration paths. For custom middleware, commit to internal roadmaps, documentation, and upgrade budgets to avoid bit-rot.

Decision approach: build a weighted scorecard across these dimensions; run a two-week spike integrating one real system of record and one system of engagement end-to-end. Measure latency, throughput, error handling, developer effort, and governance fit. Use results to validate or falsify the spreadsheet.

Typical outcomes:

Choose a platform when you need many connectors, strong API governance, quick delivery, centralized policies, and predictable support.
Choose custom when the workload is performance-critical, the integration surface is narrow and well understood, lock-in is a major risk, or cost must scale linearly with actual use.
Hybrid is common: Kafka for the event backbone, an API gateway for north–south traffic, and thin custom microservices for domain logic and specialized adapters.

Table

Aspect	Off-the-Shelf Platform (MuleSoft / WSO2 / Kafka)	Custom Middleware	Trade-off
Time-to-Value	Fast via connectors, visual mapping, policies	Build adapters and tools	Speed vs engineering lift
Performance	Good general latency; Kafka excels at throughput	Tailored for ultra-low latency	Generic scale vs bespoke speed
Governance & Security	Built-in RBAC, policies, auditing	Must implement and audit	Convenience vs control
TCO	Licenses/support but fewer incidents	No licenses; higher build/ops	Spend money vs spend time
Lock-in & Portability	Risk of proprietary features	Portable by design	Ease vs independence
Skills & Ops	Admin-friendly, vendor support	Deep DS and SRE expertise	Lower bar vs expert team
Data Evolution	Schema registry, mappings	Full control, higher discipline	Guardrails vs flexibility

‍

Common Mistakes

Picking a platform for one simple use case, then paying license and ops overhead forever.
Building custom middleware to “save money,” ignoring SRE, observability, and error-handling costs.
Ignoring latency budgets and back-pressure; discovering the ESB adds unacceptable hops.
Underestimating schema evolution and idempotency requirements; brittle flows break on replays.
Treating governance as an afterthought; audits become painful.
Locking into proprietary connectors without an exit plan.
Skipping a real pilot; relying on vendor demos or internal slideware.
Not budgeting for training, documentation, and incident run-books whichever path you choose.

Sample Answers

Junior:
“I compare requirements to platform features. If we need many third-party connectors and policy enforcement fast, I choose a platform. If latency is critical and we own both ends, I consider a small custom service with clear retries and monitoring.”

Mid:
“I create a scorecard: patterns, latency, throughput, governance, cost, and skills. I run a spike integrating one real system and measure p95 latency, build effort, and error handling. Platforms win for breadth and governance; custom wins for tight latency and narrow scope.”

Senior:
“I recommend a hybrid: Kafka as an event backbone for durability and replay, an API gateway for governance, and custom microservices for domain adapters. I model three-year TCO, assess lock-in, and define exit strategies. Decision criteria include latency budgets, SLA targets, schema evolution, and audit readiness, validated by a two-week pilot.”

‍

Evaluation Criteria

Evaluate the candidate’s ability to:

Translate business and technical needs into middleware patterns.
Compare platform versus custom across latency, throughput, governance, TCO, lock-in, and team skills.
Propose a pilot with measurable success metrics.
Address reliability features: retries, idempotency, DLQs, back-pressure, and observability.
Outline security and compliance: RBAC, policy enforcement, secrets, audit trails.
Present a pragmatic hybrid architecture and an exit strategy.
Red flags: tool absolutism, ignoring cost and ops, hand-waving on SLAs, or no plan for schema evolution and governance.

Preparation Tips

List your integration patterns and latency budgets before looking at tools.
Build a small spike: one API mediation flow and one event flow on a platform and with custom code; measure p95 and developer hours.
Learn Kafka basics (topics, partitions, consumer groups, schema registry) and API gateway policies.
Practice designing idempotent handlers with retries, timeouts, and dead-letter queues.
Create a TCO model: licenses, cloud costs, support, staff, incident budget.
Draft an exit plan from any vendor using open formats and adapter isolation.
Prepare an audit checklist: authentication, authorization, secrets, logging, data retention.

Real-world Context

A fintech needed to integrate eight external systems under strict audits. MuleSoft delivered connectors, policies, and reporting within one quarter, reducing audit effort by forty percent. Later, a real-time pricing path required sub-ten-millisecond p99. The team built a thin custom middleware with Kafka for transport and bespoke in-memory caches, meeting latency while keeping governance at the edge gateway. An e-commerce company migrated from an aging ESB to Kafka plus WSO2 for APIs, retaining a few custom adapters for legacy sockets. The hybrid model balanced speed, cost, and compliance while avoiding single-vendor lock-in.

‍

Key Takeaways

Match requirements to patterns; validate with a real pilot.
Platforms accelerate breadth, governance, and connectors; custom excels in specialized performance and portability.
Model three-year total cost of ownership, not just licenses.
Plan for observability, retries, idempotency, and schema evolution from day one.
Hybrid architectures are common and effective when boundaries are clear and exit strategies exist.

Practice Exercise

Scenario:
You must connect Salesforce, SAP, and a legacy warehouse system to a new order platform. Regulatory audits require policy enforcement and traceability. Mobile checkout needs sub-fifty-millisecond p95 for pricing calls during peak.

Tasks:

Define requirements: patterns (API mediation, event streaming, batch), latency budgets, SLAs, governance, and compliance.
Build a two-week spike with a platform (for example MuleSoft or WSO2) to expose an order API with authentication, rate limits, and a Salesforce connector; measure developer hours, p95 latency, and policy coverage.
Build a parallel spike with Kafka for order events and a custom middleware service for pricing with strict latency; implement retries, idempotency, and a dead-letter queue.
Create a scorecard: time-to-value, performance, TCO, lock-in risk, audit readiness, and team skills.
Propose an architecture: platform for north–south governance and connectors, Kafka for event backbone, custom pricing path for low latency.
Document an exit strategy and an observability plan: tracing, metrics, schema registry, policy audits.

Deliverable:
A decision memo with scorecard results, a recommended hybrid architecture, a three-year TCO model, and a rollout plan that balances speed, compliance, cost, and long-term flexibility.

How do you choose middleware: platform vs custom build?

answer

Long Answer

1) Functional fit and integration patterns

2) Latency, throughput, and performance envelopes

3) Time-to-value and delivery velocity

4) Operability, reliability, and SRE burden

5) Governance, security, and compliance

6) Data model and evolution

7) Cost model and total cost of ownership

8) Vendor lock-in and portability

9) Team skills and hiring market

10) Roadmap and ecosystem longevity

Table

Common Mistakes

Sample Answers

Evaluation Criteria

Preparation Tips

Real-world Context

Key Takeaways

Practice Exercise

Still got questions?

Privacy Preferences