When to serve a model with an API vs a batch job

Technology, Automation
Not every model needs a real-time endpoint. Here is how to choose between online inference and scheduled batch scoring without overbuilding.

Most model work lands in one of two shapes: something answers when a request arrives (an API), or something runs on a schedule or after a file lands (a batch job). Both are valid. The mistake is picking the wrong one because “real-time” sounds more serious.

What an API is good for

An API path fits when a human or another system is waiting. Someone loads a screen, a service needs a score before the next step, or latency in the order of milliseconds to a few seconds is part of the product. You care about consistent preprocessing on that request, timeouts, and what happens when the model or dependencies fail mid-call.

What batch scoring is good for

Batch fits when data shows up in chunks: nightly exports, hourly syncs, or “run this file when it is ready.” You write one row of outputs per input row (or per entity), stamp model version and run time, and keep the run repeatable. Nobody is staring at the browser; they need a ledger they can audit tomorrow.

Latency and when data arrives

If the business decision only happens after a file is complete, batch is usually simpler than forcing a long-running API. If the decision is per click or per session, an API (or a queue worker that feels like an API to the caller) is the natural shape.

Audit trail and repeatability

Batch runs often make audit easier: one job ID, one input snapshot, one output table. APIs can log too, but you have to design it. If regulators or internal finance ask “what did the model say on Tuesday?”, batch artefacts answer that question bluntly.

Operational ownership

APIs need uptime, scaling, and safe rollbacks. Batch needs scheduling, retries, and clear failure alerts. Smaller teams sometimes ship batch first because the blast radius of a bad deploy is easier to contain—you fix the job and re-run.

Why not everything should be real-time

Real-time adds moving parts. If no one needs an answer in seconds, you pay for complexity without buying a better decision. Many use cases—risk flags for the next morning’s review, lead scoring for a weekly campaign, inventory hints for a daily planning sheet—are perfectly honest batch problems.

A simple checklist
  • Does a person or system block on the score in seconds? If yes, lean API (or async job with a poll). If no, batch is on the table.
  • Does the input arrive as a stable file or batch export? Strong signal for batch.
  • Do you need a row-level audit trail for a past run? Batch tables and run logs are a good fit.
  • Can you tolerate a short delay between new data and a new score? If yes, batch is often enough.
  • Who owns on-call for this path? If that team is thin, prefer the simpler shape.

I keep runnable examples of both patterns on the Portfolio page—one churn-style serving surface and one batch scoring pipeline—so you can compare how each is scoped, not just how slides describe them. For fixed-scope work that picks one path and ships it, Services spells out how I structure those engagements.

Both API and batch can be the right answer. The goal is to match the business rhythm, not to impress with real-time by default.

Article info
© 2026 Vahdettin Karataş. All rights reserved.
Applied ML systems, APIs, and practical automation.