One feature table for training and scoring

Data Systems, Data Integration

If training and production scoring use different column logic, the model’s inputs quietly drift. Here is why one shared feature build beats ad-hoc spreadsheets for serious ML.

A model only sees numbers you feed it. If those numbers are built one way in a notebook and another way in the weekly scoring job, you do not have “the same model”—you have two pipelines that happened to share a file name. That mismatch is one of the most common sources of silent quality loss after a model ships.

Why ad-hoc feature logic breaks trust

Spreadsheets and one-off scripts are fast for exploration. They are also hard to diff: someone adds a column, rounds differently, or filters rows in scoring but not in training. The model still “runs.” The business only notices when predictions feel wrong and nobody can explain which step diverged.

Why shared definitions matter

Features are not just column names—they are rules: how you handle nulls, caps, date math, joins, and category encoding. When those rules live in one place and compile into a single table (or artefact) that both training and scoring consume, you remove a whole class of arguments. Either the build passes validation or it fails loudly before anyone trusts the output.

What a minimal compile step gives you

You do not need a vendor platform to get value. A batch compile step can mean: read raw extracts, apply a versioned definition list, write a deterministic feature table, and validate schema and sanity checks. Training reads that table. Scoring runs the same compile on fresh raw data and feeds the model. The “contract” is the built artefact, not tribal knowledge in three notebooks.

Reducing train–score mismatch

When something breaks, you compare inputs to the same transform code path, not “Sarah’s Excel” versus “the Python job.” Drift you see in monitoring is easier to interpret because the feature layer is stable; what changed is usually data or the model, not a hidden formula edit.

This is the same idea behind the feature-store-style project on Portfolio: a small, explicit registry and a batch build that training and scoring both read—without pretending to be an enterprise product. The point is to make feature definitions and rebuild steps explicit enough that another reviewer can follow them end to end.

You can still use spreadsheets upstream for business input. The line worth drawing is: before the model sees data, there should be one build you trust—and one place definitions change.

All publications

One feature table for training and scoring

Why ad-hoc feature logic breaks trust

Why shared definitions matter

What a minimal compile step gives you

Reducing train–score mismatch

Article info