Portfolio

Machine learning engineering
I work on the engineering around models—not only notebook experiments. These projects span serving, batch scoring, RAG, keeping features consistent from training to scoring, and monitoring data and predictions over time.
  • Train and serve with the same assumptions: evaluation, thresholds, preprocessing, and what belongs in an API versus a scheduled job
  • Retrieval, embeddings, and LLMs as a service—latency, sources, and the boring parts that keep answers grounded
  • Features and monitoring: one place definitions live, drift and quality checks, outputs that are easy to re-run and compare

I care about applied ML that holds up after deployment: measuring performance, matching inference to training, and choosing batch versus online paths deliberately. These repositories practice full-system work—reliable, observable model and retrieval stacks. Session-style demos sit on Data tools.

GitHub profile — opens everything public on my account. On this page, each project card links to that repo and its live demo in the footer.

Projects in this track

Customer churn prediction system

Helps you see which customers are most at risk of leaving so you can intervene early.

  • FastAPI
  • scikit-learn
  • Classification
  • Joblib

Serves per-account churn probability, tier, and routing flag from one pinned model and cutoff. Inference uses a single frozen artefact; notebook workflows stay off the live request path.

Inference-only deploy—artifact and API on a teaching dataset for train/serve study. Retrain, drift, and production monitoring stay explicitly out of scope; the repo states that boundary for reviewers.

Batch scoring pipeline

Scores many records in one run on a schedule, with the same rules each time instead of one-off checks.

  • Python
  • Batch
  • pandas
  • pytest
  • joblib

Batch job scores CSV rows with training-aligned preprocessing and writes score, label, model version, and timestamp on each row. Built for scheduled runs with deterministic row output and explicit failure exits.

Live URL is a static I/O showcase; scoring runs locally or in your jobs. Optional JSON manifest, non-zero exit on failure, notebook path to audit rows end to end.

RAG document intelligence QA

Answers questions from your own documents in plain language and points to the exact source for each answer.

  • FastAPI
  • FAISS
  • sentence-transformers
  • Docker
  • OpenAI / Ollama

Ingest chunks documents for retrieval, then answers questions using retrieved context and a hosted model. Each reply cites document, page, and chunk identifiers so answers stay traceable.

Public page wires try-it calls to the deployed API. TLS via Caddy, GET /health, optional API key, rate limits, retrieval score floor, CORS. Local Streamlit stays optional when you do not hit the public API.

Feature store (mini)

Keeps training and day-to-day scoring aligned on the same field definitions so live use matches what the model was trained on.

  • pandas
  • pytest
  • FastAPI
  • Batch pipeline

Batch compile turns raw extracts into a versioned feature table under locked column specs. Training and scoring both load the same built table so transforms stay consistent from fit to score.

CLI builds from bundled synthetic or custom raw; strict mode rejects all-null required columns. Validation covers duplicates, schema, sanity; optional FastAPI adds HTML demo, JSON catalog, sample download, POST /demo/transform with structured 422s. CI: pytest and smoke on Python 3.11–3.12.

ML monitoring & data quality

Highlights when new data or model outputs start to look unlike the baseline, before bad decisions stack up.

  • pandas
  • pytest
  • Streamlit
  • PSI / KS
  • JSON schema

Validates each scored batch against a fixed reference and surfaces shifts in inputs, categories, and prediction patterns. Emits batch-level reports and structured artifacts; retraining and online inference stay out of scope here.

Simulation emits baseline vs current_batch_*.csv; one command regenerates HTML and drift JSON. Public Streamlit is a viewer over results; the heavy reports come from the pipeline run. Scope stops at batch files—no live inference API, feature store, Kafka, or per-request logging.

This page is the portfolio index for these five projects: each card carries its GitHub repository and live-demo URL in one place. I update links when a repo moves or a deploy URL changes.

© 2026 Vahdettin Karataş. All rights reserved.
Applied ML systems, APIs, and practical automation.