Data tools

Interactive data systems

Small systems built around validation, explicit data limits, and reproducible exports—not undocumented notebook logic. Streamlit supplies the interactive session in the browser; the repo still owns tests, config, and deploy behavior. Portfolio emphasizes batch and API-heavy ML work; here the footprint is smaller with the same discipline.

Schema checks, row caps, and structured reject paths so bad inputs fail loudly instead of polluting downstream numbers.
Pytest-backed pipelines and pinned assumptions where they matter; exports match what ran on the server.
Optional LLM steps only see metrics and aggregates—never row-level free text—so scope stays reviewable.

Portfolio centers on batch- and API-shaped ML systems; this page shows the same habits in interactive surfaces—tight validation, visible limits, traceable outputs. Repos contain tests and deploy notes; Services is the short pointer for contracted work.

GitHub profile — opens everything public on my account. On this page, each project card links to that repo and its live demo in the footer.

Apps in this track

Data cleaning toolkit

Streamlit
pandas
pytest
Parquet / JSON
Docker

Upstream prerequisite, not a side utility: downstream models and dashboards ingest the same reviewed tables—multi-format inputs become auditable CSV/Parquet/JSON plus HTML step log and before/after views, capped near 100K rows; rules cover bad formats, duplicates, skewed categories, optional outliers, plus bundled samples for dry runs.

JSON flattening stays one level by design. Pairs with EDA on this page (profile vs fix). Deploy mirrors the repository limits and validation logic.

GitHubLive demo

EDA report generator

Streamlit
pandas
pytest
Jinja2
WeasyPrint

Read-only dossier you can archive or forward—no cell edits: capped sampling with sheet picker, full column intelligence, correlations, histograms, and warnings, rendered to HTML from memory with optional PDF when WeasyPrint is available. Single-column junk files fail fast.

Natural companion to the cleaning toolkit (inspect here, mutate there). PDF failures surface in UI while HTML export still succeeds. docs/DEPLOY_VPS.md covers self-host parity with the repo.

GitHubLive demo

AI-assisted data analysis

Streamlit
Plotly
Aggregates-only AI
HTML report

Exploration plus written explanation—not KPI tracking: one business CSV yields profile, a fixed chart pack, rule-based quality hints, and optional OpenAI narrative built only from aggregates (never raw rows), bundled as one HTML story for readers who need context, not a metric wall.

UTF-8 CSV with config-driven caps; empty analysis still finishes and explains missing API keys. Suited to marketing/sales/customer-behaviour tables.

GitHubLive demo

KPI dashboard app

Streamlit
Sales & marketing CSV
Mapping presets
Optional OpenAI

Metric tracking and stability—not narrative exploration: preset mappings and hard row gates drive KPI cards, trends, breakdowns, optional deltas, and a rule-first “what changed?”; optional OpenAI reads only pre-aggregated KPI objects, never raw CSV rows.

Export a timestamped snapshot ZIP for handoffs. Narrow MVP—no warehouse connectors or enterprise RBAC; UI honours the same guardrails documented in config.

GitHubLive demo

Forecasting app

Streamlit
Holt-Winters
Plotly
Rolling backtest

Univariate demand curves with honest baselines—not neural nets: resampled Holt-Winters by day/week/month only when seasonality is supportable; Plotly band with MAPE/MAE against naive; warnings and fallback reasons stay on-screen for reviewers.

Sidebar-triggered run exports forecast CSV plus summary JSON. Bundled sample series and CLI smoke live in-repo; promo or holiday regressors stay out of scope with explicit naive fallback when fit is weak.

GitHubLive demo

Each card links to source and a public instance; Services covers engagement scope.