MLOps doesn’t need to be a megaproject. Most teams we work with have been sold a vision of a unicorn platform — feature store, vector database, model registry, online experimentation, drift monitoring, end-to-end lineage — and are three quarters into the quarter with nothing in production. This guide is the antidote.
What problem are you actually solving?
Before touching Kubeflow or MLflow, answer three questions:
- How often do models change? Weekly, monthly, or “once and done”?
- What’s the cost of a bad prediction? Recommendation miss vs. clinical decision are very different.
- Who operates this after launch? Your data scientists, a platform team, or a vendor?
Your answers determine 90% of the right architecture. A team that ships a model twice a year does not need a feature store.
The minimum viable MLOps stack
Here’s the smallest stack that still qualifies as “production”:
- Version control for code and data. Git for code; DVC, LakeFS, or object-storage snapshots for data.
- Reproducible training. A single command, dockerised, that can re-run training on any host.
- A model registry. Even a folder in S3 with a manifest file beats “whichever .pkl is in ops’ Dropbox”.
- A deployment target. Cloud endpoint, container, or Kubernetes — something a CI job can push to.
- Basic monitoring. Input distribution, output distribution, latency, error rate. Dashboards, not silence.
That’s it. You can bolt on experiment tracking, feature stores, and drift detection later — once real production pain demands them.
When to introduce each additional piece
Feature store
Add one when you have more than two models sharing features and real-time inference needs. Before then, a Parquet file in S3 and a Python function is enough.
Experiment tracking (MLflow, W&B)
Add when multiple people train models weekly. If one person trains quarterly, a spreadsheet is fine.
Automated retraining
Add when you’ve seen drift hurt production metrics. Not before. Premature automation is a common failure mode — models retrain on bad data and regressions ship silently.
Shadow & canary deployments
Add as soon as the cost of a bad model outweighs the engineering cost. For customer-facing systems, this is usually immediately.
Governance you can’t skip
Whatever stack you pick, you need:
- Audit trail: who trained what, on which data, with which hyperparameters, and which version is in production.
- Rollback: a documented way to revert to the previous model in under 10 minutes.
- Data access controls: least-privilege from day one, with PII handling clearly separated.
- Offline evaluation: a golden test set every candidate model must beat.
Common MLOps anti-patterns
- The platform of Theseus: rebuilding Kubeflow from scratch “because it’s simple”. It never stays simple.
- Untested pipelines: training code that only runs on one laptop.
- Model registries as graveyards: hundreds of artefacts, none promoted, none retired.
- Monitoring without action: drift dashboards nobody owns.
A 12-week MLOps roadmap for most teams
- Weeks 1–2: Pick cloud target, containerise training, versioned data snapshots.
- Weeks 3–4: First CI/CD pipeline to registry and endpoint.
- Weeks 5–6: Golden evaluation set and automated offline eval in CI.
- Weeks 7–8: Monitoring dashboards + alerting on business KPIs, not just accuracy.
- Weeks 9–10: Shadow deployment and A/B framework.
- Weeks 11–12: Runbook, postmortem template, handover to operations.
How we help
We run MLOps consulting engagements that compress that roadmap — we’ve seen the landmines, and we know which features can wait. If you want an honest review of your current setup and a lean path forward, get in touch.
Related: LLM Integration Checklist for Production · Our MLOps services