We are seeking an experienced MLOps Platform Engineer to design, build and maintain scalable infrastructure for machine learning lifecycle management. The role requires strong platform engineering skills, a collaborative mindset and a focus on reliability, security and observability.
MLOps Platform Engineer Job Profile
The MLOps Platform Engineer will be responsible for creating robust, repeatable pipelines that take models from experimentation to production. You will work closely with data scientists, software engineers and platform teams to ensure models are reproducible, monitored and governed.
This role suits an engineer with solid cloud experience, a strong grasp of container orchestration and a practical understanding of CI/CD, infrastructure as code and model monitoring. The successful candidate will help shape platform standards and best practices across the organisation.
MLOps Platform Engineer Job Description
As an MLOps Platform Engineer, you will design and operate the core tooling that supports machine learning workloads. Your work will include architecting platforms for model training and serving, implementing automation for continuous integration and continuous delivery of models, and ensuring that infrastructure scales reliably under variable demand.
You will create reproducible workflows using infrastructure as code and integrate experiments with version control for data, features and models. A key part of the role is to implement robust monitoring and alerting for model performance and data quality so that production issues are identified and resolved promptly.
Collaboration is central to the role. You will partner with data science teams to productionise models, advise on best practice for feature stores and data pipelines, and work with security and compliance teams to embed governance into the platform. You will also evaluate and introduce new tooling that improves developer productivity and operational resilience.
MLOps Platform Engineer Duties and Responsibilities
- Design, deploy and operate scalable ML platforms on cloud or on-premises infrastructure.
- Build and maintain CI CD pipelines for model training, testing and deployment.
- Implement infrastructure as code using tools such as Terraform or CloudFormation.
- Containerise workloads and manage orchestration using Kubernetes or equivalent.
- Integrate model versioning, feature stores and data lineage tooling into workflows.
- Develop monitoring, logging and alerting for model health, drift and data quality.
- Automate repeatable processes to reduce manual intervention and deployment risk.
- Collaborate with data scientists to productionise models and optimise inference costs.
- Define and enforce platform security, access controls and compliance standards.
- Perform capacity planning and cost optimisation for ML workloads.
- Provide on-call support and incident response for platform-related outages.
- Document platform architecture, runbooks and onboarding guides for stakeholders.
MLOps Platform Engineer Requirements and Qualifications
- Bachelor's degree in Computer Science, Engineering, Mathematics or related field, or equivalent experience.
- 3+ years of experience in platform engineering or DevOps, ideally with ML production experience.
- Proficiency with cloud providers such as AWS, Azure or Google Cloud Platform.
- Strong experience with Kubernetes, Docker and container networking.
- Hands-on experience with CI CD tooling and automation frameworks.
- Familiarity with modelling frameworks and ML tooling such as TensorFlow, PyTorch or scikit learn.
- Experience with infrastructure as code tools such as Terraform or CloudFormation.
- Knowledge of monitoring and observability tools such as Prometheus, Grafana and the ELK stack.
- Understanding of model monitoring, A B testing and concept drift detection techniques.
- Strong scripting skills in Python, Bash or similar languages.
- Excellent communication skills and ability to work cross-functionally.
- Desire to standardise processes, mentor colleagues and drive platform improvements.
