We are seeking a Senior ML Engineer (independent consultant) to lead the engineering lifecycle of production machine learning systems on Google Cloud Platform (GCP). The role will partner closely with Data Scientists to productionise models, scale pipelines, and implement robust monitoring to meet production grade reliability, latency, and throughput requirements.
You will define and enforce ML engineering standards (CI/CD for ML, modular coding, testing) and build the compute/data foundation for high volume ML workloads using Vertex AI, Airflow (Cloud Composer), and BigQuery.
Ideal talent profile: an experienced ML engineer with deep cloud infrastructure expertise, proven MLOps delivery, and strong software engineering rigor (Python, SQL, Docker, container orchestration) who can operate autonomously and drive best practices across teams.
Key Activities
- Model productionisation: Transition experimental models into reliable, low latency production services; optimise for latency, throughput, and fault tolerance in a cloud native environment.
- Design & operate MLOps pipelines: Build repeatable, scalable, and automated training/evaluation/deployment pipelines using GCP services (e.g., Vertex AI, Cloud Composer), with clear promotion paths from staging to prod.
- Engineering standards & governance: Define and implement versioning, CI/CD for ML, modular coding standards, and comprehensive testing (unit/integration/data/ML specific tests).
- Infrastructure & data engineering: Provision and manage compute, storage, and data workflows to support high volume ML workloads; ensure efficient feature engineering and reliable data retrieval with BigQuery.
- Observability & reliability: Implement monitoring and alerting for model drift, performance degradation, and system health; establish thresholds, logging, and recovery procedures for long term stability.
- Collaboration & stakeholder engagement: Work with Data Scientists, Platform/DevOps, and Product teams to align releases, SLAs, and acceptance criteria; communicate tradeoffs and timelines clearly.
- Documentation & knowledge transfer: Produce architecture diagrams, pipeline runbooks, and operating procedures to enable maintainability and handover.
Your Background
Essential
- Senior level experience as an ML Engineer owning end-to-end model productionisation in cloud native environments.
- GCP expertise with hands on proficiency in Vertex AI, Airflow (Cloud Composer), and BigQuery.
- Strong programming skills in Python and SQL; extensive experience with Docker and container orchestration (e.g., Kubernetes).
- Proven track record delivering MLOps frameworks, including CI/CD pipelines and automated model monitoring in production.
- Solid software engineering fundamentals: design patterns, API development, testing strategies, and Git based workflows.
- Ability to collaborate with cross functional teams and drive engineering standards and operational excellence.
Desirable
- Experience optimising latency/throughput for real time or near real time ML services.
- Familiarity with data quality validation (e.g., schema checks, feature consistency) and ML specific testing approaches.
- Exposure to feature engineering at scale and best practices for reusable feature pipelines.
- Experience building monitoring dashboards/alerting for model performance and data drift.
- Strong communication skills to explain tradeoffs, risks, and design decisions to technical and non-technical stakeholders.