Machine Learning Systems Platform Engineer
PCRecruiter - Recruitment Software & Applicant Tracking System
Job Description
Confidential Opening: Machine Learning Systems Platform Engineer
Location: San Francisco, CA (Hybrid Preferred)
Company Overview:
A stealth-mode innovator at the forefront of AI infrastructure is seeking a dynamic Machine Learning Systems Platform Engineer to build the backbone of their next-generation ML ecosystem. This team is leading the charge in developing tools and platforms that empower world-class ML teams to experiment, scale, and deploy faster than ever before.
Position Overview:
In this key engineering role, you will architect and optimize the systems that make high-performance AI development possible. From training and tuning to inference and monitoring, your work will enable cutting-edge ML initiatives across the organization. You will work closely with ML scientists and engineers to ensure seamless integration of models into production environments.
Key Responsibilities:
- Build and maintain robust infrastructure to support machine learning workloads at scale, including training pipelines, tuning environments, and deployment frameworks.
- Develop and automate MLOps pipelines for reproducibility, experiment tracking, model versioning, and validation.
- Optimize cloud and on-prem GPU compute utilization across orchestration platforms.
- Lead the implementation of tools for model rollback, observability, and system health monitoring.
- Collaborate with cross-functional teams to ensure reliability, scalability, and maintainability of ML systems.
Qualifications:
- 3+ years of experience in designing and deploying ML infrastructure or production-grade MLOps tools.
- Fluency in backend development and infrastructure engineering, especially with Python, Go, Bash, Terraform, or Helm.
- Experience with ML orchestration tools such as Kubeflow, Airflow, MLflow, Ray, or Metaflow.
- Proficient in containerization and cloud-native technologies, including Docker, Kubernetes, Argo, or managed ML platforms like SageMaker.
- Deep understanding of cloud environments (AWS, GCP, or Azure) and GPU-accelerated workloads.
Preferred Skills:
- Exposure to distributed training techniques (FSDP, DeepSpeed, Horovod).
- Knowledge of CI/CD strategies for ML and data drift detection methods.
- Awareness of privacy, compliance, and security practices in ML systems.
- Prior experience in infrastructure-first or developer-oriented AI organizations.
Compensation and Benefits:
- Base salary range: $160,000 to $230,000 DOE
- Significant equity package and comprehensive benefits
- Opportunity to work at the core of transformative AI innovation
Why Apply?
This is a rare opportunity to own and shape the ML platform behind AI that will define the next era. If you thrive in system-level problem solving and want to leave your mark on how machine learning is built at scale, this role is for you.
Apply today to learn more about this confidential opportunity and how you can play a part in the future of AI engineering.
#J-18808-LjbffrPCRecruiter - Recruitment Software & Applicant Tracking System
San Francisco, CA