Machine Learning Systems Platform Engineer

PCRecruiter - Recruitment Software & Applicant Tracking System

San Francisco, CA 94199

United States

Posted June 30, 2025

{OTHER}

Nursing

Valid until July 30, 2025

Apply Now

Job Description

Confidential Opening: Machine Learning Systems Platform Engineer

Location: San Francisco, CA (Hybrid Preferred)

Company Overview:
A stealth-mode innovator at the forefront of AI infrastructure is seeking a dynamic Machine Learning Systems Platform Engineer to build the backbone of their next-generation ML ecosystem. This team is leading the charge in developing tools and platforms that empower world-class ML teams to experiment, scale, and deploy faster than ever before.

Position Overview:
In this key engineering role, you will architect and optimize the systems that make high-performance AI development possible. From training and tuning to inference and monitoring, your work will enable cutting-edge ML initiatives across the organization. You will work closely with ML scientists and engineers to ensure seamless integration of models into production environments.

Key Responsibilities:

Build and maintain robust infrastructure to support machine learning workloads at scale, including training pipelines, tuning environments, and deployment frameworks.
Develop and automate MLOps pipelines for reproducibility, experiment tracking, model versioning, and validation.
Optimize cloud and on-prem GPU compute utilization across orchestration platforms.
Lead the implementation of tools for model rollback, observability, and system health monitoring.
Collaborate with cross-functional teams to ensure reliability, scalability, and maintainability of ML systems.

Qualifications:

3+ years of experience in designing and deploying ML infrastructure or production-grade MLOps tools.
Fluency in backend development and infrastructure engineering, especially with Python, Go, Bash, Terraform, or Helm.
Experience with ML orchestration tools such as Kubeflow, Airflow, MLflow, Ray, or Metaflow.
Proficient in containerization and cloud-native technologies, including Docker, Kubernetes, Argo, or managed ML platforms like SageMaker.
Deep understanding of cloud environments (AWS, GCP, or Azure) and GPU-accelerated workloads.

Preferred Skills:

Exposure to distributed training techniques (FSDP, DeepSpeed, Horovod).
Knowledge of CI/CD strategies for ML and data drift detection methods.
Awareness of privacy, compliance, and security practices in ML systems.
Prior experience in infrastructure-first or developer-oriented AI organizations.

Compensation and Benefits:

Base salary range: $160,000 to $230,000 DOE
Significant equity package and comprehensive benefits
Opportunity to work at the core of transformative AI innovation

Why Apply?
This is a rare opportunity to own and shape the ML platform behind AI that will define the next era. If you thrive in system-level problem solving and want to leave your mark on how machine learning is built at scale, this role is for you.

Apply today to learn more about this confidential opportunity and how you can play a part in the future of AI engineering.

#J-18808-Ljbffr

Job Details

Employment Status{OTHER}

LocationSan Francisco, CA

ZIP Code94199

Posted DateJune 30, 2025

Valid ThroughJuly 30, 2025

CountryUnited States

Related Jobs