Principal ML Infrastructure Engineer (Relocation Available)

Franklin Fitch
Location Not Specified
Posted
💰$125 – $150/hr

Job Description

Overview AI Infrastructure Engineer (GPU Systems & Model Deployment) (Principal and Entry level available) We are seeking an AI Infrastructure Engineer to design and optimize high-performance systems that enable machine learning models to run reliably and efficiently in production environments.

This role is focused on GPU-accelerated inference, low-latency model serving, and bridging the gap between research models and real-world deployment.

You will work closely with ML researchers and software engineers to ensure models are production-ready, scalable, and performant.

This is a hands-on systems role with a strong emphasis on C++, CUDA, and GPU inference optimisation .

Core Responsibilities Design and maintain GPU-accelerated infrastructure for deploying machine learning models in production Build and optimize high-throughput, low-latency inference pipelines Develop and maintain performance-critical components in C++ Optimize GPU utilization through CUDA programming and kernel tuning Support model conversion, optimization, and deployment using inference runtimes Partner with ML researchers to transition models from experimentation to production Diagnose and improve system performance relative to baseline benchmarks Ensure deployed systems are reliable, observable, and maintainable in production environments Required Qualifications Masters or PhD required Strong C++ expertise with experience writing and optimizing production-grade systems Hands-on CUDA programming experience and GPU performance optimization Solid understanding of GPU architectures and memory management Preferred / Nice-to-Have Qualifications Experience with TensorRT or similar GPU inference runtimes 1–7 years of experience as a Software Development Engineer supporting production model deployment Experience with model optimization, quantization, or runtime acceleration techniques Exposure to ML frameworks (e.g., PyTorch, TensorFlow) from a systems or deployment perspective Experience working with containerized environments and CI/CD pipelines Tech Environment (Representative, Not Exhaustive) C++, CUDA GPU inference runtimes (e.g., TensorRT) Linux, containers, cloud or on-prem GPU systems #J-18808-Ljbffr

Apply for Principal ML Infrastructure Engineer (Relocation Available) job

Apply Now