Location: United States (West Coast preferred, remote considered)
About the Company
We are a rapidly growing AI company delivering large language models at scale. Our mission is to ensure models not only perform well in research but also serve real-world applications reliably and efficiently. We are looking for engineers who enjoy solving high-scale inference and systems challenges.
Role Overview
We are seeking a Senior / Staff LLM Systems Engineer to lead the development, optimization, and deployment of large language model inference pipelines. This role focuses on high-throughput, low-latency serving and production reliability, bridging ML research and platform engineering.
This is not a training-focused role – the emphasis is on serving models at scale, optimizing systems, and enabling production ML reliability.
Responsibilities
- Design, implement, and optimize inference pipelines for large language models
- Improve throughput and latency of model serving in production environments
- Collaborate closely with infrastructure, platform, and ML research teams to ensure smooth deployment
- Build monitoring, observability, and alerting systems for inference performance and reliability
- Identify and solve scaling challenges across GPUs, TPUs, or distributed environments
- Evaluate and adopt new technologies, frameworks, and architectures to improve inference efficiency
- Mentor other engineers and contribute to technical strategy for production ML systems
Qualifications
- 5+ years of software engineering experience, including hands-on ML systems experience
- Strong background in distributed systems, performance tuning, and low-latency architectures
- Experience with model serving frameworks (e.g., Triton, vLLM, Ray, TorchServe)
- Familiarity with GPU/TPU infrastructure, multi-node deployment, and system-level optimization
- Understanding of ML workloads and trade-offs between accuracy, latency, and cost
- Proven ability to deliver production-grade ML systems at scale
- Excellent collaboration and problem-solving skills
Why You’ll Enjoy This Role
- Work on cutting-edge LLM inference systems at scale
- Solve technically challenging, high-impact engineering problems
- Collaborate with top ML researchers and platform engineers
- Competitive compensation and flexible work arrangements
Darwin Recruitment is acting as an Employment Agency in relation to this vacancy.
Reece Waldon