Job Details
Job Information
Other Information
Job Description
Weekly Hours: 40
Role Number: 200635603-0836
Summary
Are you an open-source contributor passionate about building the next generation of cloud-native ML infrastructure? We’re looking for a hands-on technical leader with deep expertise in Kubernetes, Crossplane, Golang/Python, and agentic workflows to design and scale the platforms that power Apple’s Search and ML infrastructure ecosystems. If you’ve contributed to CNCF projects such as Kubernetes, Crossplane, or ArgoCD—and you’re driven to build intelligent, automated infrastructure for ML training and inference at massive scale—this role is for you. You’ll architect systems that are declarative, self-managing, and highly performant, enabling seamless ML experiences for billions of users.
Description
The MLPT Cloud Infrastructure Team within Apple’s Services organization designs, builds, and scales the foundational systems that power Search, and next-generation machine learning workloads. We are reimagining how infrastructure is managed through agentic, event-driven workflows, Crossplane compositions, and self-healing control planes. You’ll develop Model Context Protocol (MCP)-based infrastructure servers that integrate with ML and data workflows, delivering highly automated and observable infrastructure across hybrid and multi-cloud environments.
You will collaborate across ML engineering, SRE, and platform teams to deliver infrastructure that adapts intelligently to application needs, optimizes for cost and performance, and accelerates the development of ML training and inference pipelines.
Minimum Qualifications
BS/MS in Computer Science or equivalent practical experience.
5+ years of experience in leading distributed systems or cloud infrastructure engineering.
Strong programming experience in Golang and Python, including building controllers, operators, or automation systems.
Deep understanding of Kubernetes internals, controller-runtime, and Crossplane composition frameworks.
Experience with ArgoCD, Helm, and IaC (Terraform or Crossplane).
Hands-on experience with GitOps and reconciliation-driven workflows.
Proven ability to design and operate infrastructure for ML training and inference, including performance tuning and GPU optimization.
Experience leading technical teams and driving architectural decisions.
Strong grounding in cost efficiency, performance profiling, and system-level debugging.
Preferred Qualifications
9+ years in cloud infrastructure, SRE, or distributed systems roles.
Contributions to CNCF open-source projects (Kubernetes, Crossplane, ArgoCD, Envoy, Prometheus, etc.).
Deep expertise in Kubernetes API machinery, CRDs, and control plane development.
Experience with Model Context Protocol (MCP) or contextual infrastructure servers.
Familiarity with AIOps or agentic/LLM-driven automation in production environments.
Strong understanding of observability and distributed tracing (OpenTelemetry, Prometheus, Grafana).
Experience building ML infrastructure platforms (training clusters, inference systems, model registries).
Excellent communication, cross-functional leadership, and technical writing skills.
B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience is preferred
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf) .
Other Details

