Job Details

Back to Search

Job Information

Job Title :

Senior Software Engineer, Model Inference

Job Code :

AWM-9545-Senior Software Engineer, Model Inference

Job Announced :

3/25/2026

Job Closed :

3/30/2026

Pay Rate:

Negotiable

Duration:

Permanent

Other Information

Organization Name:

Apple

Organization Url:

www.apple.com

Address :

San Francisco, CA, 94103, USA

City :

San Francisco

State :

California

Country :

United States

Zip Code :

94103

Job Description

Weekly Hours: 40

Role Number: 200638185-3401

Summary

Join Apple Maps to help build the best map in the world. In this role on ML Platform, you will help bring advanced deep learning and large language models into high-volume, low-latency, highly available production serving, improving search quality and powering experiences across Maps. You will partner closely with research and product teams, take end-to-end ownership, and deliver measurable results at global scale.

Description

As a Software Engineer on the Apple Maps team, you will lead the design and implementation of large-scale, high-performance inference services that support a wide range of models used across Maps, including deep learning and large language models. You will collaborate closely with research and product partners to bring models into production, with a strong focus on efficiency, reliability, and scalability. Your responsibilities span the full server stack, including onboarding new use cases, optimizing inference across heterogeneous accelerated compute hardware, deploying services on Kubernetes, building and integrating inference engines and control-plane components, and ensuring seamless integration with Maps infrastructure.

Minimum Qualifications

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems.
Expertise in deploying and optimizing LLMs for high-performance, production-scale inference.
Proficiency in Python, Java or C++.
Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.
Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding.
Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks.
Skilled in cloud technologies like Kubernetes, Ingress, HAProxy for scalable deployment.

Preferred Qualifications

Master’s or PhD in Computer Science, Machine Learning, or a related field.
Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models.
Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed.
Strong understanding of distributed systems, multi-GPU/multi-node parallelism, and system-level optimization for large-scale inference.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf) .

Other Details

About Organization

Other Jobs

View other jobs from this employer

Apply Back