Job Details

Job Information

Senior Software Engineer, Cloud Platforms
AWM-1957-Senior Software Engineer, Cloud Platforms
4/10/2026
4/15/2026
Negotiable
Permanent

Other Information

www.apple.com
Austin, TX, 78703, USA
Austin
Texas
United States
78703

Job Description

No Video Available
 

Weekly Hours: 40

Role Number: 200655198-0157

Summary

Join the Apple Service Engineering team as a Software Engineer and be part of something extraordinary. At Apple, your ideas have the power to shape the future of our products, services, and customer experiences. Bring your passion and dedication, and watch your vision become reality.

As a Software Engineer, you’ll play a crucial role in supporting and scaling cloud services, including AI/ML and LLM- powered platforms, for thousands of development and operations engineers. Our services require uncompromising scalability, high availability, and seamless performance. This hands-on position involves establishing reliability practices for our private and public cloud services, which will accelerate our ability to deliver thousands of applications, including AI-driven services, reliably and consistently. If you’re passionate about designing, engineering, and running systems that make a tangible difference for our customers, Apple is the ideal place for you.

Description

We’re seeking a motivated and driven individual to join our innovative team. As a cornerstone of our production software, you’ll play a crucial role in ensuring the uncompromising reliability, security, and scalability of our systems. These systems encompass infrastructure supporting LLM inference, AI/ML training pipelines, and intelligent automation. Your expertise will be instrumental in maintaining constant uptime, seamless scalability, and fostering a thriving environment for new applications and services. With a growing emphasis on AI-powered capabilities, your contributions will be pivotal in designing and implementing solutions that enhance system stability, security, and scalability. By leveraging LLMs and AI, you’ll contribute to improved operational efficiency and system intelligence. Your collaboration with developers, architects, and AI/ML engineers will be essential in achieving these goals.

Minimum Qualifications

  • Kubernetes Expertise: Deep understanding of Kubernetes architecture, components, and best practices, including orchestration of AI/ML and LLM inference workloads.

  • Proficiency in developing Kubernetes clusters, deploying applications, and automating workflows using tools like Helm and Kustomize.

  • Cloud Platforms: Experience with major public cloud providers and their cloud-native services, including GPU-accelerated compute and AI/ML platform services. Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.

  • SRE Principles: Adherence to SRE principles, including monitoring, alerting, error budgets, fault analysis, and automation. Strong focus on reliability, availability, and performance.

  • Telemetry and Observability: Expertise in implementing and coordinating telemetry using tools like Splunk, Grafana, and Prometheus. Ability to analyze and troubleshoot complex system issues.

  • Programming: Proficiency in GoLang or Python for developing automation scripts, tools, and custom applications. Familiarity with Python-based AI/ML ecosystems is a plus.

  • AI/ML Fundamentals: Understanding of LLM serving infrastructure, model deployment patterns, and AI/ML pipeline concepts (e.g., model training, fine-tuning, inference optimization).

  • Collaboration: Excellent interpersonal and communication skills. Ability to work effectively in cross-functional teams — including AI/ML engineering teams — and foster a collaborative environment.

  • BS or MS in Computer Science or equivalent proven experience

Preferred Qualifications

  • Production & Non-Production Environments: Operate, monitor, and prioritize tasks across all production and non-production environments, including AI/ML training and LLM serving clusters, demonstrating strong operational focus.

  • LLM & AI Infrastructure: Experience deploying and managing large language model (LLM) inference services, GPU clusters, and AI/ML pipelines at scale.

  • Innovative

  • Problem Solver: Design, build, and implement innovative software solutions including AI-driven automation and intelligent observability tools — to address existing challenges and proactively anticipate future needs.

  • Documentation & Collaboration: Create clear alert handling procedures and runbooks, ensuring knowledge transfer and collaboration within and between SRE teams.

  • Automation Champion: Automate service deployment and orchestration in the cloud environment, leveraging AI/ML and LLM-based tooling to streamline operations and reduce toil.

  • Resilience & Growth: Actively participate in capability planning, scale testing, and disaster recovery exercises to ensure our systems, including AI infrastructure, remain resilient.

  • Team Player: Foster strong relationships and provide support to partner teams like engineering, QA, AI/ML, and program management.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf) .

Other Details

No Video Available
--

About Organization

 
About Organization