Want to professionalize your AI skills, pivot to an AI role and increase your salary?
Master AI Engineering with the most practical and comprehensive LLM Development certifications at Towards AI Academy.

Walmart

Principal, Site Reliability Engineer

Walmart

Published 07 Apr 2026
Bentonville, AR, USA
110K - 286K USD Annual
Part Time
Full Time

Share this job

Role Highlights

Languages used

JavaScript
Python

Key skills

Machine Learning
Computer Science
Integrations
CICD
Information Systems
Distributed Systems
Software Architect
SRE
Site Reliability
System Administration
Software Architectures
Infrastructure
Automation
Cloud
Data
UX
Testing

Tools, Libraries and Frameworks

Docker
IBM

Description

\\\\Position Summary...\\\\ \\\\What you'll do...\\\\ \\\\Role summary:\\\\ The (USA) Principal, Site Reliability Engineer leads the design, development, and implementation of reliability programs for complex site environments. This role ensures system performance, scalability, and disaster recovery through advanced monitoring, root cause analysis, and infrastructure automation. The position requires expertise in software architecture, distributed systems, and cloud technologies to optimize operational efficiency and resilience. The Principal Engineer collaborates across teams to drive continuous improvement, establish reliability standards, and support business objectives by delivering robust, scalable, and secure solutions aligned with organizational goals. \\\\About the team:\\\\ The CES team delivers exceptional customer service experiences to millions of Walmart customers and agents worldwide. Comprising software engineers, data scientists, and machine learning experts, the team advances GenAI technology within complex enterprise applications. As part of Walmart Global Techs Enterprise Business Systems, CES collaborates closely with product, business, and UX teams to drive measurable business outcomes. The team focuses on innovation, reliability, and scalability to support Walmarts mission of helping customers save money and live better through cutting-edge technology and robust site reliability engineering practices. \\\\What you'll do:\\\\ \\+ Design and develop reliability programs tailored to complex site environments, ensuring alignment with business goals and site safety engineering. \\+ Lead and facilitate reliability testing and chaos experiments to validate application resiliency and system performance. \\+ Analyze system architecture and performance to optimize scalability, disaster recovery, and operational efficiency. \\+ Develop and implement monitoring strategies, establishing metrics and alerts to maintain system availability and reliability. \\+ Guide root cause analysis efforts to identify and resolve defects, enhancing application stability and preventing incidents. \\+ Drive infrastructure automation and telemetry integration to support continuous delivery and operational excellence. \\+ Mentor team members on tools, coding standards, and reliability best practices. \\\\What you'll bring:\\\\ \\+ Extensive experience in site reliability engineering with strong expertise in system monitoring, root cause analysis, and reliability analysis. \\+ Proficiency in designing scalable, modular, and extensible software architectures aligned with business and technical requirements. \\+ In-depth knowledge of disaster recovery planning, execution, and contingency procedures for complex site environments. \\+ Skilled in cloud computing platforms and containerization technologies such as Docker. \\+ Ability to lead reliability testing and chaos engineering experiments using open-source tools. \\+ Strong coding skills in languages like JavaScript and Python, with automation experience in CI/CD pipelines. \\+ Proven capability to analyze system performance and implement telemetry for continuous improvement. At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more. You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable. For information about PTO, see . Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart. Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms. For information about benefits and eligibility, see One.Walmart () . Bentonville, Arkansas US-10735: The annual salary range for this position is $110,000.00 - $220,000.00 Sunnyvale, California US-11807: The annual salary range for this position is $143,000.00 - $286,000.00 Additional compensation includes annual or quarterly performance bonuses. Additional compensation for certain positions may also include : \- Stock \\\\Minimum Qualifications...\\\\ \Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.\\ Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and5 years experience in site reliability engineering, site and system administration, infrastructure management, or related area.Option 2: 7 years experience in site reliability engineering, site and system administration, infrastructure management, or related area. \\\\Preferred Qualifications...\\\\ \Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.\\ Experience in site reliability engineering, site and system administration, infrastructure management, or related area., Master's degree in site reliability engineering, site and system administration, infrastructure management, or related area and 3 years experience in site reliability engineering, site and system administration, infrastructure management, or related area., SRE certification (for example, IBM Cloud Site Reliability Engineer)., We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmarts accessibility standards and guidelines for supporting an inclusive culture. \\\\Primary Location...\\\\ 2501 Se J St, Ste A, Bentonville, AR 72716-3724, United States of America Walmart and its subsidiaries are committed to maintaining a drug-free workplace and has a no tolerance policy regarding the use of illegal drugs and alcohol on the job. This policy applies to all employees and aims to create a safe and productive work environment. Walmart, Inc. is an Equal Opportunity Employer- By Choice. We believe we are best equipped to help our associates, customers, and the communities we serve live better when we really know them. That means understanding, respecting, and valuing diversity- unique styles, experiences, identities, abilities, ideas and opinions- while being inclusive of all people.

Required Qualifications and Skills

A Bachelor's degree in a relevant field and 5 years of experience in site reliability engineering, site and system administration, or infrastructure management is required. Alternatively, 7 years of experience in site reliability engineering, site and system administration, or infrastructure management is acceptable. Proficiency in designing scalable software architectures, in-depth knowledge of disaster recovery planning, and experience with cloud computing platforms and containerization technologies like Docker are necessary. The role also requires the ability to lead reliability testing and chaos engineering experiments, strong coding skills in JavaScript and Python, and proven capability in analyzing system performance and implementing telemetry.

Disclaimer

Disclaimer: Job and company description information and some of the data fields may have been generated via GPT-4 summarisation and could contain inaccuracies. The full external job listing link should always be relied on for authoritative information.

About the company

Walmart

Size

404042

Founded

HQ

Bentonville, US

Public/Private

Public Company

Description

Sixty years ago, Sam Walton started a single mom-and-pop shop and transformed it into the worlds biggest retailer. Since those founding days, one thing has remained consistent: our commitment to helping our customers save money so they can live better. Today, were reinventing the shopping experience and our associates are at the heart of it. When you join our Walmart family of brands (Sam's Club, Bonobos, Moosejaw and many more!), youll play a crucial role in shaping the future of retail, improving millions of lives around the world. We are ecstatic to have been named a Great Place to Work® Certified May 2023 May 2024, Disability: IN 2023 Best Places to Work, and Fast Company 100 Best Workplaces for Innovators 2023. This is that place where your passions meet purpose. Join our family and build a career youre proud of.

Share

Share this job

Related jobs

Tech Lead
Computer Science
Integrations
API
Bentonville, AR, USA
Part Time
Full Time
Scikit-learn
Data Engineer
Machine Learning
Data Science
Bentonville, AR, USA
Remote
Part Time
Full Time
Data Science
Computer Science
Integrations
API
Sunnyvale, CA, USA
Part Time
Full Time
Freelancer