Want to professionalize your AI skills, pivot to an AI role and increase your salary?
Master AI Engineering with the most practical and comprehensive LLM Development certifications at Towards AI Academy.

OpenAI

Software Engineer, Hardware Health

OpenAI

Published 13 May 2026
San Francisco, CA, USA
$250K - $445K
Full Time

Share this job

Role Highlights

Languages used

Python
SQL

Key skills

AI
Distributed Systems
Information Technology
Data Security
Shell Scripting
Cloud
Infrastructure
Inference
Research
Debugging
Automation
Cluster
Reliability
Kernel
Operations
Deployment

Tools, Libraries and Frameworks

GPUs
Linux
OpenAI

Description

The role involves building critical infrastructure to maintain the health and operational status of large-scale compute clusters. The individual will define and maintain health signals across hardware components including GPUs, CPUs, and networking infrastructure. Responsibilities include developing automated remediation systems to detect and resolve failures with minimal manual intervention. The position requires managing node lifecycle workflows such as repair, quarantine, and return-to-service processes. Additionally, the role entails investigating system-level hardware failures and partnering with other teams to integrate health signals into training and inference workloads.

Required Qualifications and Skills

The role requires a minimum of seven years of industry experience in software or infrastructure engineering. Candidates must demonstrate strong proficiency in Python and shell scripting, along with experience building large-scale distributed systems. The position necessitates the ability to analyze operational data using tools like SQL or PromQL and requires strong systems debugging instincts. No specific degree or professional qualification is explicitly mentioned as a requirement.

Disclaimer

Disclaimer: Job and company description information and some of the data fields may have been generated via GPT-4 summarisation and could contain inaccuracies. The full external job listing link should always be relied on for authoritative information.

About the company

OpenAI

Size

1136

Founded

HQ

San Francisco, US

Public/Private

Partnership

Description

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. AI is an extremely powerful tool that must be created with safety and human needs at its core. OpenAI is dedicated to putting that alignment of interests first — ahead of profit. To achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity. Our investment in diversity, equity, and inclusion is ongoing, executed through a wide range of initiatives, and championed and supported by leadership. At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Share

Share this job

Related jobs

AI
Reinforcement Learning
ML Research
Information Technology
San Francisco, CA, USA
Full Time
AI
Integrations
Customer Success
SME
Seoul, Korea
Remote
Full Time
AI
Customer Success
Information Technology
Data Security
Bangalore, India
Remote
Full Time