Want to professionalize your AI skills, pivot to an AI role and increase your salary?
Master AI Engineering with the most practical and comprehensive LLM Development certifications at Towards AI Academy.

Ubisoft

Site Reliability Engineer - AI Platform

Ubisoft

Published 18 Feb 2026
Shanghai, China
Full Time

Share this job

Role Highlights

Languages used

Python
GO
JavaScript

Key skills

CICD
Full Stack
Cloud Infrastructure
IAC
System Design
Site Reliability
Cloud Native
Incident Management
Gaming
AI
Operations
Devops
LLMs
RAG
Automation
Deployment
Logging
Architecture
Data
SRE
Machine Learning
Security

Tools, Libraries and Frameworks

IOS
Terraform
Docker
Kubernetes
AWS
GCP
Azure

Description

Ubisoft Site Reliability Engineer - AI Platform \\\| SmartRecruiters Google Chrome Microsoft Edge Apple Safari Mozilla Firefox . Site Reliability Engineer - AI Platform Full-time Contract: Fixed Term Work flexibility: Office-based Company Description Ubisoft is a global leader in gaming with teams across the world creating original and memorable gaming experiences, from Assassins Creed, Rainbow Six, to Just Dance and more. We believe diverse perspectives help both players and teams thrive. If youre passionate about innovation and pushing entertainment boundaries, join our journey and help us create the unknown! Created in 1996, Ubisoft Shanghai studio, is a vibrant and exciting place where our  talents get opportunities to either co-develop great AAA blockbuster games, create cutting-edge online games or produce fun mobile games. To learn more, please visit: www.ubisoftgroup.com Job Description About the Role Join the AI Initiatives team as a Site Reliability Engineer and help operate, scale, and evolve the foundation that powers AI products across the company. This role sits within the AI team and focuses on ensuring that AI platforms, services, and agent-based systems are reliable, scalable, observable, and secure in production. This is not a pure operations role. The position requires strong software engineering skills combined with deep experience in cloud infrastructure, DevOps practices, and system reliability. A genuine interest in AI systems and how they behave in real-world production environments is essential. Responsibilities As a Site Reliability Engineer - AI Platform, this role plays a critical role in enabling the reliable delivery and operation of AI-powered products and platforms used across the organization. Build and Operate Reliable AI Infrastructure - Design, deploy, and operate cloud-native infrastructure supporting AI workloads, including LLM services, RAG pipelines, agent-based systems, and internal AI platforms. Full-Stack DevOps & Engineering - Develop automation, tooling, and services to support CI/CD, deployment, configuration, and lifecycle management of AI systems. Balance hands-on development work with infrastructure ownership and operational responsibilities. Infrastructure as Code & Automation - Define and manage infrastructure using Infrastructure as Code (e.g. Terraform, CloudFormation), and build automation for provisioning, scaling, recovery, and routine operations. Observability & Incident Management - Design and maintain observability solutions (monitoring, logging, tracing, alerting) to ensure high availability, fast detection of issues, and effective incident response for AI services. System Architecture & Reliability - Partner with AI engineers and product teams to review system designs, identify reliability risks, define SLOs/SLIs, and improve fault tolerance, scalability, and resilience of AI-powered systems. Cloud Native Delivery - Operate and evolve containerized platforms using Docker and Kubernetes; support safe and frequent deployments through robust CI/CD pipelines. AI-Aware Operations - Develop an understanding of AI-specific operational challenges such as model serving, LLM latency, rate limits, cost control, caching, retries, fallbacks, and data pipeline reliability. Cross-Team Collaboration - Work closely with AI engineers, software engineers, and product teams to ensure that reliability, operability, and scalability are first-class concerns throughout the product lifecycle. Qualifications We are seeking a seasoned professional with a strong technical background and a passion for building world-class AI applications. Must-Have Qualifications: 8+ years of experience in software engineering, SRE, DevOps, or platform engineering roles. Strong programming skills (e.g. Python, Go, JavaScript, or similar), with experience building internal tools and automation. Solid experience with cloud platforms (AWS, GCP, or Azure) and cloud-native architectures. Hands-on experience with DevOps practices, CI/CD pipelines, and container orchestration (Docker, Kubernetes). Strong knowledge of Infrastructure as Code (Terraform, CloudFormation, or equivalent). Experience designing and operating observability systems (monitoring, logging, alerting) Strong understanding of system architecture, reliability engineering, and production operations Passion for AI technologies and curiosity about how AI systems behave at scale. Nice-to-Have Qualifications: Experience supporting AI or data-intensive systems in production environments. Familiarity with AI/ML workloads, such as model serving, RAG pipelines, or agent-based systems. Understanding of reliability challenges specific to AI systems (latency, cost control, scaling, failure modes). Experience operating enterprise-grade platforms with high availability, security, and compliance requirements. Be familiar with AI service platform, i.e, AWS bedrock or azure foundry Experience with AI agents and Model Context Protocol (MCP), including operating, integrating, or supporting agent-based systems in production environments. Additional Information Growth Opportunities Joining our team as a Senior Software Engineer in AI Applications offers a unique chance to work on industry-leading projects that shape the future of AI technology. You will have the opportunity to: Engage in continuous learning and professional development to stay at the forefront of AI advancements. Take on increased responsibilities and influence the strategic direction of our AI product offerings and drive impactful innovation. I'm interested I'm interestedPrivacy Notice I'm interested share this job Share on LinkedIn Share on Facebook Share on Twitter Share via email Share on Xing Share on WeChat Share to WeChat × Copy the link and open WeChat to share. Copy to clipboard Open WeChat Share to WeChat × Use Scan QR Code in WeChat and click ··· to share. Site Reliability Engineer - AI Platform Shanghai, China Full-time I'm interested I'm interested

Required Qualifications and Skills

The role requires 8+ years of experience in software engineering, SRE, DevOps, or platform engineering. Strong programming skills in languages such as Python, Go, or JavaScript are necessary, along with experience building internal tools and automation. Candidates must have solid experience with cloud platforms like AWS, GCP, or Azure and cloud-native architectures. Hands-on experience with DevOps practices, CI/CD pipelines, and container orchestration using Docker and Kubernetes is essential. Strong knowledge of Infrastructure as Code tools like Terraform or CloudFormation is also required. Experience designing and operating observability systems and a strong understanding of system architecture, reliability engineering, and production operations are needed. A passion for AI technologies and curiosity about how AI systems behave at scale is essential.

Disclaimer

Disclaimer: Job and company description information and some of the data fields may have been generated via GPT-4 summarisation and could contain inaccuracies. The full external job listing link should always be relied on for authoritative information.

About the company

Ubisoft

Size

22845

HQ

Saint-Mand, FR

Public/Private

Public Company

Description

Ubisoft’s 19,000 team members, working across more than 30 countries around the world, are bound by a common mission to enrich players’ lives with original and memorable gaming experiences. Their commitment and talent have brought to life many acclaimed franchises such as Assassin’s Creed, Far Cry, Watch Dogs, Just Dance, Rainbow Six, and many more to come. Ubisoft is an equal opportunity employer that believes diverse backgrounds and perspectives are key to creating worlds where both players and teams can thrive and express themselves. If you are excited about solving game changing challenges, cutting edge technologies and pushing the boundaries of entertainment, we invite you to join our journey and help us Create the unknown.

Share

Share this job

Related jobs

Data Engineer
Data Analysis
Data Architect
Computer Science
Bucharest, Romania
Full Time
Machine Learning
ML Ops
Vue.js
CICD
Paris, France
Temporary
Data Science
Vue.js
NLP
Python
Paris, France
Stage