Want to professionalize your AI skills, pivot to an AI role and increase your salary?
Master AI Engineering with the most practical and comprehensive LLM Development certifications at Towards AI Academy.

Mastercard

Lead DevOps Engineer, Foundry RnD

Mastercard

Published 25 Feb 2026
Nairobi, Kenya
Remote
Full Time

Share this job

Role Highlights

Languages used

Python
GO

Key skills

Generative AI
ML Ops
Computer Science
Integrations
API
CICD
Code Reviews
Cloud Infrastructure
IAC
Technical Leadership
DevOps Engineer
Information Security
GitHub Actions
Shell Scripting
Devops
Machine Learning
Automation
Inference
Servers
Reliability
Logging
Deployment
Agile
SRE
Cluster
RAG
Serverless

Tools, Libraries and Frameworks

GitLab CI
AWS
Azure
GCP
DataBricks
MLFlow
Terraform
Kubernetes
Docker
GitOps
Bash
Jenkins
Prometheus
Grafana
Splunk
ELK
EKS
GKE
Linux
Unity
SageMaker
LLamaindex
Tensorflow
PyTorch
Scikit-learn
Langchain

Description

\\\\Our Purpose\\\\ \Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, were helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.\\ \\\\Title and Summary\\\\ Lead DevOps Engineer, Foundry RnD Our Purpose Mastercard powers economies and empowers people across 200+ countries and territories. Together with our customers, we build a sustainable, inclusive economy by enabling secure, simple, smart, and accessible digital payments. Our technology, innovation, partnerships, and networks deliver products and services that help people, businesses, and governments reach their full potential. Lead DevOps Engineer We are seeking a Lead DevOps Engineer to join the Mastercard Foundry R&D team. You will help build and scale AI/ML infrastructure to support our innovation efforts, with a focus on automation, observability, and developer experience. The ideal candidate is hands-on, curious, motivated, and comfortable working in fast-moving R&D environments. What You'll Do Drive Platform Infrastructure: Own DevOps and infrastructure for MLOps and agentic AI systems, establishing reusable patterns for CI/CD, scalable inference, orchestration, observability, and cost control. Design secure, scalable, repeatable systems using Infrastructure as Code (IaC) to support R&D workloads. Build secure CI/CD & automation systems: Enable secure tool access, workload isolation, and infrastructure for LLM-backed APIs and MCP servers, while partnering with security and compliance on access control, infrastructure governance and auditability. Ensure Reliability & Observability: Implement monitoring, logging, and alerting. Tune observability for ML-specific workloads to ensure performance, reliability, and operational insight. Provide Technical Leadership: Offer hands-on leadership across DevOps and platform initiatives. Review code, enforce best practices, improve tooling, and promote clean, well-tested infrastructure. Cross-Functional Collaboration: Partner with ML, software, and platform engineers to design deployment strategies, scope work, manage agile deliverables, and meet milestones. What You'll Bring Extensive DevOps Experience: 812+ years in DevOps, SRE, or platform engineering, including senior/lead roles. Experience designing end-to-end infrastructure systems, solving scale/performance challenges, and operating platforms in production. Cloud & Infrastructure Expertise: Strong skills in cloud platforms (AWS, Azure, or GCP) and AI/ML components such as Databricks, Azure ML, and MLflow. Deep experience with Infrastructure as Code using Terraform and orchestration tools like Terragrunt. Container & Orchestration Mastery: Expertise in Kubernetes and Docker, including how they optimise ML development workflows. Experience with container security, networking, and cluster management at scale. AI/ML Platform Knowledge: Understanding of ML workflow requirementsmodel registries, feature stores, AI agents, Retrieval-Augmented Generation (RAG) techniques, and frameworks like LangChain/LlamaIndex. Leadership & Mentorship: Ability to translate ambiguous goals into clear plans, guide engineers, and lead technical execution. Problem-Solving Mindset: Approach issues systematically, using analysis and data to select scalable, maintainable solutions. Required Skills Education & Background: Bachelor's degree in Computer Science, Engineering, or related field. 812+ years of proven experience architecting and operating production-grade infrastructure, especially those supporting AI/ML workloads. Infrastructure as Code: Expert in Terraform and IaC orchestration tools like Terragrunt. Strong experience with configuration management and GitOps practices. Programming & Scripting: Advanced Bash and Python skills and strong software engineering fundamentals (version control, CI, code reviews). Familiarity with Go or other systems programming languages is a plus. CI/CD & Automation: Hands-on experience with Jenkins, GitHub Actions, GitLab CI, or similar tools. Strong understanding of pipeline design, artifact management, and deployment strategies. Monitoring & Observability: Experience with monitoring stacks such as Prometheus, Grafana, Splunk, and ELK. Skilled in building dashboards, alerts, and tuning observability for ML-specific use cases. Cloud Infrastructure: Experience deploying systems on AWS/Azure/GCP. Familiar with cloud-native services, serverless computing, and managed Kubernetes offerings (EKS, AKS, GKE). Comfortable with Linux internals and shell scripting. Security & Networking: Knowledge of security best practices for MLOps, including data privacy, compliance, access controls, and encryption. Understanding of modern networking protocols (mTLS) and secure service communication. Collaboration & Agile Delivery: Strong communication skills and experience working with cross-functional teams. Ability to document designs clearly and deliver iteratively using agile practices. Preferred Skills Databricks Experience: Hands-on experience with Databricks, including workspace administration, cluster management, Unity Catalog, Delta Lake, and Lakehouse architectures. Familiarity with Databricks workflows, jobs orchestration, and MLflow integration. Advanced Cloud & ML Platform Expertise: Experience with Azure ML, SageMaker, or similar ML platforms. Familiarity with model serving, feature stores, and ML pipeline orchestration. ML Frameworks Familiarity: Knowledge of ML frameworks like TensorFlow, PyTorch, or Scikit-learn to better support ML engineering teams. Enterprise Security: Experience working in complex enterprise environments with strict security and compliance requirements. Strong networking fundamentals, including configuring and maintaining secure mTLS-based communication between services. DevOps & Platform Innovation: Experience implementing self-service platform automation, developer portals, or internal developer platforms (IDPs). Continuous Learning: Motivation to explore emerging technologies, especially in AI, generative AI, and cloud-native infrastructure. Certifications, personal projects, or open-source contributions are a plus. Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: Abide by Mastercard's security policies and practices; Ensure the confidentiality and integrity of the information being accessed; Report any suspected information security violation or breach, and Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines. \\\\Corporate Security Responsibility\\\\ All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: \\+ Abide by Mastercards security policies and practices; \\+ Ensure the confidentiality and integrity of the information being accessed; \\+ Report any suspected information security violation or breach, and \\+ Complete all periodic mandatory security trainings in accordance with Mastercards guidelines.

Required Qualifications and Skills

The role requires 8-12 years of experience in DevOps, SRE, or platform engineering, with a focus on architecting and operating production-grade infrastructure, particularly for AI/ML workloads. Expertise in Infrastructure as Code using Terraform and orchestration tools like Terragrunt is essential, along with strong configuration management and GitOps practices. Advanced Bash and Python skills, coupled with solid software engineering fundamentals, are necessary. Experience with CI/CD tools such as Jenkins, GitHub Actions, or GitLab CI, and monitoring stacks like Prometheus, Grafana, Splunk, and ELK is also required. A Bachelor's degree in Computer Science, Engineering, or a related field is expected.

Disclaimer

Disclaimer: Job and company description information and some of the data fields may have been generated via GPT-4 summarisation and could contain inaccuracies. The full external job listing link should always be relied on for authoritative information.

About the company

Mastercard

Size

38185

HQ

Purchase, US

Description

MasterCard is framed as a technology company in the global payments business, emphasizing its role in connecting various stakeholders worldwide and enabling the use of secure and convenient electronic forms of payment. It describes its mission as working to connect and power an inclusive digital economy that benefits everyone everywhere by ensuring transactions are safe, simple, smart, and accessible. The company culture is driven by a decency quotient (DQ), cultivating an inclusive environment that values individual strengths, views, and experiences. It leverages secure data, networks, partnerships, and a passion for innovation to support various entities, including individuals, financial institutions, governments, and businesses, in realizing their potential.

Share

Share this job

Related jobs

Data Engineer
Data Analysis
Team Lead
Information Security
Mumbai, India
Full Time
AI
Data Collection
Prompt Engineering
Machine Learning
USA
Part Time
Full Time
Freelancer
Apache Spark
Data Engineer
Big Data
Data Architect
Bogota, Colombia
Full Time