Skip to Main Content

Job Title


MLOps Site Reliability Engineer


Company : KLA


Location : Chennai, Tamil Nadu


Created : 2025-06-04


Job Type : Full Time


Job Description

MLOps Site Reliability Engineer KLA Overview: KLA is a global leader in diversified electronics for the semiconductor manufacturing ecosystem. Virtually every electronic device in the world is produced using our technologies. No laptop, smartphone, wearable device, voice-controlled gadget, flexible screen, VR device or smart car would have made it into your hands without us. KLA invents systems and solutions for the manufacturing of wafers and reticles, integrated circuits, packaging, printed circuit boards and flat panel displays. The innovative ideas and devices that are advancing humanity all begin with inspiration, research and development. KLA focuses more than average on innovation and in 2019 we invested 15% of sales back into R&D. Our expert teams of physicists, engineers, data scientists and problem-solvers work together with the world’s leading technology providers to accelerate the delivery of tomorrow’s electronic devices. Life here is exciting and our teams thrive on tackling really hard problems. There is never a dull moment with us. Key Responsibilities: We are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our machine learning infrastructure. You will work closely with data scientists, machine learning engineers, and software developers to build and maintain robust and efficient systems that support our machine learning workflows. This position offers an exciting opportunity to work on cutting-edge technologies and make a significant impact on our organization's success.Design, implement, and maintain scalable and reliable machine learning infrastructure. Collaborate with data scientists and machine learning engineers to deploy and manage machine learning models in production. Develop and maintain CI/CD pipelines for machine learning workflows. Monitor and optimize the performance of machine learning systems and infrastructure. Implement and manage automated testing and validation processes for machine learning models. Ensure the security and compliance of machine learning systems and data. Troubleshoot and resolve issues related to machine learning infrastructure and workflows. Document processes, procedures, and best practices for machine learning operations. Stay up to date with the latest developments in MLOps and related technologies.Qualifications: Required: Bachelor's degree in computer science, Engineering, or a related field. Proven experience as a Site Reliability Engineer (SRE) or in a similar role. Strong knowledge of machine learning concepts and workflows. Proficiency in programming languages such as Python, Java, or Go. Experience with cloud platforms such as AWS, Azure, or Google Cloud. Familiarity with containerization technologies like Docker and Kubernetes. Experience with CI/CD tools such as Jenkins, GitLab CI, or CircleCI. Strong problem-solving skills and the ability to troubleshoot complex issues. Excellent communication and collaboration skills.Preferred: Master's degree in computer science, Engineering, or a related field. Experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn. Knowledge of data engineering and data pipeline tools such as Apache Spark, Apache Kafka, or Airflow. Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack. Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible. Experience with automated testing frameworks for machine learning models. Knowledge of security best practices for machine learning systems and data.Equal Employment Opportunity: We offer a competitive, family friendly total rewards package. We design our programs to reflect our commitment to an inclusive environment; while ensuring we provide benefits that meet the diverse needs of our employees.KLA is proud to be an Equal Opportunity Employer. We do not discriminate on the basis of race, religion, color, national origin, sex, gender identity, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other status protected by applicable law. We will ensure that qualified individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment.