Skip to Main Content

Job Title


AI/ML infrastructure engineer


Company : Huxley


Location : Ernakulam, Kerala


Created : 2025-05-15


Job Type : Full Time


Job Description

Position:Why We HireAs an MLOps Engineer in the GPU Engineering team, you will be at the heart of Rakuten's ML operations, focusing on the deployment, monitoring, and management of ML models. You'll work closely with ML Engineers across the department to provide a reliable infrastructure that supports rapid model development, training, and deployment. Your expertise will contribute to the efficiency and scalability of our ML projects, directly impacting Rakuten's product innovation and service excellence. Position DetailsKey Responsibilities: - Design, implement, and maintain ML pipelines for automated training, testing, and deployment of machine learning models, ensuring scalability and efficiency. - Work collaboratively with ML engineers to troubleshoot and optimize model performance, ensuring models are production-ready and meet defined SLAs. - Manage and monitor Kubernetes clusters and related infrastructure to support high-volume ML workloads, implementing best practices for security and resilience. - Develop and maintain documentation on ML infrastructure, tools, and best practices, providing guidance and support to ML teams. - Continuously evaluate and incorporate new technologies and tools to enhance the ML platform's capabilities and performance. Mandatory Qualifications:- Experience: 3 years or more of experience in MLOps, with a proven track record of managing ML infrastructure- Kubernetes Proficiency: Deep understanding of Kubernetes (K8s) infrastructure and its application in managing ML workloads- Programming Skills: Proficiency in Python or Golang- Proven experience with Linux OS, with the ability to maintain system performance, ensure proper configuration, and leverage tools to troubleshoot software, hardware, and network-related issues- Education: Bachelor’s or higher degree in Computer Science, Engineering, or a related technical discipline- Strong communication and teamwork skills- Passion for technology and solving challenging problemsDesired Qualifications:- Familiarity with ML frameworks (e.g., TensorFlow, PyTorch) and CUDA- CI/CD Tools: Experience with CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI) and container technologies (e.g., Docker)- Experience training large models, including LLMs