Skip to Main Content

Job Title


ML Ops Lead


Company : Epergne Solutions


Location : Mangaluru, Karnataka


Created : 2026-01-21


Job Type : Full Time


Job Description

Job Role : - ML Ops Lead Job Location: - India Job Type : - Remote Experience : -7+ Years Roles & Responsibilities: - Design, deploy, and manage scalable, secure, and automated ML/AI platforms across Azure and AWS. Lead MLOps lifecycle model deployment, monitoring, retraining, CI/CD orchestration, and drift management. Build, maintain, and optimize ML workflows using Azure Machine Learning, Databricks, and AWS SageMaker. Integrate ML services with data platforms (Azure Data Lake, Cosmos DB, S3, DynamoDB, RDS). Implement governance, observability, compliance, and audit practices across ML and GenAI environments. Manage containerized workloads using Docker and Kubernetes (AKS/EKS). Develop and maintain Infrastructure as Code using Terraform, Bicep, CloudFormation, or CDK. Collaborate with stakeholders to resolve ML pipeline issues and ensure efficient production delivery. Conduct testing (unit/integration) as part of CI/CD pipelines via Azure DevOps or AWS CodePipeline. Apply security best practices RBAC, IAM, least privilege, authentication, and key management. Monitor systems using Grafana, Prometheus, Azure Monitor, and Log Analytics. Skills & Requirements: - Experience: 7+ years in cloud platform engineering and ML operations. Cloud Platforms: Azure (AI Services, ML, AKS, Functions) and AWS (SageMaker, Bedrock, Lambda). ML & AI: Strong in Python, TensorFlow, PyTorch, Scikit-learn, and end-to-end ML lifecycle management. GenAI Tools: Azure OpenAI, Bedrock, LangChain; understanding of prompt injection and jailbreak mitigation. IaC & DevOps: Hands-on with Terraform, Bicep, CloudFormation, CDK, Azure DevOps, CodePipeline. Security: IAM, RBAC, Azure Policy, AWS SCP, Key Vault, Audit Logging. Networking: DNS, Load Balancers, VPNs, VNets. Monitoring: Grafana, Prometheus, Application Insights, Azure Monitor. Databases: Azure SQL, Cosmos DB, AWS S3, RDS, DynamoDB, Redshift. Preferred Tools: GitHub Copilot, Cursor, Claude Code, M365 Copilot. Soft Skills: Strong problem-solving, stakeholder collaboration, and documentation abilities.