Skip to Main Content

Job Title


Site Reliability Engineer


Company : Triomics


Location : Lucknow, Uttar pradesh


Created : 2026-03-20


Job Type : Full Time


Job Description

About Triomics:Triomics is building the agentic AI layer for oncology EHRs. Cancer hospitals spend billions on highly trained staff manually reading unstructured patient records such as pathology reports, clinical notes, genomic panels to power workflows like trial matching, registry curation, visit prep, and quality reporting. We replace that manual work with task-driven AI agents that sit inside the EMR and process records with >95% accuracy, at scale, in real time.About the Role:We are looking for a DevOps / Site Reliability Engineer to help design, build, and maintain scalable, secure, and reliable infrastructure. In this role, you will work closely with engineering teams to streamline deployments, improve system reliability, and ensure our platforms run efficiently in production.You will be responsible for building automation, managing cloud infrastructure, monitoring systems, and responding to incidents to maintain high availability and performance.Key Responsibilities:Design, implement, and manage cloud-based infrastructure and deployment pipelines.Build and maintain CI/CD pipelines to enable reliable and efficient software delivery.Manage and optimize containerized environments using Kubernetes and Docker.Automate infrastructure provisioning and configuration using Terraform and Helm.Develop and maintain automation scripts using Python and Bash.Monitor system health, performance, and reliability using logging and monitoring tools.Troubleshoot production issues and participate in incident response and root cause analysis.Ensure infrastructure security, network configuration, and system hardening best practices.Collaborate with development teams to improve reliability, scalability, and deployment processes.Maintain clear documentation for infrastructure, processes, and operational procedures.Requirements:1+ years of experience in DevOps, Site Reliability Engineering, or a related role.Hands-on experience with at least one cloud platform (AWS, Azure, or GCP).Strong experience with Kubernetes, Docker, Jenkins, Terraform, and Helm.Proficiency in Python and Bash scripting.Solid understanding of Linux system administration, networking concepts, and security practices.Experience with monitoring, logging, and incident response systems.Strong communication skills and the ability to document technical processes effectively.Software development experience is a plus.Nice to Have:Experience deploying and scaling AI/ML workloads in production environments.Familiarity with single-tenant deployment models.Experience with MLOps/AIOps platforms such as SageMaker or Kubeflow.Knowledge of chaos engineering and disaster recovery strategies.Experience with cloud cost optimization strategies.Relevant cloud certifications (AWS, Azure, or GCP).Why Join Us?Impact at scale - The AI you build directly accelerates cancer research and improves patient outcomes worldwideCutting-edge problems - You’ll work on some of the hardest and most interesting LLM engineering challenges in a highly regulated industry.World-class team - Collaborate with experts across AI, engineering, product, and oncology with best-in-industry compensation.Culture that ships - We’re a team that works hard and plays hard (company-sponsored workations in Bali, Sri Lanka, Goa, and morePerks & Benefits:Lunch Provided at the Office – one less daily decision, one happier employee.Flexible Working Hours – we care about output, not clock-ins.Health Insurance – comprehensive coverage for you and your family.Zomato Meal Benefit – breakfast and dinner can be ordered when you come in early or leave late, because effort deserves fuel.