DevOps/AIOps Engineer (Platform)Experience: 3–5 YearsAbout the CompanyWe aim to bring about a new paradigm in medical image diagnostics — intelligent, holistic, ethical, explainable, and patient‑centric. We’re looking for innovative problem‑solvers who empathize with clinicians and patients, understand business problems, and can design and deliver reliable, intelligent products.Key Responsibilities· CI/CD for services & models: Own pipelines (GitHub Actions/GitLab CI), environment gates, artifact/version governance (containers, models, SBOMs), safe rollouts & instant rollbacks.· Kubernetes platform (EKS preferred): Operate multi-env clusters; Helm/Kustomize; GitOps (Argo CD/Flux); progressive delivery (canary/blue green/Argo Rollouts/Flagger).· Serving & APIs: Deploy and tune FastAPI services and Triton/ONNX/TensorRT inference; traffic shaping, runtime config, autoscaling signals.· Event-driven orchestration: Build robust consumers/producers on RabbitMQ/ActiveMQ/Kafka with back-pressure, dead-lettering, idempotency, and retry patterns.· Observability & AIOps: Define SLIs/SLOs and error budgets; metrics/logs/traces (Prometheus/Grafana/Loki/Tempo/ELK); intelligent alerting & noise reduction; basic model/data drift hooks.· Security in SDLC: Supply-chain security (image signing/provenance, SBOM scans), SAST/DAST/IaC scanning, policy-as-code (OPA/Gatekeeper), secrets hygiene in pipelines/workloads.· Data/Model platform integration: S3/MinIO for artifacts; integrate model registry (MLflow or similar) into CD; immutable, traceable releases.· Resilience & performance: Capacity planning (incl. GPU), autoscaling (HPA/VPA/KEDA), caching/queue tuning; chaos/game-days; write runbooks and own incident response for platform services.· Developer experience: Golden paths, starter repos, internal Helm charts, docs & enablement to make shipping boring and fast.· FinOps mindset: Cost dashboards, right-sizing, bin-packing, GPU utilization policies, spot vs on-demand strategy. Skills and Qualifications (Required)· 3+ years in DevOps/SRE/MLOps with strong Docker & Kubernetes fundamentals.· Production CI/CD expertise; canary/blue-green; artifact & version management.· IaC (Terraform) and GitOps workflows (Argo CD/Flux).· Observability: Prometheus/Grafana; logs/traces with Loki/Tempo/ELK.· Production message queues (RabbitMQ/ActiveMQ/Kafka) with back-pressure & retries.· Cloud experience (AWS/GCP/Azure), EKS preferred; object storage (S3/MinIO); model registries (MLflow or similar).· Security in SDLC and compliance guardrails for PHI-like data (least-privilege IAM, secrets, auditability).· Incident response experience; writing SLIs/SLOs, runbooks, and operating to error budgets.· Scripting for platform tasks (Python/Bash). Preferred· Triton Inference Server, ONNX/TensorRT optimizations; GPU scheduling on K8s (NVIDIA device plugin, MIG, node pools).· Argo Rollouts/Flagger, Karpenter, KEDA; caching layers (Redis/NVCache patterns).· Policy-as-code (OPA/Gatekeeper), image signing (cosign), SBOM tools (syft/grype).· Network savvy for app delivery (ingress, service meshes, egress policies). EducationBE/B.Tech (MS/M.Tech a bonus) or equivalent experience.Location & Work SetupOn-site - Gurugram
Job Title
AIOps Engineer