Role OverviewWe are looking for a skilled Spark on Kubernetes Support Engineer to provide L1/L2 support for large-scale data platforms. This role involves monitoring, troubleshooting, and optimizing Spark workloads running on Kubernetes, ensuring high availability and performance of data pipelines.Key ResponsibilitiesAct as first-level escalation for 24×7 monitoring of Spark (batch & streaming) workloads on KubernetesTroubleshoot Spark job failures, performance issues, and resource bottlenecksDiagnose Kubernetes issues (pod failures, OOMKilled, evictions, DiskPressure, scaling issues)Monitor Spark UI, cluster health, and resource utilizationCollaborate with development teams to debug and optimize pipelinesHandle Sev1/Sev2 incidents, including RCA and war-room coordinationBuild and maintain monitoring dashboards and alerting frameworks (Prometheus/Grafana/ELK)Support CI/CD pipelines and deployment automation using Azure DevOpsMaintain SOPs, runbooks, and drive continuous improvementsRequired Skills3–10 years in Big Data / Distributed Systems / Cloud SupportStrong expertise in Apache Spark (Core, SQL, Structured Streaming)Hands-on experience with Spark on KubernetesGood understanding of Kubernetes architecture & troubleshootingExperience with Azure DevOps (CI/CD pipelines, Git, deployments)Strong knowledge of Linux, SQL, and scripting (Python/Shell)Familiarity with monitoring tools: Prometheus, Grafana, ELKGood to HaveExperience with Kafka / streaming ecosystemsExposure to cloud platforms (Azure/AWS/GCP)
Job Title
Spark on Kubernetes Support Engineer