Skip to Main Content

Job Title


Senior Site Reliability Engineer- ELK Expert


Company : iVedha Inc.


Location : Gurgaon, Haryana


Created : 2025-07-23


Job Type : Full Time


Job Description

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering PracticeLocation: India (Remote) -Must be available to work in the EST (US/Canada) Time Zone.Role Summary: Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure? We're looking for an SRE with7+ years of experience , including4+ years specializing in the ELK stack (Elasticsearch, Logstash, Kibana) , to join ourPlatform Engineering Practice . In this role, you’ll design, manage, and scale ELK clusters ingesting2–3+ TB/day , enhance reliability across distributed systems, and drive automation within Azure cloud environments. This is a high-impact engineering opportunity focused on performance, observability, and operational excellence at scale.Why Join UsCareer Growth:Work alongside industry experts on cutting-edge cloud technologies Competitive Compensation and Benefits:We recognize and reward top talent Exciting, Impactful Work:Design and build scalable, resilient cloud environments Strategic Platform Role:Contribute to the foundation of next-gen observability and reliability infrastructureWhat You Will DoDesign and Optimize Cloud Infrastructure:Architect scalable, fault-tolerant systems on Microsoft Azure Automate Everything:Use Terraform, Ansible, and GitHub Actions to streamline deployment and configuration Ensure Reliability and Performance:Proactively monitor, troubleshoot, and resolve production issues using Prometheus, Grafana, and Azure Monitor Enhance Security and Compliance:Implement security best practices across DevOps workflows Collaborate and Innovate:Work closely with engineering, security, and operations teams to drive automation and efficiency Manage and scale large ELK clustershandling2–3+ TB/daylog volumes, ensuring high availability and performance Optimize ELK architecture:Implement efficient index lifecycle policies, shard strategies, and hot-warm-cold tiered storage Build and tune log pipelines:Scale Logstash and Beats pipelines across distributed environments Support Kibana observability layers:Create dashboards, visualizations, and custom alerting frameworks (e.g., Watcher, ElastAlert)What You Bring7+ years of experiencein Site Reliability Engineering, DevOps, or Cloud Engineering 4+ years of dedicated, hands-on experience with ELK (Elasticsearch, Logstash, Kibana) Strong experience managinglarge-scale ELK clusters in productionwith heavy ingestion (multi-TB/day) Deep knowledge ofindex tuning, shard allocation, ILM policies , and scaling ELK components Expertise in GitHub Actions, Terraform, Ansible, and Infrastructure as Code (IaC) Proficiency inPython, Go, or Bashfor automation and scripting Deep understanding ofKubernetes, Docker , and cloud-native architectures Experience withobservability toolssuch as Prometheus, Grafana, Azure Monitor Ability to work in a fast-paced, collaborative environment and solve complex operational issuesEducationBachelor’s or Master’s degree in Computer Science, Information Technology, or a related fieldCertifications (Nice to Have)Microsoft Azure certifications:AZ-104 ,AZ-400