Skip to Main Content

Job Title


Site Reliability Engineer (SRE) – Infrastructure & Automation


Company : InstaService


Location : Chittoor,


Created : 2025-12-19


Job Type : Full Time


Job Description

About InstaService InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding nationwide — backed by strong traction, rapid adoption, and a mission to simplify how people get work done at home. We’re looking for a Senior Site Reliability Engineer (SRE) to join our core engineering team and scale our infrastructure to serve millions of users reliably. What You’ll Do - Lead incident response, conduct root cause analysis, and ensure permanent preventive measures. - Design and optimize CI/CD pipelines, automate deployments, and enforce release stability. - Build and manage scalable infrastructure on AWS, GCP, or Azure using Terraform, Ansible, and Kubernetes. - Continuously monitor system health with Prometheus, Grafana, ELK, and CloudWatch. - Conduct load and performance testing (k6, JMeter, Locust) and optimize systems for high-traffic events. - Improve observability, reduce alert noise, and enhance signal clarity for faster debugging. - Collaborate with developers and architects to ensure systems meet SLOs, SLIs, and SLAs. - Develop automation scripts and tools in Python, Go, Node.js, or Shell to streamline operations. - Manage distributed systems and message queues like Kafka or RabbitMQ. - Drive a culture of reliability, automation, and scalability across teams. What We’re Looking For - 4–7 years of experience in SRE or DevOps roles (preferably in high-scale or e-commerce environments). - Strong hands-on experience with Kubernetes, Docker, Terraform, Ansible, and CI/CD pipelines. - Deep understanding of Linux systems, networking, and distributed architecture. - Solid programming skills in Python, Go, or Node.js. - Experience managing cloud platforms (AWS, GCP, or Azure). - Proven track record of maintaining production uptime and optimizing system performance. Nice to Have - Experience with observability stacks, distributed tracing, and incident automation. - Familiarity with microservices and event-driven systems. - Exposure to cost optimization and capacity planning in multi-cloud environments. Why Join InstaService? - Fast-growing startup reshaping a massive industry - Work on high-scale systems and impactful technology - Collaborative and innovation-driven team - Competitive compensation and growth opportunities