Site Reliability Engineer - Team Lead | Chandigarh (Onsite) | Permanent POSITION: We are looking for an experienced "Site Reliability Engineer - Team Lead" to lead an SRE team. The ideal candidate will have a strong background in enhancing the reliability and scalability of services, leading technical teams, and driving strategic initiatives to improve a Lodging-as-a-Service platform. RESPONSIBILITIES: Leadership & Mentorship: Lead, mentor, and develop a team of SREs, fostering a culture of reliability, collaboration, and continuous improvement. Strategic Planning: Drive the design and implementation of scalable, sustainable solutions, and lead the transition towards a cloud-native, serverless, and NoOps environment. Service Excellence: Oversee service availability, system performance, and capacity planning for critical Cross-Functional Collaboration: Work closely with stakeholders across the organization to solve complex technical challenges and enhance user experiences. Incident Management: Lead incident response efforts, perform root cause analysis, and implement preventative measures. Process Optimization: Champion the adoption of best practices in monitoring, automation, and observability. SLO Management: Define and manage Service Level Objectives (SLOs) to guide prioritization and ensure reliability. REQUIRED EXPERIENCE: Experience: 7+ years in site reliability engineering or related fields, with at least 2 years in a leadership role. Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related field. Technical Expertise: Extensive experience with AWS cloud services and cloud engineering best practices. Proficiency in programming languages such as Java, Python, and familiarity with React. Deep understanding of software engineering methodologies and development cycles. Expertise in monitoring and observability tools (New Relic, Kibana, Prometheus, Grafana, ElasticSearch). Leadership Skills: Proven ability to lead technical teams, manage projects, and communicate effectively with stakeholders. Problem-Solving skills: Exceptional analytical abilities to perform root cause analysis and develop effective solutions. Automation & Efficiency: Strong background in automating processes and driving operational efficiency.
Job Title
Site Reliability Team Lead