Site Reliability Engineer - Team Lead | Chandigarh (Onsite) | PermanentPOSITION: We are looking for an experienced "Site Reliability Engineer - Team Lead" to lead an SRE team. The ideal candidate will have a strong background in enhancing the reliability and scalability of services, leading technical teams, and driving strategic initiatives to improve a Lodging-as-a-Service platform.RESPONSIBILITIES: Leadership & Mentorship:Lead, mentor, and develop a team of SREs, fostering a culture of reliability, collaboration, and continuous improvement. Strategic Planning:Drive the design and implementation of scalable, sustainable solutions, and lead the transition towards a cloud-native, serverless, and NoOps environment. Service Excellence:Oversee service availability, system performance, and capacity planning for critical Cross-Functional Collaboration:Work closely with stakeholders across the organization to solve complex technical challenges and enhance user experiences. Incident Management:Lead incident response efforts, perform root cause analysis, and implement preventative measures. Process Optimization:Champion the adoption of best practices in monitoring, automation, and observability. SLO Management:Define and manage Service Level Objectives (SLOs) to guide prioritization and ensure reliability.REQUIRED EXPERIENCE: Experience:7+ years in site reliability engineering or related fields, with at least 2 years in a leadership role. Education:Bachelor's or Master's degree in Computer Science, Engineering, or a related field. Technical Expertise: Extensive experience withAWS cloud servicesand cloud engineering best practices. Proficiency in programming languages such asJava, Python, and familiarity with React. Deep understanding of software engineering methodologies and development cycles. Expertise inmonitoring and observability tools(New Relic, Kibana, Prometheus, Grafana, ElasticSearch). Leadership Skills:Proven ability to lead technical teams, manage projects, and communicate effectively with stakeholders. Problem-Solving skills:Exceptional analytical abilities to perform root cause analysis and develop effective solutions. Automation & Efficiency:Strong background in automating processes and driving operational efficiency.
Job Title
Site Reliability Team Lead