Job Description – Application SRE (L3 Support)Experience:5–8 years (Senior-level)Role Summary:The Application SRE (L3) role focuses on application-level reliability, monitoring, and production support. This role acts as a bridge between Development and Support teams, ensuring faster issue resolution, proactive alerting, and stable application operations.The role emphasizes observability, incident management, and FCAPS ownership, enabling NOC and SQT teams to respond effectively to application alerts.Key Responsibilities:Act as L3 support for production applications and critical incidentsCollaborate with Development, Support(L1/L2), QA, and DevOps teamsDesign and maintain application-level monitoring and alertsEnsure alerts are actionable for NOC and SQT teamsPerform root cause analysis(RCA) and drive permanent fixesMaintain FCAPS practices (Fault, Configuration, Accounting, Performance, Security) at the application levelCreate and maintain run books and operational documentationSupport release validation and post-production monitoringPerformance Management: Diagnose and resolve High Latency queue building and Throughput degradationSupport DR drills and validate HA mechanismsRequired Skills:Strong knowledge of Linux systemsExperience with scripting (Shell, Python, or similar)Good understanding of databases, preferably PostgreSQLGood Understanding of Postman and Rest API to troubleshoot the issuesExperience with application monitoring, alerting, and logging toolsExperience on Tcp Dump and Wireshark to find out the network level issues and bottleneckStrong problem-solving and communication skillsHands-on AWS SysOps expertiseNice to Have:Experience in SRE, Production Support, or Platform OperationsAWS SysOps Administrator certificationExperience supporting high-availability or customer-facing systemsSuccess Metrics:Reduction in production incidents and repeat issuesImproved alert quality and response timeFaster incident solution (MTTR)Effective enablement of NOC and SQT teams
Job Title
Site Reliability Engineer