Skip to Main Content

Job Title


Senior Site Reliability Engineer


Company : Delta System & Software, Inc.


Location : Lucknow, Uttar pradesh


Created : 2026-03-18


Job Type : Full Time


Job Description

Senior Site Reliability EngineerLocation: Remote (India- Offshore)Job type: Full time or ContractShift time: 3PM to 12AM ISTTotal Experience: 6 + years The Senior Site Reliability Engineer is responsible for the availability, performance, serviceability, and recoverability of production systems supporting flight operations, maintenance, and compliance workflows.This role owns production reliability outcomes as systems scale, migrate, and evolve within regulated aviation environments.What You Will OwnSQL to RDS migration experienceExperience with DMS or similar migration toolsReliability Ownership and Service HealthOwn availability, latency, throughput, and durability for production systemsDefine and maintain service level indicators and service level objectivesManage error budgets to guide engineering and operational decisionsEnsure reliability targets are met consistentlyProduction Architecture and ResilienceDesign and operate highly available multi availability zone and multi region architecturesEnsure controlled and observable failure behaviorDefine redundancy, graceful degradation, and automated recovery strategiesValidate failover and recovery through testingIncident Response and Operational MaturityLead response to production incidentsOwn root cause analysis focused on systemic contributorsDrive remediation actions to completionReduce incident frequency, severity, and blast radius over timeObservability and Operational InsightDesign centralized logging, metrics, alerting, and dashboardsDefine observability standards tied to customer impactEnsure alerts are actionable and low noiseUse operational data for capacity planning and scaling decisionsAutomation and Toil ReductionIdentify and eliminate manual or repetitive operational tasksBuild automation to reduce operational riskStandardize operational workflowsTreat simplicity as a reliability requirementData and Database ReliabilityOwn production database reliabilityDesign replication, backup, restore, and failover strategiesValidate recovery procedures regularlyLead migrations to managed cloud databases such as AWS RDS or AuroraTechnical QualificationsCloud and InfrastructureHands on experience operating production systems on AWS or AzureStrong understanding of networking, IAM, load balancing, and managed servicesAbility to balance cost, reliability, and operational complexityDistributed SystemsExperience operating distributed systems in productionStrong understanding of partial failure and recovery patternsAbility to diagnose cross stack production issuesObservability and OperationsExperience with centralized logging, metrics, and alertingAbility to design alerts based on service impactExperience driving improvement from operational dataProgramming and AutomationStrong scripting skills using Python, Node.js, or shellAbility to write production grade operational toolingComfort modifying application code to improve reliabilityDatabasesExperience operating relational databases in productionExperience with replication, backup, restore, and failoverExperience migrating legacy databases to managed services preferredPreferred ExperienceExperience in regulated or safety critical industries such as aviationFamiliarity with compliance, auditability, and traceability requirementsExperience supporting systems with direct operational impact