Role: Support system reliability, automation, and operational efficiency by developing automation tools, improving monitoring systems, and contributing to infrastructure management. The role focuses on reducing manual operational tasks while ensuring high availability and performance of production systems.Years of experience - 3 to 5 yearsKey ResponsibilitiesAutomation DevelopmentDevelop and maintain automation scripts, tools, and workflows using technologies such as Python and Bash to automate manual operational tasks.System ReliabilityAssist in managing service reliability and availability, including monitoring, alerting, and incident response processes.Infrastructure as Code (IaC)Contribute to configuration management and Infrastructure as Code implementations using tools like Ansible, Terraform, and Puppet.Monitoring & ObservabilityBuild, tune, and maintain monitoring dashboards and observability systems using tools such as Prometheus, Grafana, and Datadog to ensure system health and performance.CI/CD Pipeline MaintenanceImprove and maintain continuous integration and deployment pipelines to streamline application deployments and infrastructure updates.Troubleshooting & Incident ResponseParticipate in on-call rotations to diagnose, troubleshoot, and resolve production incidents efficiently.What we need?Bachelor’s/Master’s degree in Computer Science, Information Technology, Engineering, or related field.Job Location: Hyderabad, Indore and Ahmedabad (India
Job Title
Site Reliability Engineer