Skip to Main Content

Job Title


Site Reliability Engineer


Company : TECEZE


Location : Chennai, Tamil Nadu


Created : 2025-12-20


Job Type : Full Time


Job Description

Job Title:Site Reliability Engineer (SRE) – Core IT InfrastructureLocation:Chennai/ pune/ bangaloreCompany:TecezeAbout TecezeTeceze is a global IT services and consulting organization delivering innovative, scalable, and secure technology solutions. We specialize in infrastructure services, cloud transformation, DevOps, and managed services, helping enterprises achieve operational excellence and digital resilience.Job SummaryTeceze is looking for a highly skilled Site Reliability Engineer (SRE) to join our Core IT Infrastructure team. The ideal candidate will focus on designing, building, and maintaining reliable, scalable, and highly available infrastructure platforms. This role blends software engineering, systems engineering, and operational excellence to ensure stability, performance, and automation across enterprise environments.⸻Key ResponsibilitiesInfrastructure Reliability & Operations• Design, implement, and maintain highly available and fault-tolerant infrastructure• Ensure reliability, performance, scalability, and security of core IT systems• Monitor system health, capacity, and performance using proactive observability practices• Lead incident response, root cause analysis (RCA), and post-incident reviewsAutomation & SRE Development• Develop and maintain automation tools, scripts, and frameworks to reduce manual operations• Apply Infrastructure as Code (IaC) principles using tools such as Terraform, Ansible, or CloudFormation• Build self-healing systems and automate repetitive operational tasks• Improve deployment pipelines and operational workflows through engineering solutionsDevOps & Platform Engineering• Collaborate with DevOps, development, and security teams to support CI/CD pipelines• Enable seamless application deployments with minimal downtime• Support containerized and orchestration platforms (Docker, Kubernetes, OpenShift)• Implement best practices for configuration management and environment consistencyMonitoring, Observability & Performance• Design and maintain monitoring, logging, and alerting systems• Define and track SLIs, SLOs, and SLAs• Optimize system performance, capacity planning, and cost efficiency• Enhance observability using tools such as Prometheus, Grafana, ELK, Datadog, or similarSecurity & Compliance• Implement infrastructure security best practices• Collaborate with security teams on vulnerability management and compliance requirements• Ensure secure access, identity management, and audit readiness⸻Required Skills & QualificationsTechnical Skills• Strong experience in Linux/Unix system administration• Proficiency in programming/scripting (Python, Go, Bash, Shell, or similar)• Experience with cloud platforms (AWS, Azure, or GCP)• Hands-on experience with containerization and orchestration• Knowledge of networking concepts (DNS, TCP/IP, load balancing, firewalls)• Experience with monitoring, logging, and alerting tools