Job Description

Datacenter Observability and Site Reliability Engineer Location:Remote, IndiaContract Duration: 6 months+Shift timings: 5.30 am to 1.30 pm ISTHave managed OE platform running Grafana, mimir, Prometheus, loki etc and IAC – terraform or coding.**Key Requirements**5+ Observability Engineering with deep understanding of the Grafana software stack and who has experienced in building and maintaining large, scaled enterprise observability stack.Required to be responsive and flexible to Korea hours as needed. be able to support fast-pacedPreferred someone experienced with:Loki, Grafana, Mimir and Alloy agentInfra – GPU/CPU/K8s metrics and logsQualifications:Experience:8+ years of experience in datacenter observability and site reliability engineering.Proven experience in managing and optimizing large-scale datacenter environments.Technical Skills:Proficiency in observability tools and technologies (e.g., Prometheus, Grafana, ELK Stack).Experience with SRE practices and tools (e.g., Kubernetes, Docker, Terraform).Strong programming and scripting skills (e.g., Python, Go, Bash).Familiarity with cloud platforms (AWS, Azure, GCP) and their observability and reliability services.Soft Skills:Strong problem-solving skills and attention to detail.Excellent communication and collaboration skills.Ability to work in a fast-paced, dynamic environment.

Job Title

Company : Tekgence Inc

Location : Coimbatore, Tamil nadu

Created : 2025-06-21

Job Type : Full Time