Key Responsibilities:Define and implement Site Reliability Engineering (SRE) practices to ensure high availability, scalability, and security.Collaborate with infrastructure teams to design and deliver robust solutions across cloud, on-premises, and data center environments.Embed security throughout development and deployment workflows, ensuring secure-by-design principles.Champion automation and Infrastructure as Code (IaC) to streamline operations and improve consistency.Leverage AIOps for proactive monitoring, anomaly detection, and automated incident response.Establish observability frameworks and reliability standards, including SLIs, SLOs, and SLAs.Work closely with network teams to architect secure, scalable network solutions and manage data center connectivity.Optimize CI/CD pipelines and source control processes for efficiency and reliability.Ensure compliance, redundancy, and disaster recovery strategies are in place and continuously validated. Required Skills & Qualifications:Technical CompetenciesCloud & Container Platforms: Azure, AKS, OpenShift.DevSecOps Tools: Jenkins, Azure DevOps, Bitbucket, GitHub.Infrastructure Automation: Terraform, Ansible.Networking: Cisco switches, firewalls, VPNs, load balancers.Monitoring & AIOps: New Relic, Azure Insights, ML-based anomaly detection.Programming: Python, Go, Bash.Data Center Operations: Capacity planning, redundancy, disaster recovery. Behavioral CompetenciesStrong analytical and problem-solving skills.Ability to work independently and drive initiatives end-to-end.Excellent communication and stakeholder management.Adaptability to emerging technologies (AI Ops, automation). EDUCATION AND EXPERIENCE REQUIREMENTS:Bachelor’s degree in information technology or related field required8+ years in SRE/Infrastructure roles with architecture-level responsibilities.Hands-on experience in cloud platforms, CI/CD, and automation frameworks.Proven track record in network architecture and data center operations.Familiarity with AIOps and AI/ML concepts for predictive analytics.
Job Title
Site Reliability Engineering architect