Skip to Main Content

Job Title


Site Reliability Engineer (SRE) / Infrastructure Engineer


Company : iVedha Inc.


Location : waterloo, Ontario


Created : 2025-06-20


Job Type : Full Time


Job Description

About the RoleWe are seeking an experienced Site Reliability Engineer (SRE) / Infrastructure Engineer to join our Platform Engineering team. This role requires a hands-on technologist with deep expertise in cloud infrastructure, Kubernetes, DevOps, and SRE practices to ensure the performance, availability, scalability, and security of mission-critical platforms.Key ResponsibilitiesDesign, implement, and maintain highly available, scalable, and secure infrastructure across AWS, Azure, and GCP.Build and automate CI/CD pipelines using Azure DevOps, Jenkins, Ansible Tower, and Terraform.Manage containerized applications using Kubernetes, Docker, AKS, EKS, and GKEDevelop and enforce SRE best practices including monitoring, incident response, capacity planning, and reliability automation.Implement Infrastructure as Code (IaC) using Terraform, Bicep, ARM templates, and CloudFormation.Collaborate with development, QA, and security teams to integrate DevSecOps pipelines.Use observability tools (e.g., ELK, Kibana, ) for metrics, logging, and alerting.Manage machine identity and key lifecycle with Venafi, TLS, and PKI-based automation.Lead root cause analysis and provide reliable fixes for complex infrastructure issues.Participate in architectural reviews, security audits, and disaster recovery planning.QualificationsMust-Have:10+ years in infrastructure, DevOps, or SRE roles within enterprise-grade environments.Proven experience with AWS, Azure, and GCP cloud services.Hands-on expertise in Kubernetes (AKS/EKS/GKE), Helm, Docker.Strong scripting skills in Python, Bash, PowerShell.Experience with Terraform, Ansible.Familiarity with CI/CD tools: Jenkins, Azure DevOps, Octopus, GitHub Actions.In-depth knowledge of Linux, Windows Server, and hybrid cloud environments.Solid understanding of networking, load balancing (NGINX, F5, ELB), and firewalls.Knowledge of security best practices and tools (e.g., IAM, TLS, PKI, SIEM, WAF, DAST/SAST).Nice-to-Have:Experience with Apache airflow, snowflake , and big data pipelines.Familiarity with SRE maturity models and service level objectives (SLOs, SLIs, SLAs).