Skip to Main Content

Job Title


Application DevOps SRE


Company : Delpath


Location : Toronto, Ontario


Created : 2025-11-06


Job Type : Full Time


Job Description

L2 Application DevOps SRELocation: TorontoContract (with potential for extension or conversion to FTE) Schedule: Includes participation in a 24/7 rotating on-call support scheduleAbout the Role Were seeking a proactive and technically skilled L2 Application Support Engineer with DevOps/Site Reliability Engineering (SRE) capabilities to join our team. This role is designed to cover a paternity leave, with the possibility of extension or conversion to full-time employment if the stars align.Youll play a critical role in supporting application stability, driving automation, and ensuring seamless deployments across hybrid and public cloud environments. The ideal candidate brings a can-do attitude, strong engineering fundamentals, and a passion for operational excellence.Key ResponsibilitiesProvide L2 application support and DevOps/SRE expertise, including automation, deployment, and monitoring.Drive and implement service stability, automation, and optimization initiatives aligned with SLAs.Lead efforts to operationalize and stabilize workloads migrated to hybrid/public cloud environments.Support cloud workload migrations by validating readiness, ensuring observability, and automating Day 2 operations.Collaborate with DevSecOps and architecture teams to build automated deployment, monitoring, and recovery pipelines.Develop scalable solutions to reduce manual intervention and enable self-healing and auto-scaling mechanisms.Optimize performance using tools like Dynatrace, Grafana, Splunk, and implement automated anomaly detection.Analyze testing and production trends, conduct root cause analyses, and drive performance tuning with Agile squads.Maintain comprehensive technical documentation including runbooks, SOPs, post-mortems, and architecture overviews.Provide technical leadership in security, tech currency, and vulnerability remediation with a resiliency-first mindset.Participate in a 24/7 rotating on-call schedule, swiftly responding to escalations and mitigating service disruptions.Must HavesProficiency in version control systems (e.g., Git) for codebase management and collaboration.Strong scripting skills in PowerShell, Python, and Ansible.Experience with microservices and containerization (e.g., Docker).Hands-on experience with tools such as Azure Monitor, Splunk, Dynatrace, Grafana, and ServiceNow.Solid understanding of AKS architecture: clusters, namespaces, nodes, pods, services, autoscaling, ingress policies.Familiarity with Agile development methodologies and tools (e.g., JIRA, GitHub).Knowledge of the software lifecycle including release planning, testing, incident, problem, and change management.Ability to analyze data and proactively mitigate risks to production systems.Strong written and verbal communication skills for seamless stakeholder engagement.Energetic team player with a collaborative mindset and stakeholder empathy.Nice to HaveSite Reliability Engineering (SRE) CertificationExperience with OpenShift KubernetesFamiliarity with DevOps principles and practicesUnderstanding of database systems such as Oracle and PostgreSQLMicrosoft Certified: Azure Administrator Associate (or equivalent in AWS/GCP)Experience working in Agile squadsExperience in Financial and Payments servicesITIL CertificationEducationBachelors degree in Computer Science, Information Technology, or a related field.