Job Description

Site Reliability Engineer Were working with a global technology consultancy that designs, builds, and supports modern software platforms for enterprise customers worldwide. They partner closely with clients to deliver reliable, scalable, cloud-native solutions. The Role As an SRE, youll play a key role in ensuring the availability, performance, and scalability of production systems, supporting customers across the EMEA region. Helping to build, mature, and enhance the SRE function. This is a hands-on, technical role, focused on reliability, automation, and operational excellence across a distributed, cloud-based platform Key Responsibilities Platform Reliability: Deploy, operate, and improve Kubernetes clusters across multiple cloud environments. Service Performance: Design and implement processes to enhance system reliability, availability, and scalability. CI/CD Enablement: Build and optimise CI/CD pipelines to support safe, repeatable deployments. Observability & Incidents: Own monitoring, alerting, and incident response to minimise downtime and speed recovery. Root Cause Analysis: Lead post-incident reviews and implement long-term preventative improvements. Automation: Reduce operational toil through automation and performance optimisation. On-Call: Participate in weekday coverage and a once-monthly weekend rota. Collaboration & Stakeholder Engagement Work closely with engineering, infrastructure, and product teams to embed SRE best practices. Advocate for reliability, resilience, and operational excellence across teams. Collaborate with a globally distributed engineering function. Engage directly with customers to resolve incidents and improve user experience. Skills & Experience Proven experience as an SRE or similar role, supporting complex distributed systems (5+ years). Strong Kubernetes experience (AKS, EKS, GKE, or similar). Hands-on with observability tools such as Prometheus, Grafana, Kibana, Vector, or Superset. Experience with at least one major cloud platform: AWS, Azure, GCP, or Linode. SQL database experience (PostgreSQL beneficial but not essential). Proficiency in Python, Go, or Rust. Strong Linux expertise, including performance tuning and troubleshooting. Excellent communication skills, able to work effectively with engineers and customers. customers and cross-functional team Please apply now if you are meeting the above criteria, or contact Andrew Harrison directly. Skills: SRE AWS Azure Kubernetes Terraform CI/CD Python Benefits: Work From Home

Job Title

Company : Ocho

Location : belfast,

Created : 2026-01-10

Job Type : Full Time