Skip to Main Content

Job Title


Site Reliability Engineer 3


Company : slice


Location : Bangalore, Karnataka


Created : 2026-03-18


Job Type : Full Time


Job Description

sliceA new bank for a new Indiaslice’s purpose is to make the world better at using money and time, with a major focus on building the best consumer experience for your money. We’ve all felt how slow, confusing, and complicated banking can be. So, we’re reimagining it. We’re building every product from scratch to be fast, transparent, and feel good, because we believe that the best products transcend demographics, like how great music touches most of us.Our cornerstone products and services: slice savings account, slice UPI credit card, slice UPI, and slice business are designed to be simple, rewarding, and completely in your control. At slice, you’ll get to build things you’d use yourself and shape the future of banking in India. We tailor our working experience with the belief that the present moment is the only real thing in life. And we have harmony in the present the most when we feel happy and successful together. We’re backed by some of the world’s leading investors, including Tiger Global, Insight Partners, Advent International, Blume Ventures, and Gunosy Capital.About the teamAs a Site Reliability Engineer 3, you’ll be part of the Platform & Infrastructure Engineering team. The team is responsible for designing, automating, building, and operating highly reliable cloud-native systems.About the roleIn this role you will be a core owner of production infrastructure and reliability for high-scale, regulated systems. You will operate at the intersection of cloud infrastructure, Kubernetes platforms, security, observability, and incident management, enabling engineering teams to ship confidently while meeting stringent availability, scalability, and compliance requirements. This role expects strong hands-on ownership, sound architectural judgment, and the ability to lead reliability initiatives end-to-end, not just execute tasks.What you will doDesign, build, and operate AWS-based production infrastructure spanning networking, compute, storage, security, and observability.Own Kubernetes (EKS) platforms and critical production workloads at scale, including upgrades, capacity planning, and operational stability.Architect and manage VPCs, routing, security groups, NACLs, ALB/NLB, and hybrid or private connectivity patterns.Implement and operate service mesh (Istio) for traffic control, security policies, and service-level observability.Design fault-tolerant, highly available architectures across multiple AZs and regions.Define, implement, and continuously improve Disaster Recovery (DR) strategies with clearly articulated RPO/RTO aligned to business SLAs.Lead production readiness reviews, capacity planning, and reliability improvements for critical services.Build and standardize infrastructure automation using Terraform, including reusable modules, guardrails, and a clean Terraform SDLC.Enable GitOps-driven deployments using ArgoCD and CI/CD pipelines (GitHub Actions or equivalent).Reduce operational toil through automation and by building self-service platform capabilities.Build scalable metrics, alerting, and logging pipelines that support high-traffic, low-latency systems.Lead incident response, drive blameless post-mortems, and translate learnings into systemic fixes.Partner closely with Security teams to implement defense-in-depth using Cloudflare, network firewalls, IAM, and AWS security primitives.Deep dive into Linux, networking, and system performance issues under real production load.Mentor junior engineers, set technical standards, and raise the overall reliability bar for the Infra team.What you will need7+ years of experience in SRE / DevOps / Platform / Infrastructure Engineering roles.Strong hands-on experience with AWS (EC2, ASG, ALB/NLB, IAM, VPC, networking).Proven experience operating Kubernetes (EKS) in production, high-availability environments.Deep understanding of networking fundamentals (routing, DNS, load balancing, firewalls).Strong experience with Terraform and infrastructure automation at scale.Excellent Linux fundamentals and production troubleshooting skills.Solid understanding of monitoring, alerting, and logging systems.Experience with Golang (preferred) or strong proficiency in Python / TypeScript.Prior exposure to service mesh architectures (Istio or similar).Experience operating systems with strict uptime, security, and compliance requirements.Experience leading incidents and reliability initiatives, not just participating in them.Life at sliceLife so good, you’d think we’re kidding:Competitive salaries. Period.An extensive medical insurance that looks out for our employees & their dependents. We’ll love you and take care of you, our promise.Flexible working hours. Just don’t call us at 3AM, we like our sleep schedule.Tailored vacation & leave policies so that you enjoy every important moment in your life.A reward system that celebrates hard work and milestones throughout the year. Expect a gift coming your way anytime you kill it here.Learning and upskilling opportunities. Seriously, not kidding.Good food, games, and a cool office to make you feel like home. An environment so good, you’ll forget the term “colleagues can’t be your friends”.