Job Title: Technical Lead- SREJob Type: Full TimeLocation: RemoteAbout the Job:What You Will DoTechnical Leadership & ArchitectureOwn and drive the technical direction for your team's infrastructure systems, making architectural decisions that balance reliability, scalability, and cost.Design systems of moderate to high complexity using distributed systems best practices; anticipate future use cases and minimize technical debt.Conduct architectural reviews and advance design patterns across the organization.Identify and implement improvements to existing software architecture; define and expand design patterns to solve common platform problems.Define and enforce security best practices across team-owned systems; proactively surface gaps to senior leadership.Reliability & Operational ExcellenceOwn the reliability posture of team-owned services — establish SLOs, monitor SLAs, and hold the team accountable to them.Lead incident response for complex, multi-service issues; systematically debug, identify root causes, and ensure issues do not recur.Establish standards for logging, monitoring, and operationalization across all team-owned systems.Foresee potential operational issues and implement preventative measures to safeguard the customer experience.Participate in and help lead the on-call rotation; ensure production systems are appropriately instrumented.Project & Delivery OwnershipAct as DRI (Directly Responsible Individual) for medium-to-large SRE projects spanning months and involving cross-team collaboration.Partner with Engineering Managers and Product Managers to scope roadmap initiatives, break down work into actionable increments, and commit to delivery plans.Negotiate scope effectively when required, ensuring adjustments align with customer needs and project goals.Proactively identify and resolve project risks — dependencies, architectural drift, and staffing blockers — before they impact delivery.AI-Augmented EngineeringDemonstrate mastery of AI-driven development practices and integrate them into end-to-end feature and infrastructure delivery.Contribute improvements to internal AI prompt libraries, coding workflows, and AI usage best practices for the team.Use AI tools to accelerate creation of technical documents, design proposals, runbooks, and exploration of alternative solutions.Stay current with emerging AI development patterns and bring relevant innovations back to the team.Coach teammates on responsible, efficient, and effective use of AI tools (e.g., Cursor, Augment) across the software development lifecycle.What We Are Looking ForRequired Experience7+ years of experience in Site Reliability Engineering, DevOps, or Platform Engineering in a production cloud environment.5+ years of hands-on experience with AWS cloud services across compute, networking, storage, and security.5+ years managing Linux-oriented production environments at scale.5+ years using Infrastructure-as-Code (Terraform, CDK, CloudFormation) and/or GitOps best practices.3+ years operating and troubleshooting production Kubernetes environments.3+ years applying AWS Well-Architected Framework principles across reliability, security, performance, and cost pillars.3+ years in cloud security best practices including IAM, secrets management, network security, and compliance.3+ years working with PostgreSQL in production: performance tuning, replication, backup, and recovery.Demonstrated track record of leading multi-person technical projects from scoping through delivery.Technical SkillsStrong general programming skills; comfort writing automation scripts and tooling in Python, Go, or similar.Deep knowledge of observability tooling — metrics, logging, distributed tracing — and how to use them to drive reliability.Solid understanding of data retention, backup, and recovery processes across cloud-native systems.Experience with CI/CD pipelines, release management, and deployment automation.Familiarity with service mesh, API gateway patterns, and microservices architectures.AI FluencyProficient with agentic coding assistants (e.g., Cursor, Augment, GitHub Copilot) for day-to-day engineering tasks.Able to use AI to break down complex infrastructure tasks, accelerate design documentation, and improve code review quality.Ability to critically evaluate AI-generated outputs and identify when outputs are suboptimal or unsafe.Leadership & CollaborationProven ability to lead technical discussions, drive alignment across engineering and product, and communicate decisions clearly to stakeholders.Experience mentoring junior and mid-level engineers in both technical skills and professional development.Able to operate independently with minimal supervision; comfortable making final technical decisions as DRI.Strong communication skills in English — written and verbal — with experience influencing cross-functional partners.
Job Title
Technical Lead- SRE