Job Description The Role The rapid adoption of advanced software in vehicles marks a new era for automakers and consumers, bringing both advantages and challenges. As part of Site Reliability Engineering (SRE) at General motors, you'll join a dedicated team focused on enhancing the reliability, efficiency, and scalability of our distributed systems. We leverage engineering principles to manage operations effectively and build solutions that enable us to grow without sacrificing performance or quality. Our SREs work closely with software development teams, acting as specialists in reliability and production engineering, with a focus on automation, observability, and shared responsibility. We are looking for individuals who are passionate about maintaining the health of our infrastructure while optimising for reliability and cost-efficiency. This role involves a blend of software engineering and systems engineering skills to keep our services resilient, robust, and scalable. This role is for a hands-on position as an Individual Contributor (IC). As an Software Engineer, Site Reliability Engineering IC you will focus on enhancing the reliability, efficiency, and performance of our services. You'll work closely with other engineers to develop automated solutions, respond to incidents, and drive improvements across our infrastructure. The expectation for this role is that you will maintain a hands-on approach, whether it's through scripting, troubleshooting incidents, or improving observability. As an IC, you will be at the forefront of solving technical challenges and making impactful improvements that directly enhance the quality of service for our users. The role requires a blend of software engineering and systems engineering skills to address complex production challenges effectively. What You'll Do + Automation and Reliability Improvements : Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention. + Observability and Monitoring : Lead, Implement and improve monitoring and observability frameworks, enabling proactive detection and resolution of incidents. + Incident Response : Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime and swift resolution. + Collaboration with Development Teams : Work alongside developers to ensure the quality, scalability, and reliability of our services. Practice shared ownership of services in production, fostering a
Job Title
Principal Software Engineer, Site Reliability Engineering