About T-Mobile:T-Mobile US, Inc. (NASDAQ: TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.About TMUS Global Solutions:TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.TMUS India Private Limited operates as TMUS Global Solutions.What You’ll Do:- Own day-to-day reliability and operational support for assigned cybersecurity platforms and services. - Design and implement automation scripts and workflows to eliminate manual operations and prevent recurring issues. - Monitor service health, analyze alerts, and maintain operational KPIs and dashboards. - Participate in incident response, troubleshooting, and root cause analysis, driving fixes for assigned issues. - Contribute to service resilience, performance tuning, and capacity planning. - Build and maintain CI/CD pipelines and support reliable, repeatable deployments. - Operate and troubleshoot Docker and Kubernetes-based workloads. - Support cloud-native services on AWS and Azure, including configuration, performance, and cost awareness. - Maintain and enhance Power BI dashboards for reliability, incident, and automation metrics. - Fix production bugs and remediate security vulnerabilities and configuration gaps. - Apply and help evolve SRE practices such as SLIs, SLOs, error budgets, and automation-first operations. - Collaborate with software engineering and cybersecurity teams to improve operational readiness and security posture. - Perform additional duties and projects as needed.What You’ll Bring:- 5+ years of experience in Site Reliability Engineering, DevOps, platform engineering, or operations-focused engineering roles. - Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience). - Experience supporting production, security-critical systems. - Hands-on experience with AWS and/or Azure. - Strong scripting or programming skills using Python, Bash, PowerShell, Go, or Java. - Experience with CI/CD pipelines, DevOps tooling, and automated deployments. - Practical experience with Docker and working knowledge of Kubernetes. - Experience with monitoring, logging, alerting, and operational troubleshooting. - Familiarity with relational and/or NoSQL databases from an operational perspective. - Understanding of secure operations, IAM concepts, and vulnerability remediation. - Ability to work independently while collaborating effectively across teams.Must Have Skills:- 5+ years supporting production systems - Automation and scripting (Python, Bash, PowerShell) - Hands-on experience with AWS - CI/CD and DevOps practices - Docker and container-based operations - Monitoring, alerting, and incident response - Secure operations and configuration management - Day-to-day operational ownership of hosted applicationsNice To Have:- Power BI dashboard creation and maintenance - Kubernetes troubleshooting experience - Power BI dashboard creation and maintenance - Experience with SLIs, SLOs, and error budgets - Infrastructure-as-code (Terraform, ARM, Bicep) - Experience supporting cybersecurity platforms - Exposure to AIOps or anomaly detection - Experience building security automation or SOAR solutions
Job Title
Senior Engineer, Site Reliability [T500-22829]