Summary: Lead a team responsible for designing, implementing, and maintaining the infrastructure and processes that support the development, deployment, and operation of our software systems. You will play a critical role in driving efficiency, reliability, and scalability across our software development lifecycle while ensuring alignment with business objectives. This role requires strong leadership skills, technical expertise, and a strategic mindset to effectively manage resources, foster collaboration, and drive continuous improvement within the DevOps team. Duties & Responsibilities: Lead and manage a team of DevOps engineers, providing coaching, mentorship, and performance feedback. Collaborate with senior leadership to define and align DevOps strategies with overall business objectives. Architect, implement, and manage CI/CD pipelines to automate software build, test, and deployment processes. Design and maintain infrastructure as code using tools such as Terraform, Ansible, or Chef. Oversee the implementation and management of containerization solutions using Docker and orchestration tools like Kubernetes. Ensure the monitoring and troubleshooting of production systems to maintain high availability and reliability. Drive the adoption of best practices for infrastructure security, compliance, and governance. Evaluate and recommend new tools and technologies to improve efficiency and reliability of development and deployment processes. Collaborate with cross-functional teams to define infrastructure requirements and design scalable solutions. Champion a culture of collaboration, innovation, and continuous improvement within the DevOps team. Manage relationships with external vendors and service providers as needed. Develop and manage departmental budgets, forecasts, and resource allocation plans. Minimum Qualifications: Bachelors' degree in Computer Science, Engineering, or a related field. (Master's degree preferred) 5+ years of experience in software development, IT operations, or a related field, with at least 2 years in a leadership or management role. Strong proficiency in scripting and programming languages such as Python, Bash, or Ruby. Experience with cloud computing platforms such as AWS, Azure, or Google Cloud Platform. In-depth knowledge of containerization technologies such as Docker and container orchestration tools like Kubernetes. Experience with configuration management tools such as Ansible, Puppet, or Chef. Proficiency in infrastructure as code concepts and tools such as Terraform. Hands-on experience with CI/CD tools such as Jenkins, GitLab CI, or CircleCI. Excellent problem-solving and troubleshooting skills. Strong leadership, communication, and collaboration skills, with the ability to motivate and inspire team members. Preferred Qualifications: Certifications in relevant technologies such as AWS Certified DevOps Engineer, Kubernetes Certified Administrator, etc. Experience with monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk. Knowledge of agile software development methodologies. Experience with implementing and managing microservices architectures.
Job Title
Reliability Engineering Manager