Job Description

About the role As a Site Reliability Architect (SRE), you will make an impact by building and leading a modern, AIu2011enabled SRE organization that improves the availability, performance, and resilience of largeu2011scale retail and supply chain platforms. You will be a valued member of a global engineering leadership team and work collaboratively with product, infrastructure, cloud, and business stakeholders to drive a transition from legacy operations to an SLOu2011driven reliability culture. In this role, you will: Build and scale an enterprise SRE function from the ground up, defining standards, operating models, and career paths. Own availability, latency, and performance for a complex omnichannel ecosystem, including highu2011traffic web applications, APIs, and GraphQL layers. Define and execute a multiu2011year SRE strategy, transitioning legacy environments to modern, automationu2011first and SLOu2011based practices. Lead infrastructure reliability across hybrid environments, bridging cloudu2011native platforms with onu2011prem retail store systems and thin/thick client architectures. Design and operate scalable eventu2011driven architectures, including highu2011throughput Kafka platforms supporting global inventory and POS systems. Standardize enterprise observability using tools such as Dynatrace, New Relic, and Google Cloud Monitoring to enable proactive issue detection and faster incident resolution. Architect and deploy AIu2011enhanced operations, leveraging LLMs, AI agents, and MCPu2011based workflows to automate root cause analysis, reduce toil, and enable selfu2011healing systems. Partner with engineering, vendors, and external partners to align reliability goals with overall business outcomes. Work model We strive to provide flexibility wherever possible. Based on this roleu2019s business requirements, this is a remote position open to qualified applicants in the United States. Regardless of your working arrangement, we are here to support a healthy worku2011life balance through our wellbeing programs. The working arrangements for this role are accurate as of the date of posting. This may change based on project, business, or client requirements. We will always be clear about role expectations. What you need to have to be considered 12+ years of progressive experience across Site Reliability Engineering, DevOps, infrastructure, or platform engineering in large, distributed environments. Demonstrated experience building and leading SRE organizations within complex enterprise or global environments. Deep handsu2011on experience with cloud platforms (GCP preferred) or multiu2011cloud environments, including Kubernetes (GKE/EKS) and Infrastructure as Code (Terraform). Strong knowledge of modern microservices and middleware technologies, including Kafka and GraphQL, operating at scale. Proven ability to think strategically while operating handsu2011on, influencing crossu2011functional teams and senior stakeholders. Experience managing vendors, partner teams, and thirdu2011party solutions within a broader product or platform portfolio. Ability to translate complex technical concepts into clear business value for both engineering and nonu2011technical stakeholders. These will help you stand out Experience supporting largeu2011scale retail, eu2011commerce, or supply chain platforms with hybrid (cloud + onu2011prem) architectures. Handsu2011on experience applying LLMs, AI agents, or automation frameworks to improve incident management and predictive maintenance. Deep understanding of retail store networking, local hardware constraints, and thin/thick client models. Successful track record driving cultural change toward reliability engineering, automation, and SLOu2011based operations. Strong leadership presence with the ability to mentor senior engineers and develop highu2011performing global teams. Please note: This role will require an in-person meet and greet at our Cognizant offices or client location. Bacheloru2019s degree in computer science, IT or equivalent Applications will be accepted until April 21st, 2026. Salary and Other Compensation: The annual salary for this position is between $89,100 to $141,500 depending on experience and other qualifications of the successful candidate. This position is also eligible for Cognizantu2019s discretionary annual incentive program, based on performance and subject to the terms of Cognizantu2019s applicable plans. Benefits: Cognizant offers the following benefits for this position, subject to applicable eligibility requirements: u00b7 Medical/Dental/Vision/Life Insurance u00b7 Paid holidays plus Paid Time Off u00b7 401(k) plan and contributions u00b7 Long-term/Short-term Disability u00b7 Paid Parental Leave u00b7 Employee Stock Purchase Plan Disclaimer: The salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.Cognizant is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law.

Job Title

Company : MSCCN

Location : Columbus, OH

Created : 2026-04-17

Job Type : Full Time