Skip to Main Content

Job Title


Incident and Escalation Manager - AI/HPC/Storage


Company : DDN


Location : New delhi, Delhi


Created : 2025-05-07


Job Type : Full Time


Job Description

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing."DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC “The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIADDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence.Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management.Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.The Incident and Escalation Manager (IEM) plays a critical role within DDN’s Global Services and Support (GSS) organization. This senior-level position is responsible for managing high-impact customer incidents, executive-level escalations, and systemic problems in mission-critical environments. As a trusted leader, the IEM brings calm to chaos, drives rapid resolution of issues, and ensures structured communication internally and externally.This hybrid role merges operational command, technical insight, and continuous improvement — vital in supporting DDN’s cutting-edge storage solutions that power the world’s most advanced AI and HPC workloads.Key ResponsibilitiesIncident & Escalation LeadershipLead cross-functional incident response for high-severity issues (e.g., system outages, critical performance degradation) across global enterprise and AI / HPC customer environments.Serve as the Incident Commander, coordinating internal SMEs across Engineering, Product, and Support to drive rapid triage, mitigation, and resolution.Own and manage customer case and account escalations involving complex technical, operational, or service challenges.Provide real-time updates to executive stakeholders and customer leadership through bridge calls, case notes, and formal summaries.Root Cause & Problem ManagementFacilitate robust Post-Incident Reviews (PIRs) and Root Cause Analysis (RCA) for all critical events.Identify systemic gaps and ensure long-term corrective and preventative actions are implemented.Maintain and update Known Error records to support rapid future issue recognition and response.Continuous Service ImprovementCollaborate with Incident, Escalation, and Engineering teams to reduce incident recurrence and improve support processes.Lead and contribute to proactive programs focused on tooling, process automation, and data-driven service delivery.Analyze incident and escalation trends, presenting actionable insights to improve product reliability and customer experience.Customer AdvocacyAct as a key customer champion during critical situations, ensuring transparency, accountability, and empathy.Restore customer confidence in DDN through ownership, technical insight, and clear communication.Provide voice-of-customer insights to Product and Engineering teams to drive product enhancements.QualificationsRequired10+ years of experience in Incident Management, Escalation Management, Problem Management, or Technical Operations in the high-tech or enterprise IT space.Proven experience leading high-severity incidents and executive escalations in AI, HPC, or large-scale infrastructure environments.Strong technical background with the ability to grasp complex systems and collaborate with Engineering teams under pressure.Deep knowledge of ITIL frameworks, particularly around Incident, Problem, Change, and Escalation Management.Exceptional communication skills with the ability to manage both technical details and executive-level updates.Analytical thinker with strong data interpretation and reporting skills.PreferredITIL v3 or v4 Certification.Experience with tools such as Salesforce, Jira, Confluence, and monitoring platforms.Familiarity with DDN technologies, parallel file systems (e.g., Lustre), or large-scale data platforms.Background in managing customer-facing issues in a 24x7 support or cloud services model.Understanding of software development lifecycle and modern DevOps practices.Additional ExpectationsAvailability for on-call rotations, including weekend and holiday support when required.Ability to work across time zones and lead global teams in real-time incidents.Comfortable in high-pressure, customer-facing situations with strong decision-making capabilities.