Skip to Main Content

Job Title


Infini Band -L3


Company : Yotta Data Services Private Limited


Location : Mumbai, Maharashtra


Created : 2026-01-26


Job Type : Full Time


Job Description

Yotta Data Services Private LimitedDatacenter | Cloud | Managed IT | Network & Connectivity | Application Modernization | Cyber SecurityCSPs and Hyperscalers around the world are using InfiniBand products to revolutionize deep learning and data analytics, and to power data centers. Join the team building many of the largest and fastest AI/HPC systems in the world!About the RoleWe are looking for someone with the ability to work on a dynamic customer-focused team that requires excellent interpersonal skills. This role will be interacting with customers, partners and internal teams, to analyze, define and implement large-scale Networking projects. The scope of these efforts includes a combination of Networking, System Design and Automation and being the face to the customer!Responsibilities- 8 to 12 yrs as relevant experience - Primary responsibilities will include maintaining InfiniBand interconnect for AI/HPC infrastructure - Day-to-day operations include diagnosis of InfiniBand fabric, collecting logs, analysing the same and issue resolution - Closely working with server operations team - Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, real-time monitoring, logging, and alerting. - Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement. - Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. - Work with OEM by opening support ticket, documenting workaroundsQualifications- 8-12 years of professional experience in networking fundamentals, TCP/IP stack, InfiniBand fundamentals and data center architectureRequired Skills- Proficiency in configuring, testing, validating, and resolving issues in InfiniBand networks, especially in medium to large-scale HPC/AI environments. - Advanced knowledge of HPC/AI networking protocols. - Hands-on experience with InfiniBand network switch/router platforms - Strong focus on customer needs and satisfaction. - Self-motivated with leadership skills to work collaboratively with customers and internal teams. - Strong written, verbal, and listening skills are essential. - InfiniBand certification and storage operational experience managing large HPC clusters with IB as interconnect. - Having Knowledge like Mellanox OS, Cumulus Linux, SONiCLinux or Networking Certifications. - Knowledge in link level performance and diagnostics. - Experience with High-performance computing architectures. - Experience with GPU (Graphics Processing Unit) focused hardware/software. - Cluster/gpfs management technologies knowledgePreferred Skills- Bonus credit for Node provisioning software such as Base Command Manager, XCAT, HPCMQualificationMS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields.