Skip to Main Content

Job Title


Senior System Administrator(HPC)


Company : Tata Consultancy Services


Location : Bengaluru, Karnataka


Created : 2025-12-18


Job Type : Full Time


Job Description

Role: Sr. HPC AdministratorDesired Experience Range: 7 - 12 yrsNotice Period: Immediate to 60 Days onlyLocation of Requirement: BangaloreJOB DESCRIPTION● Strong experience in providing support for Linux HPC clusters.● Strong working knowledge on Following:o IBM Platform LSF 9 and 10 administration.o Redhat Enterprise Linux Administration.o Lustre Parallel File system.o Mellanox Infiniband Connectivity.o Cluster Manager Administration (HPCM or xCAT)o SSSD & NIS Authentication mechanisms.o Bash & Python scripting.o Ansible playbooks.● Experience of Abaqus, and CFD application (Fluent and StarCCM..etc.,)● Strong knowledge of application installations and version management on shared file systems.● IT infrastructure Technical Operation Management under ITIL framework● Security compliance and remediation management.Intermediate Level● DevOps, ITIL, Agile, Safe (certifications are desirable)Responsibilities● Installation, configuration, troubleshooting and administration of Linux HPC clusters (compute,storage, and network) and applications in support of CAE environments.● Monitor and analyze LSF job queues and resource utilization to optimize workload management.● Troubleshoot and resolve any issues with LSF and its components, including master servers, computenodes, and resource managers.● Collaborate with users to understand their HPC requirements and design LSF job workflows to meettheir needs.● Develop and maintain LSF documentation, including standard operating procedures, installationguides, and troubleshooting procedures.● Develop and maintain LSF scripts for automation and task scheduling.● Diagnose and troubleshoot complex RHEL OS, application and HPC cluster technical problems.● Interact with hardware and software vendors for external support.● Develop and maintain technical solution documents (TSD) and standard operating procedures(SOP).● Keep all HPC infrastructure systems/servers/devices up to date and working condition to enhancebusiness continuity.● Design and implement HPC network topology, including Mellanox connectivity.● Create and maintain HPC capacity planning and periodical cluster utilization reports.● Troubleshoot Abaqus, StarCCM+ and Fluent applications, and resolve any issues in a timely manner.● Develop and maintain scripts for automation and task scheduling using Python and Bash scripting.