Job Description

Must have deep High Performance Computing (HPC), Linux and Python knowledge.We are seeking an experienced HPC Engineer with deep expertise in NVIDIA GPU technologies to support and scale our high-performance computing infrastructure. You will play a key role in building and optimizing compute environments for intensive scientific, engineering, and AI workloads—ranging from large-scale simulations to deep learning training clusters.This position is ideal for someone who thrives at the intersection of hardware acceleration, parallel computing, and high-throughput systems, and who’s passionate about enabling researchers, engineers, and scientists to work at scale.Work from Home! Compensation 6-7LPA. Please do not apply if compensation is not acceptable.You'll work with our founders and US members plus off-shore cloud and AI team.Responsibilities: Design, implement, and manage HPC clusters with a focus on GPU acceleration using NVIDIA hardware.Install, configure, and maintain GPU drivers, CUDA libraries, and NVIDIA software stacks (e.g., NCCL, cuDNN, Nsight).Tune performance of GPU-enabled applications in domains like simulation, machine learning, data analytics, and scientific computing.Support scheduling and workload management systems such as Slurm, PBS, or HTCondor.Work with containerized HPC environments using Singularity or Docker.Collaborate with researchers and engineers to profile, optimize, and troubleshoot GPU-intensive workloads.Monitor system health, job performance, and GPU utilization across nodes.Contribute to architecture decisions, scaling strategies, and automation of infrastructure provisioning and maintenance.Required Skills and Experience: 2-3 years of experience in HPC systems engineering or scientific computing environments.Hands-on experience with NVIDIA GPUs in a production or research computing setting.Proficiency with CUDA programming, GPU performance tuning, and hardware benchmarking.Solid understanding of HPC workload managers (e.g., Slurm) and job scheduling policies.Familiarity with parallel computing (MPI, OpenMP) and large-scale system architecture.Experience with Linux system administration and scripting languages (e.g., Bash, Python).Familiarity with high-speed interconnects (e.g., InfiniBand, RDMA).Strong problem-solving and communication skills.Preferred Qualifications:Experience with AI/ML workflows on HPC systems (e.g., TensorFlow, PyTorch on GPUs).Exposure to container orchestration in HPC, using tools like Singularity or Kubernetes.Experience with GPU monitoring and observability tools (e.g., DCGM, Prometheus, Ganglia).Knowledge of storage systems commonly used in HPC (Lustre, BeeGFS, GPFS).Familiarity with automation tools (e.g., Ansible, Terraform).NVIDIA certifications (e.g., DLI, CUDA Developer).

Job Title

Company : Qubrid AI

Location : Mysore, Karnataka

Created : 2025-04-29

Job Type : Full Time