Job Description

Must have deep Linux, GPU and Python knowledge.We are seeking an experienced Linux & GPU Engineer with deep expertise in NVIDIA GPU technologies to support and scale our cloud and on-premise software platform. You will play a key role in building and optimizing compute environments for intensive scientific, engineering, and AI workloads—ranging from large-scale simulations to deep learning training clusters.This position is ideal for someone who thrives at the intersection of GPU computing, Linux, parallel computing, and high-throughput systems, and who’s passionate about enabling researchers, engineers, and scientists to work at scale.Work from Home! Compensation 11 LPA. Please do not apply if compensation is not acceptable.You'll work with our founders and US members plus off-shore cloud and AI team.Responsibilities: Design, implement, and manage GPU Cloud and on-premise software with a focus on NVIDIA GPU Install, configure, and maintain GPU drivers, CUDA libraries, and NVIDIA software stacks (e.g., NCCL, cuDNN etc).Create virtual machines for AI tools and packagesUse Linux and networking knowledge to simplify and make virtual machine deployment smoothHelp customers in domains like AI, simulation, machine learning, data analytics, and scientific computing.with installation and operationsWork on advanced clustering, scheduling and workload management systems such as SlurmWork with containerized environments with Kubernetes and Docker.Collaborate with researchers and engineers to profile, optimize, and troubleshoot GPU-intensive workloads.Monitor system health, job performance, and GPU utilization across nodes.Contribute to architecture decisions, scaling strategies, and automation of infrastructure provisioning and maintenance.Required Skills and Experience: 3 years of experience in Linux and GPU hardware and tools such as MIG for GPU partitioning etcHands-on experience with NVIDIA GPUs in a production or research computing setting.Understanding of HPC workload managers (e.g., Slurm) and job scheduling policies.Familiarity with parallel computing (MPI, OpenMP) and large-scale system architecture.Experience with Linux system administration and scripting languages (e.g., Bash, Python).Familiarity with high-speed interconnects (e.g., InfiniBand, RDMA).Strong problem-solving and communication skills.Preferred Qualifications:Experience with AI/ML workflows (e.g., TensorFlow, PyTorch on GPUs).Exposure to container orchestration , using tools like Singularity or Kubernetes.Experience with GPU monitoring and observability tools (e.g., DCGM, Prometheus, Ganglia).Familiarity with automation tools (e.g., Ansible, Terraform).Proficiency with CUDA programming, GPU performance tuning, and hardware benchmarking.NVIDIA certifications (e.g., DLI, CUDA Developer).

Job Title

Company : Qubrid AI

Location : Dehradun, Uttarakhand

Created : 2025-07-23

Job Type : Full Time