Job Description

PrimaLabs builds systems that help enterprises run large-scale AI workloads efficiently on real hardware. Our focus is on optimizing inference performance, cost, and reliability across modern accelerator platforms.We work directly with enterprise customers deploying frontier models on next-generation GPUs and AI accelerators. Our platform continuously discovers optimal runtime configurations to maximize throughput, reduce latency, and improve cost efficiency.Role OverviewPrimaLabs is hiring a Senior ML Systems Engineer to own the optimization engine that runs on real customer hardware.You will work on tuning and benchmarking inference systems across GPUs like NVIDIA H200 Tensor Core GPU, NVIDIA B200 Tensor Core GPU, and AMD Instinct MI300X.Your work will power PrimaLabs’ automated optimization stack, including runtime tuning, benchmarking pipelines, and integration with large-scale hyperparameter search frameworks such as DeepHyper.You will also work directly with customers during deployments, ensuring our system delivers measurable performance gains on real production infrastructure.Key ResponsibilitiesInference Runtime OptimizationTune and optimize inference systems using vLLM and SGLangProfile model performance across different hardware and runtime configurationsIdentify and eliminate performance bottlenecks (memory bandwidth, kernel inefficiencies, batching behavior)Benchmarking & Performance AnalysisDesign and execute benchmark suites for real customer workloadsMeasure throughput, latency, memory utilization, and cost efficiencyBuild standardized benchmarking frameworks for new models and hardwareOptimization InfrastructureBuild systems for large-scale configuration sweeps and automated tuningIntegrate runtime parameters, hardware constraints, and workload characteristics into search pipelinesMaintain and extend the DeepHyper-based optimization pipelineCustomer DeploymentsWork directly on enterprise deployments running on modern AI acceleratorsSupport benchmarking and optimization during customer onboardingDeliver performance improvements tailored to customer hardware environmentsHardware-Aware Systems EngineeringOptimize workloads across GPUs including:NVIDIA H200 Tensor Core GPUNVIDIA B200 Tensor Core GPUAMD Instinct MI300XUnderstand memory hierarchy, GPU scheduling, and model parallelism strategiesRequired Background5+ years experience in ML infrastructure or high-performance ML systemsDeep experience with LLM inference runtimesStrong skills in:Performance profilingGPU utilization optimizationSystems debuggingHands-on experience with:vLLM, SGLang, or similar inference runtimesGPU profiling toolsPython + systems-level debuggingNice to HaveExperience working with large-scale inference serving systemsFamiliarity with GPU kernel profiling tools (Nsight, ROCm profiler)Experience with distributed inference or model parallelismExposure to hyperparameter optimization frameworks such as DeepHyperPrevious work with cutting-edge AI hardware deploymentsWhat Makes This Role UniqueWork directly with next-generation AI hardwareSolve real performance problems on enterprise deploymentsBuild the core optimization engine of PrimaLabsClose collaboration with founders and direct impact on customer success

Job Title

Company : Credflow AI

Location : Gurugram, Uttar pradesh

Created : 2026-03-15

Job Type : Full Time