High-Performance Computing (HPC) infrastructures provide users with dedicated compute resources to run computation-intensive workloads such as weather simulations, artificial intelligence (AI), and machine learning (ML). Each job submitted by a user may consist of multiple tasks that run concurrently on different nodes, often requiring shared access to intermediate or final data. To facilitate this, HPC systems typically use a Parallel File System (PFS) that allows data to be accessed across nodes. However, this same PFS is commonly shared among all users, meaning that multiple jobs access the storage system simultaneously. This shared usage can lead to I/O interference, where one user's job slows down due to competing I/O demands from other users, thereby affecting overall job execution time. To address this challenge, we are developing software that allows HPC infrastructure providers to provision isolated PFS instances for each user or job. This reduces interference by isolating I/O traffic. Additionally, we are designing our software to support dynamic performance scaling of PFS instances, integrate erasure-coded fault tolerance, and enable data tiering to object storage systems. If you are interested in contributing to this effort or would like to discuss it further, please reach out.Key Responsibilities Design, develop, and maintain high-performance software in C/C++ for system-level components. Develop and optimize kernel modules and device drivers for Linux-based systems. Lead the design and implementation of storage domain solutions, including filesystems and related technologies. Utilize advanced data structures and algorithms to solve complex system problems. Analyze and debug system-level issues, ensuring efficient problem resolution. Collaborate with cross-functional teams to architect scalable and robust software solutions. Perform code reviews, mentor junior engineers, and contribute to continuous process improvement.Required Skills and Qualifications Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field. 5-12 years of professional experience in system software development. Proficiency in C/C++ programming, with a strong understanding of object-oriented and low-level programming concepts. Expertise in Linux operating system internals, including process management, memory management, and I/O subsystems. Deep knowledge of storage technologies, filesystems, kernel programming, and device driver development. Solid understanding of data structures, algorithms, and their application in system-level programming. Excellent debugging skills, with experience using tools like GDB, strace, perf, and system logs. Strong problem-solving and analytical thinking abilities. Excellent communication and collaboration skills.Preferred Qualifications Experience with distributed storage systems or cloud storage solutions. Familiarity with virtualization, containers, or hypervisors. Hands-on experience in performance tuning and optimization. Knowledge of scripting languages (e.g., Python, Bash) for automation and testing. Why Join Us? Work on innovative, high-impact projects in system software engineering. Collaborate with a team of passionate and highly skilled professionals. Enjoy a culture that values creativity, innovation, and personal growth. Competitive salary and comprehensive benefits package.
Job Title
Senior Software Engineer