Job Description

Maxonic maintains a close and long-term relationship with our direct client. In support of their needs, we are looking for an RLHF (Reinforcement Learning with Human Feedback) Expert.Job Description:Job Title: RLHF (Reinforcement Learning with Human Feedback) ExpertJob Type: Full-timeJob Location: RemoteWork Schedule: Day Shift (IST)About this roleMaxonic, a growing AI firm, is looking for an energetic and committed individual with a passion to push the boundaries of AI and machine learning for its India office. This is an unparalleled, high-impact opportunity to work on cutting-edge alignment research while helping to drive Maxonic forward in the field of applied AI.Dive deep into the world of AI safety and large language models. Work with some of the most advanced RLHF methodologies and help design systems that learn from and align with human intent. Become part of a forward-looking AI team and help bring responsible, human-centered AI to life.Responsibilities:- Design, develop, and optimize Reinforcement Learning with Human Feedback (RLHF) systems for fine-tuning large language models and other generative AI models.- Build and refine reward models using human preference data (rankings, comparisons, scoring).- Collaborate with human annotators, data scientists, and product stakeholders to design effective feedback collection workflows.- Stay up to date with state-of-the-art RL algorithms such as PPO, DPO, and preference-based learning, and apply them in real-world scenarios.- Lead or contribute to projects that integrate RLHF pipelines into broader AI systems and production deployments.Desired Skills and Experience- Undergraduate Degree, preferably in computer science, machine learning, or a related field.- Strong analytical and problem-solving skills, with a curious mindset and eagerness to experiment.- Excellent communication and interpersonal skills.- Demonstrated passion for applied AI, reinforcement learning, and how human-centered systems can add real-world value.- Strong ability to reason through complexity and design robust, safe, and scalable systems.- Comfort in working independently as well as collaboratively in a cross-functional team.Technical Qualifications:The following skills are valued, but not essential:- Experience with RLHF techniques, such as supervised fine-tuning (SFT), reward modeling, and policy optimization using PPO or related algorithms.- Proficiency in Python and familiarity with ML libraries like PyTorch, Hugging Face Transformers, TRL, and RL libraries such as Ray RLlib or Stable Baselines.- Exposure to large language model training or fine-tuning workflows.- Familiarity with prompt engineering, natural language processing, or human-in-the-loop systems.- Understanding of MLOps tools, distributed training, and experiment tracking platforms.- Experience with human data labeling, annotation platforms, or quality control strategies is a plus.About Maxonic:Since 2002 Maxonic has been at the forefront of connecting candidate strengths to client challenges. Our award winning, dedicated team of recruiting professionals are specialized by technology, are great listeners, and will seek to find a position that meets the long-term career needs of our candidates. We take pride in the over 10,000 candidates that we have placed, and the repeat business that we earn from our satisfied clients.Interested in Applying?Please apply with your most current resume. Feel free to contact Paduka Padhy (paduka@) for more details.

Job Title

Company : Maxonic Inc.

Location : Eluru, Andhra pradesh

Created : 2025-06-15

Job Type : Full Time