Lead QA – Automated Testing & AI ValidationExperience: 6+ YearsEmployment Type: ContractNotice Period: Immediate Role OverviewWe are seeking a Lead QA – Automated Testing & AI Validation with strong expertise in Python automation, LLM evaluation, RAG pipelines, observability, adversarial testing, and Azure monitoring. This role will lead quality assurance initiatives for cutting-edge Generative AI and Multi-Agent Systems.Key ResponsibilitiesLead and scale test automation initiatives with a focus on GenAI and AI systemsDesign and execute LLM evaluation frameworks using:LLM-as-a-Judge (G-Eval, custom evaluators)Hallucination detection, faithfulness, relevance, precision/recallImplement RAG evaluation frameworks (RAGAS or similar)Build Python-based automation frameworks using PyTest & DeepEvalIntegrate automation into CI/CD pipelines using GitHub ActionsDesign and validate multi-agent evaluation pipelines (tool usage, collaboration, reasoning chains)Perform adversarial and red-team testing:Prompt injectionJailbreak attacksBias and toxicity detectionConduct API testing for microservices (REST, async workflows)Monitor applications using Azure Application Insights & Log AnalyticsDefine automated scoring systems for GenAI outputsManage synthetic datasets and golden datasets for AI validationImplement observability and trace monitoring using LangFuse, LangSmith, or similar toolsMandatory SkillsTest Automation – Python, PyTest, DeepEvalLLM Evaluation – G-Eval, Custom Evaluators, LLM-as-a-JudgeRAG Evaluation – RAGAS, Retrieval MetricsEvaluation Metrics – Hallucination, Faithfulness, Relevance, Precision/RecallObservability & Monitoring – LangFuse, LangSmithCI/CD – GitHub ActionsMulti-Agent Testing – Reasoning & Tool ValidationAdversarial/Red Team Testing – Prompt Injection, Jailbreak, Bias/ToxicityAPI Testing – REST & Async WorkflowsAzure Monitoring – App Insights, Log AnalyticsSynthetic & Golden Dataset ManagementAutomated Scoring System Design for GenAI Outputs
Job Title
Lead QA – Automated Testing & AI Validation