Skip to Main Content

Job Title


SwarmBench Task Engineer (Knowledge/Research) - 75064


Company : Turing


Location : Lucknow, Uttar pradesh


Created : 2026-05-03


Job Type : Full Time


Job Description

About Turing:Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L.Role Overview:We are seeking a highly analytical and computationally proficient individual to join our team with a strong research background. You will be instrumental in contributing to this role by either crafting challenging and insightful problems in your respective research domain, devising elegant computational solutions.Responsibilities:Build multi-agent benchmark tasks that require reading, analyzing, and synthesizing large document collectionsCurate real-world research corpora — academic papers, case studies, technical reports — and design questions that require comprehensive analysisWrite structured ground-truth oracles (JSON) with specific, verifiable answers that prove the agent actually read the source materialDesign LLM judge prompts that evaluate agent output field-by-field against the oracleCreate decomposition guides that split research across multiple parallel sub-agents (one per document, one per domain, then synthesis)Required Qualifications:5+ years of research experience (academic or industry) in any scientific domainStrong reading comprehension with ability to extract structured data from unstructured textExperience with JSON and data structures, including schema design and output validationProficiency in Python scripting for data processing and evaluation (e.g., judge scripts)Familiarity with AI coding benchmarks such as SWE-bench and Terminal-benchHands-on experience with Docker (writing Dockerfiles, building images, debugging containers)High attention to detail, especially for creating precise evaluation oracles without approximationsNice to have:Experience with systematic reviews, meta-analyses, or large-scale literature surveysFamiliarity with medical, legal, or scientific document analysisExperience with NLP or information extraction tasksKnowledge of LLM evaluation and benchmarking (e.g., MMLU, GPQA, SimpleQA)Experience curating datasets for AI evaluationPerks of Freelancing With Turing:Work in a fully remote environment.Opportunity to work on cutting-edge AI projects with leading LLM companies.Potential for contract extension based on performance and project needs.Offer Details:Commitments Required : 40 hours /week with 4 hours of PST OverlapEngagement type : Contractor assignment/freelancer (no medical/paid leave)Duration of contract : 1 month; [expected start date is next week]