Skip to Main Content

Job Title


Junior Gen AI Engineer-AWS Bedrock, Vertex AI


Company : Bryckel AI


Location : Bareilly, Uttar pradesh


Created : 2026-02-17


Job Type : Full Time


Job Description

Junior ML Engineer – LLM Infrastructure & OrchestrationAbout UsWe are a legal AI platform that ingests entire contracts and runs long-context, multimodal LLM pipelines on AWS Bedrock (Claude) and Vertex AI (Gemini).We operate schema-constrained LLM systems: prompts define intent, and Pydantic models enforce structure, validation, and reliability across production workflows.We’re hiring an ML Engineer (~1 year experience) to own LLM orchestration, latency, and scaling for workflows already live with customers. Available to join immediately or within 1 monthThis role is production ML systems engineering, not model training.What You’ll DoBuild and operate end-to-end LLM pipelines for full-document analysis (100–500+ page contracts)Implement schema-first LLM inference using Pydantic to produce deterministic, typed outputsOwn LLM orchestration logic: prompt routing, validation, retries, fallbacks, and partial re-executionOptimize latency, throughput, and cost for long-context inference (batching, streaming, async execution)Build and scale OCR → document parsing → LLM inference pipelines for scanned leases (Textract)Develop streaming and async APIs using FastAPIManage distributed background workloads with Celery (queues, retries, idempotency, backpressure)Productionize report generation (DOCX/EXCEL) as deterministic pipeline outputsDeploy, monitor, and scale inference workloads on AWS (Bedrock, EC2, S3, Lambda)Debug production issues: timeouts, schema failures, partial extractions, cost spikesWhat You’ll Own TechnicallyPydantic-based schemas for all LLM outputsPrompt ↔ schema contracts and versioningValidation, retry, and fallback mechanismsLatency and cost optimization for long-context inferenceReliability of OCR + LLM pipelines at scaleMust HaveStrong Python and async programming fundamentals~1 year experience working on production ML or LLM systemsHands-on experience with Claude, Gemini, and AWS BedrockExperience with schema-constrained LLM outputs (Pydantic, JSON Schema, or similar)Experience with OCR and document-heavy pipelinesExperience with Celery or distributed async job systemsComfort treating LLMs as non-deterministic services requiring validation and retriesIndividual contributor mindset in a lean startupAvailable to join immediately or within 1 monthNice to Have (Strong ML Signals)Experience with streaming LLM responsesFamiliarity with long-context failure modes and truncation issuesExperience with LLM output evaluation or regression testingCost monitoring and optimization for LLM inferenceWhy Join UsWork on real production ML systems, not demosOwn core LLM infrastructure end-to-endDirect exposure to long-context, document-scale AIFully remote, fast-paced startupCTC: ₹9,00,000 – ₹12,00,000 (based on experience & impact)