Position Summary:This role is hands-on and delivery-focused: you will move from ambiguity to production-ready designs while ensuring solutions are secure, monitored, and resilient - with clear human-in/on-the-loop checkpoints and strong controls against data leakage. You’ll also apply classical ML (supervised/unsupervised learning, feature engineering, evaluation, deployment) alongside GenAI to build hybrid solutions that perform reliably in production.Role:You will lead and contribute to high-complexity initiatives including:Agentic AI DeliveryArchitect and implement AI agents and agentic workflows for insurance use cases (e.g., claim triage, fraud detection signals, document understanding, decision support).Design multi-agent orchestration patterns (tool routing, planning, reflection/verification, delegation, prioritization, and fallback strategies).Build and maintain MCP-based integrations (Model Context Protocol) to connect agents with enterprise tools/services in a governed, repeatable way.Create reusable agent frameworks, templates, and accelerators for consistent delivery across multiple problem domains.RAG, Retrieval, and Knowledge SystemsDesign and implement RAG pipelines using embeddings, indexing strategies, chunking, reranking, and query rewriting.Build and tune vector embedding strategies for insurance corpora (policies, claims notes, SIU artifacts, adjuster docs, knowledge bases).Implement vector cache patterns to reduce latency/cost and improve response stability.Apply and integrate graph-based retrieval and reasoning (knowledge graphs / graph networks) for entity relationships and multi-hop retrieval.Traditional Machine LearningDesign, train, and deploy classical ML models for risk and operations use cases (fraud scoring, triage/prioritization, anomaly detection, severity prediction, propensity/next-best-action).Perform feature engineering across structured/semi-structured data sources (claims, policy, billing, customer interactions, documents/metadata).Select appropriate algorithms and techniques (e.g., logistic regression, tree-based models such as XGBoost/LightGBM/CatBoost, random forests, time series, clustering, outlier/anomaly detection, graph-based features, and calibration methods).Build robust evaluation pipelines (AUC/PR, lift, calibration, stability/drift metrics, fairness checks) and model validation aligned to the business decision context.Implement ML lifecycle best practices: reproducible training, versioning, experiment tracking, packaging, deployment, and monitoring.Develop hybrid AI systems where LLM/agents augment ML (e.g., using LLMs for enrichment/extraction while ML performs scoring; or ML acts as guardrails/routing logic for agent decisions).Observability, Monitoring, and AutomationImplement agentic workflow monitoring automation (latency, cost, tool success rates, retrieval hit-rate, hallucination indicators, quality metrics).Build traceability across prompts, tools, retrieval sources, and model outputs to support debugging and audit needs.Establish model output lineage and run-level provenance (input → retrieval context → tool calls → output → downstream actions).Governance, Risk Controls, and Fail-Safe DesignEngineer solutions with explicit controls for:Data leakage risk prevention (prompt injection defense, secrets handling, policy enforcement, data minimization).Auditability (decision rationale, evidence capture, reproducibility, and retention of run artifacts).Fail-safe behavior (timeouts, retries, circuit breakers, graceful degradation, safe defaults).Design human-in-the-loop / human-on-the-loop contact points for review, escalation, and override in high-risk steps.Partner with security and governance stakeholders to ensure solutions meet Erie’s enterprise controls and compliance expectations.Fraud Detection & Insurance Analytics EnablementCollaborate with fraud/SIU and analytics teams to design agent-supported workflows for:Case summarization and evidence gatheringSignal enrichment and prioritizationPattern discovery across claims, payments, providers, and narrativesSupport experimentation and production rollout with measurable success criteria and controls.Duties & ResponsibilitiesEssential FunctionsDesigns and implements production-ready agentic AI systems with orchestration, tools, and retrieval under minimal supervision.Designs, trains, and operationalizes traditional ML models with strong evaluation, validation, and monitoring.Builds secure, auditable, and traceable AI workflows with end-to-end observability and run-level lineage.Leads optimization of agent workflows for quality, latency, cost, and reliability; identifies bottlenecks and eliminates failure modes.Troubleshoots complex distributed failures across AI services, retrieval systems, tool integrations, and CI/CD pipelines.Additional ResponsibilitiesDevelops technical design documentation (architecture, data flow, threat modeling, observability plans, runbooks).Implements automated testing strategies (unit, integration, evaluation harnesses, regression suites for prompts/retrieval).Establishes best practices for prompt management, versioning, evaluation, and controlled rollout.Mentors engineers and contributes to engineering standards for enterprise AI delivery.Required QualificationsEducation & ExperienceBachelor’s degree in Computer Science, Engineering, Data Science, or related field (or equivalent practical experience).Minimum 5 years of hands-on experience building AI solutions, with deep, current hands-on delivery in modern agentic workflows.Extensive background and hands-on experience with Machine learning concepts with proven industry experienceCore Technical Requirements (Must Have)Proven experience architecting and implementing:AI agents, agent orchestration, and tool-using systemsAgentic workflow optimization (quality/cost/latency/reliability tradeoffs)Agent monitoring automation and operational runbooksStrong, hands-on traditional ML experience:Supervised learning, model selection, feature engineering, evaluation, calibration, and deploymentExperience with at least one: XGBoost/LightGBM/CatBoost, scikit-learn, PyTorch/TensorFlow (as appropriate)Proven experience building production scoring systems and monitoring model performance/driftStrong understanding and implementation experience with:Data leakage risks, prompt injection vectors, and mitigationsTraceability, auditability, and evidence-based outputs (citations/grounding)Fail-safe system design, robust error handling, and workflow troubleshootingHands-on experience with:AWS Bedrock (or equivalent managed LLM platforms) and secure enterprise integration patternsRAG, embeddings, vector databases, and retrieval tuningVector caching and performance optimizationGraph networks / knowledge graph concepts applied to retrieval or reasoningExperience designing human-in/on-the-loop workflow checkpoints and escalation patterns.Strong system design skills: distributed components, reliability patterns, scaling, and production support.Tooling & Engineering PracticesStrong proficiency in Python and/or TypeScript/Node.js in production environments.Experience with modern CI/CD, infrastructure-as-code, and cloud-native practices.Comfort working across APIs, event-driven workflows, and integration patterns.Preferred Qualifications (Nice to Have)Insurance domain experience: claims, underwriting, billing, SIU/fraud workflows, or document-heavy enterprise operations.Experience with AWS AI/ML services beyond Bedrock (or equivalents), such as:Document extraction, entity recognition, speech-to-text, vision, search, personalization, etc.Experience with feature stores, offline/online feature consistency, and real-time scoring architectures.Production experience with evaluation frameworks (offline/online evals, groundedness checks, red-teaming, regression testing).Experience implementing policy-as-code style controls for AI (guardrails, content filters, tool allowlists, PII handling).Experience building knowledge graphs / graph retrieval systems (entity resolution, relationship inference, graph queries).What Success Looks Like (First 30–60 Days)Delivers at least one end-to-end agentic workflow to a production-ready standard (or strong pilot) with:A measurable ML component (model + evaluation + monitoring) and/or hybrid ML + agent architectureRetrieval grounding and clear evidence trailsMonitoring dashboards and alertingDocumented failure modes and safe fallbacksHuman review points for high-risk decisionsEstablishes reusable patterns for MCP tools, agent orchestration, ML lifecycle, and evaluation/monitoring that other teams can adopt.Demonstrates measurable improvements in quality and/or efficiency (latency, cost, throughput, or reduced manual effort).Working Style & CollaborationOperates effectively in ambiguity and can translate business problems into robust technical designs.Communicates clearly with both engineering peers and non-technical stakeholders.Comfortable partnering with governance/security/data teams to ensure compliant delivery.Keywords (for sourcing)Agentic AI, AI Agents, Orchestration, MCP (Model Context Protocol), AWS Bedrock, RAG, Embeddings, Vector DB, Vector Cache, Knowledge Graph, Graph Networks, Observability, Traceability, Auditability, Prompt Injection Defense, Data Leakage Prevention, Human-in-the-Loop, Fraud Detection, Machine Learning.
Job Title
Sr. Gen AI Engineer