Job Description

Location Hybrid / Remote (global, aligned to target customer time zones)Role Type Full-time | Principal LevelRole Overview Centific’s DAC (Digital Architecture & Cognitive) Command is expanding its global architecture unit to build and operationalize agentic, AI-driven business automation at production scale. In this role, you will act as the end-to-end design authority for agentic inference solutions—owning outcomes from blueprint to live operations. You will architect multi-agent systems, runtime orchestration, and operational guardrails that meet demanding non-functional requirements (latency, reliability, cost, and security). This is a hands-on role. You will prototype reference implementations, tune runtime behavior, and partner with engineering, platform, security, and product stakeholders to deliver production-first agentic systems.Key Responsibilities 1. Agentic System Architecture & Orchestration Design multi-agent architectures (planner–executor, supervisor loops, routing/dispatch, delegation, reflection/verification patterns) aligned to business workflows. Define orchestration mechanisms for state/session handling, memory (short/long-term), tool invocation, retrieval/RAG, and structured I/O. Establish standards for prompt/agent templates, tool/skill contracts, agent-to-agent messaging, and deterministic fallbacks. Create reference implementations that teams can extend safely (agent frameworks, orchestration services, reusable libraries).2. NFR-Driven Design for Production Inference Own non-functional design (latency, throughput, scalability, reliability, availability, cost) as first-class requirements. Design for performance and cost: token budgeting, caching strategies, batching, streaming responses, concurrency controls, and adaptive routing. Define resilience patterns: timeouts, retries, circuit breakers, idempotency, queue back-pressure, graceful degradation, and safe-mode behavior. Drive architecture decisions that balance quality vs. cost vs. speed—documenting trade-offs and expected SLOs/SLAs.3. Solution Blueprint Ownership & End-to-End Delivery Own the end-to-end solution blueprint from concept through production rollout (architecture, integration, testing, operations). Translate business intent into system decomposition (services, agents, tools, data flows) with clear ownership boundaries and contracts. Collaborate with Solution Blueprint Architects, Platform Architects, Data/Governance, and Security/Compliance to align constraints early. Deliver architecture artifacts: sequence diagrams, decision records (ADRs), integration specs, runbooks, acceptance criteria, and launch checklists.4. Integration Governance & Platform Compatibility Set integration standards for APIs/events (versioning, compatibility contracts, error semantics, schema governance). Define interfaces for tool invocation (capabilities registry, permissions, rate limits, safe parameterization). Ensure agentic systems integrate cleanly with enterprise platforms (IAM, logging, monitoring, workflow engines, data platforms). Partner with enterprise architecture to ensure interoperability across domains and prevent fragmentation.5. Operational Readiness & Reliability Design and enforce operational guardrails: monitoring, alerting, evaluation hooks, rollback plans, and safety kill-switches. Establish runbooks for incident response, model/agent degradation, and dependency failures (tools, data sources, external APIs). Define observability standards for agent traces, tool calls, prompts/responses, evaluation scores, and cost telemetry. Lead postmortems and reliability improvements; ensure corrective actions are implemented and verified.6. Technical Leadership & Enablement Act as a principal technical leader—aligning cross-functional teams on architecture, roadmap, and delivery priorities. Mentor engineers/architects on agentic design patterns, evaluation, and production hardening. Drive reuse: shared components, gold-standard reference flows, and platform primitives that accelerate delivery. Contribute to architecture councils/design reviews; influence standards and best practices across DAC Command.Required Experience & Skills Core Experience 10–15+ years in software/platform engineering with 5+ years in solution/AI/platform architecture roles. Proven delivery of production-grade AI/LLM systems (not just prototypes), including operational ownership considerations. Strong background in distributed systems, API/event-driven integration, and reliability engineering.Agentic AI & LLM Runtime Expertise (Hands-On) Deep experience with agentic patterns: multi-agent coordination, planning, tool calling, routing, memory, and state management. Experience optimizing LLM inference: caching, batching, token/latency management, throughput tuning, and quality-cost trade-offs. Strong understanding of evaluation strategies (offline/online), prompt/agent regression testing, and release gates. Familiarity with common orchestration frameworks and patterns (e.g., graph-based agent flows, tool registries, function calling).Platform & Operations Strong cloud-native architecture experience (AWS/Azure/GCP), microservices, event streaming, and container/Kubernetes ecosystems. Hands-on with observability stacks (logs/metrics/traces), SLO/error budgets, incident response practices, and postmortems. Ability to design secure-by-default tool access patterns (least privilege, scoped tokens, auditability).Soft Skills & Ways of Working Production-first mindset: design for operability, safety, and reliability from day one. Strong systems thinking: can reason across product, platform, data, security, and cost dimensions. Clear communicator: able to explain architecture trade-offs to engineers, product, and executive stakeholders. Bias for action: prototypes quickly, then codifies reusable standards and reference implementations. Collaborative leadership: aligns teams without relying on formal authority.Nice-to-Have / Preferred Experience with large-scale workflow orchestration and automation platforms (BPM/workflow engines, event-driven pipelines). Experience implementing agent observability and evaluation harnesses at scale. Background in regulated environments (SOC2, HIPAA, PCI, CJIS) and designing AI systems with audit-ready traces. Open-source contributions, talks, or published work in agentic systems, LLM infrastructure, or reliability engineering.What Success Looks Like (First 12–18 Months) Agentic reference architectures and runtime standards are adopted across DAC Command deliveries. Production deployments meet defined SLOs for latency, availability, and cost; incident rates reduce over time through reliability improvements. Reusable orchestration primitives (routing, memory, tool registry, evaluation hooks) accelerate new use cases and reduce duplication. Integration governance prevents fragmentation—APIs/events are versioned, compatible, and observable. Teams trust the platform: safe rollouts, clear runbooks, and measurable quality/cost improvements are in place.

Job Title

Company : Centific

Location : Hyderabad, Telangana

Created : 2026-02-23

Job Type : Full Time