Skip to Main Content

Job Title


HCI Administrator


Company : Tata Consultancy Services


Location : Panipat, Haryana


Created : 2026-04-16


Job Type : Full Time


Job Description

Role: HCI AdministratorLocation: ChennaiExp: 7-14 Years experienceApply only if you are an immediate joiner or can join within 30 days noticeRole Summary We are looking for a proactive L2 Technical Support Lead dedicated to maintaining operational stability, responding to incidents, and driving continuous improvement for our hyper-converged infrastructure (HCI) based on VMware vSphere and vSAN. In this role, you will be responsible for leading the triage and resolution of storage and compute issues, optimizing system performance and capacity, ensuring lifecycle compliance for firmware, drivers, and ESXi, and mentoring both L1 and L2 engineers. The technology landscape includes vSAN clusters (ESA/OSA), Storage Policy-Based Management (SPBM), Skyline Health, Aria Operations dashboards, vSphere HA/DRS, fault domains, stretched clusters, encryption for data-at-rest and in-transit, as well as vSphere Lifecycle Manager (vLCM).Key Responsibilities 1) Incident & Problem Management (L2 Ownership) Lead L2 triage, diagnosis, and restoration for vSAN and vSphere incidents, including performance issues, resync operations, object health, latency, and host failures, ensuring rapid service recovery in alignment with SLAs. Conduct post-incident reviews and ensure that problem records are resolved to root cause and permanent fix.Implement structured Incident and Major Incident practices, prioritizing incidents based on impact and urgency, utilizing defined escalation paths, and assigning clear roles during high-severity events.2) Health, Performance & Capacity Operations Use vSAN Skyline Health to monitor cluster health, including hardware compatibility, network health, and storage objects. Apply health scoring and diagnostics to prioritize remediation actions and track operational trends.Leverage Aria Operations for monitoring vSAN performance, capacity, and configuration. Utilize dashboards, alerts, and recommendations to anticipate and prevent potential issues.Analyze resync operations, I/O paths, and advanced statistics such as vsantop and I/O Trip Analyzer to optimize workloads and eliminate performance bottlenecks.3) Configuration, Policy & Resiliency Develop and maintain SPBM policies, including FTT/RAID configurations, stripes per object, IOPS limits, and space-efficiency settings, ensuring alignment with workload SLAs and continuous policy compliance. Make optimal reconfigurations after failures.Administer vSphere HA and DRS with vSAN for automatic failover and balanced recovery following events. Manage HA admission control, VM-host affinity and anti-affinity rules, and DPM interactions.Design and maintain fault domains to protect against rack and chassis failures, validating latency and placement rules for replicas and witness objects.Operate stretched clusters across two sites with a witness, configure storage policies for site affinity, manage failure scenarios, and verify HA/DRS behavior across sites.4) Security & Compliance Enable and manage vSAN data-at-rest encryption using AES-256, including the KEK/DEK workflow and integration with KMS or Native Key Provider. Ensure key persistence with TPM, perform rekey operations, and maintain secure cluster practices.Validate data-in-transit encryption where applicable, and enforce role-based access controls for all encryption operations.5) Lifecycle & Hardware Compatibility Maintain vSphere Lifecycle Manager (vLCM) compliance for vSAN clusters, orchestrating ESXi images, vendor add-ons, drivers, and firmware, and performing hardware compatibility checks against the vSAN HCL. Coordinate with the OEM Hardware Support Manager for full-stack remediation.Apply vSAN build recommendations, including release catalog and critical patches, and baseline groups. Remediate clusters and monitor catalog currency through health checks.6) Change, Release & Knowledge Plan and execute changes, such as patching, driver and firmware updates, and policy adjustments, within designated maintenance windows. Maintain runbooks and knowledge bases for common faults and recovery procedures.Mentor L1 and L2 staff, establish operational checklists, and conduct pre-flight validations, including network MTU/NIOC, capacity slack space, and hardware balance.Required Qualifications & Experience 6–10 years in enterprise infrastructure operations, with at least 3 years focused on VMware vSphere and vSAN operations in production (ESA/OSA).Proven L2 ownership of incidents across compute, storage, and network for HCI environments, with strong ITSM Service Operation skills.Hands-on experience with Skyline Health, Aria Operations, SPBM, vSphere HA/DRS, fault domains, stretched clusters, encryption (KMS/NKP/TPM), and vLCM/HCL compliance.Key Technical Competencies VMware vSAN operations and design, including capacity planning, resync management, performance tuning, and understanding ESA/OSA differences.Network design for vSAN, such as 10Gb+ redundancy, jumbo frames/MTU, NIOC QoS tuning, and maintaining less than 1ms RTT across fault domains.SPBM policy management, including FTT, RAID-5/6, IOPS limits, stripes, and space-efficiency trade-offs.Resiliency topologies, including 2-node/witness, stretched clusters, and fault domains.Security controls covering data-at-rest and in-transit encryption, KMS/NKP, TPM key persistence, and secure operational procedures.Observability using Skyline Health scoring/diagnostics and Aria Operations dashboards for performance, capacity, and configuration.Lifecycle management involving vLCM images/baselines, HSM integration, and vSAN release catalog/build recommendations.Tools & Platforms VMware vCenter and ESXi, VMware vSAN (ESA/OSA).VMware Aria Operations (vROps) for comprehensive monitoring of vSAN clusters.vSAN Skyline Health and Support Insight for health checks and performance diagnostics.vSphere Lifecycle Manager for image-based patching and firmware compliance, with Hardware Support Manager integration.Soft Skills Clear and structured communication during incidents, providing timely status updates to stakeholders in major events.Strong documentation skills for runbooks and the ability to mentor junior engineers, fostering collaboration across infrastructure, network, and security teams.Education & Certifications (Preferred) VMware certifications (such as VCP-DCV or vSAN-focused training) are preferred, along with a commitment to ongoing learning through official documentation and hands-on labs.Success Metrics (KPIs) Mean Time To Recovery (MTTR) and SLA adherence for vSAN and vSphere incidents.Improvement in Skyline Health scores and reduction of repeat incidents through effective problem management.Lifecycle and HCL compliance across clusters, along with measurable improvements in capacity and performance headroom validated by dashboards.Why This Role Matters HCI with VMware vSAN simplifies operations, removes the need for external SANs, and enables policy-driven resiliency through features like HA, DRS, and SPBM. The L2 Technical Support Lead plays a critical role in ensuring these benefits are fully realized in production environments, delivering secure, efficient, and predictable operations.