Skip to Main Content

Job Title


TCS Hiring for Observability Tools Tech Lead_PAN India


Company : Tata Consultancy Services


Location : Mumbai, Maharashtra


Created : 2025-07-20


Job Type : Full Time


Job Description

TCS Hiring for Observability Tools Tech Lead_PAN India Experience: 8 to 12 Years Only Job Location: PAN India TCS Hiring for Observability Tools Tech Lead_PAN India Required Technical Skill Set: Core Responsibilities: Designing and Implementing Observability Solutions: This involves selecting, configuring, and deploying tools and platforms for collecting, processing, and analyzing telemetry data (logs, metrics, traces). Developing and Maintaining Monitoring and Alerting Systems: Creating dashboards, setting up alerts based on key performance indicators (KPIs), and ensuring timely notification of issues. Instrumenting Applications and Infrastructure: Working with development teams to add instrumentation code to applications to generate meaningful telemetry data. This often involves using open standards like Open Telemetry. Analyzing and Troubleshooting System Performance: Investigating performance bottlenecks, identifying root causes of issues, and collaborating with development teams to resolve them. Defining and Tracking Service Level Objectives (SLOs) and Service Level Indicators (SLIs): Working with stakeholders to define acceptable levels of performance and reliability and tracking these metrics. Improving Incident Response and Post-Mortem Processes: Using observability data to understand incidents, identify contributing factors, and implement preventative measures. Collaborating with Development, Operations, and SRE Teams: Working closely with other teams to ensure observability practices are integrated throughout the software development lifecycle. Educating and Mentoring Teams on Observability Best Practices: Promoting a culture of observability within the organization. Managing and Optimizing Observability Infrastructure Costs: Ensuring the cost-effectiveness of observability tools and platforms. Staying Up to Date with Observability Trends and Technologies: Continuously learning about new tools, techniques, and best practices. Key Skills: Strong Understanding of Observability Principles: Deep knowledge of logs, metrics, and traces and how they contribute to understanding system behavior. Proficiency with Observability Tools and Platforms: Experience with tools like: Logging: Elasticsearch, Splunk, Fluentd, Logstash, etc., Metrics: Prometheus, Grafana, InfluxDB, Graphite, etc., Tracing: OpenTelemetry, DataDog APM, etc., APM (Application Performance Monitoring): DataDog, New Relic, AppDynamics, etc, Programming and Scripting Skills: Proficiency in languages like Python, Go, Java, or scripting languages like Bash for automation and tool integration. Experience with Cloud Platforms: Familiarity with cloud providers like AWS, Azure, or GCP and their monitoring and logging services. Understanding of Distributed Systems: Knowledge of how distributed systems work and the challenges of monitoring and troubleshooting them. Troubleshooting and Problem-Solving Skills: Strong analytical skills to identify and resolve complex issues. Communication and Collaboration Skills: Ability to effectively communicate technical concepts to different audiences and work collaboratively with other teams. Knowledge of DevOps and SRE Practices: Understanding of continuous integration/continuous delivery (CI/CD), infrastructure as code, and site reliability engineering principles. Data Analysis and Visualization Skills: Ability to analyze telemetry data and create meaningful dashboards and reports. Experience with Containerization and Orchestration: Familiarity with Docker, Kubernetes, and related technologies. Kind Regards, Priyankha M