Greetings from TCS! Skill: PySpark Years of Experience: 5 - 8 years Location: Pune/ Mumbai/ ChennaiJob Description:- Key Responsibilities: Develop, optimize, and maintain big data pipelines using PySpark on distributed computing platforms. Design and implement ETL workflows for ingesting, processing, and transforming large datasets in Hive. Work with structured and unstructured data sources to ensure efficient data storage and retrieval. Optimize Hive queries and Spark jobs for performance, scalability, and cost efficiency. Implement best practices for data engineering, including data governance, security, and compliance. Monitor, troubleshoot, and enhance data workflows to ensure high availability and fault tolerance. Work with cloud platforms Azure and big data technologies to scale data solutions. Required Skills & Qualifications: Strong experience with PySpark for distributed data processing. Hands-on experience with Apache Hive and SQL-based data querying. Proficiency in Python and experience in working with large datasets. Familiarity with HDFS, Apache Hadoop, and distributed computing concepts. Good to have Knowledge of cloud-based data platforms like Azure Synapse, DataBricks Understanding of performance tuning for Hive and Spark. Strong problem-solving and analytical skills.Thanks, Ayushi Gupta
Job Title
Pyspark