Job Description

Job DescriptionWe are looking for a skilled Data Engineer with strong experience in Apache Spark to design, build, and optimize large-scale data pipelines in a distributed environment. The ideal candidate has hands-on expertise in modern data engineering practices, cloud platforms, and scalable data processing frameworks.Key ResponsibilitiesDesign, develop, and maintain ETL/ELT pipelines using Apache Spark (batch and/or streaming).Build and optimize distributed data processing workflows on Spark (PySpark/Scala/Java).Work with cloud-based data ecosystems (AWS, GCP, or Azure) to develop scalable data solutions.Collaborate with data scientists, analysts, and backend engineers to deliver reliable, high‑quality data products.Implement and maintain data quality checks, monitoring, and alerting for data pipelines.Optimize Spark jobs for performance, cost efficiency, and scalability.Manage and model data in data lakes, data warehouses, and/or structured storage systems.Contribute to data architecture design, including schema modeling, partitioning, and data lifecycle management.Automate infrastructure and pipeline deployments using CI/CD and IaC frameworks.Ensure compliance with data governance, security, and privacy standards.Required Skills & QualificationsStrong hands-on experience with Apache Spark (batch or streaming).Proficiency in Python, Scala, or Java for data processing.Experience with at least one cloud platform (AWS, GCP, or Azure).Solid understanding of distributed systems, data partitioning, and performance tuning.Hands-on experience with data lake technologies (e.g., S3, GCS, Azure Data Lake).Experience with relational databases and SQL.Familiarity with CI/CD workflows and version control (Git).Experience with Infrastructure-as-Code tools (Terraform, CloudFormation, etc.) is a plus.Knowledge of workflow orchestration tools such as Airflow, Dagster, or Prefect.Strong problem‑solving skills and ability to work in cross‑functional teams.Preferred Qualifications (Optional)Experience with Spark on Kubernetes, Databricks, EMR, or Dataproc.Knowledge of streaming technologies (Kafka, Pub/Sub, Kinesis).Familiarity with Delta Lake, Iceberg, or Hudi.Background in data modeling (ELT/ETL design, star/snowflake schemas).Experience with real‑time and near‑real‑time data pipelines.

Job Title

Company : Google

Location : Kochi, Kerala

Created : 2026-03-17

Job Type : Full Time