Responsibilities:Design & Architecture of Scalable Data PlatformsDesign, develop, and maintain large-scale data processing architectures (Lakehouse Platform to support business needs)Architect multi-layer data models including Bronze (raw), Silver (cleansed), and Gold (curated) layers for HRTech DomainsStrong experience with any of these Snowflake, Databricks, Redshift.Leverage Delta Lake, Unity Catalog, and advanced features of Data-bricks for governed data sharing, versioning, and reproducibility. MUST have experience with AWS technologies like AWS glue, Athena, Redshift, etc.Data Pipeline Development & CollaborationCollaborate with data engineers and data scientists to develop end-to-end pipelines using Python, PySpark, SQLPerformance, Scalability, and ReliabilityOptimize Spark jobs for performance tuning, cost efficiency, and scalability by configuring appropriate cluster sizing, caching, and query optimization techniques.Implement monitoring and alerting using Observability Platforms, Cloud-native toolsDesign secure architectures using Unity Catalog, role-based access control (RBAC), encryption, token-based access, and data lineage tools to meet compliance policies.Establish data governance practices including Data Fitness Index, Quality Scores, SLA Monitoring, and Metadata Cataloging.Writing PySpark, SQL, and Python code snippets for data engineering and ML tasks.Data profiling and schema inference.Stay abreast of emerging trends in Lakehouse architectures, Generative AI, and cloud-native tooling.Requirements:8-12 years of hands-on experience in data engineering, with at least 5+ years on Python and Apache Spark.MUST have experience in migrating to data lakes from RDBMS like systems.Expertise in building high-throughput, low-latency ETL/ELT pipelines on AWS using Python, PySpark, SQL, Athena, AWS glue, Redshift, etcExcellent hands on experience with workload automation tools such as Airflow, Prefect etc.Familiarity with building dynamic ingestion frameworks from structured/unstructured data sources including APIs, flat files, RDBMS, and cloud storageExperience designing Lakehouse architectures with bronze, silver, gold layering.Strong understanding of data modelling concepts, star/snowflake schemas, dimensional modelling, and modern cloud-based data warehousing.Experience with designing Data marts using Cloud data warehouses and integrating with BI tools (Power BI, Tableau, etc.).Experience CI/CD pipelines using tools such as AWS Code commit, Azure DevOps, GitHub Actions.Knowledge of infrastructure-as-code (Terraform, ARM templates) for provisioning platform resourcesIn-depth experience with AWS Cloud services such as Glue, S3, Redshift etc.Strong understanding of data privacy, access controls, and governance best practices.Experience working with RBAC, tokenization, and data classification frameworks
Job Title
Data Engineer