Responsibilities Design, build, and maintain batch and streaming ETL pipelines using Python, PySpark, and orchestration tools (Airflow, AWS Step Functions, Glue workflows). Strong handson experience in Python, PySpark, SQL, AWS, and ETL to build and optimize scalable data pipelines and warehouse solutions. Work closely with Data Scientists, Analytics, and Business stakeholders to ensure reliable, highquality data is available for reporting and advanced analytics. Develop optimized SQL for data modeling, transformations, and performance tuning across Data Warehouses and Lakes. Implement robust data ingestion frameworks from APIs, files, and RDBMS, managing schema evolution and partitioning strategies. Build and maintain data models (star, snowflake schemas, dimensional modeling) to support BI, Analytics, and downstream ML workloads. Ensure data quality (validations, profiling, observability, lineage) and implement error handling and recovery patterns. Optimize PySpark jobs (shuffle management, partitioning, broadcast joins, caching) and SQL queries (explain plans, indexes, sort keys). Collaborate with stakeholders to translate requirements into technical designs, document pipelines, schemas, and runbooks. Maintain and automate Unix/Linux scripts for jobs, monitoring, and data operations. Uphold security and compliance (PII handling, encryption, rolebased access, auditability). Unix/Linux scripting and operations are a plus. Skills Amazon Web Services (AWS) Cloud Computing Data Warehouse Python for Data Science PySpark #J-18808-Ljbffr
Job Title
AWS Data Engineer - PySpark, Python, SQL, ETL