About The RoleLifesight is growing rapidly and seeking a strong Data Engineer to be a key member ofthe Data and Business Intelligence organization with a focus on deep data engineeringprojects. You will be joining as one of the few initial data engineers as part of the dataplatform team in our Bengaluru office. You will have an opportunity to help define ourtechnical strategy and data engineering team culture in India.You will design and build data platforms and services while managing our datainfrastructure in cloud environments that fuels strategic business decisions acrossLifesight products.A successful candidate will be a self-starter, who drives excellence, is ready to jump intoa variety of big data technologies & frameworks, and is able to coordinate andcollaborate with other engineers, as well as mentor other engineers in the teaWhat You’ll Be Doing- Build highly scalable, available, fault-tolerant distributed data processing systems(batch and streaming systems) processing over 100s of terabytes of data ingestedevery day and petabyte-sized data warehouse and elasticsearch cluster.- Build quality data solutions and refine existing diverse datasets to simplifiedmodels encouraging self-service- Build data pipelines that optimize on data quality and are resilient to poor qualitydata sources- Own the data mapping, business logic, transformations and data quality- Low level systems debugging, performance measurement & optimization on largeproduction clusters- Participate in architecture discussions, influence product roadmap, and takeownership and responsibility over new projects- Maintain and support existing platforms and evolve to newer technology stacksand architecturesWe’re excited if you have- Proficiency in Python and pyspark- Deep understanding of Apache Spark, Spark tuning, creating RDDs, and buildingdata frames. Create Java/ Scala Spark jobs for data transformation andaggregation.- Experience in big data technologies like HDFS, YARN, Map-Reduce, Hive, Kafka,Spark, Airflow, Presto, etc.- Experience in building distributed environments using any of Kafka, Spark, Hive,Hadoop, etc.- Good understanding of the architecture and functioning of Distributed databasesystems- Experience working with various file formats like Parquet, Avro, etc for largevolumes of data- Experience with one or more NoSQL databases- Experience with AWS, GCP- 5+ years of professional experience as a data or software engineer
Job Title
Senior Data Engineer