Skip to Main Content

Job Title


Data Engineer - Streaming (WkStream 2 - Kafka)


Company : TekWissen India


Location : Bangalore, Karnataka


Created : 2026-04-15


Job Type : Full Time


Job Description

Position: Data Engineer - Streaming (WkStream 2 - Kafka)Location: Bangalore/GurgaonWork Type: HybridJob Type: Full TimeOverview:TekWissen is a global workforce management provider throughout India and many other countries in the world. The below job opportunity is to one of our clients who is a part of a trusted global innovator of IT and business services headquartered in Tokyo. We help clients transform through consulting, industry solutions, business process services, IT modernization and managed services. This client enables us to move confidently into the digital future. This client committed to Long Term success and combine global reach with local client attention to serve them in over 50 Countries.Job Description:The Client is seeking a Streaming Integration Engineer to own the two streaming ingestion workstreams of the PNC Bank Hadoop-to-Iceberg POC.This role is responsible for designing and delivering production-grade PySpark Structured Streaming pipelines that ingest data into Apache Iceberg tables — operating under specific technical constraints.For example, workstream 2 requires building a Confluent Kafka-to-Iceberg ingestion application using only Apache-supported APIs.PNC will not permit the use of the unsupported Confluent Iceberg Sink Connector. Additionally, workstream 3 requires delivering a syslog-ng-to-Iceberg batch ingestion pipeline via rolling log files, as syslog-ng has no native Iceberg sink.The engineer will work closely with GitHub CoPilot to scaffold, iterate, test, and document the streaming application code — acting as the technical reviewer and subject matter expert who ensures AI-generated pipelines are production-ready, PNC-compliant, and correctly integrated with the Iceberg catalog and Protegrity tokenization layer.Workstream 2 – Confluent Kafka to IcebergDesign and implement a PySpark Structured Streaming application that reads from Confluent Kafka topics, parses JSON and Avro payloads, applies schema mappings, and writes atomically to Iceberg tables using the Iceberg Spark runtime and foreachBatch micro-batch patternEnsure all functionality relies exclusively on public Apache-supported APIs — Apache Spark, Apache Kafka, and Apache Iceberg — with no unsupported Confluent connectors or proprietary sinksConfigure Kafka source parameters: bootstrap servers, consumer group IDs, offset management (startingOffsets, failOnDataLoss), checkpoint paths, and trigger intervalsImplement PII detection and Protegrity tokenization hooks within the ingestion pipeline before data lands in the Iceberg Bronze layerWrite comprehensive unit and integration tests: row count validation, schema conformance checks, Kafka offset commit verification, and data comparison against the source topicSupport PNC UAT — walk PNC engineers through the code, demonstrate no unsupported connectors are used, and address review findingsMinimum Skills Required: Apache Kafka – Producer & Consumer4+ years of hands-on experience with Apache Kafka, including both producer and consumer development in PySpark, Java, or ScalaDeep understanding of Kafka internals: topics, partitions, consumer groups, offsets, rebalancing, and exactly-once delivery semanticsExperience with Confluent Kafka: schema registry, Avro/JSON serialisation, and Confluent Cloud or on-prem cluster configurationProven ability to build ingestion pipelines without relying on unsupported or third-party sink connectors — using only native Kafka consumer APIs and Spark integrationFamiliarity with Kafka Connect architecture to evaluate trade-offs and articulate why application-level ingestion is preferred in constrained environmentsPySpark Structured StreamingStrong practical experience with PySpark Structured Streaming: Kafka source, file source, foreachBatch, output modes (append/update/complete), and checkpoint managementExperience tuning streaming micro-batch trigger intervals, watermarking, and late data handling for production workloadsHands-on experience writing streaming data directly to Apache Iceberg tables using the Iceberg Spark runtimeAbility to implement robust error handling: dead-letter queues, parse error isolation, and recovery from checkpoint failuresData Engineering & IcebergWorking knowledge of Apache Iceberg: catalog configuration, schema definition, append writes, and partition strategy for event and log dataFamiliarity with S3-compatible object storage as an Iceberg warehouse destinationUnderstanding of medallion architecture — ability to correctly land streaming data in the Bronze layer with appropriate schema governanceTekWissen® Group is an equal opportunity employer supporting workforce diversity