AI Engineer - Big DataEmployment Type: Full-TimeLocation: Remote, SingaporeLevel: Entry to Mid Level (PhD Required)Bridge Cutting-Edge AI Research with Petabyte-Scale Data SystemsAbout the RoleWork at the intersection of big data and AI, where you'll develop intelligent, self-healing data systems processing trillions of data points daily. You'll have autonomy to pursue research in distributed ML systems and AI-enhanced data optimization, with your innovations deployed at unprecedented scale within months, not years.This isn't traditional data engineering - you'll implement agentic AI for autonomous pipeline management, leverage LLMs for data quality assurance, and create ML-optimized architectures that redefine what's possible at petabyte scale.Key Research Areas & ResponsibilitiesAI-Enhanced Data InfrastructureDesign intelligent pipelines with autonomous optimization and self-healing capabilities using agentic AIImplement ML-driven anomaly detection for terabyte-scale datasetsDistributed Machine Learning at ScaleBuild distributed ML pipelinesDevelop real-time feature stores for billions of transactionsOptimize feature engineering with AutoML and neural architecture searchRequired QualificationsEducation & ResearchPhD in Computer Science, Data Science, or Distributed Systems (exceptional Master's with research experience considered)Published research or expertise in distributed computing, ML infrastructure, or stream processingTechnical ExpertiseCore Languages: Expert SQL (window functions, CTEs), Python (Pandas, Polars, PyArrow), Scala/JavaBig Data Stack: Spark 3.5+, Flink, Kafka, Ray, DaskStorage & Orchestration: Delta Lake, Iceberg, Airflow, Dagster, TemporalCloud Platforms: GCP (BigQuery, Dataflow, Vertex AI), AWS (EMR, SageMaker), Azure (Databricks)ML Systems: MLflow, Kubeflow, Feature Stores, Vector Databases, scikit-learn + search CV, H2O AutoML, auto-sklearn, GCP Vertex AI AutoML TablesNeural Architecture Search: KerasTuner, AutoKeras, Ray Tune, Optuna, PyTorch Lightning + HydraResearch SkillsTrack record with 100TB+ datasetsExperience with lakehouse architectures, streaming ML, and graph processing at scaleUnderstanding of distributed systems theory and ML algorithm implementationPreferred QualificationsExperience applying LLMs to data engineering challengesAbility to translate complex AutoML/NAS research into practical production workflowsHands-on project examples of feature engineering automation or NAS experimentsProven success in automating ML pipelines, from raw data to an optimized model architectureContributions to Apache projects (Spark, Flink, Kafka)Knowledge of privacy-preserving techniques and data mesh architecturesWhat Makes This Role UniqueYou'll work with one of the few truly petabyte-scale production datasets outside of major tech companies, with the freedom to experiment with cutting-edge approaches. Unlike traditional big data roles, you'll apply the latest AI research to fundamental data challenges - from using LLMs to understand data quality issues to implementing agentic systems that autonomously optimize and heal data pipelines.About usPixalate is an online trust and safety platform that protects businesses, consumers and children from deceptive, fraudulent and non-compliant mobile, CTV apps and websites.We're seeking a PhD-level AI Engineer to lead cutting-edge research in agentic AI systems, multimodal analysis, and advanced reasoning architectures that will directly impact millions of users worldwide. Our software and data have been used to unearth multiple high profile criminal and illegal surveillance cases including:UNICEF:
Job Title
Big Data AI Engineer