About Predictive Text Labs PTL builds AI that predicts the future. Our hybrid reasoning engine has achieved a Brier score of 0.121, beating human superforecasters. We''re backed by Blackbird Ventures and notable angels, including Balaji Srinivasan, Synthesia founders, and Supabase founders. To get there, we need a platform engineer whose mandate is to make our prediction-to-trade runtime reliable, reproducible, observable, and safe to evolve. The role You will own PTL''s prediction-to-trade runtime: the platform that runs forecasting pipelines, persists auditable reasoning traces, supports backtests and live/paper evaluations, and turns forecasts into trade intents, alerts, and broker-executed orders. This is not a generic developer role and not a pure infrastructure role. You will work across API contracts, event streams, state machines, background orchestration, database schemas, broker adapters, reconciliation, observability, and deployment safety. What you''ll do Own runtime correctness across prediction batches, prediction stages, pipeline specs, schedule runs, question snapshots, strategy states, target snapshots, order records, fills, positions, and audit events. Harden orchestration across Trigger.dev: retries, idempotency, deterministic run keys, cancellation, lock strategy, failure classification, replay, and safe recovery. Own API and event contracts across multiple repos: SSE events, OpenAPI/Zod schemas, structured artifacts, and stage/agent attribution. Build headless prediction and evaluation workflows: scheduled batches, locked datasets, livemarket and papertrading probes, benchmark runs, and operator controls. Build production observability: structured logs, Sentry, OpenTelemetry, pino, provider and toolcall timing, runlevel dashboards, cost and token tracking, actionable alerts, and incident workflows. Maintain data integrity across market ingestion, resolution syncing, cutoff dates, snapshot coverage, multichoice market semantics, and sourcespecific schema quirks. Support leakagesafe research workflows: frozen evidence, cutoffdate validation, trace replay, postmortem capture, and experimentcard audit trails. Maintain deployment and environment hygiene across Vercel, Supabase, Trigger.dev, Doppler, AWS/EC2, and Cloudflare/SSM. Improve platform velocity: contract tests, replay and regression harnesses, localtoprod parity, papertrade smoke tests, and reduction of flaky behavior. Partner with Research, Data Science, Data Infrastructure, and Trading to turn evolving research logic into stable runtime contracts. You will not own the research thesis; you will own the systems that make research executable, measurable, and safe. Requirements Strong TypeScript/Node backend engineering experience in production systems with real operational risk. Experience designing stateful workflows where correctness depends on explicit status transitions, idempotency, and auditability. Deep familiarity with Postgresbacked systems: schema design, migrations, constraints, indexes, RLS and auth boundaries, and dataquality checks. Experience with asynchronous orchestration: queues, scheduled jobs, retries, cancellation, replay, compensating actions, and deadletter or manual recovery paths. Strong API and eventcontract instincts: OpenAPI/Zodstyle schemas, SSE or other streaming protocols, versioning, backward compatibility, and structured artifacts. Practical observability experience: structured logs, tracing, Sentry or equivalent, dashboarding, alerting, and incident diagnosis. Ability to work across app, runtime, and integration layers in one codebase without losing architectural discipline. Fluency with AIassisted development in large TypeScript systems; able to use agents and code assistants productively without sacrificing review discipline. Strong product judgment under uncertainty: you can ship pragmatic runtime improvements while preserving correctness in highstakes paths. Ability to partner with research and data teams and translate evolving experimental logic into stable production contracts. Nice to have Experience with Supabase, Trigger.dev, Drizzle, Hono, Next.js, or similar TypeScript runtime stacks. Experience with trading systems, broker APIs, prediction markets, exchange APIs, order lifecycle management, or executioncritical fintech systems. Experience with eventsourced or auditledger style systems: order events, fills, positions, reconciliation, or paymentstate machines. Familiarity with LLM pipelines, toolcalling, structured outputs, reasoning traces, or modelevaluation infrastructure. Familiarity with ClickHouse or other OLAP systems and where analytical vs transactional boundaries should live. Experience building deterministic replay or regression frameworks for workflows with external providers. Experience with leakagesafe backtesting, frozen data snapshots, or timeconsistent evaluation. Why PTL Australia''s highest powered team. Our founding team consists of Australia''s Kaggle champion, SIG''s Australia''s top equities analyst, PhDs who reached 6th in ARCAGI, and the founder of a time series foundation model lab. Our cofounders include the founder of Netlify, one of the world''s largest DevOps unicorns, the creator of DLFinLab, and Forbes 30 Under 30 Alumini Real traction. Our forecasting system already outperforms human superforecasters in internal and live evaluation. Highleverage role. You own the runtime that connects forecasting, backtesting, evaluation, and trade execution. Technically dense domain across AI reasoning, prediction markets, trading systems, data quality, and reliability engineering. Compounding research loop. You will build the infrastructure that makes traces, evals, postmortems, paper and live probes, and production feedback compound over time. Small, senior team with high ownership and fast iteration. Backed by toptier investors and operators. Remotefriendly with Sydney and San Francisco presence. How to apply Send your resume and a brief note covering: A production workflow you owned that required strict state transitions and idempotent background execution. What broke, how did you detect it, and how did you harden it? An incident where async orchestration queues, jobs, webhooks, or streaming caused userfacing, operational, or financial risk. How did you mitigate it and prevent recurrence? Design a recovery path for this scenario: a prediction batch is running, the SSE stream disconnects, the model provider times out, partial stage artifacts have been persisted, and a downstream trade alert depends on the final probability. What should the system persist, retry, replay, suppress, and alert on? How would you evolve a humanintheloop paper trading stack into a reliabilityfirst live trading platform without losing developer velocity or operator control? #J-18808-Ljbffr
Job Title
Member Of Technical Staff - Platform