Job Description

About Cut + Dry Cut + Dry is a fastgrowing FoodTech startup transforming the $300B U.S. food supply industry. We are revolutionizing how food distributors do business by providing a bestinclass ecommerce platform that connects them seamlessly with restaurants, caterers, schools, and other buyers. Our founders consist of lifelong Silicon Valley entrepreneurs with deep domain expertise who have built and exited multiple startups in the restaurant and food supply chain business. We''''re looking for flexible gogetters who welcome the challenge of meeting the needs of a rapidly expanding business. What Youll Do You will be a seniorlevel engineer embedded within our Production Engineering function, responsible for owning L3 escalations endtoend. This is not a traditional support role; youll be an engineer first, deepdiving into distributed systems behavior, application code, and database queries to diagnose and permanently resolve the most complex production issues across the Cut+Dry platform. This is a critical opportunity to not only solve immediate production challenges at the code level but also to build the tooling and automation that prevents recurrence. You will serve as the technical bridge between production operations and feature engineering teams, ensuring our platform is resilient, observable, and continuously improving. Responsibilities Root Cause Analysis & Deep Debugging : Own L3 production incidents endtoend: reproduce, diagnose, and resolve issues that span multiple services, databases, and integrations, focusing on codelevel investigation rather than surfacelevel triage. CodeLevel Investigation : Analyze application code, logs, traces, and metrics across the full stack (APIs, background jobs, data pipelines, thirdparty integrations) to identify root causes, not just symptoms. Tooling & Automation : Build internal tools, scripts, and automation that reduce meantimetodetect (MTTD) and meantimetoresolve (MTTR). If you fix it twice manually, you automate it the third time. Observability & Monitoring : Improve alerting, monitoring dashboards, and runbooks so recurring issues are caught earlier or eliminated entirely. Contribute to shared libraries, healthcheck endpoints, and selfhealing mechanisms. PostIncident Reporting : Write detailed postincident reports with clear root cause identification, impact assessment, and corrective action plans. CrossTeam Collaboration : Partner with feature engineering teams to review changes that carry production risk. Advocate for reliability improvements in architecture reviews and sprint planning. Production Readiness : Help establish and evolve production readiness standards for new features and services. Translate production patterns and failure modes into actionable guidance for other engineers. Who You Are A strong software engineer who thrives on the detective work of debugging complex, distributed systems. A builder at heartyou dont just fix problems, you create tools and systems that prevent them from happening again. A clear communicator who can write detailed technical postmortems and translate production failure modes into actionable guidance for feature teams. Passionate about system reliability and committed to engineering excellence in a fastpaced environment. Passionate about helping people adopt new technologies and thrive in a digitalfirst world. A collaborative team player who is eager to learn and contribute to a positive team environment. What You Bring 5+ years of software engineering experience, with significant time spent debugging and maintaining production systems. Strong proficiency in backend technologies such as PHP, Python, or Java and comfortable reading unfamiliar codebases quickly. Deep experience with databases (SQL and NoSQL), query optimization, and data integrity troubleshooting. Handson experience with observability tools such as Sentry, CloudWatch, etc. Solid understanding of distributed systems, including microservices, message queues, caching layers, and API gateways. Track record of building internal tooling or automation that reduced incident frequency or resolution time. Experience with API troubleshooting (interpreting documentation, error codes, and using tools like Postman). Familiarity with AWS infrastructure (e.g. EC2, S3, Lambda, CloudWatch) sufficient to understand and troubleshoot systemlevel issues. Excellent written communication skills for postincident reports, runbooks, and crossteam knowledge sharing. Experience troubleshooting ETL (Extract, Transform, Load) processes is highly desirable. Familiarity with CI/CD pipelines and deployment processes. Why Work at Cut+Dry? Make a real impact by helping businesses grow and thrive through both support and education. Resultsdriven company culture that encourages a balanced lifestyle. Stock options package. Paid Medical, Dental, and Vision. Unlimited PTO. Flexible remote (workfromanywhere) environment. Laptop provided. #J-18808-Ljbffr

Job Title

Company : Cut Dry

Location : Toronto, Ontario

Created : 2026-03-22

Job Type : Full Time