About OmniscientOmniscient (o8t) is the world leader in using AI to decode the human braina field known as connectomics. Our mission is to improve the lives of billions through connectomics. Today, Omniscients connectomic analysis platform, Quicktome, generates personalized, patient-specific maps of an individuals brain networks, or connectome. These critical insights inform prognosis and planning across neurologic conditions, from cranial surgery and neuro-oncology to stroke and beyond. Tomorrow, Omniscient is poised to revolutionize brain health and help conquer conditions such as Alzheimers disease and depression through truly personalized brain medicine.Our products deliver these insights with enterprise-grade efficiency and usability, enabling broader access to vital subject-specific neurological insights.Since founding, we have grown exponentially and achieved several world firsts with the development of the worlds first connectomic neurosurgical planning and visualization platform to be cleared by regulatory bodies. Omniscient recently expanded its product offering to include the first FDA-cleared neurological planning and visualization tool using resting-state fMRI, opening up new horizons for clinicians to assess brain connectivity and function in cases such as brain surgery, stroke, disorders of consciousness, and oncology.With continued development, we intend to improve the lives of billions with both medical and non-medical products and services that drastically change how the human brain is understood, treated, and even enhanced.Role (what your day to day might look like)As a global company, we have development and commercial interests which spanning several continents and availability zones. Consequently, our SRE team is distributed between Australia and North America, providing support to the Delivery, Data Science and our Production Environments. As a member of this small team, you will design, maintain, and improve our practices across development, staging and production environments while working closely with the delivery teams. You will be working with best-in-class technologies both OSS and close. We host applications and services on Kubernetes (EKS) by default and our cloud is composed of technologies including: Istio, Helm, ArgoCD, Argo Workflows, Cloudflare and Datadog. We run data science workflows on medical imaging datasets at scale to guide neurosurgical and neurological decision making. As such, the reliability and security of our environments is of very high importance. We seek team members that are excited about learning new things, improving existing things and challenging themselves to mastery. When incidents do occur, you will serve at the front-line of our incident response: restoring availability, running post-mortems, and then working to develop monitoring solutions to be notified earlier as well as practices and technical solutions to avoid similar pitfalls in the future.ResponsibilitiesWork closely with the application development team to improve upon our testing, release, and deployment processes.Work with the field and tech support teams to ensure smooth customer onboarding and reliability of our cloud services, particularly in the North American timezone.Build internal tools for debugging, performance analysis, compliance, monitoring and enforcement of code and security best practices.Prepare for the worst: build and conduct experiments that explore performance and induce failure to see how our systems respond. Translate those learnings into updates to our platform and practices to achieve greater resiliency.Design the future of our platform -- as we expand into new features, products or markets, we will need to take on new technologies and architectural patterns. We want an SRE who is excited about exploring trade-offs and opinionated about the ways in which we should grow in order to maintain a world-class platform.Key RequirementsBachelors degree in Computer Science, Engineering or relevant STEM field5+ years of experience directly in DevOps, SRE or a similar roleMinimum demonstrated 5+ years in production AWS.Extensive knowledge of AWS cloud services (and/or Google Cloud) and infrastructure orchestration tools, e.g. Terraform, HelmDemonstrated experience with Kubernetes management and tools, e.g. Argo CD, IstioDemonstrated experience in networking concepts, e.g. DNS, routing, TCP and UDP protocols, AWS VPCExperience with best practices for logging, monitoring, and alerting, e.g. DatadogHighly DesiredManagement of CI/CD workloads, e.g. Gitlab CIDesigning and maintaining infrastructure for securityPrevious experience as a software engineer, e.g. NodeJS, Python, GoExperience with computationally intensive workloadsExperience working in a regulated environment or with personally identifying information, e.g. Aerospace, Automotive, BankingPerks and BenefitsCompetitive salary, plus ESOP planFlexible and remote working - we value work-life balanceIf you're seeking professional growth and enjoy working on large, distributed, cloud-based applications that change the world of brain care then apply now to be considered for the position!
Job Title
Site Reliability Engineer