Site Reliability Engineer
Each year people are diagnosed with cancer, autoimmune diseases, and more. The immune system is the ultimate determinant of health and disease; however, it remains a black box. Leveraging our AI-first technology, we want to provide high-resolution understandings of the immune system to fully unlock its power. And with your help, we can do it.
From technology built and proven at Fred Hutch, Ozette is creating a revolutionary immune profiling platform to turn data into actionable insights. Our software is integral to our mission and enhancing insights by 10x in volume and resolution—scientists, clinicians and our teams—will accelerate breakthroughs to drive better patient outcomes.
Ozette's collaborative platform enables unprecedented transparency and reproducibility through its version-controlled analysis pipeline. It is built on Typescript and React/RecoilJS and our backend APIs are written in Python/Flask, primarily, with most data stored in PostgreSQL and S3.
Our founders are leading experts in data science and medicine and we are backed by some of the best technology investors, including Madrona Venture Group (first investors in Amazon), Microsoft co-founder Paul Allen’s Institute for Artificial Intelligence, and Vulcan Capital. If you’re passionate about leveraging technology to improve health, explore joining our innovative, creative team. This is an excellent opportunity to showcase your technical prowess and be part of a culture that encourages personal and professional growth, values innovation, and provides opportunities to build products that have a meaningful impact on people's lives. Recognizing that the best candidates do not always match all criteria of the job description, we encourage you to apply if you think you would be a good fit for the role and are inspired by our mission.
What you’ll do
- Set up robust instrumentation, monitoring, and alerting systems to ensure the reliability and availability of our platform
- Measure and optimize system performance, continually pushing our capabilities forward to stay ahead of customer needs and drive innovation
- Architect scalable cloud systems, optimizing for both performance and cost, particularly for computationally heavy machine learning algorithms
- Make continuous improvements to security infrastructure
- Design, build, maintain and support Infrastructure as Code automation and deployment processes (eg. Pulumi)
- Serve as a gatekeeper for infrastructure and system deployments; establish controls and processes around these functions
- Build, maintain and support development, testing and production environments
- Deliver new features in conjunction with the application team by providing highly available and scalable infrastructure
- Work closely with the engineering team to define methodologies and goals around availability and information security
- Help define, implement, and manage the provisioning, software configuration and release process for all applications and environments
- Collaborate with cross-functional teams to enhance the continuous integration and deployment (CI/CD) pipelines, accelerating feature development and enhancing test coverage
- Contribute to the selection, adoption, and integration of relevant technologies to support our engineering goals
- Make continuous improvements to security and costs of infrastructure
- Provide production support and help debug issues across all levels of the stack (e.g. Typescript, Python, C++, GoLang, PostgreSQL, Docker and Linux)
- Collaborate with cross functional teams (product, engineering, operations)
- Stay up to date with emerging technologies and industry best practices, actively sharing knowledge and insights with the team
- Follow software development lifecycle processes
- Report NCs and deviations to management and complete assigned quality record tasks
- Help grow the Ozette team
We'd love to hear from you if you have:
- Bachelor's or higher degree in Computer Science, Engineering, or related field
- Proven experience (5+ years) as a site reliability engineer, working on cloud-based systems (AWS) in a 24/7 production environment
- Strong understanding of software development lifecycle and best practices, including version control systems (e.g., Git) and CI/CD pipelines
- Expertise in infrastructure as code principles and tools, with practical experience implementing and managing infrastructure using tools like Terraform or Pulumi
- Experience setting up instrumentation, monitoring, and alerting systems for large-scale distributed systems
- Experience with containerization (Docker, Kubernetes, ECR, EKS)
- Linux/UNIX OS experience (CentOS/RedHat, Ubuntu)
- Familiarity with technologies such as Flask, Typescript, Bazel, GitHub Actions, Datadog, Airflow, and ECS is a plus
- Experience in build support for engineering teams (Bamboo, Jenkins, AWS CodeBuild)
- Demonstrated implementation of strong security practices and helped develop a strong security culture
- demonstrated commitment to operational observability, security and engineering best practices
- Strong analytical and problem-solving skills with a track record for diagnosing problems within complex systems and identifying root causes of issues
- Excellent communication skills and the ability to collaborate effectively with cross-functional teams
- Able to collaborate and work in a diverse team and support a culture of inclusion
- Passion for innovation, continuous improvement, and staying at the forefront of technology trends
- Passion for operational excellence and strong attention to detail
- Strong project management skills and ability to manage multiple tasks in parallel
Why you will love working here:
Ozette is dedicated to building a world-class science and technology team known for its products and services, high standards, employee development and retention. At the heart of great companies are great teams. We strive to give you everything you need to thrive at work and beyond. We offer full benefits, hybrid office schedule with many employees opting for full-time remote, honor all 11 federal holidays + an additional extra day off for every holiday, plan company wide respites during Thanksgiving, Xmas and New Years, and offer a 'take what you need' vacation policy.
As an early stage employee you will help set standards and mental models that set foundations for the future, be they product, culture, or operations related. Exercise your skills across a breadth of responsibilities and evolve your role as the company grows. You will work on some of the most exciting initiatives that bridge AI/ML with hard science, packaged with a brilliant user experience.
Our work and teams thrive most when we are diverse and inclusive, so we take equal opportunity seriously. We commit to fostering a respectful, diverse, and inclusive environment where all team members can contribute and develop to their fullest potential. We welcome individuals of all backgrounds, orientations, and identities to join our community. Applications from women and members of underrepresented minority groups are particularly welcomed.
Base salary range: $110,000 - $160,000 USD
Total compensation, which includes equity, determined based on experience.