Join TechBio Companies Driving Patient Impact

Sponsored by Alix Ventures

Post a job // Join our newsletter
BIOS Community
BIOS Community

Scientist- Bioinformatics R&D -REMOTE



Stamford, CT, USA
Posted on Friday, July 28, 2023

Sema4 is a patient-centered health intelligence company dedicated to advancing healthcare through data-driven insights. Sema4 is transforming healthcare by applying AI and machine learning to multidimensional, longitudinal clinical and genomic data to build dynamic models of human health and defining optimal, individualized health trajectories. Centrellis®, our innovative health intelligence platform, is enabling us to generate a more complete understanding of disease and wellness and to provide science-driven solutions to the most pressing medical needs. Sema4 believes that patients should be treated as partners, and that data should be shared for the benefit of all.

We are looking for a talented Scientist- Bioinformatics R&D to join our team. The Bioinformatics Scientist leads translational bioinformatics and product development for NGS pipelines as part of the R&D Bioinformatics department. This scientist is an integral part of an interdisciplinary team that develops computational methods and pipelines to interpret large-scale human genome and transcriptome sequencing data from reproductive health, cancer, and other diseases. As part of a development team of engineers and scientists, this scientist will translate research prototypes into production-quality, scalable pipeline products used by a variety of clinical diagnostics and research projects across many teams at Sema4. This scientist will serve as an authority in these products to other users and teams and optimize them to serve Sema4 data science needs.


  • Design, develop, and test NGS pipelines for clinical tests and research projects in oncology, reproductive health, and other indications.
  • Lead or support bioinformatics projects to translate NGS results, as well as public and internal genomic, phenotype, and clinical/EMR datasets, to features and optimizations of clinical utility.
  • Analyze and integrate heterogeneous NGS data (somatic and germline SNVs, indel variants, copy-number alterations, structural variants, gene fusions, transcript isoforms, RNA abundance, RNA editing and modification) from diverse next-generation sequencing assays (Illumina, Ion Torrent, Pacific Biosciences; targeted panels, whole-exome sequencing, whole-genome sequencing, RNA-Seq; bulk and single-cell) and microarrays.
  • Work with wet labs and clinical teams to plan and design experiments to generate such data, and analyze this data.
  • Communicate effectively with collaborators (computational and bioinformatics scientists on R&D and production teams, IT/HPC, clinical lab directors, knowledgebase and curation teams, wet lab staff) to understand and satisfy product and research analysis needs.


  • PhD in Bioinformatics, Biomedical Informatics, Computational Biology, Genomics, or a related discipline requiring strong computational and analytical skills supplemented with biology background
  • Hands-on experience working with NGS tools with high proficiency, especially for sequence analysis and expression analysis
  • Strong coding proficiency in R, Python, and SQL programming languages in a Linux environment.
  • Well-versed in the art of effective communication on interdisciplinary teams (scientists, programmers, and clinicians), especially graphical communication about high-complexity datasets to scientific audiences from different backgrounds.
  • High self-motivation, great ability to work in both multiple-task and independent fashions.
  • Good understanding of molecular, cell, and developmental biology, especially where relevant to cancer genomics, oncology, or endocrine neoplasms, and especially molecular cloning and NGS library preparation methodologies.
  • Developing code using distributed version control tools (especially Git) and software issue tracking/management systems (especially Jira).
  • Using or developing genome browsers or other tools for visualization of genomic datasets.