Description & Requirements
The Somatic Mosaicism across Human Tissues (SMaHT) Network was established by the NIH to catalog somatic genetic variation across human tissues and discover its patterns, causes and consequences. This effort includes all classes of somatic mutation: single nucleotide variants, short insertions and deletions, structural variants, and other large chromosomal aberrations. The somatic mutation catalog will enable downstream analyses, such as rates and burdens of mutations across tissues, mutational signature analysis, driver mutations and clonal expansions, and lineage tracing.
The Broad Institute is one of five Genome Characterization Centers (GCCs) tasked with delivering the sequencing data underpinning this somatic mutation catalog. As part of this effort, we are looking for a highly motivated and talented individual with a computational background to join these efforts and to lead data curation and analyses for this ambitious project. Some of our recent work includes Coorens et al., Nature 2025.
The successful candidate will join an interdisciplinary team working with an unprecedented set of multimodality data from a wide range of human tissues and donors, including extensive deep short- and long-read genomics, transcriptomics, epigenomics, and duplex sequencing data. The scope of this project provides unique opportunities for developing novel analytical methods for data QC, integration, detection of somatic mutations, multi-tissue analyses, and integration with transcriptomic data.
Responsibilities will include overseeing the implementation of experimental work plans, pipelines for data processing, organization, data submission timelines, and analysis, and contributing to budgetary and operational considerations. In addition, the individual is expected to be able to clearly communicate scientific details, results, and strategic considerations to others within the team and the SMaHT network at large. This role will require strategic coordination of multiple groups at the Broad Institute and within the SMaHT Network. This individual will serve as a key contact for project leaders, collaborators of the project (specifically the Data Analysis Center), and other staff.
PRINCIPAL DUTIES AND RESPONSIBILITIES
- Design and execute data QC and analysis strategies involving multimodal human tissue datasets, and specifically lead whole-genome short and long-read DNA sequencing data, RNA sequencing, and somatic mutation analyses. Prior experience working with long-read genomics and transcriptomics data is required.
- Apply and develop state-of-the-art computational tools and pipelines to a) assess data quality, b) integrate diverse data types and metadata, and c) detect somatic mutations and subsequent downstream analyses.
- Collaborate with, and provide analytical support for, internal technology development efforts for the application of novel strategies to detect somatic mutations.
- Together with others, develop new methodologies and evaluate new methods for integrative analysis of these genomic data types.
- Present ideas and results to the multi-disciplinary members of the SMaHT Network. Prepare written reports and presentations for internal use as well as presentations at SMaHT Network and other conferences.
QUALIFICATIONS
- PhD in Genomics, Bioinformatics, Computational Biology, Computer Science, Statistics, Math, or a related quantitative field is required with 2+yrs of industry experience
- Experience with computational analysis, algorithm development, and statistics is expected.
- Proven track record of leading complex data curation or analysis projects, ideally within large-scale consortia, is a strong plus.
- Deep Sequencing Expertise: Extensive experience with analyzing high-throughput biological data, specifically long-read genomic (PacBio/Oxford Nanopore) and transcriptomic data (RNAseq), is required.
- Pipeline Development: Proficiency in developing and maintaining reproducible computational pipelines using languages such as Python, R, or C++, and workflow managers like WDL, Nextflow, or Snakemake.
- Cloud Computing: Experience working in cloud-based environments (e.g., Google Cloud Platform/Terra, AWS) to manage and process petabyte-scale datasets.
- Somatic Mutation Analysis: Strong background in detecting and interpreting single nucleotide variants (SNVs), indels, and structural variants (SVs) in human samples.
- Multimodal Data Integration: Demonstrated ability to integrate and analyze diverse data types, including transcriptomics (RNA-seq), epigenomics, and duplex sequencing is a strong plus.
- Strategic Coordination: Ability to manage timelines and deliverables across multiple interdisciplinary groups both within the Broad Institute and external collaborators within the network (e.g. the Data Analysis Center).
- Scientific Communication: Excellent verbal and written communication skills, with the ability to present complex technical results to both specialist and generalist audiences.
- Adaptability: A high degree of motivation to work in a fast-paced, evolving research environment on a high-stakes, 5-year NIH-funded initiative.