Description & Requirements
The Machine Learning Engineer will participate in research and development efforts aimed at solving problems in analyzing large-scale clinical data with a mission of improving human health. The candidate will develop and deploy machine learning models for disease diagnosis, risk assessment and outcomes prediction across multiple data modalities derived from large electronic health record datasets, including tabular, clinical note, electrocardiogram (ECG), echocardiogram, magnetic resonance image (MRI) and genetic data. Rich representations of clinical data derived from deep learning models will also be used in conjunction with genetic data to investigate the genetic basis for disease.
The ideal candidate has both a theoretical and practical understanding of deep learning techniques, experience in areas such as clinical research, probability, statistics, and/or data engineering, as well as strong programming fundamentals.
The candidate joins a strong team of machine learning scientists and practitioners to work with, has access to vast amounts of clinical data, and is encouraged to publish new methods and results in academic journals and conferences. The candidate will conduct research in clinical machine learning and disease biology, and must collaborate effectively with researchers at the Broad Institute, practicing clinicians, and external partners (e.g., collaborators from industry or other academic institutions). This position is suited to a person who is excited by the prospect of learning, is highly interested in adapting and applying modern machine learning techniques to solve the key challenges for emerging problems in clinical data analysis, with revolutionary implications in advancing the state-of-the-art in clinical practice.
Responsibilities
- Developing techniques for ingesting, curating, characterizing, and storing real-world clinical datasets for large-scale machine learning
- Applying and refining machine learning models and techniques (e.g., transformers, CNNs, diffusion models, etc) for clinical datasets and applications, including but not limited to disease diagnosis, risk assessment and outcomes prediction
- Developing novel, robust and generalizable machine learning methods that advance the state-of-the-art for unstructured clinical datasets in collaboration with machine learning scientists
- Writing well-crafted, maintainable, scalable, and performant machine learning code
- Designing, developing, and maintaining testing and deployment frameworks for machine learning code
- Collaborating with machine learning practitioners and clinicians to determine project goals and strategies
- Drafting and submitting academic publications to leading journals and conferences, given the dedicated time and resources provided to pursue novel research
Requirements
- Master’s degree in Computer Science, Electrical Engineering, Biomedical Engineering, Physics, Math, Statistics, Computational Biology or a related quantitative field
- Practical experience in designing, training, and evaluating deep neural networks, including experience developing and tuning custom neural network components
- Proficiency in Python, with a strong grasp of software design and programming principles and experience writing production-quality code
- Practical experience with python-based deep learning frameworks (e.g. TensorFlow or PyTorch) and related python packages (numpy, scipy, pandas)
- Proficiency with UNIX operating systems
- Strong oral and written communication skills and ability to collaborate effectively with clinicians, machine learning scientists, and software engineers on model requirements and design
Preferred Skills
- Experience working with clinical or omics data, or experience in machine learning for computer vision, natural language processing or signal processing
- Knowledge of software engineering best practices, including version control (git), writing tests, and code review
- Experience developing data pipelines to prepare data for modeling from large, messy, real-world data sets
- Experience with cloud computing (e.g. Azure) or clinical/EHR databases (e.g., OMOP)