About Me

I am a software engineer with a focus on bioinformatics and data engineering. My undergraduate degree was in Medical Bioscience BSc, with a wet-lab focus on oncology and neurodegenerative disease. My thesis evaluated machine learning architectures for classifying cardiac MRI data.

Prior to university, I worked at a 5 person BioTech startup in Berlin where I first started to work as an engineer. Throughout undergrad I worked part-time or during the summer as a engineer, largely with genomics data. At the Chan-Zuckerberg Biohub I focused on LIMS tooling and transcriptomics. At insitro, I continue to develop transcriptomics infrastructure as well as working on clinical data.

In my free time I swim & train Muay Thai. Born in NYC, grew up in London, lived in Berlin & San Francisco. Currently based out of NYC!

Experience

 
 
 
 
 
Software Engineer - Compute Core / ClinData
May 2023 – Present South San Francisco, California, US (remote)
Develop cloud compute infrastructure (K8s, Terraform, AWS) to support software across insitro’s scientific and computational domains. Build full-stack scientific data explorers. Engineer data structures to consolidate data infrastructure and support omics analysis and ML at scale. Produced analyses and reports for target screening with scientific teams. Stood up image viewing infrastructure and tooling for clinical datasets.
 
 
 
 
 
Lead Data Engineer - Quantitive Cell Science Data Science Team
December 2021 – May 2023 San Francisco, California, US
Lead and built a project called Datahub, which aims to structure data to support research. Biomedical research is complex, with highly heterogeneous metadata depending on the sample, modality, or goal. Datahub leverages graph databases to represent highly interconnected metadata. A website and API allows users to easily upload, track, and analyze ongoing research. To support Datahub, I also developed infrastructure and analysis pipelines for the mass spectrometry and genomics platforms. Communicated with leadership, and users, and coordinated software engineers to develop automation. Researched using graph theory to interrogate multiomic data being generated at the Biohub under the guidance of the quantitive cell science data science team and Dr Angela Oliveira Pisco. Furthermore worked to analyse and interpret transcriptomics and spatial transcriptomcis datasets.
 
 
 
 
 
Bioinformatics and Data Analyst - Population Genomics Group
June 2021 – September 2021 Cambridge, UK
Worked as a bioinformatician for Illumina’s EMEA office, within their Population Genomics (PopGen) group reporting to Dr Ole Schulz-Trieglaff. Contributed to methods for merging and saving terabytes of variant called files (VCFs). Developed test software for triaging bugs and identifying algorithmic slowdowns. Worked for the DRAGEN platform in C++ and Python locally and in Illumina’s HPC environment.
 
 
 
 
 
Machine Learning Researcher - Darrel P Francis Lab
October 2020 – March 2021 London, UK
Developed machine learning models under the guidance of Professor Darrel Francis and Dr. James Howard in the Cardiology Department of Hammersmith Hospital, with the end goal of producing a CMRI (cardiac magnetic resonance imaging) pipeline for automated diagnostic of cardiovascular diseases. Developed a classifier that identified the ‘view’ of CMRIs, and retrospectively assessed model performance using several metrics. Investigated a potential novel transformer methodology for image data.
 
 
 
 
 
Bioinformatics Engineer
June 2020 – October 2020 Cambridge, UK
GGC collects bespoke exome data from underrepresented populations for investigations into disease genotypes, towards a 1 Million Genome Project. Developed an exome analysis pipeline in nextflow for variant calling and evaluated called variants using bioinformatics tools and Exomiser on a HPC. Also created a PostgreSQL database for storage of the Exomiser analysis, an API backend, and a ReactJS frontend. My project enabled geneticists to review and autogenerate reports on likely causative variants for the subject’s disease phenotype.
 
 
 
 
 
Researcher
Imperial College London
July 2019 – April 2020 London, UK
This project was an investigation into the effect of adiponectin on ß-Amyloid secretion in PDK1- Knockdown HEK293 cells, as a potential molecular target for treating Alzheimers. I worked with a group of four other researchers to develop knockdown cells with CRISPR, and evaluate the impact of adiponectin on the ß-Amyloid secretion pathway through protein and genomic expression.
 
 
 
 
 
Software Engineer
July 2019 – April 2020 Berlin, Germany (remote)
Automat develops turnkey software solutions. Worked within Automat’s ‘Workshop Mode’ and was responsible for the infrastructure behind the communication software pricing API and a web-scraping tool. Developed and deployed this all on AWS. Worked with Docker, SQL, and Serverless.
 
 
 
 
 
Researcher
Imperial College London
October 2018 – March 2019 London, UK
Studied the inhibition of paclitaxel-induced apoptosis by epinephrine in a breast cancer cell line using cell viability assays, western blot, and qPCR alongside cell culture techniques.
 
 
 
 
 
R&D Engineer
March 2018 – July 2018 Berlin, Germany
Researched and developed a hardware device for i2x’s call assistance service. Developed audio streaming software in python and Arduino, to stream data to i2x’s ML infrastructure, and visualise analytics.
 
 
 
 
 
R&D Engineer
Leonyte Biosystems
September 2017 – March 2018 Berlin, Germany
Leonyte was an early stage startup aiming to provide real-time testing for pathogens in food. Worked with the engineering team to develop a prototype portable bacterial detection device. This involved reading data sheets from National Instruments, writing summaries, and constructing several prototypes. Visualized biological and system data for presentation and troubleshooting. Smoothed signals and fine-tuned detection algorithms to improve pathogen detection. Helped with administration related to inventory, monthly invoicing, and SCRUM.

Recent Publications

Quickly discover relevant content by filtering publications.
Aging Fly Cell Atlas Identifies Exhaustive Aging Features at Cellular Resolution
Aging is characterized by a decline in tissue function, but the underlying changes at cellular resolution across the organism remain …

Public Projects

*
OpenPipelines
Extensible single cell analysis pipelines for reproducible and large-scale single cell processing using Viash and Nextflow. These pipelines are built using the Viash framework on top of the nextflow workflow system.
OpenPipelines
Datahub
Simple, scalable, and open-source data infrastructure supporting scientific discovery and insight.
Datahub
Classifying the orientation of the heart in a series of MRI scan
This paper presents the development of a machine learning algorithm that accurately classifies the ‘view’ of cardiac magnetic resonance imaging (CMR) images, achieving near human expert-level performance with the EfficientNet-B5 architecture, potentially forming the basis for future AI solutions in cardiac diagnosis.
Classifying the orientation of the heart in a series of MRI scan
ExomePipe
This is a nextflow-based pipeline built to identify causative variants in subjects disease phenotype, based on their exome and phenotype.
ExomePipe
ExomePipe Website
This website allows geneticists to evaluate the data generated by ExomePipe against their own knowledge, and information from reference databases, in order to autogenerate reports on causative variants.
ExomePipe Website
PositiviTree
This project aims to improve the mental health of users, using positive and self-affirmation. Research indicates that the positive affirmation of users ‘good’ thoughts and opinions can lead to a more positive mental outlook. This decreases their stress, increases well being, improves academic performance, and can improve openess to behavioural change.
PositiviTree
Correct-a-spine
Problems with posture arise because of upper and lower back problems, but the current technologies only focus on one of these, and so we decided to use an accelerometer and flex sensor to measure both of these to prevent both kinds of back pain with a single device. The results are shows in an iOS application which is sent the data via Bluetooth.
Correct-a-spine

MOOCs

Coursera
Build a Modern Computer from First Principles From Nand to Tetris (Project-Centered Course)
See certificate
Coursera
Computational Neuroscience
See certificate
Coursera
DeepLearning.ai
See certificate
Coursera
The Brain and Space
See certificate
Coursera
Philosophy and the Sciences Introduction to the Philosophy of Cognitive Sciences
See certificate

Contact