Jenny Leopoldina Smith, MSc


Education

Applied Bioinformatics and Genomics, MSc

  • University of Oregon, Eugene, OR

Secondary Education in Science, MEd

  • Arizona State University, Phoenix, AZ

Biology, BA, magna cum laude

  • University of San Diego, San Diego, CA

Experience and Skills

Computational Biologist, Senior

Research Scientific Computing Team

Seattle Children’s, Seattle, WA

April 2022 – Sept 2024

  • Developed and adapted reproducible genomics workflows using Nextflow and nf-core tools for RNA-seq quantification, Cut&Run, ATAC-seq, and PacBio Isoseq;

  • Nextflow pipelines with containerized software with singularity/apptainer and the SLURM/PBSpro executors on a high-performance compute cluster (HPC).

  • Analysis and visualization of multi-sample single cell scRNA-seq and scATAC-seq datasets with Cellranger quantification, followed by doublet detection, ambient DNA correction, and dataset integration (SCANVI and Seurat v4/v5). Production of multi-omics analyses for bulk transcriptomic RNA-seq and single-cell datasets.

  • Generation of complex visualizations include heatmaps, circos plots, oncoprints, 3D scatter plots, genomic tracks, network graphs, and volcano plots, among others, to identify actionable insights from multivariate biological datasets from NGS and public databases (ensembl, UCSC, Genomic Data Commons).

  • Statistical analysis of complex clinical data elements using dplyr and tidyverse framework.

  • Engagement in bioinformatics support by providing guidance and troubleshooting assistance at office hours and teaching internal courses.


Bioinformatics Analyst

Principal Investigator: Soheil Meshinchi

Fred Hutchinson Cancer Center, Seattle, WA

January 2017 – March 2022

  • Development of genomics workflows for cloud computing on AWS using Nextflow for processing raw RNA-seq data for gene expression quantification and fusion detection.

  • Complex data manipulation and interpretation using dplyr and the principles of tidyverse on large datasets derived from NGS and public databases. Retrieval of genomics and clinical data from public databases using public APIs.

  • Profiled pediatric AML subtypes including gene fusions CBFA2T3-GLIS2, NUP98 fusions, and mutations (FLT3-ITD) using multi-omic data (RNA-sequencing and miRNA-seq) with supervised and unsupervised clustering ML algorithms, as well as statistical regression and classification.

  • Collaborated in a multidisciplinary environment with molecular biologists, bioinformaticians, and lead investigators. These bioinformatics analyses and data visualizations utilized in 18 manuscripts published in peer reviewed journals, as well as grant applications for American Cancer Society, Saint Baldricks Foundation, Gabriella Miller Foundation, and TpAML being funded.

  • Survival analysis of AML variants and application of the LSC17 prognostic score to a pediatric cohort using Kaplan-Meier estimates and Cox proportional hazards regression; resulted in oral presentation at American Society of Hematology (2017) and followed by collaborative analysis with UCSF.


Postbaccalaureate Fellow

Principal Investigator: Maria Morasso

National Institutes of Health, Bethesda, MD

June 2014 - July 2016

  • Conducted biomedical research on the function of homeodomain protein DLX3 at NIAMS.

  • Bioinformatics analysis of ATAC-seq and ChIP-seq from murine mouse models using command line tools (CLI) for genomic alignment, MACS2 peak calling, motif analysis with HOMER, and figure generation with NGS.plot and Deeptools.


Science Teacher

Teach for America Phoenix Corps

Agua Fria High School, Avondale, AZ

June 2012 - May 2014

  • Certified for secondary biology and chemistry education; Recognition of Excellence (ROE) Award for Biology Issued by Educational Testing Service (ETS).

  • Courses taught: general biology, AP biology, general chemistry and integrated science.


Research Assistant

Principal Investigator: Terry Bird

University of San Diego, San Diego, CA

August 2009 - May 2012

  • Investigated the role of CHPT in the cellular differentiation of the bacterium Rhodospirillum centenum

  • Utilized site-directed mutagenesis and construction of triple knockout mutants of CHPT, CTRA, and CYD2 using molecular cloning techniques.

Language or Framework Experience Tools / Packages
R programming Expert
  • base R, tidyverse, bioconductor

  • R package development

  • scripting withRscript CLI and commandArgs

  • quarto / rmarkdown

  • posit workbench / Rstudio IDE

Bash / linux Advanced
  • rclone, rsync, scp data copying and management

  • git and github command line interface (CLI)

  • regular expressions with find, grep , sed, tr, etc

  • shell scripting with control structures and iteration (for, if/else, etc)

  • environment variables

  • modifying shell behavior and configurations (.bashrc, set built-in, CLI tool configurations)

Nextflow Advanced
  • modularized scientific workflows using domain-specific language (DSL2) syntax

  • nf-core and nf-core tools CLI

  • invoke Groovy language operators, object oriented methods, and closures in nextflow scripts

  • develop custom workflows using nf-core best practices

  • adapt nf-core community developed pipelines to meet institutional requirements

Python programming Experienced
  • python scripting using control structures, iteration, etc.

  • scanpy and SCVI-tools

  • conda, mamba, and pip package manages

  • venv virtual environments

  • jupyter notebooks

  • multi-lingual environments with reticulate in Rstudio IDE

Version control Advanced
  • Git and github (gh) CLI

  • Github and Bitbucket remotes

  • gitpod cloud development environments

HPC Advanced
  • SLURM and PBS schedulers

  • generation of HPC modules / module files

  • array jobs

  • parallelization

  • resource allocation

Containerization Advanced
  • docker CLI

  • apptainer / singularity CLI

  • generation of custom containerized environments

  • adapt and build containers from existing dockerfiles to meet institutional requirements

  • utilize scientific software container images from biocontainers and galaxy hub repositories

  • use of quay.io and docker hub repositories

Cloud Platforms Competent
  • Amazon Web Services

    • AWS S3

    • Batch

  • Azure

    • blob storage
  • Google Cloud Platform

CI / CD Competent
  • development of CI/CD plans using Atlassian bamboo agent

  • implementation of CI/CD functional tests for Nextflow workflows

    • plan integrated with bitbucket remote

    • plan branches for main and dev

    • automatic triggers

    • scheduled tasks

  • github actions workflow for quarto website

SQL Competent / Being Developed
  • complex queries and database operations, such as filtering and joins

  • conversion to SQL using dbplyr R package

  • SQL scripting skills being developed

    • 2024 November: SQL programming course, Codecademy
    • 2024 Dec - present: Relational Database Certification course, FreeCodeCamp
  • utilization and creation of SQLite databases for genomic reference data

IDE Expert
  • integrated development environments for analysis projects and development of data pipelines and packages

  • VSCode and extensions

  • RStudio

  • Posit workbench

  • Jupyter notebooks

Bioinformatics and Statistical Analysis

  • Unsupervised clustering with dendrograms, PCA, NMDS, and UMAP

  • Statistical regression, classification, and regularization

  • Survival and time-to-event analysis

  • Data visualization

  • Differential gene expression, differential binding, and differential DNA accessibility analysis

  • De novo fusion transcript detection

  • single-cell celltype classification / prediction


NGS Data Types

  • Transcriptomics

    • scRNA-seq (single-cell RNA-seq)

    • scATAC-seq (single-cell ATAC-seq)

    • RNA-seq (Illumina short read, PacBio long read)

    • miRNA-seq (microRNA-seq)

  • Epigenetics

    • ATAC-seq

    • Cut&Run

    • ChIP-seq

Continuing Education and Volunteering

  • 2024 November, SQL Programming, Codecademy, Virtual

  • 2024 August, DevOps for Data Scientists, POSIT Conference, Seattle, WA

  • 2024 March, Nextflow NF-Core Hackathon, University of Washington

  • 2024 February, IGNITE speaker to promote gender equity in STEM, Seattle Children’s

  • 2023 - 2024, International Society for Computational Biology (ISCB) member

  • 2018 - 2023, RLadies event organizer, Seattle, WA

  • 2022 July, Bioconductor conference organizer, Seattle, WA

  • 2021 - 2022, Mentor, UO Bioinformatics and Genomics Graduate Program

  • 2021 January, SnpReportR package, Carnegie Mellon and DNAnexus Hackathon

  • 2020 July, Pacific Biosciences (PacBio) Isoseq Transcriptome Analysis Training

  • 2020 May, Nextflow workflow development training, Fred Hutch Cancer Center

  • 2017 - 2019, Women In Biology, MAPS Mentorship Group, Seattle, WA

  • 2018 February, ConsensusML: Machine learning classification in AML, NCBI Hackathon

  • 2018 June, Mentor, Fred Hutch Summer High School Internship, Seattle, WA

  • 2017 June, Summer Institute in Statistics for Big Data (SISBID), University of Washington

References

Note five levels of skills proficiency are used here. Definitions are listed below:

  1. Being Developed: the individual demonstrates a minimal use of the competency and is currently developing it
  2. Basic / Competent: the individual demonstrates use of a competency; can work independently and requires additional experience, but carries out deliberate planning and formulates routines to execute tasks effectively.
  3. Intermediate / Experienced: the individual demonstrates a working or functional proficiency level which enables the competency to be exercised effectively (has working or functional command of the competency)
  4. Advanced: the individual demonstrates in depth proficiency level ; is able to assist, consult or lead others in the application of a competency
  5. Expert: the individual demonstrates broad, in-depth proficiency; is recognized as an authority or mastery in skills and exercising the competency

Adapted from UBC and Dreyfus