Katie Doroschak

PhD Student in Computer Science

Data science & machine learning in computational & synthetic biology

See my work


6th year PhD student, Computer Science & Engineering
      University of Washington, Seattle, WA
      MISL Lab, advised by Luis Ceze, Jeff Nivala, and Karin Strauss
      Advanced Data Science track
      Expected graduation: Autumn/Winter 2020
MA Computer Science & Engineering
      June 2017     University of Washington, Seattle, WA
BS Computer Science
      May 2014     University of Minnesota, Twin Cities, MN
      University Honors Program


Note that not all projects are represented here -- just my favorites, the most recent, and the most interesting!

  Molecular tagging with nanopore-orthogonal DNA strands

  University of Washington, Seattle, WA     Aug 2017 - present
We created a molecular tagging system using synthetic DNA-based tags designed for a portable, low-cost DNA sequencing device (Oxford Nanopore Technologies’ MinION). These tags are analogous to radio frequency identification (RFID) tags in the digital world, in that they encode only a short identifier and can be attached to objects to be re-identified later. In our system, digital information is converted to DNA not by synthesizing new DNA strands, but by the presence or absence of unique, pre-prepared molecular bits (molbits) representing 1’s and 0’s respectively. We have extensively designed these bits so they each produce visually unique data in the nanopore sequencing device, and developed deep learning classifiers to read the molecular tags rapidly with minimal amounts of data (seconds to minutes of sequencing), bringing costs down significantly.
  Lightning talk and poster at London Calling, June 2019
  Invited talk at Microsoft Research (Security group), November 2019
  Madrona prize runner-up at UW CSE Affiliates research day, November 2019
  NSF Innovation Corps (I-Corps) grant via UW CoMotion, January-July 2020
  Misc other posters & talks throughout project
  Manuscript in preparation

  NanoporeTERs -- Multiplexed, barcoded reporter proteins for nanopore arrays

  University of Washington, Seattle, WA     Aug 2017 - Nov 2019
Replace traditional fluorescent readout by excreting a reporter protein that is directly read via the ONT MinION nanopore device (with an unmodified R9.4.1 flowcell).
For this project, I created the from-scratch pipeline for data preprocessing and analysis from raw data to peptide identification, plus the first version of the classifier (Random Forest) and additional computational analysis.
  BioRxiv preprint

  DNA barcoding with nanopore techology

  University of Washington, Seattle, WA     Aug 2017 - 2018
Identifying and quantifying DNA barcodes (short DNA strands attached to the output of other biochemical reactions) in Oxford Nanopore MinION data.

  Climate Conversations App

  UW, Seattle, WA     Dec 2016 - present
Developing an app to facilitate personal conversations about climate change. I write and maintain the app, and together we research content by seeking feedback at local community group events and conferences. This is a side project that is becoming much more than that, as a completely student-run collaboration with 3 postdocs in UW’s Program on Climate Change and 1 PhD student in UW DXARTS. See climateconversations.org
  Presentation at 2017 Northwest Climate Conference
  Poster at 2017 AGU Fall Meeting (abstract)

Allele-specific expression (ASE)

  University of Washington (UW), Seattle, WA     Sept 2016 - Aug 2017
Analyzed the use of ASE in RNA-Seq for use as a marker in inherited crossover events. Evaluated the effect of haplotype phasing and read start bias on a personal genome-based ASE calling algorithm, AlleleSeq.
  Masters qualifying document ("thesis")

RNA-Seq Bias quantification and correction

  UW, Seattle, WA     Sept 2015 - July 2016
Compared eight machine learning methods (a mix of regression & ensemble methods, all tuned) to successfully distinguish RNA-Seq laboratories. The sequencing dataset was of the same sample sequenced in six different labs, and training on just the reads themselves revealed enough experimental bias to distinguish the labs with these methods.
  CSE 526 (Machine Learning) final project

Bioinformatics Scientist Intern

  National Marrow Donor Program, Minneapolis, MN     Summer 2012, 2013-2014
Three selected projects: (1) Designed a pilot program to transmit HLA and KIR genotype data for transplant labs lacking IT support. (2) Changed donor selection interface to provide physicians more info with increasing ubiquity of NGS data. (3) Developed tools to retrieve donor-patient match results for mandatory governmental reporting.
  (1) Paper in Human Immunology, 2015 (pubmed, journal impact factor 2.127 in 2015)
  (1) Poster at 2014 EFI European Immunogenetics and Histocompatibility Conference
Projects (2) and (3) are non-public.

DREU Research Participant

  Tufts University, Medford, MA     Summer 2013
Derived publication-based confidence scores for protein interaction networks for protein function prediction. Roughly, scores were based on the throughput of the source (i.e. a paper confirming 10-15 interactions would be more reliable than one confirming 1000+) and linear combinations of the various scores if an interaction had multiple confirmations.
  Paper in Bioinformatics, 2014 (pubmed, impact factor 4.981 in 2014)
  Archived blog from this project
  Distributed Research Experience for Undergraduates Award (DREU) (awarded in order to do this work)
  Clare Boothe Luce Undergraduate Research Scholar (awarded as a result of this work)



  • Cardozo, Nicolas, Karen Zhang, Katie Doroschak, Aerilynn Nguyen, Zoheb Siddiqui, Karin Strauss, Luis Ceze, Jeff Nivala. "Multiplexed direct detection of barcoded protein reporters on a nanopore array", bioRxiv preprint, November 11, 2019. bioRxiv.

  • Milius, Robert P., Michael Heuer, Daniel Valiga, Kathryn J. Doroschak, Caleb J. Kennedy, Yung-Tsi Bolon, Joel Schneider, Jane Pollack, Hwa Ran Kim, Nezih Cereb, Jill A. Hollenbach, Steven J. Mack, and Martin Maiers. "Histoimmunogenetics Markup Language 1.0:Reporting Next Generation Sequencing-based HLA and KIR Genotyping." Human Immunology, December 2015. PubMed

  • Cao, Mengfei, Christopher M. Pietras, Xian Feng, Kathryn J. Doroschak, Thomas Schaffner, Jisoo Park, Hao Zhang, Lenore J. Cowen, and Benjamin J. Hescott. "New Directions for Diffusion-based Network Prediction of Protein Function: Incorporating Pathways with Confidence." Bioinformatics. Oxford University Press, 11 June 2015. PubMed


  • Molecular tagging system (Error correction and security perspectives). Invited talk at Microsoft Research (Security group), November 2019.

  • Molecular tagging overview (technical perspective). UW CSE Affiliates Research Day, November 2019.

  • “Molecular tagging with nanopore-orthogonal DNA strands.” Lightning talk at Oxford Nanopore Technologies London calling, June 2019. (Vimeo)

  • “Reading DNA barcodes in electric current time series data from nanopore sequencing.” Paul G. Allen School Industry Affiliates Annual Research Day, 15 Nov 2017.


(Selected and recent posters only.)
  • Kathryn J. Doroschak, Karen Zhang, Melissa Queen, Aishwarya Mandyam, Karin Strauss, Jeff Nivala, Luis Ceze. “Molecular tagging system with nanopore-orthogonal DNA strands, UW CSE Affiliates Research Day, Seattle, WA. November 2019.

  • Kathryn J. Doroschak, Karen Zhang, Melissa Queen, Aishwarya Mandyam, Karin Strauss, Jeff Nivala, Luis Ceze. “Molecular tagging system with nanopore-orthogonal DNA strands, DNA 25, Seattle, WA. August 2019.

  • Kathryn J. Doroschak, Karen Zhang, Melissa Queen, Aishwarya Mandyam, Karin Strauss, Jeff Nivala, Luis Ceze. “Molecular tagging system with nanopore-orthogonal DNA strands, London Calling, London, UK. June 2019.

  • Kathryn J. Doroschak, Thomas Schaffner, Lenore Cowen. “Improvements in protein function prediction using confidence in protein interactions”, RECOMB 2014 Philadelphia, PA.

  • Kathryn J. Doroschak, Robert P. Milius, Joel Schneider, Michael Heuer, Michael George, Jane Pollack, Seonghan Kim, Nezih Cereb, Jill A. Hollenbach, Steven J. Mack, Martin Maiers. “Enhancing HML for Electronic Reporting of NGS-based HLA and KIR Genotyping Results”, EFI European Immunogenetics and Histocompatibility Conference (July 2014), Stockholm, Sweden.



CSE 427 Computational Biology    Autumn 2017
Assisted Professor Ruzzo. Answered conceptual questions and graded all work for coding assignments.
CSE 312 Foundations of Computing II (essentially Probability and Statistics)     Spring 2015
Assisted Professor Ruzzo. Lead and design content for weekly discussion sections, hold weekly office hours, meet with struggling students 1:1, and grade weekly and daily assignments.
CSE 421 Introduction to Algorithms     Winter 2014
Assisted Professor Ruzzo. Held weekly office hours and graded sections of weekly assignments.
  Filled in for a guest lecture on dynamic programming.
CSE 527 Computational Biology     Autumn 2014
Assisted Professor Ruzzo. Professional Masters Course, solo TA. Answered conceptual questions and graded all assignments.


Mentor to lab undergraduates    Ongoing
Mentored 6 current and former undergraduates in the MISL lab or doing side projects adjacent to the lab.
Women's Research Day Organizer    Oct 2018 - Feb 2019
Coordinated efforts for inviting speakers and attendees.
Outreach via Science Communication Fellowship    Dec 2017 - present
Run demonstrations related to my research at the Pacific Science Center as a Science Communication Fellow.
Grad admissions application reading committee    Dec 2017, 2018, 2019
Help read incoming applications to the department for the following term.
Moderator of mid-size subreddit    Mar 2017 - present
Seniormost active moderator for an active community of 250k+.
Technical skills:Maintain a bot to flag posts when they reach the top pages of reddit, notifying both the moderators and the user. I also occasionally do some NLP on the content of the posts to predict their topic and identify subreddit-specific spam, like covert brand promotion (mostly just for fun).
Soft skills:Moderate internet trolls and rulebreakers, write wiki articles (advanced guides on the subreddit's topic), and keep things organized.


Madrona Prize runner-up
  November 2019   Article with more details on the award
At UW CSE Affiliates Day, Madrona Venture Group recognizes projects that combine exciting research with commercial potential.
Science Communication Fellow
  March 2018-present   More info
NIH Big Data in Genomics and Neuroscience Training Grant
  Sept 2015-2017   More info (training grant homepage)
Clare Boothe Luce Undergraduate Research Scholar
  Aug 2013
Distributed Research Experience for Undergraduates Award (DREU)
  Aug 2013

Contact Me

You can reach me at kdorosch cs.washington.edu.

Here's my LinkedIn (not updated too often any more, best for experience < 2015).