Katie Doroschak

PhD Student in Computer Science

Data science & machine learning in computational & synthetic biology

4th year PhD student, Computer Science & Engineering
      University of Washington, Seattle, WA
MA Computer Science & Engineering
     June 2017     University of Washington, Seattle, WA
BS Computer Science
      May 2014     University of Minnesota, Twin Cities, MN
      University Honors Program


Note that not all projects are represented here -- just my favorites, the most recent, and the most interesting!

 DNA barcoding with nanopore techology

  University of Washington, Seattle, WA     Aug 2017 - present
Identifying and quantifying DNA barcodes (short DNA strands attached to the output of other biochemical reactions) in Oxford Nanopore MinION data. Methods development is in progress; currently working with HMMs and random forests for data preprocessing and analysis.

 Sequence alignment for nanopores

  University of Washington, Seattle, WA     Aug 2017 - present
Pursuing real-time nanopore sequence alignment. (Intentionally vague for now.)

 Nanopore proteomics

  University of Washington, Seattle, WA     Aug 2017 - present
Developing strategies for analyzing peptide data from nanopores. (Intentionally vague for now.)

 Climate Conversations App

  UW, Seattle, WA     Dec 2016 - present
Developing an app to facilitate personal conversations about climate change. I write and maintain the app, and together we research content by seeking feedback at local community group events and conferences. This is a side project that is becoming much more than that, as a completely student-run collaboration with 3 postdocs in UW’s Program on Climate Change and 1 PhD student in UW DXARTS. See climateconversations.org
  Presentation at 2017 Northwest Climate Conference (slides)
  Poster at 2017 AGU Fall Meeting (abstract)

Allele-specific expression (ASE)

  University of Washington (UW), Seattle, WA     Sept 2016 - Aug 2017
Analyzed the use of ASE in RNA-Seq for use as a marker in inherited crossover events. Evaluated the effect of haplotype phasing and read start bias on a personal genome-based ASE calling algorithm, AlleleSeq.
  Masters qualifying document ("thesis")

RNA-Seq Bias quantification and correction

  UW, Seattle, WA     Sept 2015 - July 2016
Compared eight machine learning methods (a mix of regression & ensemble methods, all tuned) to successfully distinguish RNA-Seq laboratories. The sequencing dataset was of the same sample sequenced in six different labs, and training on just the reads themselves revealed enough experimental bias to distinguish the labs with these methods.
  CSE 526 (Machine Learning) final project

Bioinformatics Scientist Intern

  National Marrow Donor Program, Minneapolis, MN     Summer 2012, 2013-2014
Three selected projects: (1) Designed a pilot program to transmit HLA and KIR genotype data for transplant labs lacking IT support. (2) Changed donor selection interface to provide physicians more info with increasing ubiquity of NGS data. (3) Developed tools to retrieve donor-patient match results for mandatory governmental reporting.
  (1) Paper in Human Immunology, 2015 (pubmed, journal impact factor 2.127 in 2015)
  (1) Poster at 2014 EFI European Immunogenetics and Histocompatibility Conference
Projects (2) and (3) are non-public.

DREU Research Participant

  Tufts University, Medford, MA     Summer 2013
Derived publication-based confidence scores for protein interaction networks for protein function prediction. Roughly, scores were based on the throughput of the source (i.e. a paper confirming 10-15 interactions would be more reliable than one confirming 1000+) and linear combinations of the various scores if an interaction had multiple confirmations.
  Paper in Bioinformatics, 2014 (pubmed, impact factor 4.981 in 2014)
  Archived blog from this project
  Distributed Research Experience for Undergraduates Award (DREU) (awarded in order to do this work)
  Clare Boothe Luce Undergraduate Research Scholar (awarded as a result of this work)



  • Milius, Robert P., Michael Heuer, Daniel Valiga, Kathryn J. Doroschak, Caleb J. Kennedy, Yung-Tsi Bolon, Joel Schneider, Jane Pollack, Hwa Ran Kim, Nezih Cereb, Jill A. Hollenbach, Steven J. Mack, and Martin Maiers. "Histoimmunogenetics Markup Language 1.0: Reporting Next Generation Sequencing-based HLA and KIR Genotyping." Human Immunology, December 2015. PubMed

  • Cao, Mengfei, Christopher M. Pietras, Xian Feng, Kathryn J. Doroschak, Thomas Schaffner, Jisoo Park, Hao Zhang, Lenore J. Cowen, and Benjamin J. Hescott. "New Directions for Diffusion-based Network Prediction of Protein Function: Incorporating Pathways with Confidence." Bioinformatics. Oxford University Press, 11 June 2015. PubMed


  • Katie Doroschak. Research talk (personal background and research at a high level.) Paul G. Allen School Women in Research Day, 6 Apr 2018.
  • Katie Doroschak, Jeff Nivala, Karin Strauss, and Luis Ceze. “Global scale genomics: reducing the cost of nanopore sequence analysis.” UW Data Science Summit, 3 Apr 2018.
  • Kathryn J. Doroschak, Jeff Nivala, Walter L. Ruzzo, and Luis Ceze. “Reading DNA barcodes in electric current time series data from nanopore sequencing.” Paul G. Allen School Industry Affiliates Annual Research Day, 15 Nov 2017.


  • Kathryn J. Doroschak, Thomas Schaffner, Lenore Cowen. “Improvements in protein function prediction using confidence in protein interactions”, RECOMB 2014 Philadelphia, PA. (Poster)

  • Kathryn J. Doroschak, Robert P. Milius, Joel Schneider, Michael Heuer, Michael George, Jane Pollack, Seonghan Kim, Nezih Cereb, Jill A. Hollenbach, Steven J. Mack, Martin Maiers. “Enhancing HML for Electronic Reporting of NGS-based HLA and KIR Genotyping Results”, EFI European Immunogenetics and Histocompatibility Conference (July 2014), Stockholm, Sweden. (Poster)



CSE 427 Computational Biology    Autumn 2017
Assisted Professor Ruzzo. Answered conceptual questions and graded all work for coding assignments.
CSE 312 Foundations of Computing II (essentially Probability and Statistics)     Spring 2015
Assisted Professor Ruzzo. Lead and design content for weekly discussion sections, hold weekly office hours, meet with struggling students 1:1, and grade weekly and daily assignments.
CSE 421 Introduction to Algorithms     Winter 2014
Assisted Professor Ruzzo. Held weekly office hours and graded sections of weekly assignments.
  Filled in for a guest lecture on dynamic programming.
CSE 527 Computational Biology     Autumn 2014
Assisted Professor Ruzzo. Professional Masters Course, solo TA. Answered conceptual questions and graded all assignments.


Grad admissions application reading committee    Dec 2017
Help read incoming applications to the department for the following term.
Moderator of mid-size subreddit    Mar 2017 - present
Sole active moderator for an active community of ~80k. Technical skills: Maintain a simple bot to flag posts when they need my attention. I also occasionally do some NLP on the content of the posts to predict their topic and identify subreddit-specific spam (mostly just for fun). Soft skills: Patiently and kindly deal with internet trolls and rulebreakers, write wiki articles (beginner and advanced guides on the subreddit's topic), and keep things organized.


NIH Big Data in Genomics and Neuroscience Training Grant
  Sept 2015-2017   More info (training grant homepage)
Clare Boothe Luce Undergraduate Research Scholar
  Aug 2013
Distributed Research Experience for Undergraduates Award (DREU)
  Aug 2013

Contact Me

You can reach me at kdorosch cs.washington.edu.

This is my LinkedIn (not updated too often any more, best for experience < 2015).