5th year PhD student, Computer Science & Engineering
University of Washington, Seattle, WA
MA Computer Science & Engineering
June 2017 University of Washington, Seattle, WA
BS Computer Science
May 2014 University of Minnesota, Twin Cities, MN
University Honors Program
Note that not all projects are represented here -- just my favorites, the most recent, and the most interesting!
DNA barcoding with nanopore techologyUniversity of Washington, Seattle, WA Aug 2017 - present
Identifying and quantifying DNA barcodes (short DNA strands attached to the output of other biochemical reactions) in Oxford Nanopore MinION data. Methods development is in progress; currently working with HMMs and random forests for data preprocessing and analysis.
Nanopore proteomicsUniversity of Washington, Seattle, WA Aug 2017 - present
Developing strategies for analyzing peptide data from nanopores. (Intentionally vague for now.)
Climate Conversations AppUW, Seattle, WA Dec 2016 - present
Developing an app to facilitate personal conversations about climate change. I write and maintain the app, and together we research content by seeking feedback at local community group events and conferences. This is a side project that is becoming much more than that, as a completely student-run collaboration with 3 postdocs in UW’s Program on Climate Change and 1 PhD student in UW DXARTS. See climateconversations.org
Presentation at 2017 Northwest Climate Conference (slides)
Poster at 2017 AGU Fall Meeting (abstract)
Allele-specific expression (ASE)University of Washington (UW), Seattle, WA Sept 2016 - Aug 2017
Analyzed the use of ASE in RNA-Seq for use as a marker in inherited crossover events. Evaluated the effect of haplotype phasing and read start bias on a personal genome-based ASE calling algorithm, AlleleSeq.
Masters qualifying document ("thesis")
RNA-Seq Bias quantification and correctionUW, Seattle, WA Sept 2015 - July 2016
Compared eight machine learning methods (a mix of regression & ensemble methods, all tuned) to successfully distinguish RNA-Seq laboratories. The sequencing dataset was of the same sample sequenced in six different labs, and training on just the reads themselves revealed enough experimental bias to distinguish the labs with these methods.
CSE 526 (Machine Learning) final project
Bioinformatics Scientist InternNational Marrow Donor Program, Minneapolis, MN Summer 2012, 2013-2014
Three selected projects: (1) Designed a pilot program to transmit HLA and KIR genotype data for transplant labs lacking IT support. (2) Changed donor selection interface to provide physicians more info with increasing ubiquity of NGS data. (3) Developed tools to retrieve donor-patient match results for mandatory governmental reporting.
(1) Paper in Human Immunology, 2015 (pubmed, journal impact factor 2.127 in 2015)
(1) Poster at 2014 EFI European Immunogenetics and Histocompatibility Conference
Projects (2) and (3) are non-public.
DREU Research ParticipantTufts University, Medford, MA Summer 2013
Derived publication-based confidence scores for protein interaction networks for protein function prediction. Roughly, scores were based on the throughput of the source (i.e. a paper confirming 10-15 interactions would be more reliable than one confirming 1000+) and linear combinations of the various scores if an interaction had multiple confirmations.
Paper in Bioinformatics, 2014 (pubmed, impact factor 4.981 in 2014)
Archived blog from this project
Distributed Research Experience for Undergraduates Award (DREU) (awarded in order to do this work)
Clare Boothe Luce Undergraduate Research Scholar (awarded as a result of this work)
- Milius, Robert P., Michael Heuer, Daniel Valiga, Kathryn J. Doroschak, Caleb J. Kennedy, Yung-Tsi Bolon, Joel Schneider, Jane Pollack, Hwa Ran Kim, Nezih Cereb, Jill A. Hollenbach, Steven J. Mack, and Martin Maiers. "Histoimmunogenetics Markup Language 1.0: Reporting Next Generation Sequencing-based HLA and KIR Genotyping." Human Immunology, December 2015. PubMed
- Cao, Mengfei, Christopher M. Pietras, Xian Feng, Kathryn J. Doroschak, Thomas Schaffner, Jisoo Park, Hao Zhang, Lenore J. Cowen, and Benjamin J. Hescott. "New Directions for Diffusion-based Network Prediction of Protein Function: Incorporating Pathways with Confidence." Bioinformatics. Oxford University Press, 11 June 2015. PubMed
- Kathryn J. Doroschak, Jeff Nivala, Walter L. Ruzzo, and Luis Ceze. “Reading DNA barcodes in electric current time series data from nanopore sequencing.” Paul G. Allen School Industry Affiliates Annual Research Day, 15 Nov 2017.
- Kathryn J. Doroschak, Thomas Schaffner, Lenore Cowen. “Improvements in protein function prediction using confidence in protein interactions”, RECOMB 2014 Philadelphia, PA. (Poster)
- Kathryn J. Doroschak, Robert P. Milius, Joel Schneider, Michael Heuer, Michael George, Jane Pollack, Seonghan Kim, Nezih Cereb, Jill A. Hollenbach, Steven J. Mack, Martin Maiers. “Enhancing HML for Electronic Reporting of NGS-based HLA and KIR Genotyping Results”, EFI European Immunogenetics and Histocompatibility Conference (July 2014), Stockholm, Sweden. (Poster)
CSE 427 Computational Biology
Assisted Professor Ruzzo. Answered conceptual questions and graded all work for coding assignments.
CSE 312 Foundations of Computing II (essentially Probability and Statistics) Spring 2015
Assisted Professor Ruzzo. Lead and design content for weekly discussion sections, hold weekly office hours, meet with struggling students 1:1, and grade weekly and daily assignments.
CSE 421 Introduction to Algorithms Winter 2014
Assisted Professor Ruzzo. Held weekly office hours and graded sections of weekly assignments.
Filled in for a guest lecture on dynamic programming.
CSE 527 Computational Biology Autumn 2014
Assisted Professor Ruzzo. Professional Masters Course, solo TA. Answered conceptual questions and graded all assignments.
Outreach via Science Communication Fellowship
Run demonstrations related to my research at the Pacific Science Center as a Science Communication Fellow.
Grad admissions application reading committee Dec 2017
Help read incoming applications to the department for the following term.
Moderator of mid-size subreddit Mar 2017 - present
Lead moderator for an active community of ~130k. Technical skills: Maintain a simple bot to flag posts when they reach the top pages of reddit. I also occasionally do some NLP on the content of the posts to predict their topic and identify subreddit-specific spam, like covert brand promotion (mostly just for fun). Soft skills: Patiently and kindly deal with internet trolls and rulebreakers, write wiki articles (beginner and advanced guides on the subreddit's topic), and keep things organized.
Science Communication Fellow
March 2018-present More info
NIH Big Data in Genomics and Neuroscience Training Grant
Sept 2015-2017 More info (training grant homepage)
Clare Boothe Luce Undergraduate Research Scholar
Distributed Research Experience for Undergraduates Award (DREU)
I'm poking around for interesting internships for Summer 2019. Feel free to email me if you're seeking a grad level intern in anything comp bio and/or data science and/or machine learning!
You can reach me at kdorosch cs.washington.edu.
Here's my LinkedIn (not updated too often any more, best for experience < 2015).