Horizon: Big Data Analytics for Science

I am the lead PI on two NSF grants exploring the question of how cloud computing can support interactive, visual, exploratory science. Through an NSF Cluster Exploratory grant, and in partnership with visualization experts at the University of Utah, we are exploring the use of MapReduce as a common framework for both scalable data processing and scalable visualization. Through an NSF EAGER grant, I am developing a new visualization algebra for use with the Microsoft Azure platform. The core goal of both projects is to allow scientists to analyze terabytes of data in the cloud as efficiently, conveniently, and as deeply as they can analyze megabytes of data on their laptops.

Students and Postdocs

  • Yongchul Kwon
  • Yingyi Bu
  • Marianne Shaw
  • Scott Moe
  • Paris Koutris

Papers

  1. Hadoop’s adolescence: an analysis of Hadoop usage in scientific workloads
    Kai Ren, YongChul Kwon, Magdalena Balazinska, Bill Howe.
    Proceedings of the VLDB Endowment 6(10) 2013
    @article{ren2013hadoop,
      title = {Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads},
      author = {Ren, Kai and Kwon, YongChul and Balazinska, Magdalena and Howe, Bill},
      journal = {Proceedings of the VLDB Endowment},
      volume = {6},
      number = {10},
      pages = {853--864},
      year = {2013},
      publisher = {VLDB Endowment}
    }
    
  2. Managing Skew in Hadoop.
    YongChul Kwon, Kai Ren, Magdalena Balazinska, Bill Howe, Jerome Rolia.
    IEEE Data Eng. Bull. 36(1) 2013
    @article{kwon2013managing,
      title = {Managing Skew in Hadoop.},
      author = {Kwon, YongChul and Ren, Kai and Balazinska, Magdalena and Howe, Bill and Rolia, Jerome},
      journal = {IEEE Data Eng. Bull.},
      volume = {36},
      number = {1},
      pages = {24--33},
      year = {2013}
    }
    
  3. Skewtune: mitigating skew in mapreduce applications
    YongChul Kwon, Magdalena Balazinska, Bill Howe, Jerome Rolia.
    Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data 2012
    @inproceedings{kwon2012skewtune,
      title = {Skewtune: mitigating skew in mapreduce applications},
      author = {Kwon, YongChul and Balazinska, Magdalena and Howe, Bill and Rolia, Jerome},
      booktitle = {Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data},
      pages = {25--36},
      year = {2012},
      organization = {ACM}
    }
    
  4. Hadoop’s Adolescence; A Comparative Workloads Analysis from Three Research Clusters.
    Kai Ren, Garth Gibson, YongChul Kwon, Magdalena Balazinska, Bill Howe.
    SC Companion 2012
    @inproceedings{ren2012hadoop,
      title = {Hadoop's Adolescence; A Comparative Workloads Analysis from Three Research Clusters.},
      author = {Ren, Kai and Gibson, Garth and Kwon, YongChul and Balazinska, Magdalena and Howe, Bill},
      booktitle = {SC Companion},
      pages = {1452},
      year = {2012}
    }
    
  5. Optimizing large-scale semi-naive datalog evaluation in hadoop
    Marianne Shaw, Paraschos Koutris, Bill Howe, Dan Suciu.
    Proceedings of the Second International Conference on Datalog in Academia and Industry 2012
    @inproceedings{shaw2012optimizing,
      title = {Optimizing large-scale semi-naive datalog evaluation in hadoop},
      author = {Shaw, Marianne and Koutris, Paraschos and Howe, Bill and Suciu, Dan},
      year = {2012},
      booktitle = {Proceedings of the Second International Conference on Datalog in Academia and Industry},
      series = {Datalog 2.0}
    }
    
  6. SkewTune in action: mitigating skew in MapReduce applications
    YongChul Kwon, Magdalena Balazinska, Bill Howe, Jerome Rolia.
    Proceedings of the VLDB Endowment 5(12) 2012
    @article{kwon2012skewtunf,
      title = {SkewTune in action: mitigating skew in MapReduce applications},
      author = {Kwon, YongChul and Balazinska, Magdalena and Howe, Bill and Rolia, Jerome},
      journal = {Proceedings of the VLDB Endowment},
      volume = {5},
      number = {12},
      pages = {1934--1937},
      year = {2012},
      publisher = {VLDB Endowment}
    }
    
  7. Parallel visualization on large clusters using MapReduce
    Huy T Vo, Jonathan Bronson, Brian Summa, João Luiz Dihl Comba, Juliana Freire, Bill Howe, Valerio Pascucci, Cláudio T Silva.
    Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on 2011
    @inproceedings{vo2011parallel,
      title = {Parallel visualization on large clusters using MapReduce},
      author = {Vo, Huy T and Bronson, Jonathan and Summa, Brian and Comba, Jo{\~a}o Luiz Dihl and Freire, Juliana and Howe, Bill and Pascucci, Valerio and Silva, Cl{\'a}udio T},
      booktitle = {Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on},
      pages = {81--88},
      year = {2011},
      organization = {IEEE}
    }
    
  8. A study of skew in mapreduce applications
    YongChul Kwon, Magdalena Balazinska, Bill Howe, Jerome Rolia.
    Open Cirrus Summit 2011
    @article{kwon2011study,
      title = {A study of skew in mapreduce applications},
      author = {Kwon, YongChul and Balazinska, Magdalena and Howe, Bill and Rolia, Jerome},
      journal = {Open Cirrus Summit},
      year = {2011}
    }
    
  9. Astronomy in the cloud: using mapreduce for image co-addition
    Keith Wiley, Andrew Connolly, Jeff Gardner, S Krughoff, Magdalena Balazinska, Bill Howe, Y Kwon, Yingyi Bu.
    Publications of the Astronomical Society of the Pacific 123(901) 2011
    @article{wiley2011astronomy,
      title = {Astronomy in the cloud: using mapreduce for image co-addition},
      author = {Wiley, Keith and Connolly, Andrew and Gardner, Jeff and Krughoff, S and Balazinska, Magdalena and Howe, Bill and Kwon, Y and Bu, Yingyi},
      journal = {Publications of the Astronomical Society of the Pacific},
      volume = {123},
      number = {901},
      pages = {366},
      year = {2011},
      publisher = {IOP Publishing}
    }
    
  10. Scalable clustering algorithm for N-body simulations in a shared-nothing cluster
    YongChul Kwon, Dylan Nunley, Jeffrey P Gardner, Magdalena Balazinska, Bill Howe, Sarah Loebman.
    Scientific and Statistical Database Management 2010
    @inproceedings{kwon2010scalable,
      title = {Scalable clustering algorithm for N-body simulations in a shared-nothing cluster},
      author = {Kwon, YongChul and Nunley, Dylan and Gardner, Jeffrey P and Balazinska, Magdalena and Howe, Bill and Loebman, Sarah},
      booktitle = {Scientific and Statistical Database Management},
      pages = {132--150},
      year = {2010},
      organization = {Springer}
    }
    
  11. Skew-resistant parallel processing of feature-extracting scientific user-defined functions
    YongChul Kwon, Magdalena Balazinska, Bill Howe, Jerome Rolia.
    Proceedings of the 1st ACM symposium on Cloud computing 2010
    @inproceedings{kwon2010skew,
      title = {Skew-resistant parallel processing of feature-extracting scientific user-defined functions},
      author = {Kwon, YongChul and Balazinska, Magdalena and Howe, Bill and Rolia, Jerome},
      booktitle = {Proceedings of the 1st ACM symposium on Cloud computing},
      pages = {75--86},
      year = {2010},
      organization = {ACM}
    }
    
  12. Astronomy in the Cloud: Using MapReduce for Image Coaddition
    Keith Wiley, Andrew J. Connolly, Jeffrey P. Gardner, K. Simon Krughoff, Magdalena Balazinska, Bill Howe, YongChul Kwon, Yingyi Bu.
    CoRR abs/1010.1015() 2010
    @article{wiley2010astronomy,
      author = {Wiley, Keith and Connolly, Andrew J. and Gardner, Jeffrey P. and Krughoff, K. Simon and Balazinska, Magdalena and Howe, Bill and Kwon, YongChul and Bu, Yingyi},
      title = {Astronomy in the Cloud: Using MapReduce for Image Coaddition},
      journal = {CoRR},
      volume = {abs/1010.1015},
      year = {2010},
      url = {http://arxiv.org/abs/1010.1015},
      timestamp = {Fri, 20 Dec 2013 07:51:13 +0100},
      biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/abs-1010-1015},
      bibsource = {dblp computer science bibliography, http://dblp.org}
    }
    
  13. Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?
    Sarah Loebman, Dylan Nunley, YongChul Kwon, Bill Howe, Magdalena Balazinska, Jeffrey P Gardner.
    Cluster Computing and Workshops, 2009. CLUSTER’09. IEEE International Conference on 2009
    @inproceedings{loebman2009analyzing,
      title = {Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help?},
      author = {Loebman, Sarah and Nunley, Dylan and Kwon, YongChul and Howe, Bill and Balazinska, Magdalena and Gardner, Jeffrey P},
      booktitle = {Cluster Computing and Workshops, 2009. CLUSTER'09. IEEE International Conference on},
      pages = {1--10},
      year = {2009},
      organization = {IEEE}
    }
    
  14. Gridfields: model-driven data transformation in the physical sciences
    Bill Howe.
    may 2006 Portland State University (Dissertation)
    @phdthesis{howe2006gridfields,
      author = {Howe, Bill},
      title = {Gridfields: model-driven data transformation in the physical sciences},
      school = {Portland State University (Dissertation)},
      year = {2006},
      month = may
    }
    

Sponsors

  • NSF Award #1060213 CIC: EAGER: Scalable Algebraic Visualization in the Cloud
  • NSF Award #0844572 Where the Ocean Meets the Cloud: Ad Hoc Longitudinal Analysis and Collaboration Over Massive Mesh Data
  • Microsoft Research
  • EMC




This webpage was built with Bootstrap and Jekyll. You can find the source code here. Last updated: Aug 02, 2021