I am currently a Computer Science and Engineering PhD student at the University of Washington. I feel very fortunate to be advised by Magdalena Balazinska.
My research is focused on the following topics: service level agreements (SLAs) for data analytics, cloud computing, distributed systems, and machine learning + databases.
I have been generously supported by the NSF Graduate Research Fellowship and the GO-MAP Dissertation Fellowship.
jortiz16 at cs.washington.edu
SLAOrchestrator: Reducing the Cost of Performance SLAs in the Cloud,
J. Ortiz, B. Lee, M. Balazinska, J. Gehrke, and J. L. Hellerstein, USENIX ATC 2018
Learning State Representations for Query Optimization with Deep Reinforcement Learning,
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi, DEEM Workshop with SIGMOD 2018
The Myria Big Data Management System and Analytics System and Cloud Services,
The Myria Team, CIDR 2017.
PerfEnforce Overview : A Scaling Engine for Analytics with Performance Guarantees,
J. Ortiz, SIGMOD 2017 Student Research Competition (extended abstract). Awarded First Runner-up.
PerfEnforce Demonstration: Data Analytics with Performance Guarantees,
J. Ortiz, B. Lee, M. Balazinska, SIGMOD Demonstration 2016.
Changing the Face of Database Cloud Services with Personalized Service Level Agreements,
J. Ortiz, V. T. Almeida, M. Balazinska, CIDR 2015.
Towards a hybrid relational and XML benchmark for loosely-coupled distributed data sources, M.B. Chaudhari, S.W.
Dietrich, J. Ortiz, S. Pearson, Journal of Systems and Software 2015.
Big-Data Management Use-Case: A Cloud Service for Creating and Analyzing Galactic Merger Trees S. Loebman, J. Ortiz, L. Choo, L. Orr, L. Anderson, D. Halperin, M. Balazinska, T. Quinn, and F. Governato, Workshop
on Data Analytics in the Cloud (DanaC) with SIGMOD 2014.
Demonstration of the Myria Big Data Management Service D. Halperin,
V. Teixeira de Almeida, L. Choo, S. Chu, P. Koutris, D. Moritz, J. Ortiz, V. Ruamviboonsuk, J. Wang, A. Whitaker, S.
Xu, M. Balazinska, B. Howe, D. Suciu, SIGMOD Demonstration 2014.
A Vision for Personalized Service Level Agreements in the Cloud, J. Ortiz,
V. T. Almeida, M. Balazinska, Workshop on Data Analytics in the Cloud (DanaC) with SIGMOD 2013.
Learning from Database Performance Benchmarks J. Ortiz, S. W. Dietrich, and M.B. Chaudhari, Consortium for Computing
Sciences in Colleges, March 2012.
Deep reinforcement learning is quickly changing the field of artificial intelligence. These models are able to capture a high level understanding of their environment, enabling them to learn difficult dynamic tasks in a variety of domains. In the database field, query optimization remains a difficult problem. Our goal in this work is to explore the capabilities of deep reinforcement learning in the context of query optimization.
We present PerfEnforce, a scaling engine designed to enable cloud providers to sell performance levels for data analytics cloud services. PerfEnforce scales a cluster of virtual machines for multiple users in a way that minimizes cost while probabilistically meeting the query runtime guarantees offered by a service level agreement. With PerfEnforce, we demonstrate how to scale a cluster in a way that minimally disrupts a user's query session. We further show when to scale the cluster using one of three methods: feedback control, reinforcement learning (through multi-armed bandits), or online learning. We find that online learning outperforms the other two methods when making cluster scaling decisions.
Update: You can try out the PSLAManager and PerfEnforce systems when launching Myria on EC2. Simply launch the cluster with the '--perfenforce' flag as follows: 'myria-cluster create test-cluster --perfenforce'. Please note that launching a cluster with this command will automatically provision 12 m4.xlarge machines. See more info here.(Source Code)
Public Clouds today provide a variety of services for data analysis such as Amazon Elastic MapReduce and Google BigQuery. Each service comes with a pricing model and service level agreement (SLA). Today's pricing models and SLAs are described at the level of compute resources (instance-hours or gigabytes processed). They are also different from one service to the next. Both conditions make it difficult for users to select a service, pick a configuration, and predict the actual analysis cost. To address this challenge, we propose a new abstraction, called a Personalized Service Level Agreement, where users are presented with what they can do with their data in terms of query capabilities, guaranteed query performance and fixed hourly prices.(Source Code)
We have created and are currently developing a service that enables astronomers to study the growth history of galaxies by following their 'merger trees' in large-scale astrophysical simulations. The service uses the Myria parallel data management system as back-end and the D3 data visualization library within its graphical front-end. We demonstrated the service at the 2014 DanaC workshop on a ~5 TB dataset.
This initially started as a data visualization class project for Laurel Orr and I. It peaked enough interest from the astronomers to continue to grow, and eventually we extended this service to work with Myria through the Ascot gadget
In this work, we describe finger gesture recognition using a microphone array. First we try to use the reflection of sound waves by implementing an acoustic pulse radar. After understanding we cannot obtain a fine enough resolution to detect a subtle gesture, we implemented a logistic regression classifier using the features from Doppler shift for each microphone. After testing the classifier on different data sets, we concluded that the classifier can be trained successfully through a small number of samples for each person.
This work was a networking class project with SeongJae Lee (see paper)
The goal of this project is to detect research communities from the DBLP bibliography in order to predict the various research areas an author contributes to. We evaluate different unsupervised clustering techniques by seeing how well they distinguish between research areas and place authors in their corresponding areas. We focus on unsupervised techniques because it may not be known apriori what field a researcher is in.
This work was a result of a machine learning class project with Laurel Orr (see paper)