Getting Started

Overview Getting Started Demonstrations Statistics In the Press About

This page has links to the MusicNet dataset and Python tutorials showing how to load and use the dataset.

Downloading MusicNet

Direct download links to the MusicNet dataset are available below. MusicNet is available in three formats: raw, native Python, and HDF5. We recommend the raw version, which is relatively self-documenting and also has improved metrical annotations if you intend to use this aspect of the labels. We also provide metadata for recordings in MusicNet. This metadata is distributed in csv format; the id column of the metadata file is cross-indexed with MusicNet ids in the data files.

(Raw - recommended) The raw data is available in standard wav audio format, with corresponding label files in csv format. These data and label filenames are MusicNet ids, which you can use to cross-index the data, labels, and metadata files. For convenience, we provide a PyTorch interface for accessing this data.

  • PyTorch - The PyTorch learning framework.
  • MusicNet in PyTorch - PyTorch Dataset class and demos for downloading and accessing MusicNet.

(Python) The Python version of the dataset is distributed as a NumPy npz file. This is a binary format specific to Python (WARNING: if you attempt to read this data in Python 3, you need to set encoding='latin1' when you call np.load or your process will hang without any informative error messages). This format has three dependencies:

  • Python - This version of MusicNet is distributed as a Python object.
  • NumPy - The MusicNet features are stored in NumPy arrays.
  • intervaltree - The MusicNet labels are stored in an IntervalTree.

(HDF5) The HDF5 version of MusicNet requires an HDF5 parser for your language of choice. The data is organized into 330 groups, one for each song, under headings "/id_<MusicNet ID>". Each group contains a "data" dataset (a CArray containing the audio signal) and a "labels" dataset (a Table of labels).

  • HDF5 - This is the official webpage for HDF5.
  • Parsers - Wikipedia maintains an extensive list of HDF5 interfaces for various languages.

Download Links

Here are some tutorials for getting started with MusicNet in Python. You can browse these tutorials using the Html viewer, or download the notebook and run it yourself using Jupyter. Some of the tutorials additionally depend on TensorFlow, scikit-learn, and matplotlib.

Jupyter Notebook Html Version
Introduction Introduction
Spectrograms Spectrograms
Linear Model Linear Model
Multi-Layer Perceptron Multi-Layer Perceptron