This page has links to the MusicNet dataset and Python tutorials showing how to load and use the dataset.
Direct download links to the MusicNet dataset are available below. MusicNet is available in three formats: raw, native Python, and HDF5. We recommend the raw version, which is relatively self-documenting and also has improved metrical annotations if you intend to use this aspect of the labels. We also provide metadata for recordings in MusicNet. This metadata is distributed in csv format; the id column of the metadata file is cross-indexed with MusicNet ids in the data files.
(Raw - recommended) The raw data is available in standard wav audio format, with corresponding label files in csv format. These data and label filenames are MusicNet ids, which you can use to cross-index the data, labels, and metadata files. For convenience, we provide a PyTorch interface for accessing this data.
(Python) The Python version of the dataset is distributed as a NumPy npz file. This is a binary format specific to Python (WARNING: if you attempt to read this data in Python 3, you need to set encoding='latin1' when you call np.load or your process will hang without any informative error messages). This format has three dependencies:
(HDF5) The HDF5 version of MusicNet requires an HDF5 parser for your language of choice. The data is organized into 330 groups, one for each song, under headings "/id_<MusicNet ID>". Each group contains a "data" dataset (a CArray containing the audio signal) and a "labels" dataset (a Table of labels).
Here are some tutorials for getting started with MusicNet in Python. You can browse these tutorials using the Html viewer, or download the notebook and run it yourself using Jupyter. Some of the tutorials additionally depend on TensorFlow, scikit-learn, and matplotlib.