MusPy documentation

Travis Codecov GitHub license GitHub release

MusPy is an open source Python library for symbolic music generation. It provides essential tools for developing a music generation system, including dataset management, data I/O, data preprocessing and model evaluation.


  • Dataset management system for commonly used datasets with interfaces to PyTorch and TensorFlow.

  • Data I/O for common symbolic music formats (e.g., MIDI, MusicXML and ABC) and interfaces to other symbolic music libraries (e.g., music21, mido, pretty_midi and Pypianoroll).

  • Implementations of common music representations for music generation, including the pitch-based, the event-based, the piano-roll and the note-based representations.

  • Model evaluation tools for music generation systems, including audio rendering, score and piano-roll visualizations and objective metrics.

Here is an overview of the library.


Why MusPy

A music generation pipeline usually consists of several steps: data collection, data preprocessing, model creation, model training and model evaluation.


While some components need to be customized for each model, others can be shared across systems. For symbolic music generation in particular, a number of datasets, representations and metrics have been proposed in the literature. As a result, an easy-to-use toolkit that implements standard versions of such routines could save a great deal of time and effort and might lead to increased reproducibility.


To install MusPy, please run pip install muspy. To build MusPy from source, please download the source and run python install.


Documentation is available here and as docstrings with the code.


Please cite the following paper if you use MusPy in a published work:

Hao-Wen Dong, Ke Chen, Julian McAuley, and Taylor Berg-Kirkpatrick, “MusPy: A Toolkit for Symbolic Music Generation,” in Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), 2020.

[homepage] [video] [paper] [slides] [poster] [arXiv] [code] [documentation]


This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset’s license.

If you’re a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the community!