Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang

Music and AI Lab,
Research Center for IT Innovation,
Academia Sinica

The Lakh Pianoroll Dataset (LPD) is a collection of 174,154 multitrack pianorolls derived from the Lakh MIDI Dataset (LMD).

Getting the dataset

We provide multiple subsets and versions of the dataset (see here). The dataset is available here.

Using LPD

The multitrack pianorolls in LPD are stored in a special format for efficient I/O and to save space. We recommend to load the data with Pypianoroll (The dataset is created using Pypianoroll v0.3.0.). See here to learn how the data is stored and how to load the data properly.


Lakh Pianoroll Dataset is a derivative of Lakh MIDI Dataset by Colin Raffel, used under CC BY 4.0. Lakh Pianoroll Dataset is licensed under CC BY 4.0 by Hao-Wen Dong and Wen-Yi Hsiao.

Please cite the following papers if you use Lakh Pianoroll Dataset in a published work.