Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang

Music and AI Lab,
Research Center for IT Innovation,
Academia Sinica


Lakh Pianoroll Dataset

We use the cleansed version of Lakh Pianoroll Dataset (LPD). LPD contains 174,154 unique multitrack pianorolls derived from the MIDI files in the Lakh MIDI Dataset (LMD), while the cleansed version contains 21,425 pianorolls that are in 4/4 time and have been matched to distinct entries in Million Song Dataset (MSD).

Training Data

Hence, the size of the target output tensor is 4 (bar) × 96 (time step) × 84 (pitch) × 5 (track).

The following are two sample pianorolls seen in our training data. The tracks are (from top to bottom): Bass, Drums, Guitar, Strings, Piano.



  1. Joan Serrá, Meinard Müller, Peter Grosche and Josep Ll. Arcos, “Unsupervised Detection of Music Boundaries by Time Series Structure Features,” in AAAI Conference on Artificial Intelligence (AAAI), 2012