The multitrack pianorolls in LPD are stored in a special format for efficient I/O and to save space. We recommend to load the data with Pypianoroll (The dataset is created using Pypianoroll v0.3.0.). See here to learn how the data is stored and how to load the data properly.
Please cite the following papers if you use Lakh Pianoroll Dataset in a published work.
Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang, “MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment,” in Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018.
Colin Raffel, “Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching,” PhD Thesis, 2016.