Base Dataset Classes

Here are the two base classes for MusPy datasets.

class muspy.Dataset[source]

Base class for MusPy datasets.

To build a custom dataset, it should inherit this class and overide the methods __getitem__ and __len__ as well as the class attribute _info. __getitem__ should return the i-th data sample as a muspy.Music object. __len__ should return the size of the dataset. _info should be a muspy.DatasetInfo instance storing the dataset information.

classmethod info()[source]: Return the dataset infomation.

classmethod citation()[source]: Print the citation infomation.

save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)[source]

Save all the music objects to a directory.

Parameters

root (str or Path) – Root directory to save the data.
kind ({'json', 'yaml'}, default: 'json') – File format to save the data.
n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.
ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.
verbose (bool, default: True) – Whether to be verbose.
**kwargs – Keyword arguments to pass to muspy.save().

split(filename=None, splits=None, random_state=None)[source]

Return the dataset as a PyTorch dataset.

Parameters

filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)[source]

Return the dataset as a PyTorch dataset.

Parameters

factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See muspy.to_representation() for available representation.
split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns

Converted PyTorch dataset(s).

Return type

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)[source]

Return the dataset as a TensorFlow dataset.

Parameters

factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.
representation (str, optional) – Target representation. See muspy.to_representation() for available representation.
split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.
splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.
random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns

class:tensorflow.data.Dataset` or Dict of
class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

class muspy.RemoteDataset(root, download_and_extract=False, overwrite=False, cleanup=False, verbose=True)[source]

Base class for remote MusPy datasets.

This class extends muspy.Dataset to support remote datasets. To build a custom remote dataset, please refer to the documentation of muspy.Dataset for details. In addition, set the class attribute _sources to the URLs to the source files (see Notes).

root

Root directory of the dataset.

Type: str or Path

Parameters

download_and_extract (bool, default: False) – Whether to download and extract the dataset.
overwrite (bool, default: False) – Whether to overwrite existing file(s).
cleanup (bool, default: False) – Whether to remove the source archive(s).
verbose (bool, default: True) – Whether to be verbose.

Raises

RuntimeError: – If download_and_extract is False but file {root}/.muspy.success does not exist (see below).

Important

muspy.Dataset.exists() depends solely on a special file named .muspy.success in directory {root}/_converted/. This file serves as an indicator for the existence and integrity of the dataset. It will automatically be created if the dataset is successfully downloaded and extracted by muspy.Dataset.download_and_extract(). If the dataset is downloaded manually, make sure to create the .muspy.success file in directory {root}/_converted/ to prevent errors.

Notes

The class attribute _sources is a dictionary storing the following information of each source file.

filename (str): Name to save the file.
url (str): URL to the file.
archive (bool): Whether the file is an archive.
md5 (str, optional): Expected MD5 checksum of the file.
sha256 (str, optional): Expected SHA256 checksum of the file.

Here is an example.:

_sources = {
    "example": {
        "filename": "example.tar.gz",
        "url": "https://www.example.com/example.tar.gz",
        "archive": True,
        "md5": None,
        "sha256": None,
    }
}