Remote Dataset Classes

Here are the classes for remote datasets.

class muspy.RemoteFolderDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]

Base class for remote datasets storing files in a folder.

root

Root directory of the dataset.

Type

str or Path

Parameters
  • download_and_extract (bool, default: False) – Whether to download and extract the dataset.

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns.

  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False.

See also

muspy.FolderDataset

Class for datasets storing files in a folder.

muspy.RemoteDataset

Base class for remote MusPy datasets.

read(filename)[source]

Read a file into a Music object.

classmethod citation()

Print the citation infomation.

convert(kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Convert and save the Music objects.

The converted files will be named by its index and saved to root/_converted. The original filenames can be found in the filenames attribute. For example, the file at filenames[i] will be converted and saved to {i}.json.

Parameters
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

Returns

Return type

Object itself.

property converted_dir

Path to the root directory of the converted dataset.

converted_exists()

Return True if the saved dataset exists, otherwise False.

download(overwrite=False, verbose=True)

Download the dataset source(s).

Parameters
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

download_and_extract(overwrite=False, cleanup=False, verbose=True)

Download source datasets and extract the downloaded archives.

Parameters
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

exists()

Return True if the dataset exists, otherwise False.

extract(cleanup=False, verbose=True)

Extract the downloaded archive(s).

Parameters
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

get_converted_filenames()

Return a list of converted filenames.

get_raw_filenames()

Return a list of raw filenames.

classmethod info()

Return the dataset infomation.

load(filename)

Load a file into a Music object.

on_the_fly()

Enable on-the-fly mode and convert the data on the fly.

Returns

Return type

Object itself.

save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Save all the music objects to a directory.

Parameters
  • root (str or Path) – Root directory to save the data.

  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

source_exists()

Return True if all the sources exist, otherwise False.

split(filename=None, splits=None, random_state=None)

Return the dataset as a PyTorch dataset.

Parameters
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a PyTorch dataset.

Parameters
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns

Converted PyTorch dataset(s).

Return type

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a TensorFlow dataset.

Parameters
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns

  • class:tensorflow.data.Dataset` or Dict of

  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

use_converted()

Disable on-the-fly mode and use converted data.

Returns

Return type

Object itself.

class muspy.RemoteMusicDataset(root, download_and_extract=False, overwrite=False, cleanup=False, kind=None, verbose=True)[source]

Base class for remote datasets of MusPy JSON/YAML files.

Parameters
  • root (str or Path) – Root directory of the dataset.

  • download_and_extract (bool, default: False) – Whether to download and extract the dataset.

  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files.

  • verbose (bool. default: True) – Whether to be verbose.

root

Root directory of the dataset.

Type

Path

filenames

Path to the files, relative to root.

Type

list of Path

See also

muspy.MusicDataset

Class for datasets of MusPy JSON/YAML files.

muspy.RemoteDataset

Base class for remote MusPy datasets.

classmethod citation()

Print the citation infomation.

download(overwrite=False, verbose=True)

Download the dataset source(s).

Parameters
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

download_and_extract(overwrite=False, cleanup=False, verbose=True)

Download source datasets and extract the downloaded archives.

Parameters
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

exists()

Return True if the dataset exists, otherwise False.

extract(cleanup=False, verbose=True)

Extract the downloaded archive(s).

Parameters
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

classmethod info()

Return the dataset infomation.

save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Save all the music objects to a directory.

Parameters
  • root (str or Path) – Root directory to save the data.

  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

source_exists()

Return True if all the sources exist, otherwise False.

split(filename=None, splits=None, random_state=None)

Return the dataset as a PyTorch dataset.

Parameters
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a PyTorch dataset.

Parameters
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns

Converted PyTorch dataset(s).

Return type

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a TensorFlow dataset.

Parameters
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns

  • class:tensorflow.data.Dataset` or Dict of

  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

class muspy.RemoteABCFolderDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]

Base class for remote datasets storing ABC files in a folder.

See also

muspy.ABCFolderDataset

Class for datasets storing ABC files in a folder.

muspy.RemoteDataset

Base class for remote MusPy datasets.

classmethod citation()

Print the citation infomation.

convert(kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Convert and save the Music objects.

The converted files will be named by its index and saved to root/_converted. The original filenames can be found in the filenames attribute. For example, the file at filenames[i] will be converted and saved to {i}.json.

Parameters
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

Returns

Return type

Object itself.

property converted_dir

Path to the root directory of the converted dataset.

converted_exists()

Return True if the saved dataset exists, otherwise False.

download(overwrite=False, verbose=True)

Download the dataset source(s).

Parameters
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

download_and_extract(overwrite=False, cleanup=False, verbose=True)

Download source datasets and extract the downloaded archives.

Parameters
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

exists()

Return True if the dataset exists, otherwise False.

extract(cleanup=False, verbose=True)

Extract the downloaded archive(s).

Parameters
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.

  • verbose (bool, default: True) – Whether to be verbose.

Returns

Return type

Object itself.

get_converted_filenames()

Return a list of converted filenames.

get_raw_filenames()

Return a list of raw filenames.

classmethod info()

Return the dataset infomation.

load(filename)

Load a file into a Music object.

on_the_fly()

Enable on-the-fly mode and convert the data on the fly.

Returns

Return type

Object itself.

read(filename)

Read a file into a Music object.

save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Save all the music objects to a directory.

Parameters
  • root (str or Path) – Root directory to save the data.

  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

source_exists()

Return True if all the sources exist, otherwise False.

split(filename=None, splits=None, random_state=None)

Return the dataset as a PyTorch dataset.

Parameters
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a PyTorch dataset.

Parameters
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns

Converted PyTorch dataset(s).

Return type

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a TensorFlow dataset.

Parameters
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns

  • class:tensorflow.data.Dataset` or Dict of

  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

use_converted()

Disable on-the-fly mode and use converted data.

Returns

Return type

Object itself.