DatasetConfig

class health_azure.DatasetConfig(name, datastore='', version=None, use_mounting=None, target_folder='', local_folder=None)[source]

Bases: object

Contains information to use AzureML datasets as inputs or outputs.

Parameters
  • name (str) – The name of the dataset, as it was registered in the AzureML workspace. For output datasets, this will be the name given to the newly created dataset.

  • datastore (str) – The name of the AzureML datastore that holds the dataset. This can be empty if the AzureML workspace has only a single datastore, or if the default datastore should be used.

  • version (Optional[int]) – The version of the dataset that should be used. This is only used for input datasets. If the version is not specified, the latest version will be used.

  • use_mounting (Optional[bool]) – If True, the dataset will be “mounted”, that is, individual files will be read or written on-demand over the network. If False, the dataset will be fully downloaded before the job starts, respectively fully uploaded at job end for output datasets. Defaults: False (downloading) for datasets that are script inputs, True (mounting) for datasets that are script outputs.

  • target_folder (str) – The folder into which the dataset should be downloaded or mounted. If left empty, a random folder on /tmp will be chosen.

  • local_folder (Optional[Path]) – The folder on the local machine at which the dataset is available. This is used only for runs outside of AzureML.

Methods Summary

to_input_dataset(workspace, dataset_index)

Creates a configuration for using an AzureML dataset inside of an AzureML run.

to_output_dataset(workspace, dataset_index)

Creates a configuration to write a script output to an AzureML dataset.

Methods Documentation

to_input_dataset(workspace, dataset_index)[source]

Creates a configuration for using an AzureML dataset inside of an AzureML run. This will make the AzureML dataset with given name available as a named input, using INPUT_0 as the key for dataset index 0.

Parameters
  • workspace (Workspace) – The AzureML workspace to read from.

  • dataset_index (int) – Suffix for using datasets as named inputs, the dataset will be marked INPUT_{index}

Return type

DatasetConsumptionConfig

to_output_dataset(workspace, dataset_index)[source]

Creates a configuration to write a script output to an AzureML dataset. The name and datastore of this new dataset will be taken from the present object.

Parameters
  • workspace (Workspace) – The AzureML workspace to read from.

  • dataset_index (int) – Suffix for using datasets as named inputs, the dataset will be marked OUTPUT_{index}

Return type

OutputFileDatasetConfig

Returns