DatasetConfig

class health_azure.DatasetConfig(name, datastore='', overwrite_existing=True, version=None, use_mounting=None, target_folder=None, local_folder=None)[source]

Bases: object

Contains information to use AzureML datasets as inputs or outputs.

Parameters
  • name (str) – The name of the dataset, as it was registered in the AzureML workspace. For output datasets, this will be the name given to the newly created dataset.

  • datastore (str) – The name of the AzureML datastore that holds the dataset. This can be empty if the AzureML workspace has only a single datastore, or if the default datastore should be used.

  • overwrite_existing (bool) – Only applies to uploading datasets. If True, the dataset will be overwritten if it already exists. If False, the dataset creation will fail if the dataset already exists.

  • version (Optional[int]) – The version of the dataset that should be used. This is only used for input datasets. If the version is not specified, the latest version will be used.

  • use_mounting (Optional[bool]) – If True, the dataset will be “mounted”, that is, individual files will be read or written on-demand over the network. If False, the dataset will be fully downloaded before the job starts, respectively fully uploaded at job end for output datasets. Defaults: False (downloading) for datasets that are script inputs, True (mounting) for datasets that are script outputs.

  • target_folder (Union[Path, str, None]) – The folder into which the dataset should be downloaded or mounted. If left empty, a random folder on /tmp will be chosen. Do NOT use “.” as the target_folder.

  • local_folder (Union[Path, str, None]) – The folder on the local machine at which the dataset is available. This is used only for runs outside of AzureML. If this is empty then the target_folder will be used to mount or download the dataset.

Methods Summary

to_input_dataset(dataset_index, workspace, …)

Creates a configuration for using an AzureML dataset inside of an AzureML run.

to_input_dataset_local(workspace)

Return a local path to the dataset when outside of an AzureML run.

to_output_dataset(workspace, dataset_index)

Creates a configuration to write a script output to an AzureML dataset.

Methods Documentation

to_input_dataset(dataset_index, workspace, strictly_aml_v1, ml_client=None)[source]

Creates a configuration for using an AzureML dataset inside of an AzureML run. This will make the AzureML dataset with given name available as a named input, using INPUT_0 as the key for dataset index 0.

Parameters
  • workspace (Workspace) – The AzureML workspace to read from.

  • dataset_index (int) – Suffix for using datasets as named inputs, the dataset will be marked INPUT_{index}

  • strictly_aml_v1 (bool) – If True, use Azure ML SDK v1. Otherwise, attempt to use Azure ML SDK v2.

  • ml_client (Optional[MLClient]) – An Azure MLClient object for interacting with Azure resources.

Return type

Optional[DatasetConsumptionConfig]

to_input_dataset_local(workspace)[source]

Return a local path to the dataset when outside of an AzureML run. If local_folder is supplied, then this is assumed to be a local dataset, and this is returned. Otherwise the dataset is mounted or downloaded to either the target folder or a temporary folder and that is returned. If self.name refers to a v2 dataset, it is not possible to mount the data here, therefore a tuple of Nones will be returned.

Parameters

workspace (Workspace) – The AzureML workspace to read from.

Return type

Tuple[Path, Optional[MountContext]]

Returns

Tuple of (path to dataset, optional mountcontext)

to_output_dataset(workspace, dataset_index)[source]

Creates a configuration to write a script output to an AzureML dataset. The name and datastore of this new dataset will be taken from the present object.

Parameters
  • workspace (Workspace) – The AzureML workspace to read from.

  • dataset_index (int) – Suffix for using datasets as named inputs, the dataset will be marked OUTPUT_{index}

Return type

OutputFileDatasetConfig

Returns

An AzureML OutputFileDatasetConfig object, representing the output dataset.

to_input_dataset(dataset_index, workspace, strictly_aml_v1, ml_client=None)[source]

Creates a configuration for using an AzureML dataset inside of an AzureML run. This will make the AzureML dataset with given name available as a named input, using INPUT_0 as the key for dataset index 0.

Parameters
  • workspace (Workspace) – The AzureML workspace to read from.

  • dataset_index (int) – Suffix for using datasets as named inputs, the dataset will be marked INPUT_{index}

  • strictly_aml_v1 (bool) – If True, use Azure ML SDK v1. Otherwise, attempt to use Azure ML SDK v2.

  • ml_client (Optional[MLClient]) – An Azure MLClient object for interacting with Azure resources.

Return type

Optional[DatasetConsumptionConfig]

to_input_dataset_local(workspace)[source]

Return a local path to the dataset when outside of an AzureML run. If local_folder is supplied, then this is assumed to be a local dataset, and this is returned. Otherwise the dataset is mounted or downloaded to either the target folder or a temporary folder and that is returned. If self.name refers to a v2 dataset, it is not possible to mount the data here, therefore a tuple of Nones will be returned.

Parameters

workspace (Workspace) – The AzureML workspace to read from.

Return type

Tuple[Path, Optional[MountContext]]

Returns

Tuple of (path to dataset, optional mountcontext)

to_output_dataset(workspace, dataset_index)[source]

Creates a configuration to write a script output to an AzureML dataset. The name and datastore of this new dataset will be taken from the present object.

Parameters
  • workspace (Workspace) – The AzureML workspace to read from.

  • dataset_index (int) – Suffix for using datasets as named inputs, the dataset will be marked OUTPUT_{index}

Return type

OutputFileDatasetConfig

Returns

An AzureML OutputFileDatasetConfig object, representing the output dataset.