submit_to_azure_if_needed

health.azure.submit_to_azure_if_needed(compute_cluster_name='', entry_script=None, aml_workspace=None, workspace_config_file=None, snapshot_root_directory=None, script_params=None, conda_environment_file=None, aml_environment_name='', experiment_name=None, environment_variables=None, pip_extra_index_url='', private_pip_wheel_path=None, docker_base_image='', docker_shm_size='', ignored_folders=None, default_datastore='', input_datasets=None, output_datasets=None, num_nodes=1, wait_for_completion=False, wait_for_completion_show_output=False, max_run_duration='', submit_to_azureml=None, tags=None, after_submission=None, hyperdrive_config=None)[source]

Submit a folder to Azure, if needed and run it. Use the commandline flag –azureml to submit to AzureML, and leave it out to run locally.

Parameters
  • after_submission (Optional[Callable[[Run], None]]) – A function that will be called directly after submitting the job to AzureML. The only argument to this function is the run that was just submitted. Use this to, for example, add additional tags or print information about the run.

  • tags (Optional[Dict[str, str]]) – A dictionary of string key/value pairs, that will be added as metadata to the run. If set to None, a default metadata field will be added that only contains the commandline arguments that started the run.

  • aml_environment_name (str) – The name of an AzureML environment that should be used to submit the script. If not provided, an environment will be created from the arguments to this function.

  • max_run_duration (str) – The maximum runtime that is allowed for this job in AzureML. This is given as a floating point number with a string suffix s, m, h, d for seconds, minutes, hours, day. Examples: ‘3.5h’, ‘2d’

  • experiment_name (Optional[str]) – The name of the AzureML experiment in which the run should be submitted. If omitted, this is created based on the name of the current script.

  • entry_script (Union[Path, str, None]) – The script that should be run in AzureML

  • compute_cluster_name (str) – The name of the AzureML cluster that should run the job. This can be a cluster with CPU or GPU machines.

  • conda_environment_file (Union[Path, str, None]) – The conda configuration file that describes which packages are necessary for your script to run.

  • aml_workspace (Optional[Workspace]) – There are two optional parameters used to glean an existing AzureML Workspace. The simplest is to pass it in as a parameter.

  • workspace_config_file (Union[Path, str, None]) – The 2nd option is to specify the path to the config.json file downloaded from the Azure portal from which we can retrieve the existing Workspace.

  • snapshot_root_directory (Union[Path, str, None]) – The directory that contains all code that should be packaged and sent to AzureML. All Python code that the script uses must be copied over.

  • ignored_folders (Optional[List[Union[Path, str]]]) – A list of folders to exclude from the snapshot when copying it to AzureML.

  • script_params (Optional[List[str]]) – A list of parameter to pass on to the script as it runs in AzureML. If empty (or None, the default) these will be copied over from sys.argv, omitting the –azureml flag.

  • environment_variables (Optional[Dict[str, str]]) – The environment variables that should be set when running in AzureML.

  • docker_base_image (str) – The Docker base image that should be used when creating a new Docker image.

  • docker_shm_size (str) – The Docker shared memory size that should be used when creating a new Docker image.

  • pip_extra_index_url (str) – If provided, use this PIP package index to find additional packages when building the Docker image.

  • private_pip_wheel_path (Union[Path, str, None]) – If provided, add this wheel as a private package to the AzureML workspace.

  • conda_environment_file – The file that contains the Conda environment definition.

  • default_datastore (str) – The data store in your AzureML workspace, that points to your training data in blob storage. This is described in more detail in the README.

  • input_datasets (Optional[List[Union[str, DatasetConfig]]]) – The script will consume all data in folder in blob storage as the input. The folder must exist in blob storage, in the location that you gave when creating the datastore. Once the script has run, it will also register the data in this folder as an AzureML dataset.

  • output_datasets (Optional[List[Union[str, DatasetConfig]]]) – The script will create a temporary folder when running in AzureML, and while the job writes data to that folder, upload it to blob storage, in the data store.

  • num_nodes (int) – The number of nodes to use in distributed training on AzureML.

  • wait_for_completion (bool) – If False (the default) return after the run is submitted to AzureML, otherwise wait for the completion of this run (if True).

  • wait_for_completion_show_output (bool) – If wait_for_completion is True this parameter indicates whether to show the run output on sys.stdout.

  • submit_to_azureml (Optional[bool]) – If True, the codepath to create an AzureML run will be executed. If False, the codepath for local execution (i.e., return immediately) will be executed. If not provided (None), submission to AzureML will be triggered if the commandline flag ‘–azureml’ is present in sys.argv

  • hyperdrive_config (Optional[HyperDriveConfig]) – A configuration object for Hyperdrive (hyperparameter search).

Return type

AzureRunInfo

Returns

If the script is submitted to AzureML then we terminate python as the script should be executed in AzureML, otherwise we return a AzureRunInfo object.