submit_to_azure_if_needed

health_azure.submit_to_azure_if_needed(compute_cluster_name='', entry_script=None, aml_workspace=None, workspace_config_file=None, ml_client=None, snapshot_root_directory=None, script_params=None, conda_environment_file=None, aml_environment_name='', experiment_name=None, environment_variables=None, pip_extra_index_url='', private_pip_wheel_path=None, docker_base_image='mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.3-cudnn8-ubuntu20.04:20230509.v1', docker_shm_size='100g', ignored_folders=None, default_datastore='', input_datasets=None, output_datasets=None, num_nodes=1, wait_for_completion=False, wait_for_completion_show_output=False, max_run_duration='', submit_to_azureml=None, tags=None, after_submission=None, hyperdrive_config=None, hyperparam_args=None, strictly_aml_v1=False, identity_based_auth=False, pytorch_processes_per_node_v2=None, use_mpi_run_for_single_node_jobs=True, display_name=None)[source]

Submit a folder to Azure, if needed and run it. Use the commandline flag –azureml to submit to AzureML, and leave it out to run locally.

Parameters
  • after_submission (Union[Callable[[Run], None], Callable[[Job, MLClient], None], None]) – A function that will be called directly after submitting the job to AzureML. Use this to, for example, add additional tags or print information about the run. When using AzureML SDK V1, the only argument to this function is the Run object that was just submitted. When using AzureML SDK V2, the arguments are (Job, MLClient).

  • tags (Optional[Dict[str, str]]) – A dictionary of string key/value pairs, that will be added as metadata to the run. If set to None, a default metadata field will be added that only contains the commandline arguments that started the run.

  • aml_environment_name (str) – The name of an AzureML environment that should be used to submit the script. If not provided, an environment will be created from the arguments to this function.

  • max_run_duration (str) – The maximum runtime that is allowed for this job in AzureML. This is given as a floating point number with a string suffix s, m, h, d for seconds, minutes, hours, day. Examples: ‘3.5h’, ‘2d’

  • experiment_name (Optional[str]) – The name of the AzureML experiment in which the run should be submitted. If omitted, this is created based on the name of the current script.

  • entry_script (Union[Path, str, None]) – The script that should be run in AzureML

  • compute_cluster_name (str) – The name of the AzureML cluster that should run the job. This can be a cluster with CPU or GPU machines.

  • conda_environment_file (Union[Path, str, None]) – The conda configuration file that describes which packages are necessary for your script to run.

  • aml_workspace (Optional[Workspace]) – There are two optional parameters used to glean an existing AzureML Workspace. The simplest is to pass it in as a parameter.

  • workspace_config_file (Union[Path, str, None]) – The 2nd option is to specify the path to the config.json file downloaded from the Azure portal from which we can retrieve the existing Workspace.

  • ml_client (Optional[MLClient]) – An Azure MLClient object for interacting with Azure resources.

  • snapshot_root_directory (Union[Path, str, None]) – The directory that contains all code that should be packaged and sent to AzureML. All Python code that the script uses must be copied over.

  • ignored_folders (Optional[List[Union[Path, str]]]) – A list of folders to exclude from the snapshot when copying it to AzureML.

  • script_params (Optional[List[str]]) – A list of parameters to pass on to the script as it runs in AzureML. If None (the default), these will be copied over from sys.argv (excluding the –azureml flag, if found).

  • environment_variables (Optional[Dict[str, str]]) – The environment variables that should be set when running in AzureML.

  • docker_base_image (str) – The Docker base image that should be used when creating a new Docker image. The list of available images can be found here: https://github.com/Azure/AzureML-Containers The default image is mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.2-cudnn8-ubuntu18.04

  • docker_shm_size (str) – The Docker shared memory size that should be used when creating a new Docker image. Default value is ‘100g’.

  • pip_extra_index_url (str) – If provided, use this PIP package index to find additional packages when building the Docker image.

  • private_pip_wheel_path (Union[Path, str, None]) – If provided, add this wheel as a private package to the AzureML workspace.

  • default_datastore (str) – The data store in your AzureML workspace, that points to your training data in blob storage. This is described in more detail in the README.

  • input_datasets (Optional[List[Union[str, DatasetConfig]]]) – The script will consume all data in folder in blob storage as the input. The folder must exist in blob storage, in the location that you gave when creating the datastore. Once the script has run, it will also register the data in this folder as an AzureML dataset.

  • output_datasets (Optional[List[Union[str, DatasetConfig]]]) – The script will create a temporary folder when running in AzureML, and while the job writes data to that folder, upload it to blob storage, in the data store.

  • num_nodes (int) – The number of nodes to use in distributed training on AzureML. When using a value > 1, multiple nodes in AzureML will be started. If pytorch_processes_per_node_v2=None, the job will be submitted as a multi-node MPI job, with 1 process per node. This is suitable for PyTorch Lightning jobs. If pytorch_processes_per_node_v2 is not None, a job with framework “PyTorch” and communication backend “nccl” will be started. pytorch_processes_per_node_v2 will guide the number of processes per node. This is suitable for plain PyTorch training jobs without the use of frameworks like PyTorch Lightning.

  • wait_for_completion (bool) – If False (the default) return after the run is submitted to AzureML, otherwise wait for the completion of this run (if True).

  • wait_for_completion_show_output (bool) – If wait_for_completion is True this parameter indicates whether to show the run output on sys.stdout.

  • submit_to_azureml (Optional[bool]) – If True, the codepath to create an AzureML run will be executed. If False, the codepath for local execution (i.e., return immediately) will be executed. If not provided (None), submission to AzureML will be triggered if the commandline flag ‘–azureml’ is present in sys.argv

  • hyperdrive_config (Optional[HyperDriveConfig]) – A configuration object for Hyperdrive (hyperparameter search).

  • strictly_aml_v1 (bool) – If True, use Azure ML SDK v1. Otherwise, attempt to use Azure ML SDK v2.

  • pytorch_processes_per_node_v2 (Optional[int]) – For plain PyTorch multi-GPU processing: The number of processes per node. This is only supported with AML SDK v2, and ignored in v1. If supplied, the job will be submitted as using the “pytorch” framework (rather than “Python”), and using “nccl” as the communication backend.

  • use_mpi_run_for_single_node_jobs (bool) – If True, even single node jobs with SDK v2 will be run as distributed MPI jobs. This is required for Kubernetes compute. If False, single node jobs will not be run as distributed jobs. This setting only affects jobs submitted with SDK v2 (when strictly_aml_v1=False)

  • display_name (Optional[str]) – The name for the run that will be displayed in the AML UI. If not provided, a random display name will be generated by AzureML.

Return type

AzureRunInfo

Returns

If the script is submitted to AzureML then we terminate python as the script should be executed in AzureML, otherwise we return a AzureRunInfo object.