Setting up Azure
If you already have an AzureML workspace available, you can go straight to the last step.
To set up all your Azure resources, you need to have:
A subscription to Azure.
An account that has “Owner” permissions to the Azure subscription.
There are two ways to set up all necessary resources, either via the Azure portal or via the Azure Command-line Interface (CLI). We recommend the CLI because all necessary resources can be easily created via a single script.
Creating an AzureML workspace via the Azure Portal
If you prefer to create your workspace via the web UI on the Azure Portal, please follow the steps below.
Creating an AzureML workspace via the Azure Command-line Tools
A pureley command-line driven setup is possible via the Azure Command-line Tools. These tools are available for multiple platforms, including Linux, Mac, and Windows.
After downloading the command-line tools, you can run the following command add the ml
extension that is required to create an AzureML workspace:
az extension add --name ml
Documentation
Collecting the necessary information
Find out which Azure data centre locations you can use:
az account list-locations -o table
You will need the location names (second column in the table) to create resources in the right geographical regions. Choosing the right region can be particularly important if your data governance requires the data to be processed inside certain geographical boundaries.
For the storage account, please choose an SKU from based on your needs as described here. Most likely, Standard_LRS
will be the right SKU for you.
Creating resources
The script below will create
an AzureML workspace.
a storage account to hold the datasets, with a container called
datasets
.a datastore that links the AzureML workspace to the storage account.
a resource group for all of the above items.
In the script, you will need to replace the values of the following variables:
location
- the location of the Azure datacenter you want to use.prefix
- the prefix you want to use for the resources you create. This will also be the name of the AzureML workspace.
export location=uksouth # The Azure location where the resources should be created
export prefix=himl # The name of the AzureML workspace. This is also the prefix for all other resources.
export container=datasets
export datastorefile=datastore.yaml
az group create \
--name ${prefix}rg \
--location ${location}
az storage account create \
--name ${prefix}data \
--resource-group ${prefix}rg \
--location ${location} \
--sku Standard_LRS
az storage container create \
--account-name ${prefix}data \
--name ${container} \
--auth-mode
az ml workspace create \
--resource-group ${prefix}rg \
--name ${prefix} \
--location ${location}
key=$(az storage account keys list --resource-group ${prefix}rg --account-name ${prefix}data --query [0].value -o tsv)
cat >${datastorefile} <<EOL
\$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: datasets
type: azure_blob
description: Pointing to the `${container}` container in the ${prefix}data storage account.
account_name: ${prefix}data
container_name: ${container}
credentials:
account_key: ${key}
EOL
az ml datastore create --file ${datastorefile} --resource-group ${prefix}rg --workspace-name ${prefix}
rm ${datastorefile}
Note that the datastore will use the storage account key to authenticate. If you want to use Shared Access Signature (SAS) instead, replace the creation of the datastore config file in the above script with the following command:
key=$(az storage container generate-sas --account-name ${prefix}data --name ${container} --permissions acdlrw --https-only --expiry 2024-01-01 -o tsv)
cat >${datastorefile} <<EOL
\$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
name: ${name}
type: azure_blob
description: Pointing to the `${container}` container in the ${prefix}data storage account.
account_name: ${prefix}data
container_name: ${container}
credentials:
sas_token: ${key}
EOL
You can adjust the expiry date of the SAS token and the permissions of the SAS token (full read/write permission
in the script above). For further options, run az storage container generate-sas --help
Creating compute clusters and permissions
Now that you have created the core AzureML workspace, you need to create a compute cluster.
To adjust permissions, find the AzureML workspace that you just created in the Azure Portal. Add yourself and your team members with “Contributor” permissions to the workspace, following the guidelines here.
Accessing the workspace
The hi-ml
toolbox relies on a workspace configuration file called config.json
to access the right AzureML workspace.
This file can be downloaded from the UI of the workspace. It needs to be placed either in your copy of the hi-ml
repository,
or in your repository that uses the hi-ml
package.
In the browser, navigate to the AzureML workspace that you want to use for running your training job.
In the top right section, there will be a dropdown menu showing the name of your AzureML workspace. Expand that.
In the panel, there is a link “Download config file”. Click that.
This will download a file
config.json
. Copy that file to the root folder of your repository.
The file config.json
should look like this:
{
"subscription_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"resource_group": "myresourcegroup",
"workspace_name": "myworkspace"
}
As an alternative to keeping the config.json
file in your repository, you can specify the
necessary information in environment variables. The environment variables are:
HIML_SUBSCRIPTION_ID
: The subscription ID of the AzureML workspace, taken from thesubscription_id
field in theconfig.json
file.HIML_RESOURCE_GROUP
: The resource group of the AzureML workspace, taken from theresource_group
field in theconfig.json
file.HIML_WORKSPACE_NAME
: The name of the AzureML workspace, taken from theworkspace_name
field in theconfig.json
file.
When accessing the workspace, the hi-ml
toolbox will first look for the config.json
file. If it is not found, it
will fall back to the environment variables. For details, see the documentation of the get_workspace
function in
readthedocs.