Ansible and Azure Stack deep dive

Following on from a post where we explored the complications with devOps on Azure Stack we are doing some deep dives into the tools we have looked at. Each post will take the same structure; some sample code working against Azure to provision a simple resource and break down of attempting to get it working against Azure Stack. This post explores Ansible.

Ansible is an awesome tool. I use it everyday from configuring my own machine, to configuring servers and applications, to configuring cloud infrastructure. What I used to do in bash I usually use Ansible for now. Ansible is also used by some of our customers interested in using Azure Stack at UKCloud to provision resources in Azure.

Beware this is not for the faint hearted and only a handful of the issues we have raised have been fixed properly upstream.

Getting Ansible working against Azure

This is the simple bit. The docs are great and I’m not going to regurgitate them here. Get setup by following this guide. In our example we’re using active directory auth as it cuts down the number of configuration steps in Azure.

We have created some sample code which creates a simple VM resource.

NOTE: we already know we cannot use managed disks so we avoid this from the offset. Oh and we select a storage type of “Standard_LRS” as replication isn’t supported yet.

- hosts: localhost
    location: uksouth

Show More

For this example we have created the resources separately which helps during troubleshooting. Anyway we run this with:

ansible-playbook -i localhost <playbook.yaml>

This results in some IAC goodness and Ansible completes successfully deploying our VM.

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
localhost                  : ok=9    changed=8    unreachable=0    failed=0

Working on Azure Stack

Happy days….so we just take this code, update our subscription id, endpoints and location, right? Azure Stack is an extension to Azure after all.

To update our endpoints we add the following to /root/.azure/credentials


In our yaml above we change:

location: uksouth


location: local

Or whatever your region is called.

Here goes nothing…

TASK [Create a resource group] ***************************************************************************************************************************************************************************************************************
changed: [localhost]

TASK [Create storage account] ****************************************************************************************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error checking name availability: Azure Error: InvalidResourceType\nMessage: The resource type 'checkNameAvailability' could not be found in the namespace 'Microsoft.Storage' for api version '2017-06-01'. The supported api-versions are '2016-01-01,2015-06-15,2015-05-01-preview'."}
to retry, use: --limit @/root/.azure/test-azure.retry

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
localhost : ok=2 changed=1 unreachable=0 failed=1

Good news, the resource group created successfully so authentication is working. When we first started this work we had two issues, one with self signed certs and one with authentication failures. I’d like to callout the Microsoft python Azure SDK team as they have worked fast to fix these issues. If you’re interested in these problems take a look at the following issues:

Issue 1 – API versions

So if you take a look at the error above, you can see that our first problem is that the API versions that Ansible and the Azure public cloud use are not supported by Azure Stack. But we can get round this. We just updated the hardcoded (yes hardcoded) API versions in Ansible and put in a hack to pull in the models (the bit that does serialisation of the request) to match the API version.

sed -i '716s/2017-06-01/2016-01-01/g' /usr/lib/python2.7/site-packages/ansible/module_utils/
sed -i 's/2017_06_01/2016_01_01/g' /usr/lib/python2.7/site-packages/azure/mgmt/storage/
sed -i '743s/2017-03-30/2016-03-30/g' /usr/lib/python2.7/site-packages/ansible/module_utils/
sed -i 's/See link below for details./See link below for details.\nfrom .v2016_03_30.models import */g' /usr/lib/python2.7/site-packages/azure/mgmt/compute/
sed -i '725s/2017-06-01/2015-06-15/g' /usr/lib/python2.7/site-packages/ansible/module_utils/

Issue 2 – Storage API endpoint

….so close!!

TASK [Create virtual machine] ****************************************************************************************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: msrestazure.azure_cloud.CloudSuffixNotSetException: The suffix 'storage_endpoint' for this cloud is not set but is used.
fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ansible_Mo6zta/\", line 1554, in \n    main()\n  File \"/tmp/ansible_Mo6zta/\", line 1551, in main\n    AzureRMVirtualMachine()\n  File \"/tmp/ansible_Mo6zta/\", line 651, in __init__\n    supports_check_mode=True)\n  File \"/tmp/ansible_Mo6zta/\", line 285, in __init__\n  File \"/tmp/ansible_Mo6zta/\", line 716, in exec_module\n    self._cloud_environment.suffixes.storage_endpoint,\n  File \"/usr/lib/python2.7/site-packages/msrestazure/\", line 107, in __getattribute__\n    \"is not set but is used.\".format(name))\nmsrestazure.azure_cloud.CloudSuffixNotSetException: The suffix 'storage_endpoint' for this cloud is not set but is used.\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 0}

In this case the issue occurs because the storage_endpoint suffix is not set. This is not something that is available in Azure Stack and therefore is never populated. The workaround is simple.

sed -i "s/requested_vhd_uri \= 'https:\/\/{0}.blob.{1}\/{2}\/{3}'.format(self.storage_account_name,/requested_vhd_uri \= '{0}{1}\/{2}'.format(properties.primary_endpoints.blob,/g" /usr/lib/python2.7/site-packages/ansible/modules/cloud/azure/
sed -i '/self._cloud_environment.suffixes.storage_endpoint,/d' /usr/lib/python2.7/site-packages/ansible/modules/cloud/azure/
sed -i 's/self.get_storage_account(self.storage_account_name)/properties = self.get_storage_account(self.storage_account_name)/g' /usr/lib/python2.7/site-packages/ansible/modules/cloud/azure/

It’s worth noting that we have made a pull request to resolve this properly upstream, it just hasn’t been merged yet.

Issue 3 – No managed disk support

“Hang on, you told me we weren’t using managed disks!!”

Well, it’s true we’re not, but the issue here comes back to the API versions again. The model (schema we are using is for a version of the API that doesn’t support managed disks. This is the error.

TASK [Create virtual machine] ****************************************************************************************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: __init__() got an unexpected keyword argument 'managed_disk'
fatal: [localhost]: FAILED! => {"changed": false, "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ansible_EU85eN/\", line 1551, in \n    main()\n  File \"/tmp/ansible_EU85eN/\", line 1548, in main\n    AzureRMVirtualMachine()\n  File \"/tmp/ansible_EU85eN/\", line 651, in __init__\n    supports_check_mode=True)\n  File \"/tmp/ansible_EU85eN/\", line 285, in __init__\n  File \"/tmp/ansible_EU85eN/\", line 867, in exec_module\n    caching=self.os_disk_caching,\nTypeError: __init__() got an unexpected keyword argument 'managed_disk'\n", "module_stdout": "", "msg": "MODULE FAILURE", "rc": 0}

The issue is we have frigged Ansible to use the versions that work with Azure Stack in the SDK, however the module itself assumes that is still using a version of the API that has managed disks and thus passes a key in for managed disks with a null value. The SDK model receives this key and then fails as it doesn’t know about it for that API version. This is where the real severity of not supporting API versions becomes clear. Again we can workaround this with:

sed -i 's/managed_disk=managed_disk,/#managed_disk=managed_disk,/g' /usr/lib/python2.7/site-packages/ansible/modules/cloud/azure/
sed -i 's/managed_disk=data_disk_managed_disk,/#managed_disk=data_disk_managed_disk,/g' /usr/lib/python2.7/site-packages/ansible/modules/cloud/azure/
sed -i 's/self.delete_managed_disks(managed_disk_ids)/#self.delete_managed_disks(managed_disk_ids)/g' /usr/lib/python2.7/site-packages/ansible/modules/cloud/azure/
sed -i -e '1193,1195d' /usr/lib/python2.7/site-packages/ansible/modules/cloud/azure/
sed -i -e 's/elif(vm.storage_profile.os_disk.vhd):/if(vm.storage_profile.os_disk.vhd):/g' /usr/lib/python2.7/site-packages/ansible/modules/cloud/azure/


Well, there we go we have deployed a VM on Azure Stack using Ansible.

TASK [Create virtual machine] ****************************************************************************************************************************************************************************************************************
changed: [localhost]

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
localhost                  : ok=8    changed=1    unreachable=0    failed=0

Being realistic though, this really isn’t a case of being an extension to Azure. I totally get that this isn’t just a Microsoft problem. Azure Stack is really similar to Azure but when you get into the detail it’s different enough to break a whole load of preconceptions made by people only focused on supporting Azure’s public cloud. The projects consuming Azure just haven’t considered Azure Stack and Microsoft have given very little guidance around how to handle it, so there are going to be some real challenges for early adopters to get their code working seamlessly until the community get up to speed.


Whilst we have worked around a number of issues and technically have used Ansible to deploy Azure Stack infrastructure, it’s not a sustainable or production ready solution.

In my view the real fix will come when Microsoft define a strategy to handle API versions both in the SDK’s they provide and give guidance to communities on how best to handle these differences. If that is solved I have no doubt a very seamless experience will be achieved provided you are willing to run at a lower API version across both Azure and Azure Stack.

I hope Microsoft step up to this challenge and would suggest gathering ideas on this thread….wade in if you have some ideas!

Final thought

Downgrading packages could be another approach to solving this. It’s fair to assume that if the project old is enough at some point it worked with the older version of the Azure API that Azure Stack supports. If the project pinned the dependency versions (e.g. the MS SDK that worked with that version) you might get lucky and can just downgrade the lot and it’ll work like magic.

We’ll do some more exploration and let you know in a future post.