Microsoft’s vision describes Azure Stack as an extension to Azure. The mantra being, bring your code for building on Azure and replay it against Azure Stack to get a seamless cloud experience. Thus, boosting operational effectiveness where public cloud is not the right fit. As a provider dedicated to bringing specialist cloud services to the UK public sector, it is important that at UKCloud we understand what this really means to our customers.
We’ve been having some good conversations with many of our customers who practice devOps with Azure to understand how their current practices will work with Azure Stack. The following tools have commonly come up in conversation, and in this post we’ll explore some challenges that need to be solved to really enable a seamless transition between Azure and Azure Stack.
- SALT Stack
In subsequent posts we will do a deep dive into the state of each of the tools and share any workarounds we have found to get things functioning. As well as explore some of the documentation from Microsoft around devOps and how that works with Azure Stack.
In our exploration we’ve tried a few different actions to test the tools.
- Creating a new VMs
- Creating and useing blob storage
- Creating and using Queues
- Building load balancers
- Creating DNS zones
It’s through these actions and the use of these tools we have started to see a number of common threads. Our aim is to catalog these and work directly with Microsoft, and the community, to define a common approach to solving these problems. Below is an overview of the common issues most tools appear to be affected by.
The first issue we ran into with some of the tools was an inability to accept self-signed certificates. In a cloud you develop test against the same production cloud endpoints, so this isn’t an issue. In our labs we have been using the Azure Stack Development Kit (ASDK) which is an Azure Stack lab deployed on a single physical node. The deployment generates self-signed certs during the deployment which wasn’t something that the Azure SDK’s (used by most of the tools) ever catered for. An example of this was resolved in one of the azure python libraries here.
Not a big problem but for those wishing to experiment on Azure Stack without cost, this could be a potential blocker.
The second most common issue we encountered was to do with Authentication. The problem being that Azure has many API endpoints for each resource, for example compute, network storage etc. On top of that there are different cloud platforms such as AzureChinaCloud, AzureGermanCloud or standard AzurePublicCloud. Most of these environments have been hardcoded into an environments file, e.g. https://github.com/Azure/go-autorest/blob/master/autorest/azure/environments.go. Azure Stack makes this model much more complicated as firstly, Azure Stack supports a different set of API endpoints and secondly, the customer chooses the domain suffix meaning that hardcoding a small set of clouds is no longer viable.
This is not a huge issue to work around for the developers of these tools, but it adds complications and work for the various projects and starts to erode trust in the “extension to Azure” mantra.
This is probably the biggest and most complex problem to solve. Azure versions every API using a date format, e.g. 2017-01-01. The version of the API denotes what features that API can support which has a direct impact of the API schema. Many of the Azure SDK’s have hardcoded the API version to inline with the Azure Public cloud. These SDK’s in turn are consumed by the devOps tools project which adds their own provisioning logic and methodologies, resulting in hardcoding specific assumptions against API versions. The problem is now that Azure Stack runs on a very different set of API’s meaning many of the devOps tools fail to run.
We believe there need to be two fixes to resolve this. The first is within the Microsoft SDK’s and this seems to be coming along well, for example some work has been done by the azure-cli team to support versions which is now being moved to the SDK, https://github.com/Azure/azure-cli/issues/2343. I only hope that the approach is the same across the various languages as let’s be honest, when was the last time you only used a single language! This has started in some of the SDK’s, for example the python SDK now supports passing in an API version which maps to a model defining the correct API schema to be used for serialisation.
The second resolution needs to happen within the devOps tools project code to also allow versions to be passed in, these versions need to map to the tools code that was specifically created to work with a specific API version. Again it seems like Microsoft could be helping to galvanise these communities as a common approach would again help context switching for end users.
This section also brings some big challenges to overcome and to some extent is linked to the API version point covered above. There are two challenges here.
We have already discussed the fact that Azure Stack has different feature set compared with Azure and that this is generally denoted by different API versions. The trouble is for users that even if we hardcode a specific API version from the tools to the SDK the tool is using (we have tested this with Ansible), the tool still assumes certain features are available and so will pass values to the SDK it’s using that are not valid for the API version.
A common example here is the lack of managed disks in Azure Stack. Whilst this is a focus for Microsoft and is most definitely on the Azure Stack roadmap, it is causing a number of issues and won’t be the last example we see. In Ansible we can pin the API version and then make a request for a VM, without using managed disks. Unfortunately whilst this is possible, Ansible has hardcoded managed disks as a key for the API and just passes in null if they are not used, unfortunately the SDK sees the managed_disks key and fails to serialise the object for the API request as it doesn’t match the schema defined in the model. The offending line of code is here.
The fix is to remove the managed disks key from Ansible if it is not used, but this is only a sticking plaster. The real fix is to have code in the tool designed to work with a specific version of the API with the correct features that can be called.
The final hurdle is that there are some really big differences in the way select services have been delivered. For example; SQL databases, the Azure database as a service (DBaaS). The way Azure Stack delivers DBaaS is so different from Azure that Microsoft has not wanted to confuse customers by letting them think the services are the same. So whilst there is DBaaS in Azure Stack you cannot use the SQL database API from Azure. This means that the SDK’s now need to support two different DBaaS API’s entirely and the devOps tools need to be updated to allow the user to specify which one they use.
DevOps with Azure Stack right now is going to be challenging. I believe some of our customers will have some success deploying workloads on Azure Stack with their existing tools but need to be prepared to do two things:
- Make on the fly changes to the tooling and work with the community to get these changes pushed upstream
- Pair down the features that they use in Azure to match the capabilities of Azure Stack
We’ll be working with our customers over the coming months to get into the detail as every deployment will bring new challenges.
However, the real fix needs to come from a collaboration of Microsoft and the devOps community. This needs to cover a number of factors.
- Where possible bring Azure Stack in line with Azure
This is the simplest solution for the end user but given that it is the Azure Stack providers responsibility to patch it will often be the case that there will be differing versions in reality.
- Microsoft should define a standard approach for all the SDK’s they provide to handle API versioning
- Microsoft should work with communities to define how to use the standard approach to versioning to best support versioning within the devOps projects themselves
We hope that developers and engineers alike will contribute to these discussions and would suggest a good starting point would be to post on this github issue as these are not Ansible specific problems but common across the board!
Happy Azure Stacking!!