Deploy azure policy with azure devops
During my previous blog post I wrote about how to create an azure policy to tag vm’s behind a load balancer. In this post the end result was a policy which would automatically tag vm’s which where behind a load balancer. In this post we are going to deploy this policy automatically to Azure and remove all manual labor. We are also going to add unit tests and integration tests to this pipeline to make sure that we can create a proper ci/cd flow. Just like always all the files used in this post will be available on GitHub.
Designing the Pipeline
First we need to look at what different steps need to be taken and in what order. After that we can translate these steps to code. For the purpose of this blogpost we might make some steps more complex then they need to be to illustrate possibilities. When you are implementing this yourself you could simplify these steps.
To deploy the azure policy we want to take these actions (click the name of the main stages to go to that part of the blog):
-
Download all required repositories and dependencies
Put all files in one artifact which can be used in the pipeline
-
Test the functionality of the scripts used
Test the code quality of the scripts used
Publish the test results
-
Deploy the managed identity
Assign permissions to the managed identity
Create the custom role for the policy system identity
Deploy the Policy Definition
Deploy the Policy Assignment
Assign permissions to the policy system identity
-
Create a VM with a Load balancer in Azure
See if the Tag is applied to the VM
Clean up the resources
As you see in the overview we have 4 distinct phases which are reliant on each other but don’t share much information except the initial build we did. So we are going to use a azure devops multistage pipeline to create this workflow.
Azure DevOps pipelines are written in language YAML. So we will need to create the pipeline in this language too. There is still a possibility in azure devops to use release pipelines which can be configured using a classic editor. But Microsoft is moving away from this way of building pipelines and it is advised that if you are creating new pipelines you do this with the yaml version.
Most multistage pipelines will start with a build stage. When creating a pipeline for a production environment it’s important to keep in mind that deployment might need to go through some approval. These approvals can be build into the pipeline (I will write about this later). But due to this it’s important to keep in mind that in a environment with a short iteration time it’s very well possible that code will change between the moment a pipeline starts and it being deployed on your production environment. Because of this we introduce the build step. In this step we gather everything from the main production branch (in most cases the master or main branch) and package it together in what we call an artifact. This artifact will then be stored and used in every next stage. This way if in the mean time new code is being pushed to our production branch it will start in a new pipeline but won’t affect this one. Imagine having a test stage where every test was successful and then going on to deploy to a production stage, but in the mean time code was changed and this new untested code was deployed to production. This is not a scenario you want so this is why we make sure that the code at the start of the pipeline will stay the same.
First we start off our pipeline with some basic stuff.
For the purpose of this blog we are going to make the pipeline in a separate repository then the one which hosts the policy files. It would be best to put these together so you could make it trigger automatically once code has be pushed. This would make it more continuous. For illustrative purposes the trigger for this pipeline was set to none so it has to be triggered manually. Next up is the pool, you could use a self hosted agent for this. We are using the Microsoft hosted agents in this blog post. By specifying the “windows-latest” vmImage we know we will always get a windows server host which will run the latest version of windows server used by Azure DevOps.
The last part are three variables which are defined. It’s possible to have both variables and parameters in a pipeline. This blogpost will use a lot of fixed hardcoded values to make it easier to follow and read so there are only three variables defined here as they are used multiple times. Note that if you are making this yourself you probably want to add more values as variables or parameters to make it easier applicable in multiple situations or easier to adapt.
Next up is the actual build stage of the yaml file.
Here we see the build stage completely written in yaml. There are several steps which are being done, the first checkout step will get the repository and store it in the folder defined in the path attribute. Next up are two powershell tasks, both of them use git to clone these two repositories:
Code Quality Testing
Azure Policy for tagging VM’s behind a loadbalancer
Both of them are discussed in earlier blogposts. Not everything in these repositories is relevant for us now, so we only copy the relevant folder or files to our main folder, this is done by the Copy-Item powershell commands.
Last up a build artifact is published with a certain name so we can refer to it later. We select everything in our main folder to be added in this artifact, this way we don’t have to retrieve the files every time.
Next up is the stage for unit testing. If we want to deploy our code to a production environment it’s important to incorporate testing into our pipeline. This way we can make sure that mistakes are caught before it goes to production. In this stage you can test multiple things. For this blog we limit ourself to only testing the powershell part of the solution, but I would strongly suggest to also look at testing your arm templates if you are going to implement this in a production environment.
The stage looks like this in yaml:
You’ll notice that in this step we use the checkout: none step, this is because in the next step we download the artifact created in the build stage. If you want to be thorough you could add a dependency to the build stage to this stage. If you are working with more complex pipelines I would strongly suggest using those dependencies to visualize better which stages is depended on which one. For this simple pipeline I’ve opted not to include it to keep things easier to read.
In the next step we run a powershell script to start the tests, this script looks like this:
Most of this script is explained in this post already. A thing to note is that this time the Invoke-Pester command is called twice. As you notice after the first time we change the Outputpath and the Filter.Tag properties. In the first run we only check for tests with the tag “Error”, in the second run we check for warning and information tests, this way the tests are divided in two different test reports.
As described in the post about code quality testing the tests area tagged based on the severity, in the unit tests which are written for this repository we also tagged the tests like this:
You see that in the It statement a -Tag parameter was added with the value “Error”, this way we tag these tests as Error too. Besides that you see that these unit tests are quite simple. In the tests folder we have a resources folder which contains a small script, this function will get the script and convert it to a single line json (we will need this later on). We have the output from the function and our expected result and compare these with each other to see if our function behaves the way we want. In our other test we have an extra challenge:
This script expects to run in an environment which has the az module installed. To still be able to test this without installed it completely we first define the function(s) used from this module. Notice that we also define the parameters used for this module. Because of the way we are going to test this we will need this. Next up we mock the az module function so it will run the code specified in the script block instead. This is empty because we don’t need anything to be executed.
Now in our tests we use the Assert-MockCalled to test if the function is called with the right parameters the right amount of time. This way we know that the data entered in our function will be passed to the az command in the right way.
The last step of this stage consists of publishing the test results. We have two jobs for this because we have two test reports. You’ll notice that the “failTaskOnFailedTests” attribute is different in both jobs. This is because when a test tagged as “error” fails we don’t want the pipeline to continue. If a warning or information tests fails we are okay with continuing the pipeline. We still want to see this in our test report but it doesn’t have to stop our deploy. This is something you will have to decide for yourself if you will allow some tests to fail but still continue. In this blog I mostly wanted to show that it is possible to have this divide.
After unit testing it’s time to deploy the resources. In the ideal situation you want to deploy this to your testing environment first, then after that you will do stage 4 and in a later stage you will deploy to production. Especially in smaller environments often there is no testing environment, so in these cases you can choose to deploy to production instantly. In these cases you might want to set up your pipeline in a way that if the next stage (integration testing) fails that it performs a rollback (this is something I will cover in a future blogpost). For the simplicity of this blogpost I’ve left that out. So in this stage we are going to deploy our resources to azure. We do this with the following part of the yaml:
This stage starts just like the previous one with getting the artifact. After that a azure powershell task is started. The difference between the normal powershell task and the azure powershell task is that in the azure powershell task it will call connect-azaccount with the provided serviceconnection.
Note that to make this work the create service connection needs to have owner rights on the subscription you want to deploy too. Default it will only get contributor rights, but because we also need to set some permissions we need to have owner rights.
To deploy the resources we make use of ARM templates. The deployment is split up in two different templates with the first being deploy_mid.json (see on github).
I wont go into full detail of the template but I will point out some details. In the first part of the template the parameters are defined and the variables are defined:
The […] means there is more code there but for not it’s irrelevant. What you see here is that we have a variable called roleJson, this variable is the json file we created in the previous blog post. You can completely write out the role definition in the arm template, but to make it easier to maintain your repository and to prevent you from having a massive ARM template it might be nice to import this json file into the arm template. So that is what happening here. We give the content of the file as a parameter to this ARM template and in the variables section we transform it to an object. The ARM template needs a bit more information to actually create the role definition, these are things we would normally enter in the GUI, so these things are defined in a different object and then by using the union function we merge the two objects together.
Note that the template is deployed on subscription level, this is because we also need to create a resourcegroup to store our User Assigned Managed Identity. After deploying this resourcegroup we make use of a nested arm template. Inside here we create the managed identity and we define outputs to return the id and resourceid of this identity because we are going to need them later.
Next in the template we actually create the role definition:
Here you see that where the documentation of this resource says that the properties object has several different attributes we only enter a single variable. That is because this variable is an object containing all these different attributes. This is the variable we made before and it can be added just like this.
After this we assign the reader and tag contributor role to the managed identity we created. These are straightforward tasks so don’t need any further explanation.
At the end of the template some outputs are defined which will be used in the next template.
The next template will deploy the actual policy definition. Just like with the other template at the start we do some manipulations with the parameters and variables:
For this template we import two things. First we import the policy definition json we created in the previous blog post. Secondly we also import the script to set the tags to vm’s. Because where in the previous blogpost we just got this script from an online repository, if we are concerned in code suddenly changing in production we want to include this in our policy definition.
So you see that we define two variables being the scripturl and the scriptcontent. This scriptcontent variable is a concat of the script and the attribute used in the arm template. The script needs to be in json representation on a single line. We will look at how to achieve this later in this blog.
You also see that the “YOUR ROLE ID” is being replaces by the id of the role we created in the precious arm template. We take the midOutput variable which is an object because we converted the output from the other arm template back from json. These replacements are performed on the JSON of the policy definition, after this is done it is transformed into an object and stored in a variable.
After that the policy defintion is made the assigned by the arm template:
In the first resource you see the same as we did with the role definition. Every attribute is in the variable already so we only need to define the properties with this single variable.
In the second resource you see we create the assignments, the parameters are defined and we link it to the previously created policy definition.
After that the last step is to assign the custom role we created to the managed identity which is created with the policy assignment. In this arm template it’s added and by using the dependson attribute we can configure it to wait until after creation to assign the roles. In practice I’ve seen it happen that the policy was assigned and the identity created but the azure active directory being behind and therefore when assigning the roles it will error. This can be solved by introducing a delay. Personally I normally use the deployment script with a start-sleep option. When I deploy this arm template to my environment it wont error out but like I said I’ve seen this happen so if this happens to you consider adding a delay.
To start the arm templates and make sure all data is retrieved the following powershell script is started by the yaml pipeline in a azure powershell task.
As you see here we retrieve the role definition, policy definition and script. Also the output from the first arm template is passed onto the next template.
If you are using the New-AzSubscriptionDeployment powershell command it will have dynamic parameters based on the ones defined in the arm template. So this way you can add parameters like -midOutput to it while they aren’t shown in the documentation.
The last step of the pipeline is to validate that the solution we deployed is working. Especially with policies it could lead to big problems if they misbehave while also having auto remediation. So to validate this there is this stage in the pipeline. In yaml it’s defined like this:
The first part is the same again like the other stages, after that a azure powershell script is called to perform the tests. This script is almost the same as the script calling the unit tests so let’s look at the actual pester test which is done:
You see here that in the test we deploy an ARM template. This arm template will create a resourcegroup with a load balancer and a virtual machine in it’s backendpool. While this is creating every 10 seconds we check if we can find the vm and if it has a tag applied to it. After 10 minutes we stop with checking and the test would be considered failed, you can tweak this value for yourself but mostly when I was doing this I saw a result within 5 minutes, this is also what we would expect as we set the evaluationDelay property in the policy to AfterProvisioning, so instantly after creation of the VM the policy should kick in. The delay is mostly because it has to start a Azure Container Instance to run the deployment script.
We could add a Afterall step in the pester test to clean up the resources of this test. But if you want to have multiple integration tests it might be easier to have one script at the end to clean everything up. So that’s what I’ve done here too. After the test results are published the final step of the pipeline is cleaning up the resources. This is done by a very simple script:
If we make sure that all resources used to integration testing are in a single resourcegroup this cleanup step can be done by just removing the resourcegroup.
The pipeline in Azure DevOps
Now that all the code is written it’s time to put this pipeline in Azure DevOps. This can be easily done by creating a pipeline with the yaml file. After you run it you will see the different stages visually shown. The stages are named and they should a status symbol. This is what you will see:
You will notice it shows errors and one of the stages is orange. This is because actually not all our tests will be successful when running this exact example. If we look at the test results we will see why:
We see that two of the PSScriptAnalyzer tests are failed. These tests are tagged as Warning and Information. And because we only said that Error tags should fail the pipeline it continued like we asked too. In a later revision of this product we could fix these errors, but they should halt our deployment now.
One of the big advantages of using multistage pipelines is that during development you could run just the first two stages. When starting the pipeline it’s possible to define which stages are exempted. This way you can have your unit tests be done but prevent the deployment. If you are using a testing environment you could even do the deploy to your test environment and run the integration tests but then skip the deployment to the production environment.
Conclusion
This blogpost has become longer then I originally intended. I still had to leave stuff out I would have loved to include. When creating a pipeline to deploy things automatically it’s very easy to go overboard and spend a lot of time one it. So I want to give the last advice to always see what will actually improve the value of your product before implementing it. Also it’s always important to keep an eye on your dependencies and if you are retrieving dependencies (like scripts/modules or other kinds of resources) from remote repositories you don’t have management over it is recommended to have a different artifact feed where you store these dependencies. And by specifying the version you want in your pipelines you know for sure you will always get the same thing. When a new release of one of your dependencies is published you can test your solution with this new version. If it works you can include it and push it to your production environment. But this way you have a lot more control.