Automatic rollback for Azure deploy with pipeline
/Recently I gave a talk about several tips and trick in your Azure DevOps pipelines. One of the tricks in this talk was to implement a automatic rollback for your azure deployment. In this blogpost I will explain how to automatically rollback to the last known good configuration once a Azure Deployment fails.
In this blog I will be using ARM templates for the deployment but this can just as easily be done by using Bicep or Terraform.
The files used in the talk can be found on my GitHub page. In this blog we will only look at folder 4 in the repository. In this pipeline we will deploy an Azure VM which if fails will rollback to an earlier version.
The start of the pipeline
Let’s look at the first part of the pipeline:
- none
parameters:
- name: rgName
displayName: Name of resourcegroup?
type: string
default: "test-rg-pipeline"
- name: useLastKnownGoodConfiguration
displayName: Use last known stable configuration (if false the most recent commit will be used)?
type: boolean
default: false
variables:
- group: "Deployment Variables"
pool:
vmImage: ubuntu-latest
The trigger of the pipeline is here set to none, in a production environment you probably want to trigger this pipeline every time something changes with your IAC files.
Afterwards two parameters are defined. The Resource Group Name is not that important, this was added to be able to run the pipeline multiple times at once for different resourcegroups to test quicker. The second parameter is important. This boolean parameter is used to determine if the pipeline should run with the current commit selected or if it should run with the last known good configuration. Both parameters have default values assigned so the pipeline can be run without entering any parameters.
A variable group is defined, this variable group contains two variables:
LastKnownGoodConfiguration : this holds the commit ID of the last known good deployment.
Password : this holds the password to use in the deployment of the azure vm.
The variable group is a library linked to an azure KeyVault.
The build stage
Next up is the first stage of the pipeline which is called Build.
- stage: Build
jobs:
- job: Build
displayName: Create Artifact
steps:
- checkout: self
displayName: Clone the repository
fetchDepth: 100
persistCredentials: true
- task: PowerShell@2
displayName: Switch to last known good configuration
inputs:
targetType: 'inline'
script: |
git reset --hard $(LastKnownGoodConfiguration)
condition: ${ }
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: "$(Build.SourcesDirectory)"
ArtifactName: "drop"
publishLocation: "Container"
- task: PowerShell@2
name: StoreCommit
displayName: Store commit
inputs:
targetType: 'inline'
script: |
Write-Host "##vso[task.setvariable variable=commit;isOutput=true]$(git log -n 1 --pretty=format:%H)"
In this stage a package is created which will be used in the rest of the pipeline. This is done because it’s possible there is a delay between when the pipeline was started and when it actually deploys things. It’s even possible when doing multiple deployments that there is time between these steps. Especially when using approvals or manual interventions. This could mean that the code in the production branch of your source control could change in the meantime so to make sure the pipeline will only deploy what is approved for this pipeline it starts with a checkout to get all the files from the branch. It will use a fetchdepth of 100 here, this is to ensure also earlier commits are retrieved.
The next step has a condition to only run when the parameter at the top is set to true. If this is the case it will use powershell to change the HEAD of the branch to a different commit. The commit chosen is the one stored in the keyvault and available as variable due to the variablegroup.
In the third step a package of the files is created. So when the parameter for lastknowngoodconfiguration is set to false this will just be the branch/commit selected when running the pipeline. If the parameter is true this package will contain the files of the last known good configuration.
In the last step a git command is used to get the currently active HEAD commit ID. This value for now is stored in a variable so it can be stored later if the pipeline was successful.
Deploying the resources
Next up in the pipeline it’s time to deploy the resources.
displayName: Deploy
jobs:
- deployment: Deploy
displayName: Deploy resources
environment: Test
variables:
- name: commit
value: $[ stageDependencies.Build.Build.outputs['StoreCommit.commit'] ]
strategy:
runOnce:
preDeploy:
steps:
- download: current
artifact: drop
- task: AzureResourceManagerTemplateDeployment@3
displayName: Deploy Resource Group
inputs:
deploymentScope: 'Subscription'
azureResourceManagerConnection: '<YOUR SERIVICE CONNECTION>'
subscriptionId: '<YOUR SUBSCRIPTIONID>'
location: 'West Europe'
templateLocation: 'Linked artifact'
csmFile: '$(Pipeline.Workspace)/drop/CreateRG/template.json'
csmParametersFile: '$(Pipeline.Workspace)/drop/CreateRG/parameters.json'
overrideParameters: '-rgName ${ }'
deploymentMode: 'Incremental'
deploy:
steps:
- task: AzureResourceManagerTemplateDeployment@3
inputs:
deploymentScope: 'Resource Group'
azureResourceManagerConnection: '<YOUR SERIVICE CONNECTION>'
subscriptionId: '<YOUR SUBSCRIPTIONID>'
action: 'Create Or Update Resource Group'
resourceGroupName: '${ }'
location: 'West Europe'
templateLocation: 'Linked artifact'
csmFile: '$(Pipeline.Workspace)/drop/CreateVM/template.json'
csmParametersFile: '$(Pipeline.Workspace)/drop/CreateVM/parameters.json'
overrideParameters: '-adminPassword $(Password)'
deploymentMode: 'Incremental'
In this pipeline we make use of deployment jobs to control the steps for deployment better. To illustrate the use of the predeployment and deployment phase in this job the deployment template is split up in two different steps where first a resourcegroup is created and afterwards a VM is created in this resourcegroup.
You’ll notice that in the predeploy phase a step is added to download the artifact while this isn’t done in the deploy step. This is because the deploy step will automatically download the artifacts associated with the pipeline so it doesn’t need to be added here.
In this example the tasks for ARM templates are used to deploy the files but this could be changed to Bicep or Terraform steps without changing the rest of the functionality.
On success or failure
The interesting part is what follows now. The deployment job has a special phase called “on” which has two situations, it’s shown in the last part of the pipeline.
success:
steps:
- task: AzureCLI@2
inputs:
azureSubscription: '<YOUR SERIVICE CONNECTION>'
scriptType: 'bash'
scriptLocation: 'inlineScript'
inlineScript: 'az keyvault secret set --vault-name ''<YOUR KEYVAULT NAME>'' --name ''LastKnownGoodConfiguration'' --value ''$(commit)'''
failure:
steps:
- task: TriggerBuild@4
inputs:
definitionIsInCurrentTeamProject: true
buildDefinition: '<YOUR PIPELINEID>'
queueBuildForUserThatTriggeredBuild: false
ignoreSslCertificateErrors: false
useSameSourceVersion: false
useCustomSourceVersion: false
useSameBranch: true
waitForQueuedBuildsToFinish: false
storeInEnvironmentVariable: false
templateParameters: 'useLastKnownGoodConfiguration: true'
authenticationMethod: 'OAuth Token'
enableBuildInQueueCondition: false
dependentOnSuccessfulBuildCondition: false
dependentOnFailedBuildCondition: false
checkbuildsoncurrentbranch: false
failTaskIfConditionsAreNotFulfilled: false
Let’s start by looking at the on.success part. Here it writes the commit ID we stored earlier to the azure keyvault, so after the whole deployment is complete this commit ID because the new lastknowngoodconfiguration.
In the on.failure part I make use of the trigger build step from the azure devops marketplace. This could be done with the API too if you don’t want or can’t use the extension from the marketplace. This will trigger a run of the pipeline but now it will set the parameter to true. If for some reason your pipeline fails it will show a new run like this:
You’ll notice the new run was initiated by a different account, this could be changed but personally I prefer this to make it more visible that this was an automated rollback run.
Possible improvements
This pipeline was written purely to demonstrate the possibilities so there is a lot of room for improvement. Here are some things to keep in mind when writing this for yourself.
If earlier version of the code is deployed the last known good configuration will be downgraded to this version too, this might be expected behavior but you do need to keep this in mind.
You probably want to make this pipeline into a template (or multiple ones) which can used in your actual deployment pipelines.
You probably want to deploy the resources in an absolute way instead of the incremental way. Now if new resources are added and something fails these new resources will persist.
A failsafe should be included incase the lastknowngoodconfiguration also fails, because now it will just infinitely loop and keep creating new pipelines.
It would be usefull to add something to the name of the pipeline if a rollback is triggered so this is instantly clear in the pipeline run overview.
Conclusion
With this technique you can create an automatic rollback scenario without to much extra work. Do keep in mind this could still break your environment so it would be wise to add some approvals somewhere or extra tests. But with this as a basis I wish you good luck on implementing these rollbacks!