AutoSysOps

View Original

Use traffic analytics to spot common azure network mistakes

It’s that time of year again where we look at our Azure environments and think to ourselves “Is there something we can do to manage it better”. The Azure Spring Clean was started to help you help you do this. This time my focus will be on azure networking. Often I see environments will all kinds of complex azure networking which have monitoring to test if connection are working. But plenty of time I’ve been asked to help improve these environment to then spot plenty of configuration errors which weren’t caught by the monitoring as there was no impact in the availability. In this blog post I will explain how you can use some functions of the Azure Network Watcher to spot some problems in your azure networking very easily.

Azure networking can become complex very quickly. Therefore it’s important to have tools to analyze these networks in a way where you can drill down in the information when needed but also have a high overview to spot macro problems. There are multiple third party solutions available for this, but personally I find the build in Traffic Analytics tool a very nice tool to use.

Let’s say there is a network like this:

There is a central Virtual Network which contains two webhosts. These webhosts are available to the internet via a Azure Application Gateway. There is a Azure Keyvault which stores some secrets. This keyvault is connected via a Azure Private Link to the Virtual network.
Besides that there is a separate virtual network which contains a management machine.

A simple web application could have a network like this. There might be more webhosts or some other resources like databases might be involved. It’s possible the virtual machines are replaces by a AKS cluster but there would still be networking between these different elements.

One thing that can be done very easily with Traffic Analytics is visualizing the network flows between different elements of the network. This will give an overview like this:

In the picture above you can see the different subnets in our network and how they interact. This brings us to the first important part of this visualization. It’s based on traffic between subnets.

When working with Azure Networks it’s easy to create one big subnet where all resources are in together. But resources together in one subnet are harder to monitor and control. So my advice would be to make sure you separate resources as much as possible in different subnets. This allows you to have much more control on which dataflows are allowed between the different resources and therefore working towards a zero trust model.
In the network watcher it’s possible to create a topology view of your network so you can see which subnet is part of which virtual network and which resources are in which subnet.

Things to note here are the kvsubnet. This subnet does have a network interface but not resources is visible here. This is due to the private link being used. The private link will make sure the keyvault resource receives the data via this network interface.
Another thing to note is the “clients” box between two of the network interfaces. This is to show these interfaces are in a load balancing group. This is because we are using a application gateway to load balance the traffic between these two interfaces.
The last time to note are the shield logo’s which are Azure Network Security Groups (NSG).

If you want to set up Azure Traffic Analytics it has to get the data from somewhere. The place it gets it from are the NSG’s.

When looking at the NSG in the azure portal there is a option to see “flow logs”. In here you can create a configuration. Flow logs will record all information about the traffic going in and out of this NSG. They are stored in blob storage as json files. These files are not easy to parse by hand as they contain a lot of information in a format which is not really human readable.
So you can also enable the traffic analytics here. This will process these flow logs on a regular interval and enhance the data with other resources.
If you are having a network I would suggest enabling both the flow logs and traffic analytics for every NSG in the network. I would also strongly suggest to make sure a NSG is place in every subnet. Even if the NSG will be completely transparent (allow all traffic) it will still log the traffic.

After the flow logs for all NSG’s are setup you can go to the traffic analytics area of the Azure Network Watcher and look at the danshboard there.

When clicking on the button for “view subnets” the visualization mentioned earlier will be shown. But in there there is much more information visible. When scrolling down many overviews will be presented.
When I’m analyzing a network I often start with the visualization mentioned earlier. Here I look at the data flows and see if they are in the direction I would expect and see if there are any data flows I wouldn’t expect.
After that you can delve deeper into the flows. For example I would expect my webhosts to only have connections with the application gateway. So I can look at the frequent conversations happening in my network and see if there are any conversation between one of my webhosts and something different. When doing this in my lab I will spot this:

Here it shows me there is a frequent conversation between my “server02” and a endpoint in the netherlands. This means my server is connecting to something else. I can press the see more button to get even more info.

When pressing “See more” in this overview I will be brought to the Log Analytics Workspace where a kusto query will be auto filled for me already that generated this report. I can use that to look at the raw data.
This overview shows me that there are connections between different IP addresses. I could try to look up these IP addresses and see if I can find them in the public IP’s from azure, this might give me some information.
If I have an idea already what it could be I could also perform a trick where I use my NSG’s for troubleshooting. I could link a NSG to my server (if it didn’t have one already) and create some outbound rules. I can use Service Tags to create a transparent rule to the azure services I suspect it might try to access. In this case I’ve created a rule for the KeyVault, Azure AD and a general rule for the Azure Cloud.

I can now go back to the Traffic Analytics and use the NSG hits area. This will show me which NSG rules are being hit and how often they are hit. When I look in the data I will spot this rule hit in my lab:

So here we see the keyvault rule which was made. If I look more I will also see a hit for the azure cloud rule as the keyvault is a subset of the azure cloud rule. So it seems that my server2 is contacting the keyvault by going to a public IP instead of the private link.
I could use tools like the packet capture tool in the Azure network watcher. This will generate a pcap file which can be opened in tools like WireShark.

When looking the capture I could for example filter on DNS queries being done. If I do this in my lab I will spot that it does a query for the keyvault but this goes to the 8.8.8.8 Google DNS server instead of the azure DNS server. When using a private endpoint it’s required to also have a private DNS zone which can be resolved by the Azure DNS server. But if this isn’t being used it will resolve the public IP of this keyvault. When the public endpoint is not closed off or restricted this will just work without any error. And my secrets will be transferred over the internet instead of my private connection. The data will still be encrypted but it’s not what I want. If I would look in my lab I would see that for server2 on the network interface a misconfiguration happened here:

So here by accident the wrong DNS server was entered and therefore this problem occured.

So this shows how you can spot an issue like this by using the tools in Azure Traffic Analytics and the Azure Network Watcher in general. This is only one of the problems you can spot with tools like this. I would strongly suggest to consider implementing this in your environments to get more visibility of your network traffic. Of course there is some cost involved with it. You will need to pay for the storage of the log data. There are retention policies you can set up so with these policies you can keep the costs under control. Personally I’d say the small investment you need to make with this is worth it as this will speed up your troubleshooting significantly and it could help you spot problems before they are found in (security) audits.

Thanks again to the Azure Spring Clean for organizing the event and allowing me to share this information with you.