Aidan Finn, IT Pro 2024年08月28日
Azure Route Server Saves The Day
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一个使用 Azure 路由服务器分支到分支路由来帮助客户解决网络问题的案例。客户是一个拥有全球业务的大型组织,其网络设计采用 Meraki SD-WAN 连接全球分支机构,并通过专线连接到另一个云供应商的云区域。当挖掘机意外破坏了连接客户数据中心和另一个云区域的专线后,Azure 路由服务器被用来建立分支到分支的连接,确保客户服务能够正常运行。

🤔 该案例中,客户的网络架构包含三个部分:客户数据中心、另一个云供应商的云区域和 Azure 云区域。客户数据中心通过 Meraki SD-WAN 与全球分支机构连接,并通过专线连接到另一个云供应商的云区域,而 Azure 云区域则通过 ExpressRoute 连接到客户数据中心。

🚧 当挖掘机意外破坏了连接客户数据中心和另一个云区域的专线时,客户的网络连接中断,导致客户数据中心和另一个云区域之间的应用集成和服务无法正常访问。

💡 为了解决这个问题,作者利用 Azure 路由服务器的分支到分支路由功能,将 Azure 云区域作为临时路由器,建立了客户数据中心、另一个云区域和全球分支机构之间的连接。由于 Meraki SD-WAN 和 Azure 路由服务器之间的对等关系,客户数据中心和全球分支机构可以将流量路由到 Azure 云区域,而 Azure 云区域可以通过 ExpressRoute 与另一个云区域进行通信。

🚀 通过这种方法,客户数据中心、另一个云区域和全球分支机构之间的网络连接得以恢复,客户服务也能够正常运行。此外,由于 Azure 云区域与全球分支机构之间的距离较短,用户体验也得到了改善。

💡 该案例说明了 Azure 路由服务器在网络故障场景下的重要作用,可以作为临时路由器,帮助客户快速恢复网络连接,并提供更好的用户体验。

🤔 作者还建议,客户可以考虑将另一个云区域纳入 Meraki SD-WAN 中,以进一步提升用户体验。此外,为了确保最佳路径选择,可以配置 Azure 路由服务器的路由策略,例如优先使用 ExpressRoute 或 VPN,或通过 AS 路径来控制路由选择。

🚀 这种三角形的连接方式为客户提供了更高的网络可用性和可恢复性,即使其中一条路径出现故障,其他两条路径仍然可以保证网络连接的正常运行。

In this post, I will discuss a recent scenario where we used Azure Route Server branch-to-branch routing to rescue a client.

The Original Network Design

This client is a large organisation with a global footprint. They had a previous WAN design that was out of scope for our engagement. The heart of the design was Meraki SD-WAN, connecting their global locations. I like Meraki – it’s relatively simple and it just works – that’s coming from me, an Azure networking person with little on-premises networking experience.

The client started using the services of a cloud provider (not Microsoft). The client followed the guidance of the vendor and deployed a leased line connection to a cloud region that was close to their headquarters and to their own main data centre. The leased line provides low latency connectivity between applications hosted on-premises and applications/data hosted in the other cloud.

Adding Azure

The customer wanted to start using Azure for general compute/data tasks. My employer was engaged to build the original footprint and to get them started on their journey.

I led the platform build-out, delegating most of the hands-on and focusing on the design. We did some research and determined the best approach to integrate with the other cloud vendor was via ExpressRoute. The Azure footprint was placed in an Azure region very close to the other vendor’s region.

An ExpressRoute circuit was deployed between a VNet-based hub in Azure – always my preference because of the scalability, security/governance concepts, and the superiority over Virtual WAN hub when it comes to flexibility and troubleshooting. The Meraki solution from the Azure Marketplace was added to the hub to connect Azure to the SD-WAN and BGP propagation with Azure was enabled using Azure Route Server. To be honest – that was relatively simple.

The customer had two clouds:

Along Came a Digger

My day-to-day involvement with the client was over months previously. I got a message early one morning from a colleague. The client was having a serious networking issue and could I get online. The issue was that an excavator/digger had torn up the lines that provided connectivity between the client’s data centre and the other cloud.

Critical services in the other Cloud were unavailable:

I thought about it for a short while and checked out my theory online. One of the roles of Azure Route server is to enable branch to branch connectivity between “on-premises” locations between ExpressRoute/VPN.

Forget that the other cloud is a cloud – think of the other cloud’s region as an on-premises site that is connected via ExpressRoute and the above Microsoft diagram makes sense – we can interconnect the two locations via BGP propagation through Azure Route Server:

I presented the idea to the client. They processed the information quickly and the plan was implemented quickly. How quickly? It’s one setting in Azure Route Server!

The Solution

The workaround was to use Azure as a temporary route to the other Cloud. The client had routes from their data centre and global offices to Azure via the Meraki SD-WAN. BGP routes were propagating between the SD-WAN connected locations, thanks to the peering between the Meraki NVA in the Azure hub and Azure Route Server.

BGP routes were also propagating between the other cloud and Azure thanks to ExpressRoute.

The BGP routes that did exist between the SD-WAN and the other cloud were gone because the leased line was down – and was going to be down for some time.

We wanted to fill the gap – get routes from the other cloud and the SD-WAN to propagate through Azure. If we did that then the SD-WAN locations and the other cloud could route via the Meraki and the ExpressRoute gateway in the Azure Hub – Azure would become the gateway between the SD-WAN and the other cloud.

The solution was very simple: enable branch-to-branch connectivity in Azure Route Server. There’s a little wait when you do that and then you run a command to check the routes that are being advertised to the Route Server peer (the Meraki NVA in this case).

The result was near instant. Routes were advertised. We checked Azure Monitor metrics on the ExpressRoute circuit and could see a spike in traffic that coincided with the change. The plan had worked.

The Results

I had not heard anything in a while. This morning I heard that the client was happy with the fix. In fact, user experience was faster.

Go back to the original diagram before Azure and I can explain. Users are located in the branch offices around the world. Their client applications are connecting to services/data in the other cloud. Their route is a “backhaul”:

    SD-WAN to central data centreLeased line over long distance to the other cloud

When we introduced the “Azure bypass” after the leased line failure, a new route appeared for end users:

    SD-WAN to AzureA very short distance hop over ExpressRoute

Latency was reduced quite a bit so user experience improved. On the contrary, latency between the on-premises data centre and the other cloud has increased because the SD-WAN is a new hop but at least the path is available. The original leased line is still down after a few weeks – this is not the fault of the client!

Some Considerations

Ideally one would have two leased lines in place for failover. That incurs costs and it was not possible. What about Azure ExpressRoute Metro? That is still in preview at this time and is not available in the Azure metro in question.

However, this workaround has offered a triangle of connectivity. When the lease line in repaired, I will recommend that the triangle becomes their failover – if any one path fails, the other two will take the place, bringing the automatic recoverability that was part of the concept of the original ARPANET.

The other change is that the other cloud should become another site in the Meraki SD-WAN to improve the user app experience.

If we do keep branch-to-branch connectivity then we need to consider “what is the best path”? For example, we want the data centre to route directly to the other cloud when the leased line is available because that offers the lowest latency. But what if a route via Azure is accidentally preferred? We need control.

In Azure Route Server, we have the option to control connectivity from the Azure perspective (my focus):

The post Azure Route Server Saves The Day first appeared on Aidan Finn, IT Pro.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Azure 路由服务器 分支到分支路由 网络故障 Meraki SD-WAN ExpressRoute 云计算
相关文章