The move to open networking and next-generation software-defined networking has offered one of Italy’s leading banking institutions, Credito Valtellinese [Creval], significant technical and operational benefits. The organisation has upgraded its data centre infrastructure, with greater capacity for improved performance. Bruno Franchetti, Chief Network Architect, Creval Group, discusses how the organisation worked with Pluribus Networks and tells us how Creval now has world-class data centre infrastructure which meets the growth needs of its business and customer expectations.
Creval is a medium-sized banking institution with more than 3,400 employees, present in 11 Italian regions through a distribution network made up of 355 branches. Established as a cooperative company in Sondrio, in 1908. From October 2016, it took on the new legal form of joint-stock company following the approval of the Shareholders’ Meeting in accordance with the popular bank reform law passed by the government in 2015.
Over the years, Creval has expanded its business operations across Italy, added new services, staff and branch offices. In order to meet the needs of a growing business, the IT team realised that its network needed to be modernised to meet the ever-increasing IT workloads and respond to the changing customer expectations.
The legacy network consisted of two data centres located approximately 150 kms apart. The applications hosted by these data centres were announced over a private MPLS WAN towards 300+ bank branches. The legacy data centre architecture was based on older switches and created the following challenges:
Disaster Recovery: The ability to migrate services from one data centre to another was not deterministic. The recovery time objective (RTO) for a full Disaster Recovery was four days, which was unacceptable from a business perspective.
Localised services independently managed: From a load distribution perspective, all services are localised to a single data centre independently managed from the other one.
Legacy systems: All networking switches, majority of the storage systems and firewalls were old and required modernisation to meet emerging requirements of the business.
Based on the network challenges, the Creval IT team came up with the following objectives for the network redesign:
- Streamline Disaster Recovery procedures: The key requirement for the network re-design was the need for a deterministic and streamlined Disaster Recovery strategy.
- Evolve to a scalable and flexible data centre architecture: Build two new data centres with an option to expand to a third or even more sites without having to re-architect the network.
- Operational agility: Have a unified overlay network fabric across the geographically separated data centres to enable simplified operations and workload mobility.
- Business Continuity: Continuity of business operations during the entire network re-design and migration process was a very important consideration for the team.
- Migration: Seamless migration of services from the old data centres to the new data centres was a key requirement of this project.
The Creval team undertook a detailed analysis of the solutions and technologies available in the market that could meet its requirements. After an extensive evaluation process and discussions with vendors/system integrators, the team decided to move ahead with Pluribus Netvisor ONE running on high-performance open networking Dell 5200 series switches.
The key to choosing Pluribus was based on its ability to meet all of the programme objectives and selection criteria, which included the ability to unify the data centres into a seamless multi-site fabric for operational simplicity, agility, ease of service migration, integrated analytics for visibility and the ability of the solution to scale.
“We decided to move ahead with a solution that supported disaggregated networking and a vendor that had local support and could meet out stringent testing and service migration requirements. The Pluribus deployment and engineering teams were very professional, a few minor issues that came up during the course of deployment were resolved in a timely manner,” said Iacopo Salacrist, Head of Technology Division at Creval.
Creval built two new modern data centres within a 50 km distance of each other. The two new data centres are interconnected with two 10G low latency DWDM links. To make service migration seamless, a pair of Dell-Pluribus switches interface with the existing Cisco Nexus 5500 switch. The Pluribus Adaptive Cloud Fabric (ACF) was able to extend the fabric from the new data centre to the old one and seamlessly connect to migrate services.
The workload migration was completed smoothly within the allocated time and after extensive testing. This solution was certified from a bank regulation compliance at the end of 2020.
“The key benefit of this network refresh with Pluribus is to have drastically streamlined the Disaster Recovery procedures and brought down the RTO to minutes for the most critical applications, and a few hours for the entire infrastructure, when before, they were complex – leading to longer and mostly undeterministic RTO,” said Bruno Franchetti, Chief Architect at Creval.
Application services can be distributed across both new data centres for resilience and there is complete flexibility in performing granular workload migration without impacting service performance. The IT team now has the choice to control when to move an entire service or all services from one site to the other just by reconfiguring a pair of firewall interfaces and with minimal impact to the service availability.
Here, Bruno Franchetti, Chief Network Architect, Creval Group, discusses the solution implementation and its benefits.
Can you explain your role and the scope of your responsibility at Creval?
I am the Chief Network Architect at Creval. I am in charge of the design and management of the corporate network, both in the data centres and the branch offices spread across Italy.
What were the driving factors behind the need to modernise your network infrastructure?
We decided to completely redesign our data centres and our main goals were to:
- Increase reliability and scalability of our IT infrastructure with a focus on our data centre architecture
- Improve the Disaster Recovery procedure to achieve a near to zero Recovery Point Objective (RPO) and a reduced and deterministic Recovery Time Objective (RTO) where all apps are recovered in less than five minutes.
- Reduce capital costs by leveraging disaggregated networking using white box switches and also reduce OpEx by deploying a fully-automated networking solution
To achieve these goals, we decided to migrate our two legacy data centres to two new data centre sites and replace all computing, networking and storage gear. The biggest change for us was a much more elegant and cost-effective network design that supports an active-active architecture across the two new sites.
What changing customer expectations did you need to address?
Creval has seen increasing usage of online banking applications by our consumer and business customers. Customers expect applications to be available 24×7. We worked to improve the overall reliability of the data centre infrastructure and designed the new data centres in a way where applications can fail over to the redundant site in just a few minutes.
We paid particular attention to scalability and manageability with the objective to put in place a solution that could scale easily and be managed by a small team like ours. Another key consideration for my team was to develop a migration strategy without service interruption.
How does the improved infrastructure design allow for better communication across your regions throughout Italy?
In the two new data centre sites, we implemented a state-of-the-art spine and leaf infrastructure and connected them with two 10G DWDM links. On top of this underlay network, we run a VXLAN overlay. The spine and leaf design guarantees the desired scalability, while the VXLAN overlay allows for stretching the layer 2 and layer 3 services and the reliability of a layer 3 underlay transport. This overlay architecture enables us to deploy an active-active data centre architecture where a catastrophic failure of the primary data centre can be recovered by the backup active data centre almost instantly.
The network has been built using Pluribus Networks Netvisor ONE OS running on Dell switches, S5232-ON for the spines and S5248-ON for the leafs. This disaggregated network approach using the Dell – Pluribus solution is much more cost-effective than our legacy architecture.
The real engine in this solution is the Pluribus software. The Linux-based Netvisor ONE OS powers a unique SDN automation solution called the Pluribus Adaptive Cloud Fabric (ACF) that enables us to treat the two physically separated data centre sites as a single entity from a network management perspective, where the entire network fabric can be managed like a single switch. This radically simplifies the provisioning and management of the network.
To facilitate the migration from the old data centre to the new data centre, we installed a pair of switches to connect the two legacy data centres with the two new data centres, creating a full mesh connection. Leveraging the flexibility of the Pluribus ACF, we were able to create a single fabric spanning all four sites which allowed us to have a seamless migration of workloads from the legacy to the new data centres – all with no service interruption.
The two new data centres are identically equipped in terms of computing, storage and network resources. To better manage Disaster Recovery situations, we decided to use one data centre as primary and the other as secondary. The connection between the data centre and the internal MPLS network, Internet and various network functions are redundant in both data centres and are configured in such a way that should a catastrophic failure occur in the primary site, the traffic is automatically steered towards the secondary data centre, with all applications back online in less than five minutes.
How would you now describe your Disaster Recovery strategy since the implementation?
As discussed earlier, the two new data centres behave as a single logical site from the network perspective, allowing seamless workload mobility between them.
Creval uses one site as primary and the other as secondary for Disaster Recovery. All internal and external services are provided by the primary site and the corresponding virtual machines (VMs) are constantly replicated on the secondary site by Zerto for Disaster Recovery. The console also provides us with an immediate view of the status of the environment and reporting the estimated RPO, i.e., replication delay, for each protected VM.
The protected VMs are distributed in logically consistent groups. In a Disaster Recovery situation, they can be restarted on the Disaster Recovery site according to a predefined order. Each VM starts from the latest stored consistency point.
One of the main challenges related to Disaster Recovery has always been network reconfiguration, a task that is minimised by leveraging the Pluribus ACF. We have the same network spanning both data centres and VMs don’t have to change their IP address when spun up at the Disaster Recovery site. Using anycast gateways constructs like VRFs and VSG, the network appears the same to the VMs on both data centres.
During the Disaster Recovery process, the only portion of the network we would need to reconfigure are the interfaces of systems providing stateful services like firewalls. To steer traffic to firewalls on the secondary site we would reconfigure their interfaces with the same IP address assigned to those of the primary site.
How far has the solution meant you’ve been able to fulfil your network redesign objectives?
From the network perspective, we are quite satisfied because the solution we put in place helped us achieve our Disaster Recovery goals both in terms of RPO and RTO, and provides us with flexibility around workload mobility and Business Continuity. The overall solution is very easy to manage and provides an intrinsic level of redundancy that, during normal operations, is able to support multiple levels of failure without service interruption.
Critically, the Pluribus solution seamlessly integrates with our existing infrastructure which allowed us to dramatically simplify the migration process; everything with the economics of open networking and White Box switches.
How has the project enabled you to operate with world-class data centre infrastructure and how will this benefit both business and customers?
Our team is extremely satisfied with what we have achieved with the network upgrade. We have built a state-of-the-art, flexible, scalable, reliable and easy to manage data centre that can safely support our company evolution for many years to come.Click below to share this article