Magazine Button
How can data centre leaders ensure resilience in today’s rapid growth market?

How can data centre leaders ensure resilience in today’s rapid growth market?

InsightsOperations & SystemsTop Stories

As the amount of data continues to grow at an unprecedented rate, business leaders must discover new ways of managing it and in turn, discover methods to secure it to avoid data loss.

We hear from a number of industry experts who discuss ways to ensure resilience in today’s rapid growth market.

Nabil Khalil, Executive Vice-President of R&M Middle East, Turkey and Africa, said: “Internal data centre traffic is expected to grow by 80% over the next three years. As a result, there is a real risk of networks becoming bandwidth bottlenecks. As a standard practice, organisations must now move away from traditional low-density cabling to high-density modular structured cable solutions. By doing so, they can implement physical network infrastructure in a far more manageable and flexible manner. Furthermore, these systems enable data centres to easily migrate to 25, 100 and 200 Gb/s networks and solve some of the most critical network challenges.

“A data centre is an ever-expanding, interdependent, difficult-to-understand infrastructure that, once installed, is there to stay. Data centre managers must address constant demand for new services and booming data growth.

“When planning the installation or upgrade of network cabling, it is vital to remember that while the cost of cabling typically only accounts for 4-5% of the total expense of the data centre, reports have shown that 65% of system outages are related to cabling, and patching mistakes cause 28% of downtime in data centres. They are therefore making sound cabling investments to significantly reduce the possibility of data centre downtime.

“Today, no one asks for a slower network and while it is 25 Gigabit Ethernet (GbE) that is being deployed, all new implementations are factoring in the need to include easy 100G and even 200G migration as a part of their future. And for this, higher value cabling systems such as fibre are the way forward. Fibre systems facilitate the setup of high-density cabling systems for data networks with parallel-optical connection technology. Consequently, data centres can introduce 25 GbE or even 100 and 200 GbE as a bandwidth to connect the fastest servers and switches to each other.

“That being said, copper is still a good option for horizontal links (10G) and with the new Cat. 8, it will be able to provide 40G for 30 metres. So, it remains cheaper and sufficient for small distances.

“In terms of design, the traditional hierarchical network topology with core, aggregation and access level can no longer cope with today’s needs, resulting in congestion along traffic routes. To ensure data and applications are available in real time at all times, latencies have to be considerably reduced, calling for single-mode or multi-mode fibre and new kinds of network architecture. A spine-leaf architecture can be adapted to continuously changing needs and reduces latency. A network mesh with crisscrossing cabling guarantees that switches at access level are no longer more than a hop away from each other.

“Between ever increasing fibre densities, complicated fanout connections and meshed leaf-spine architectures, knowing the true state of the network cabling at any time has been extremely difficult. Changing network topologies therefore also warrant the use of Automated Infrastructure Management (AIM) solutions. Using a single, current, consistent database eliminates stranded capacity and facilitates end-to-end analysis, agile infrastructure management, predictive analysis, capacity utilisation and efficiency of operation and administration, and can bring 30 – 50% reduction in downtime.”

Brian Smith, Business Unit Manager: Managed Data Centre and Cloud Solutions, Datacentrix: “Data centre resilience is being tested like never before, particularly within the South African context where technology is under immense pressure to deliver remote working availability due to COVID-19 restrictions, while concurrently being faced with renewed power interruptions in the form of the latest bout of load-shedding.

“A data centre’s resilience can be defined as the measurement of its ability to continue to operate despite the failure of a component, such as its related equipment or power grid connection, or any other disruption. Resilience could be described as a data centre’s vulnerability to failure.

“Data centre resilience can be considered from four aspects, namely telecommunications (the network), electrical (power), structural (the physical building or room housing the data centre), and mechanical (cooling). As prescribed by The Uptime Institute, there are standard data centre resilience measurements, known as tier ratings. There are four tiers of data centre resilience, from lowest to highest: N, N+1, 2N and 2N+1.

“Continuous resilience improvement is a must today, with connectivity being many a business’ life-support system. Local C-level executives therefore have to scrutinise their specific business requirements when it comes to its advancement. They must address questions such as: how much downtime is acceptable; how much budget can go towards improving resilience – which would need to take infrastructure, maintenance costs and more into consideration; does the cooling system work, even if there is no power; is it possible to generate power within the data centre; and whether or not the correct technical skills are in place. Once these questions have been answered, a business’ uptime needs become clearer.

“Next is whether it makes more sense to host the data centre internally, or outsource it to an expert.

“To improve resilience towards reaching the five or six nines of availability can be an expensive exercise within an existing, in-house data centre, to the tune of millions of Rands, only recoverable over the following seven to 10 years. It would require the right equipment, the right staff and the right maintenance contracts in place. Location of the data centre is also critical, with access to more power if needed, as well as a seamless connection to more than one telecom provider. Essentially, as availability is increased, so too is the cost ratio for each percentage point gained.

“The benefits of outsourcing data centre requirements mean that the services partner will handle power and connectivity requirements, ensure that the right equipment is in place, and have the right staff and technical expertise on board – with uptime service level agreements (SLAs) in place. This will not only ease remote working requirements, it also allows businesses to step away from budgeting towards an OPEX focused model.

“One of the major advantages of a hosted environment is the economies of scale; there is a fixed investment spread over a larger group of companies all accessing the service, and ultimately making a high availability, higher tiered environment more accessible to many organisations from a cost point of view.”

Kevin Deierling, VP of Marketing, NVIDIA, said: “Advanced AI applications are becoming commonplace across cloud, enterprise and Edge, driving massive compute and data requirements and making data centre resiliency more critical than ever. Data centre resilience is achieved by adopting a cloud-native architecture, where applications are broken down into small, distributed microservices which are assembled — or composed — into scalable applications as needed and on-demand. Such cloud-native applications are far more resilient than apps developed as giant monolithic code behemoths, because the small, cooperating microservices dynamically come and go. These microservices are implemented within containers, so they are easy to launch or update and the application can quickly scale across hundreds and even thousands of nodes. Resilience to failure is a huge additional benefit of this cloud-native architecture, because the distributed application is designed to accommodate containers that come and go, whether intentionally or not. So failures of individual containers or entire servers are expected and accommodated by design and the microservices are quickly replaced by new containers running on different servers.

“Accelerated computing using GPUs and an intelligent network are critical elements needed to build this resilient, distributed cloud-native data centre. A good example is the NVIDIA’s accelerated computing GPUs for AI applications, that deliver faster and more efficient natural language processing, Big Data analytics, task automation and recommendation engines for both consumers and IT staff. GPU-powered AI can recognise anomalies or problematic trends in power consumption, storage usage, network traffic, hardware reliability, or response time to let data centre professionals prevent outages or resource shortages. It can also recognise and stop security threats or intrusions more quickly. The AI acceleration is complemented by the intelligent NVIDIA networking switches, sSmartNICs and Data Processing Units (DPUs) from the Mellanox acquisition. The SmartNICs offload SDN, virtualisation (for networking containers), data movement and encryption tasks from the CPUs. This allows applications to run more quickly while using fewer CPUs and servers, and also simplifies connecting new or moved containers with their microservices. The DPUs provide security isolation, a distributed software-defined, hardware-accelerated data and control plane, and storage virtualisation to servers and containers, making it faster and easier to spin up or spin down microservices with all the needed security protections and just the right amount of shared storage. Additionally intelligent, open-networking switches provide multiple high-bandwidth paths between servers to avoid bottlenecks or outages due to congestions or broken links. The switches also provide programmable fabric automation and smart telemetry across the network, increasing resiliency and simplifying the management of composable microservices. This entire accelerated AI computing stack and cloud-native fabric are fully integrated within a Kubernetes container orchestration platform that is at the heart of achieving resilience and scale in next-generation data centres.”

Ashraf  Yehia, Managing Director – Middle East at Eaton, said: “The factors contributing to the growth of the hyperscale data centre market are the increasing requirements for high performance applications, rising need for reduction in capital, reduction in operational expenditures and the high spending on hyperscale data centre technologies. Also, the increasing number of users opting for these technological solutions is expected to improve the data infrastructure and this in turn is expected to boost the hyperscale data centre market.

“Geographically, north America accounted for the largest market share in the global hyperscale data centre market, whereas Asia Pacific (including the Middle East) is expected to grow at a fast pace over the forecast period, 2017-2023. The global hyperscale data centre market is expected to reach approximately US$96 billion by the end of 2023, with 27% CAGR during forecast period, 2017-2023.

“Power efficiency, increasing application of these data centres across various sectors is expected to boost the overall growth of the market over the forecast period. However, power failure and increasing generation of heat from these technological solutions are hampering the growth of the market.

“Current methods of data centre design involve a mix and match approach to component selection and often utilising products from multiple manufacturers to completing the power chain. This approach has served us well for the last 30 years, but cannot deliver the expectations of the next generation of customers and the data centres they require. Accelerated construction programmes and leaner builds require a more integrated system-level approach that can shave cost and improve uptime. Manufacturers are ideally positioned to combine complete component understanding with deep-dive design experience to produce a more finely tuned electrical system. This results in not just performance gains for the customer, but also significant cost savings across the complete life cycle of the data centre from the initial build to end of life. A systems level approach to design will define the future of data centres.

“Soon, we’ll also see a measured shift in the number of creative ways to cool data centres – particularly by direct liquid cooling. As the processing power of compute increases, so does the heat generated. For every kW of electrical energy consumed in a data centre, there is another kW needed to cool the equipment and keep it in its operational temperature range. Instead of cooling the entire space with either mechanical or free air cooling at a certain power density, a more direct or targeted cooling approach can be deployed. This can be done by immersion cooling, where entire servers are submerged into tanks of thermally conductive, but not electrically conductive liquid, or by direct liquid cooling whereby the electronics and heat sink on the motherboard can be encapsulated in a manifold and a liquid flow removes the heat.”

Ihab Farhoud, Director, Solutions Engineering – METNA, VMware Middle East, Turkey and North Africa, said: “Data centre leaders must focus on Business Continuity planning, and resiliency should be a part of this. In terms of resources, it’s important to have disaster recovery sites and backup data centres, but true resiliency means Business Continuity and agility: organisations should be able to operate as normal, completely unhindered and be able to maintain all their SLAs at any given time.

“The Business Continuity plan should include various components including capacity planning, which should be part of the plan from day-one. This is especially important for organisations that are expecting rapid growth or fluctuations in service demand that could require rapid upscaling or downscaling of capacity, such as computer, storage, network and security resources. Effective capacity planning involves making projections two to five years into the future. The Business Continuity plan should also include disaster recovery and maintenance of SLAs, which means you must have some sort of scaleability plan in place.

“Data centre leaders should also look to implement automation, such as the automation of service provision, whether it’s Software-as-a-Service and applications, Infrastructure-as-a-Service, or anything else. Services should be automated as far as possible using the different tools available. Automation is important to meet increasing demands and pressures on operations at all levels. One of the alternatives to automation when there is a dramatic increase in demand on services, is to try to hire additional staff with the right skills, but this is costly, time-consuming and difficult, especially in this part of world where there is a shortage of many types of IT skills. In addition to increased resiliency, automation helps to overcome challenges presented by the skills shortage and reduces OPEX.

“Additionally, moving to the cloud and building a hybrid or multicloud model, is a great way of adding resiliency into your data centre. It’s no longer sufficient to stick to your on-premise data centre, even with a good disaster recovery backup system in place, as it doesn’t offer the flexibility that organisations require.

“Modernising services and applications is another factor which contributes to additional resiliency.

“Furthermore, by adopting and scaling DevOps, organisations can manage and maintain their applications/services more efficiently, adding another layer of resiliency.”

Click below to share this article

Browse our latest issue

Magazine Cover

View Magazine Archive