A combination of new technology trends and changing working models have required data centre professionals to adapt to remote monitoring and management. But this is not without its challenges. Herman Chan, President, Sunbird Software, outlines the 12 KPIs data centre managers need to manage their facilities effectively – and remotely.
One of the biggest challenges facing today’s data centre professionals is the inability to effectively manage their data centres remotely. Increasing Edge data centre deployments, shifts to colocation facilities and the need to work from home through the COVID-19 pandemic prevent on-site management and exacerbate the need for remote monitoring and management of mission-critical infrastructure.
Data centre managers are now stuck wondering about questions like: how much capacity do I have and when will I run out? How do I manage moves, adds and changes with remote hands? Where can I deploy equipment if I don’t have the luxury of walking the data centre floor? How do I identify and manage hot spots? How do I ensure power loads don’t exceed capacity and cause downtime while I’m not at the data centre? And without easy access to the right information, decisions are delayed, problems continue to mount and service delivery is impacted.
To answer such questions, the solution lies in identifying and monitoring Key Performance Indicators (KPIs) and leveraging insights to optimise your data centre and increase uptime, improve efficiency, better utilise capacity and boost the productivity of people.
The top 12 KPIs you need to monitor to remotely manage your data centres are:
- Power utilisation and capacity per cabinet. Data centre power resources are increasingly constrained, while managing to uptime competes with driving efficient power utilisation. By monitoring your power utilisation and capacity at the cabinet level, you will improve uptime by ensuring you don’t exceed capacity and save money by discovering stranded power capacity.
- Real time power trends per cabinet. Many data centre managers take weekly or monthly measurements of their power consumption, leaving them vulnerable to short term peaks and potential overloads that are not detected. Monitor your power consumption per rack in real time, trend that data continuously and set thresholds and alerts to ensure that you are notified and able to react before there is a major issue or users are impacted.
- Stranded power capacity per cabinet. Data centre managers will often plan, budget and allocate more power to each server than is actually demanded by the IT equipment. This causes stranded power that can still be utilised in racks. For a single rack, a couple of kilowatts of stranded power may seem unremarkable but when you factor in hundreds or thousands of racks, stranded power could account for as much as 50% of all available power. Monitor power consumption in your data centre to identify stranded capacity. Then, deploy that power with confidence and delay spending millions to build your next data centre.
- Available rack units trend. This KPI allows you to see when you may run out of space and how many items can be installed in your data centre over time, based on RU height. It is useful to identify trends in the efficiency of your use of space and to correlate how much space vs. power capacity you have to deploy new devices.
- Available floor space remaining. In addition to tracking available cabinet space, track available floor space by the number of open cabinet positions to know how much white space is available to deploy new cabinets on the data centre floor.
- Data and power ports capacity and usage trends. How effective you are at planning and managing your data centre capacity is related to how detailed you are managing your port level capacity. Tracking capacity down to the data and power port level provides granular data that clues you in to how many available ports remain. Monitor your usage and capacity by connector type to ensure you never run out of free data or power ports in your data centre. By tracking physical port capacity at the cabinet level, you can intelligently provision new equipment, make more informed capacity planning decisions, use power and network resources more efficiently and reduce operating expenses.
- Requests by requester, stage, type and location. To maintain SLAs while improving efficiency and productivity of data centre staff, you must properly monitor and manage moves, adds and changes. Track the status and number of change requests, tickets and work orders – who is making them and where – what progress is being made and what types of changes are being requested. Track your requests from creation to approval to ensure work order quality and transparency while improving staff efficiency through improved collaboration.
- Completed requests over time. It’s important to know how much work is being done in the data centre. One method of doing this is by monitoring the number of completed requests by type of request over time. Tracking data centre activity and productivity in this manner allows you to manage human resource capacity, utilisation and productivity more effectively and find opportunities for process enhancements.
- Asset audit trail. Having complete visibility and transparency into the information and history of any asset in your data centre helps drive efficiency and facilitate compliance. For the most effective remote data centre management, maintain a real-time audit log for all changes in your data centre that includes what action was taken, by who and when.
- Inlet temperature per cabinet. A common mistake in data centre monitoring is to monitor the temperature at the room level rather than at the rack inlet level, potentially leaving you blind to cabinets that are operating at unsafe temperatures. Instead, monitor each cabinet’s inlet temperatures in real time to ensure that your equipment is operating safely within ASHRAE standards, easily identify hot spots and save money by avoiding overcooling.
- Average max temperature trends. In addition to tracking the latest temperature per cabinet, you should add a level of sophistication to your monitoring by trending that data over time to identify spikes and irregularities. By monitoring the average max temperature per cabinet over time, you can ensure that your equipment is operating within safe guidelines not just now, but all the time. If you see temperature spikes, you’ll have data to identify what the issue was and prevent it from reoccurring.
- Energy consumption per location. Energy consumption per server is growing each year as increases in performance drive energy demand and the cost of energy consumed can account for up to 50% of total data centre operating expenses. As such, energy consumption needs to be monitored and intelligently reduced. Track your energy consumption and set targets to reduce consumption, bill back users, meet corporate sustainability and green initiatives, and collect energy rebates and carbon credits.
It’s more critical than ever to integrate, analyse and act on the KPIs that have the most impact on your daily IT operations, but how do you begin to remotely monitor these metrics? With a comprehensive remote Data Centre Infrastructure Management (DCIM) solution, it’s easy.
A modern DCIM tool provides all your most important KPIs right out of the box with zero-configuration dashboard widgets, reports and visual analytics. An enterprise-class data and health poller gathers data directly from facility equipment to ensure accurate, high-quality information that leads to deeper, more reliable insights. Second-generation DCIM makes it simple for data centre professionals to make smarter, more informed remote data centre management decisions to improve data centre health and efficiency while dramatically simplifying capacity management.Click below to share this article