DCU Study Guide
DCU Study Guide
Advantages of Row and Rack-Oriented Cooling Architectures Part I Data Center University Course Transcript
Slide 1 Welcome to the Data Center University TM course on the Advantages of Row and Rack-Oriented Cooling Architectures, Part I. Slide 2 If this is your first time participating in a Data Center University TM course, please note some of the screen controls. For best viewing results, we recommend that you maximize your browser window now. The Pause/ Play icon lets you pause and play the course. Use the Previous and Next slide icons to move back or ahead. Using your browser controls may disrupt the normal play of the course. Finally, click the Notes tab to read a transcript of the narration. Slide 3 At the end of this course, you will be able to: Compare and contrast the various approaches to cooling Explain why there is a trend for moving away from room based cooling and moving toward row based cooling, when deploying higher density applications Determine the appropriate cooling solution for your data center Slide 4 The agenda for this course is as follows: First, we will have a brief introduction, and then well have an overview of cooling architectures, including room, row, rack, mixed and hybrids. Well spend the bulk of our time exploring the Challenges and Solutions for Cooling Architectures, including Agility Availability
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Lifecycle Costs Serviceability and Manageability, well wrap up with a brief Summary Please Note! Many of the cooling architecture concepts reviewed in this course have been introduced and discussed in depth in the course; Fundamentals of Cooling Architecture. If you have not already done so, you may wish to participate in Fundamentals of Cooling Architecture prior to taking this course. Slide 5 All of the electrical power delivered to the IT loads in a data center ends up as waste in the form of heat. This heat must be removed to maintain consistent, optimal temperatures in the data center. This heat removal function is critical to maintaining consistent uptime. Virtually all IT equipment is air-cooled, meaning, each piece of IT equipment takes in the surrounding air and ejects waste heat into its exhaust air. Since a data center may contain thousands of IT devices, there may be thousands of hot airflow paths within the data center that together represent the total waste heat output of the data center; waste heat that must be removed. The purpose of the air conditioning system for the data center is to efficiently capture this complex flow of waste heat and eject it from the room. Slide 6 This course will explain and contrast the various approaches to cooling the data center, and while each of the three approaches, rack, row and room, has an appropriate application, it will also demonstrate why there is a trend for moving away from room based cooling, and toward row based cooling when deploying higher density applications.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 7 As we learned in the Fundamentals of Cooling Architectures course, room-based cooling is the historical method for accomplishing data center cooling. The basic principle of this approach is that the air conditioners not only provide raw cooling capacity, but they also serve as a large mixer, constantly stirring and mixing the air in the room to bring it to a homogeneous average temperature, preventing hot-spots from occurring. Slide 8 In the course, Fundamentals of Cooling Architecture, we also explored the features and benefits of row-oriented cooling architecture, which allows one row of racks to run high density applications such as blade servers, while another row satisfies lower power density applications such as communication enclosures. Additionally, redundancy can be targeted at specific rows. A major benefit of row-oriented architecture is that it can be implemented without a raised floor. This is especially useful for high density installations where a raised floor height of one meter or more may be required. Slide 9 The simple and pre-defined layout geometries of row-oriented architecture allow for predictable performance. In addition, these layouts are relatively immune to the affects of room geometry or other room constraints. Slide 10 As we explored in Fundamentals of Cooling Architecture, in rack-oriented architecture, the CRAC units are associated with a rack and are assumed to be dedicated to a rack for design purposes.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The major benefit to deploying a rack-oriented cooling architecture include; allowing the entire rated capacity of the CRAC to be utilized, so that the highest power density (up to 50 kW per rack) can be achieved. The reduction in the airflow path length reduces the CRAC fan power required, increasing efficiency. This is an advantage because in many lightly loaded data centers without a rack-oriented cooling architecture, the CRAC fan power losses alone can exceed the total IT power consumption. Slide 11 The principal drawback of this approach is that it requires a large number of air conditioning devices and associated piping when compared to the other approaches, especially at lower power densities. Slide 12 When a data center must operate with a broad spectrum of power densities, it may be beneficial to deploy a mixed use cooling architecture, whereby room, row and rack architectures are all used together in the same installation. Slide 13 Another effective argument for the deployment of a mixed architecture is for density upgrades within an existing low density room-oriented design. In this case, small groups of racks within an existing data center are outfitted with row or rack-oriented cooling systems.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 14 Lets take a moment to review. Use the drag and drop feature to match the item in column two with its counterpart in column one.
Slide 15 Ducted exhaust air scavenging systems are an example of a hybrid cooling architecture previously discussed in Fundamentals of Cooling Architectures. In this case, the device will capture exhaust air at the rack level and duct it directly back to a room-oriented cooling system. This system has some of the same benefits as a rack-oriented cooling system and can integrate into an existing or planned room-oriented cooling system. Slide 16 To make effective decisions regarding choice of architecture for new data centers or upgrades, its essential to relate the performance characteristics of the architectures to practical issues that face todays data center personnel. A survey of data center operators suggests that these issues can be placed into one of the following categories: Agility System availability Lifecycle costs (TCO) Serviceability Manageability Lets review each of these categories, and focus on how the alternative architectures address key cooling challenges. In our illustrations, please note that the highest priority challenges identified in the survey are listed first under each category. Lets begin by starting with agility.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 17 Data center users have identified agility as one of the critical cooling-related challenges. Agility can be defined as the ability of a system to adapt to change. One agility challenge deals with having a plan for power density that is increasing and unpredictable. The rack-oriented architecture is modular, and deployable at rack level increments which may be targeted at specific densities. Row-oriented architecture is also modular, and deployable at row level increments targeted at specific densities. However, room-oriented architecture is complex to upgrade or adapt, and is typically built out in advance of the requirement. The second agility challenge is aimed at reducing the extensive engineering required for custom installations. Rack-oriented architecture triumphs here in that it is totally immune to room effects, because rack layout may be completely arbitrary. Row-oriented cooling architecture is immune to room effects when rows are laid out according to standard designs, and may be configured with simple tools. Room-oriented cooling architectures, however, require a complex Computational Fluid Dynamics (CFD) analysis for each unique room layout. A third agility challenge deals with adapting to ever-changing business requirements or any range of power densities. Rack cooling does not work well here, in that cooling capacity that is not used cannot be used by other racks. However, in row-oriented architectures, the cooling capacity is well defined and can be shared across a group of racks. This is not so with a room-oriented cooling architecture, as any change may result in over heating. In addition, a complex analysis is required to assure that redundancy and density are achieved.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The fourth agility challenge well explore deals with allowances for cooling capacity to be added to an existing operating space. While new loads may be added that are completely isolated from the existing cooling system, support for these loads is limited by rack cooling capacity. In this case, a better solution is to deploy row-oriented cooling architecture. New loads may be added that are completely isolated from the existing cooling system. As each additional cooling module is added to the row system, it increases the density that the entire row can support. This contrasts sharply with the deployment of a room-oriented cooling architecture whereby a shutdown of the existing cooling system may be required, resulting in the need for extensive engineering work. The last agility challenge well discuss, is the challenge of providing a highly flexible cooling deployment with minimal reconfiguration. A drawback of a rack-oriented cooling architecture in this scenario is that the racks may need to be retrofitted or IT equipment moved to accommodate the new architecture. Row-oriented cooling architecture would require that the rack rows be re-spaced to accommodate the changes or that overheard infrastructure would have to be modified to accommodate the new architecture. In this case, a room-oriented cooling architecture may be the best solution in that floor tiles can be quickly reconfigured to change cooling distribution patterns for power densities of less than 3 kW. Slide 18 A quick review of the chart highlights which architecture is best suited to addressing each agility challenge. Before we review availability challenges lets do a quick review of agility challenges. Slide 19 A survey of data center operators suggests that issues dealing with upgrades can be placed into which of the following categories?
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 20 The first availability challenge is to eliminate hot spots. The rack-oriented architecture closely couples heat removal with the heat generation to eliminate mixing. The airflow is completely contained in the rack. In a row-oriented architecture, the cooling system is also closely coupled with the heat source heat for heat removal and mixing is minimized, but the airflow is not contained to the rack. A room-oriented architecture promotes mixing of supply and return paths, requiring engineered ductwork to separate air streams.
The second challenge of availability is to assure redundancy when required. Rack-oriented architectures will require 2N cooling capacity for each rack, and many rack cooling systems are not redundant capable. A row-oriented cooling architecture utilizes shared N+1 capacity across a common air return. A room-oriented architecture, on the other hand, requires a complex CFD analysis to model failure modes as well as localized redundancy. The third availability challenge is to eliminate vertical temperature gradients at the face of the rack. Oftentimes, the temperature of the air going into the lower parts of the rack are much cooler than the temperature of the air going into the top parts of the rack. Servers in the upper part of the rack could suffer poor performance as a result. Best performance honors are shared by both rack and row cooling architectures in that heat is captured at the rear of the rack before mixing with the cold air supply. This contrasts sharply with a room-oriented cooling system in that warm air may re-circulate to the front of the rack as a result of insufficient heat removal or supply. The next challenge of availability is to minimize the possibility of liquid leaks in the mission critical installation. Both the row and rack architectures operate at warmer return temperatures to reduce the need for moisture production and removal, but rack targeted cooling requires additional piping and thereby increases the potential for leakage points. Room-oriented architectures, promotes the
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
production of condensation through a mixed air return and increases the requirement for humidification. The last availability challenge well explore deals with human error. In this case, both rack and row cooling architectures provide for standardized solutions that are well documented and can be operated by any user. However, a room-oriented system requires a highly trained and specialized operator to run the uniquely engineered system.
Slide 21 A quick review of the chart highlights which architecture is best suited to addressing each availability challenge. Lets move on to Lifecycle Cost Challenges next. Slide 22 The first lifecycle cost challenge deals with optimizing capital investment and available space. The deployment of a rack-oriented architecture requires a dedicated cooling system for each rack, and may result in over sizing and wasted capacity. A much better solution is the row-oriented architecture, where the ability to match the cooling requirements to a much higher percentage of installed capacity is achieved. Deploying a room-oriented architecture will result in system performance that is difficult to predict, resulting in frequent over sizing. The second lifecycle cost challenge focuses on accelerating the speed of deployment. Both the rack and row solutions have pre-engineered systems that eliminate or reduce planning and engineering. However, the room-oriented architecture requires unique engineering that may exceed the organizational demand. Lowering the cost of service contracts is the third area of lifecycle cost challenges. Standardized components which are common to both rack and row oriented architectures, reduce service time
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
and allow users the flexibility of self-service. Rack-oriented systems will likely have a higher number of units with a 1:1 ratio of cooling devices to IT enclosures. Specialized service contracts are required for the room-oriented architectures customized components. Quantifying the ROI (return on investment) for cooling system improvements is best addressed by either the rack or row-oriented architectures, because standardized components are used, and this results in very accurate measurements of the systems performance. Custom engineered solutions deployed in a room-oriented architecture makes system performance difficult to predict.
The last area of lifecycle cost well discuss deals with the challenge of maximizing operational efficiency by matching capacity to load. Rack-oriented cooling systems will likely be oversized and, therefore, will not be able to operate at maximum efficiency. With the row-oriented architecture, the cooling load can match the heat load and, therefore, rightsizing to capacity can be achieved. Room-oriented architecture would not be advised in this case as air delivery dictates oversized capacity, and pressure requirements for under floor delivery are a function of the room size and floor depth. Slide 23 A quick review of the chart highlights which architecture is best suited to addressing each lifecycle cost challenge. Now lets discuss the challenges associated with serviceability. Slide 24 Decreasing Mean-Time-to-Recover is another serviceability challenge that survey respondents identified.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Modular components in both the rack and row-oriented architectures reduce downtime, but a 2N redundancy is required for the rack solution when systems are in need of repair and maintenance. The row-oriented solutions can be deployed in an N+1, or excess capacity scenario, which allows for repair without interruption to system performance. Recovery time challenges are highlighted by the deployment of a room-oriented architecture when custom spare parts are not readily available and require a trained technician, extending recovery time. Simplifying the complexity of the system is readily addressed by both the rack and row architectures which deploy standardized components to reduce the technical expertise required for routine service and maintenance. Conversely, room-oriented systems require trained experts to operate and perform repairs. The third challenge deals with implementing simpler service procedures. Row and rack
architectures allow for in-house staff members to perform routine service procedures, because modular sub-systems have interfaces that make for mistake proof service procedures. Routine service procedures for room-oriented cooling systems require disassembly of unrelated subsystems. Some service items are not easily accessed when the system is installed. Highly experienced personnel are required for many service procedures. The challenge of minimizing vendor interfaces is best addressed by deploying modular units designed to integrate in both the rack and row-oriented architectures with a limited set of ancillary systems. While the room-oriented systems have engineered solutions with multi-vendor subsystems. The final serviceability challenge; learn from past problems and share learning across systems, is best addressed by a rack-oriented cooling architecture because its standardized building block approach with single rack and cooling unit interaction maximizes learning. Row-oriented systems also use a standardized building block approach with low interactions to increase learning, but with
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
fewer systems to learn from. Room-oriented architectures possess unique floor layouts, and all have unique problems, therefore very limited learning will occur. Slide 25 A quick review of the chart highlights which architecture is best suited to addressing each serviceability challenge. Lets review what we have learned aboutserviceability and then discuss the challenges associated with manageability. Slide 26 True or false. The modular rack-oriented architecture is inflexible, time consuming to implement, and performs poorly at higher density but has a cost and simplicity advantage at lower density. Slide 27 The first manageability challenge requires that the system menu be clear and provide ease of navigation. In this instance, rack and row-oriented have basic option configuration that allows the user to navigate through the menu interface quickly. The room-oriented architecture, employs a highly configurable systems interface which complicates the menu structure. As a result, advanced service training is required. Whether or not the system has the capability to provide predictive failure analysis is also a manageability challenge. Rack-oriented cooling architectures have the ability to provide real-time models of current and future performance. Row-oriented systems have the ability to provide near real-time models of current or future performance. In this respect the row-oriented systems have more limited controls than do rack systems. With room-oriented cooling architectures, it is virtually impossible to provide real-time models of current or future performance, due to room-specific considerations.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The ability to provide, aggregate and summarize cooling performance data is the final area of manageability challenges identified. Rack-oriented systems are clearly the best option in this case as the cooling capacity information at the rack level is determined and available in real time. Roworiented systems also provide the ability to capture the cooling capacity information at the row level and can be determined and available in real time, and rack level information can be effectively estimated. Cooling capacity information is not available at the rack or row level when opting to deploy a room-oriented architecture. Slide 28 A quick review of the chart highlights which architecture is best suited to addressing each manageability challenge. Now, let's spend some time discussing what conclusions can be drawn.
Slide 29 A review and analysis of our challenges suggests the following conclusions: The modular rack-oriented architecture is the most flexible, fast to implement, and achieves extreme density, but at an additional expense. Room-oriented architecture is inflexible, time consuming to implement, and performs poorly at higher density but has cost and simplicity advantages at lower density. The modular row-oriented architecture provides many of the flexibility, speed, and density advantages of the rack-oriented approach, but with a cost similar to the room-oriented architecture. There are a number of practical issues that require additional explanation and discussion regarding the architectures, these will be discussed in part two of Advantages of Row and Rack-Oriented Cooling Architectures.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 30 Lets wrap up with a summary of the information we have covered today. The conventional legacy approach to data center cooling using room-oriented architecture has technical and practical limitations. The need of next generation data centers to adapt to changing requirements, to reliably support high and variable power density, and to reduce electrical power consumption and other operating costs have directly led to the development of row and rackoriented cooling architectures. These two architectures are more successful at addressing these needs, particularly at operating densities of 3 kW per rack or greater. The legacy room-oriented approach has served the industry well, and remains an effective and practical alternative for lower density installations and those applications where IT technology changes are minimal. Row and rack-oriented cooling architecture provides the flexibility, predictability, scalability, reduced electrical power consumption, reduced TCO, and optimum availability that nextgenerations data centers require. Users should expect that many new product offerings from suppliers will utilize these approaches. It is expected that many data centers will utilize a mixture of the three cooling architectures. Rackoriented cooling will find application in situations where extreme densities, high granularity of deployment, or unstructured layout are the key drivers. Room-oriented cooling will remain an effective approach for low density applications and applications where change is infrequent. For most users with newer high density server technologies, row-oriented cooling will provide the best balance of high predictability, high power density, and adaptability, at the best overall total cost of ownership. Slide 31 Thank you for participating in this Data Center University TM course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 32 To test your knowledge of the course material click the Knowledge Checkpoint link on your Data Center UniversityTM personal homepage. Important Point! The Knowledge Checkpoint link is located under BROWSE CATALOG on the left side of the page. Slide 33 Here at DCU, we value your opinion! We are dedicated to providing you with relevant, cutting edge education on topics pertinent to data center design, build, and operations, when and where you need it. So, please take our brief survey and tell us how were doing. How do you begin? Its easy! 1) Click on the Home icon, located in the right corner of your screen. 2) Click on the We Value Your Opinion" link on the left side of the screen under Browse DCU Courses. 3) Select the course title you have just completed and take our brief survey.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Power Redundancy Data Center University Course Transcript Slide 1 Welcome to Power Redundancy in the Data Center course by Data Center University. Slide 2 For best viewing results, we recommend that you maximize your browser window now. The screen controls allow you to navigate through the eLearning experience. Using your browser controls may disrupt the normal play of the course. Click the attachments link to download supplemental information for this course. Click the Notes tab to read a transcript of the narration. Slide 3 At the completion of this course, you will be able to: Understand the impact that planning for redundancy has on the availability of a data center or network room Recognize various types of Uninterruptible Power Supplies, including Standby, Line Interactive, Standby-Ferro, Double Conversion On-Line, and Delta Conversion On-Line Determine the benefits, limitations, and common applications for these UPS types Recognize the five UPS System Design Configurations, including Capacity or N System, Isolated Redundant, Parallel Redundant or N+1 System, Distributed Redundant, and System plus System Redundant Understand dual- and single-power path environments, and the impact they have on mission critical applications Comprehend the importance of generators in mission critical applications Slide 4
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
A key element relative to all data centers is the need for power. In most countries, the public power distribution system is fairly reliable. However, studies have shown that even the best utility systems are inadequate to meet the strict operating needs of critical nonstop data processing functions. Most companies, when faced with the likelihood of downtime, and data processing errors caused by faulty utility power choose to implement a back-up strategy for their missioncritical equipment. Slide 5 These strategies may involve the inclusion of additional hardware such as Uninterruptible Power Supplies (or UPSs) and generators, and system designs such as N+1 configurations, and dualcorded equipment. This course will address various strategies to consider when planning for redundancy in the data center. Slide 6 In our rapidly changing global marketplace, the demand for faster, more robust technologies in a smaller footprint is ever-increasing. technologies be highly available as well. Slide 7 Availability is the primary goal of all data centers and networks. Five 9s of availability of a data center is a standard most IT professionals strive to achieve. Availability is the estimated percentage of time that electrical power will be online and functioning properly to support the critical load. It is of critical importance, and is the foundation upon which successful businesses rely. According to the National Archives and Records Administration in Washington, D.C., 93% of businesses that have lost availability in their data center for 10 days or more have filed for bankruptcy within one year. The cost of one episode of downtime can cripple an organization. The availability of the public power distribution, while sufficient for many organizations, is ill-equipped to support mission-critical functions. Therefore, planning for redundancy, or the introduction of In addition, there is a further requirement that these
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
alternate or additional means of support is a necessity. Redundancy can be thought of as a safetynet or Plan B should power utility fail, or be inadequate. One of the ways in which to increase data center power availability is through a UPS. Slide 8 An uninterruptible power supply or UPS, in simplistic terms is a device which provides battery back-up power to IT equipment should utility power be unavailable, or inadequate. UPSs provide power in such a way that the transition from utility power to battery power is seamless and uninterrupted. UPSs can range in size and capacity to provide power to small individual desktop computers, all the way up to large megawatt data centers. Many UPS systems incorporate software management capabilities which allow for data saving and unattended shutdown should the need or application warrants it. Slide 9 There are many different types of UPSs on the market, and choosing the correct one can often be a confusing endeavor. For example, it is a very common belief that there are only two types of UPSs standby and on-line. In reality, there are actually five different UPS topologies, or designs. These are: Standby Line Interactive Standby-Ferro Double Conversion On-Line Delta Conversion On-Line
Understanding how these various UPS designs work is critical to choosing the best UPS for a particular application. The Standby UPS Is the most common design configuration used for personal computers. The operating principal behind the standby UPS is that it contains a transfer switch which by default, uses filtered AC power as the primary power source. When AC power fails, the UPS switches to
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
the battery by way of the transfer switch. The battery-to-AC power converter, also known as the inverter is not always on, hence the name standby. The primary benefits of this type of UPS are high efficiency, small in size and low in cost. Some
models are also able to provide adequate noise filtration and surge suppression. The limitations are that this type of UPS uses its battery during brownouts, which degrades overall battery life. Also, it is an impractical solution over 2kVA. Slide 10 The Standby UPS Is the most common design configuration used for personal computers. The operating principal behind the standby UPS is that it contains a transfer switch which by default, uses filtered AC power as the primary power source. When AC power fails, the UPS switches to the battery by way of the transfer switch. The battery-to-AC power converter, also known as the inverter is not always on, hence the name standby. The primary benefits of this type of UPS are high efficiency, small in size and low in cost. Some
models are also able with proper filter and surge circuitry, to provide adequate noise filtration and surge suppression. The limitations are that this type of UPS uses its battery during brownouts, which degrades overall battery life. Also, it is an impractical solution over 2kVA. Slide 11 The Line Interactive UPS, is the most common design used for small business, Web, and departmental servers. In this design, the battery-to-AC power converter (inverter) is always connected to the output of the UPS. Operating the inverter in reverse during times when the input AC power is normal provides battery charging.
When the input power fails, the transfer switch opens and the power flows from the battery to the UPS output. With the inverter always on and connected to the output, this design provides
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
additional filtering and reduces switching transients when compared with the Standby UPS topology. In addition, the Line Interactive design usually incorporates transformer which adds voltage regulation as the input voltage varies. Voltage regulation is an important feature when variable voltage conditions exist, otherwise the UPS would frequently transfer to battery and then eventually down the load. This more frequent battery usage can cause premature battery failure. The primary benefits of the Line-interactive UPS topology include high efficiency, small size, low cost and high reliability. Additionally, the ability to correct low or high line voltage conditions make this the dominant type of UPS in the 0.5-5kVA power range. The Line-interactive UPS is ideal for rack or distributed servers and/or harsh power environments. Over 5kVA, the use of a line interactive UPS becomes impractical. Slide 12 The Standby-Ferro UPS was once the dominant form of UPS in the 3-15kVA range. This design depends on a special saturating transformer that has three windings (power connections). The primary power path is from AC input, through a transfer switch, through the transformer, and to the output. In the case of a power failure, the transfer switch is opened, and the inverter picks up the output load. In the Standby-Ferro design, the inverter is in the standby mode, and is energized when the input power fails and the transfer switch is opened. The transformer has a special "Ferro-resonant" capability, which provides limited voltage regulation and output waveform "shaping". The isolation from AC power transients provided by the Ferro transformer is as good as or better than any filter available. But the Ferro transformer itself creates severe output voltage distortion and transients, which can be worse than a poor AC connection. Even though it is a standby UPS by design, the
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Standby-Ferro generates a great deal of heat because the Ferro-resonant transformer is inherently inefficient. These transformers are also large relative to regular isolation transformers; so standbyFerro UPS are generally quite large and heavy. Standby-Ferro UPS systems are frequently represented as On-Line units, even though they have a transfer switch, the inverter operates in the standby mode, and they exhibit a transfer characteristic during an AC power failure. The primary benefit of this design are high reliability and excellent line filtering. The limitations include very low efficiency combined with instability when used with some generators and newer power-factor corrected computers, causing the popularity of this design to decrease significantly. Slide 13 The Double Conversion On-Line UPS is the most common type of UPS above 10kVA. The design configuration is the same as the Standby UPS, except that the primary power path is the inverter instead of the AC main. The Double Conversion On-Line UPS converts AC power to DC and then converts the DC back to AC to power the connected equipment. The batteries are directly connected to the DC level. This effectively filters out line noise and all other anomalies from the AC power. Failure of the AC Power does not cause activation of the transfer switch, because the input AC is charging the backup battery source which provides power to the output inverter. Therefore, during an AC power failure, on-line operation results in no transfer time. There are certainly benefits and limitations of this UPS. A benefit is that it provides nearly ideal electrical output performance, with no transfer time. But the constant wear on the power components reduces reliability over other designs. Additionally, both the battery charger and the inverter convert the entire load power flow, resulting in reduced efficiency and increased heat generation. Additionally, the inefficiency of electricity energy consumption is a significant part of the life-cycle cost of the UPS.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 14 The Delta Conversion On-Line design was introduced to eliminate the drawbacks of the Double Conversion On-Line design and is available in sizes ranging from 5kVA to 1.6MW. Similar to the Double Conversion On-Line design, the Delta Conversion On-Line UPS always has the inverter supplying the load voltage. However, in this configuration the primary power source is blended with power from the additional Delta Converter. As the primary power varies away from its normal value the inverter comes to life to make up the difference. The Double Conversion On-Line UPS converts the power to the battery and back again whereas the Delta Converter moves components of the power from input to the output. In the Delta Conversion On-Line design, the Delta Converter acts with dual purposes. The first is to control the input power characteristics. This active front end draws power in a sinusoidal manner, minimizing harmonics reflected onto the utility. This ensures optimal utility and generator system compatibility, reducing heating and system wear in the power distribution system. The second function of the Delta Converter is to control input current in order to regulate charging of the battery system. This input power control makes the Delta Conversion On-Line UPS compatible with generators and reduces the need for wiring and generator over sizing. Delta Conversion On-Line technology is the only core UPS technology today protected by patents and is therefore not likely to be available from a broad range of UPS suppliers. The benefits of Delta Conversion On-Line UPS include high efficiency, excellent voltage regulation, and overall reduction in life-cycle costs of energy in large installations. It is impractical in installations under 5kVA. Slide 15 In addition to the 5 main types of UPSs, there are different approaches to the configurations in which these UPSs are used to achieve appropriate levels of redundancy. These are called UPS System Design Configurations, and there are are 5 main types.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Capacity or N System Isolated Redundant Parallel Redundant or N+1 System Distributed Redundant System plus System Redundant Before we can fully explain these redundancy configurations, we must first talk about the concept of N. Slide 16 Using the letter N is a common nomenclature for describing redundancy of a given system. N can simply be defined as the need of the critical load; the minimum requirement for the system to operate. Slide 17 For example, in considering a RAID (Redundant Array of Independent Disks) system, we can further illustrate the use of N. If 4 disks are needed for storage capacity, and the RAID systems contains 4 disks, this is an N design. On the other hand, if there are 5 disks and only 4 are needed for storage capacity, this is an example of an N+1 design. If 4 disks are required for storage capacity, and there are 2 RAID systems, each with 4 disks, this is considered 2N. Occasionally, N+2 designs are seen, but often, this are happened upon by mistake, rather than a purposeful design. For example, a 6 disk RAID may be utilized with the thought that that much storage capacity would be used, however, the capacity never grows larger than the capacity of 4 disks. In this case, this is actually a N+2 design. N+1, and 2N designs are those that have some form of built in redundancy. Slide 18
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Now lets get back to UPS Redundancy designs. The first is the Capacity or N System. An N system simply stated is a system comprised of a single UPS, or a paralleled set of UPSs whose capacity is equal to the load. This type of system is by far the most common of the configurations in the UPS industry. The small UPS under an office desk protecting one desktop is an N configuration. Likewise, a very large 400 kW computer room is an N configuration whether it has a single 400 kW UPS, or two paralleled 200 kW UPSs. An N configuration can be looked at as the minimum requirement to provide protection for the critical load. The advantages of the N System include: Conceptually simple, and cost effective hardware configuration Optimal efficiency of the UPS, because the UPS is used to full capacity Provides availability over that of the utility power Expandable if the power requirement grows (It is possible to configure multiple units in the same installation. Depending on the vendor or manufacturer, you can have up to 8 UPS modules of the same rating in parallel.) Disadvantages: Limited availability in the event of a UPS module break down, as the load will be transferred to bypass operation, exposing it to unprotected power During maintenance of the UPS, batteries or down-stream equipment, load is exposed to unprotected power (usually takes place at least once a year with a typical duration of 2-4 hours) Lack of redundancy limits the loads protection against UPS failures Many single points of failure, which means the system is only as reliable as its weakest point Slide 19 An isolated redundant configuration is sometimes referred to as an N+1 system, however, it is considerably different from a parallel redundant configuration which is also referred to as N+1. In this configuration, there is a main or primary UPS module that normally feeds the load. The
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
isolation or secondary UPS feeds the static bypass of the main UPS module(s). configuration requires that the primary UPS module have a separate input for the static bypass
This
circuit. This is a way to achieve a level of redundancy for a previously non-redundant configuration without completely replacing the existing UPS. In a normal operating scenario the primary UPS module will be carrying the full critical load, and the isolation module will be completely unloaded. Upon any event where the primary module(s) load is transferred to static bypass, the isolation module would accept the full load of the primary module instantaneously. The isolation module has to be chosen carefully to ensure that it is capable of assuming the load this rapidly. If it is not, it may, itself, transfer to static bypass and thus defeat the additional protection provided by this configuration. The Advantages of this configuration include: Flexible product choice, products can be mixed with any make or model Provides UPS fault tolerance No synchronizing needed Relatively cost effective for a two-module system Disadvantages: Reliance on the proper operation of the primary module's static bypass to receive power from the reserve module Requires that both UPS modules static bypass must operate properly to supply currents in excess of the inverter's capability Complex and costly switchgear and associated controls Higher operating cost due to a 0% load on the secondary UPS, which draws power to keep it running
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
A two module system (one primary, one secondary) requires at least one additional circuit breaker to permit choosing between the utility and the other UPS as the bypass source. This is more complex than a system with a common load bus and further increases the risk of human error. Two or more primary modules need a special circuit to enable selection of the reserve module or the utility as the bypass source (Static Transfer Switch) Single load bus per system, a single point of failure An isolated redundant configuration is sometimes referred to as an N+1 system, however, it is considerably different from a parallel redundant configuration which is also referred to as N+1. In this configuration, there is a main or primary UPS module that normally feeds the load. The isolation or secondary UPS feeds the static bypass of the main UPS module(s). This configuration requires that the primary UPS module have a separate input for the static bypass circuit. This is a way to achieve a level of redundancy for a previously non-redundant configuration without completely replacing the existing UPS. In a normal operating scenario the primary UPS module will be carrying the full critical load, and the isolation module will be completely unloaded. Upon any event where the primary module(s) load is transferred to static bypass, the isolation module would accept the full load of the primary module instantaneously. The isolation module has to be chosen carefully to ensure that it is capable of assuming the load this rapidly. If it is not, it may, itself, transfer to static bypass and thus defeat the additional protection provided by this configuration. The Advantages of this configuration include: Flexible product choice, products can be mixed with any make or model Provides UPS fault tolerance No synchronizing needed Relatively cost effective for a two-module system Disadvantages Reliance on the proper operation of the primary module's static bypass to receive power from the reserve module
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Requires that both UPS modules static bypass must operate properly to supply currents in excess of the inverter's capability Higher operating cost due to a 0% load on the secondary UPS, which draws power to keep it running
A two module system (one primary, one secondary) requires at least one additional circuit breaker to permit choosing between the utility and the other UPS as the bypass source. This is more complex than a system with a common load bus and further increases the risk of human error. Single load bus per system creates a single point of failure Slide 20 Parallel redundant configurations allow for the failure of a single UPS module without requiring that the critical load be transferred to the utility source. A parallel redundant configuration consists of paralleling multiple, same size UPS modules onto a common output bus. The system is N+1 redundant if the spare amount of power is at least equal to the capacity of one system module; the system would be N+2 redundant if the spare power is equal to two system modules; and so on. Parallel redundant systems require UPS modules of the same capacity from the same manufacturer. The UPS module manufacturer also provides the paralleling board for the system. The paralleling board may contain logic that communicates with the individual UPS modules, and the UPS modules will communicate with each other to create an output voltage that is completely synchronized. The number of UPS modules that can be paralleled onto a common bus is different for different UPS manufacturers. The UPS modules in a parallel redundant design share the critical load evenly in normal operating situations. When one of the modules is removed from the parallel bus for service (or if it were to remove itself due to an internal failure), the remaining UPS modules are required to immediately accept the load of the failed UPS module. This capability allows any one module to be removed from the bus and be repaired without requiring the critical load to be connected to straight utility.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Advantages: Higher level of availability than capacity configurations because of the extra capacity that can be utilized if one of the UPS modules breaks down Lower probability of failure compared to isolated redundant because there are less breakers and because modules are online all the time (no step loads)
Expandable if the power requirement grows. It is possible to configure multiple units in the same installation The hardware arrangement is conceptually simple, and cost effective Disadvantages: Both modules must be of the same design, same manufacturer, same rating, same technology and configuration Still single points of failure upstream and downstream of the UPS system The load is exposed to unprotected power during maintenance of the UPS, batteries or downstream equipment, which usually takes place at least once a year with a typical duration of 2-4 hours Lower operating efficiencies because no single unit is being utilized 100% Single load bus per system, a single point of failure Most manufacturers need external static switches in order to load-share equally between the two UPS modules; otherwise they will share within a wide window of 15%. This adds to the cost of the equipment and makes it more complex Most manufacturers need a common external service bypass panel. This adds to the cost of the equipment and makes it more complex Slide 21
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Distributed redundant configurations are commonly used in the market today. This design was developed in the late 1990s in an effort by an engineering firm to provide the capabilities of complete redundancy without the cost associated with achieving it. The basis of this design uses three or more UPS modules with independent input and output feeders. The independent output buses are connected to the critical load via multiple Power Distribution Units and Static Transfer Switches. From the utility service entrance to the UPS, a distributed redundant design and a system plus system design (discussed in the next section) are quite similar. Both provide for concurrent maintenance, and minimize single points of failure. The major difference is in the quantity of UPS modules that are required in order to provide redundant power paths to the critical load, and the organization of the distribution from the UPS to the critical load. As the load requirement, N, grows the savings in quantity of UPS modules also increases. Distributed redundant systems are usually chosen for large complex installations where concurrent maintenance is a requirement and many or most loads are single corded. Savings over 2N also drive this configuration. Advantages: Allows for concurrent maintenance of all components if all loads are dual-corded Cost savings versus a 2(N+1) design due to fewer UPS modules Two separate power paths from any given dual-corded loads perspective provide redundancy from the service entrance UPS modules, switchgear, and other distribution equipment can be maintained without transferring the load to bypass mode, which would expose the load to unconditioned power. Many distributed redundant designs do not have a maintenance bypass circuit. Disadvantages: Relatively high cost solution due to the extensive use of switchgear compared to previous configurations
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Design relies on the proper operation of the STS equipment which represents single points of failure and complex failure modes Complex configuration; In large installations that have many UPS modules and many static transfer switches and PDUs, it can become a management challenge to keep systems evenly loaded and know which systems are feeding which loads. Unexpected operating modes: the system has many operating modes and many possible transitions between them. It is difficult to test all of these modes under anticipated and fault conditions to verify the proper operation of the control strategy and of the fault clearing devices. UPS inefficiencies exist due to less than full load normal operation
Slide 22 System plus System, Multiple Parallel Bus, Double-Ended, 2(N+1), 2N+2, [(N+1) + (N+1)], and 2N are all nomenclatures that refer to variations of this configuration. With this design, it now becomes possible to create UPS systems that may never require the load to be transferred to the utility power source. These systems can be designed to exhaust every conceivable single point of failure. However, the more single points of failure that are eliminated, the more expensive this design will cost to implement. Most large system plus system installations are located in standalone, specially designed buildings. It is not uncommon for the infrastructure support spaces (UPS, battery, cooling, generator, utility, and electrical distribution rooms) to be equal in size to the data center equipment space. This is the most reliable, and most expensive, design in the industry. It can be very simple or very complex depending on the engineers vision and the requirements of the owner. Although a name has been given to this configuration, the details of the design can vary greatly and this, again, is in the vision and knowledge of the design engineer responsible for the job. The 2(N+1) variation of this configuration revolves around the duplication of parallel redundant UPS systems. Optimally these UPS systems would be fed from separate switchboards, and even from separate utility services and possibly separate generator systems. The extreme cost of building this type of facility
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
has been justified by the importance of what is happening within the walls of the data center and the cost of downtime to operations. Many of the worlds largest organizations have chosen this configuration to protect their critical load. The fundamental concept behind this configuration requires that each piece of electrical equipment can fail or be turned off manually without requiring that the critical load be transferred to utility power. Common in 2(N+1) design are bypass circuits that will allow sections of the system to be shut down and bypassed to an alternate source that will maintain the redundant integrity of the installation.
For example if a critical load is 300 kW, the design requires that four 300 kW UPS modules be provided, two each on two separate parallel buses. Each bus feeds the necessary distribution to feed two separate paths directly to the dual-corded loads. The single-corded load, illustrated in Figure 6, shows how a transfer switch can bring redundancy close to the load. However, Tier IV power architectures require that all loads to be dual-corded. Companies that choose system plus system configurations are generally more concerned about high availability then the cost of achieving it. These companies also have a large percentage of dual-corded loads. Advantages: Two separate power paths allows for no single points of failure; Very fault tolerant The configuration offers complete redundancy from the service entrance all the way to the critical loads In 2(N+1) designs, UPS redundancy still exists, even during concurrent maintenance UPS modules, switchgear, and other distribution equipment can be maintained without transferring the load to bypass mode, which would expose the load to unconditioned power
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Easier to keep systems evenly loaded and know which systems are feeding which loads. Disadvantages: Highest cost solution due to the amount of redundant components UPS inefficiencies exist due to less than full load normal operation Typical buildings are not well suited for large highly available system plus system installations that require compartmentalizing of redundant components Slide 23 So how does a company choose the correct redundancy configuration? There are many factors to consider. The first is the Cost / Impact of downtime How much money is flowing through the company every minute, how long will it take to recover systems after a failure? If the answer to this question is $10,000,000 / minute versus $1,000,000 / hour the degree of redundancy needed will be different. The second factor is Risk Tolerance. Companies that have not experienced a major failure are typically more risk tolerant than companies that have not. Smart companies will learn from what companies in their industry are doing. This is called Benchmarking and it can be done in many ways. The more risk intolerant a company is, the more internal drive their will be to have more reliable operations, and disaster recovery capabilities. Other factors to consider are availability requirements How much downtime can the company withstand in a typical year? If the answer is none, then a high availability design should be in the budget. However, if the business can shut down every night after 10 PM, and on most weekends, then the UPS configuration wouldnt need to go far beyond a parallel redundant design. Every UPS will, at some point, need maintenance, and UPS systems do fail periodically, and somewhat unpredictably. The less time that can be found in a yearly schedule to allow for maintenance the more a system needs the elements of a redundant design.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Types of loads (single vs. dual-corded) are another area to examine. Dual-corded loads provide a real opportunity for a design to leverage a redundant capability, but the system plus system design concept was created before dual-corded equipment existed. The computer manufacturing industry was definitely listening to their clients when they started making dual-corded loads. The nature of loads within the data center will help guide a design effort, but are much less a driving force than the issues stated above. Finally, Budget. The cost of implementing a 2(N+1) design is significantly more, in every respect, than a capacity design, a parallel redundant design, or even a distributed redundant. As an example of the cost difference in a large data center, a 2(N+1) design may require thirty 800 kW modules (five modules per parallel bus; six parallel busses). A distributed redundant design for this same facility requires only eighteen 800 kW modules, a huge cost savings.
Slide 24 Data centers and network rooms are routinely oversized to three times their required capacity, therefore, it is extremely important that UPSs are sized correctly. Over sizing of UPSs is an extremely common practice, but the costs of doing so are often unknown. The utilization of the physical and power infrastructure in a data center or network room is typically much less than 50%. Over sizing drives excessive capital and maintenance expenses, which are a substantial fraction of the overall lifecycle cost. Slide 25 In a recent industry study, it was found that the power and cooling systems in a typical 100kW data center have a capital cost on the order of $500,000 or $5 per Watt. This analysis indicates that on the order of 70% or $350,000 of this investment is wasted. In the early years, this waste is even greater. When the time-cost of money is figured in, the typical loss due to over sizing nearly equals 100% of the entire capital cost of the data center! That is, the interest alone on the original capital is almost capable of paying for the actual capital requirement.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Most of this excess cost can be recovered by implementing a method and architecture that can adapt to changing requirements in a cost-effective manner while at the same time providing high availability. Slide 26 Most high availability data centers use a power system providing dual power paths or 2N all the way to the critical loads. This means that they have redundant power supplies and power cords to maintain the dual power paths all the way to the IT equipment internal power bus. In this way the equipment can continue to operate with a failure at any point in either power path. Slide 27 However, equipment with a single power path (single-corded) introduces a weakness into an otherwise highly available data center. Transfer switches are often used to enhance single-corded equipment availability by providing the benefits of redundant utility paths. A transfer switch is a common component in data centers and is used to perform the following functions: 1. Switching UPS and other loads from utility to generator during a utility power failure 2. Switching from a failed UPS module to utility or another UPS (depending on designs) 3. Switching critical IT loads from one UPS output bus to another in a dual path power system Slide 28 There are three fundamental approaches to powering single-corded equipment in a dual path environment. They are: Power equipment from one feed Use a transfer switch at the point of use to select a preferred source, and when that source fails switch to the second power path Use a large centralized transfer switch fed from the two sources, to generate a new power bus to supply a large group of single corded loads
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 29 One of the most efficient ways to provide redundant powering is with a generator. Standby power generation is a key component of a high availability power system for data centers and network rooms. Information technology systems may operate for minutes or even a few hours on battery, but local power generation capability is key to achieving high availability. In locations with poor utility power, local power generation may be needed to achieve even a minimal requirement of 99.9% availability. Generator systems with diesel or natural gas engines are, in most cases, the solution for standby power generation. A generator system includes not only the standby generator, but also the automatic transfer switch (ATS), the output distribution, and the communication or management system. The ATS is fed by two sources, the utility and the generator, with the utility the preferred source.
When the preferred source is unacceptable, the ATS automatically switches to the generator. Standby generator systems are often used in conjunction with UPS systems. Slide 30 Planning for redundancy has a significant impact on the availability of a data center or network room Various types of Uninterruptible Power Supplies exist, including Standby, Line Interactive, StandbyFerro, Double Conversion On-Line, and Delta Conversion On-Line Determining the benefits, limitations, and best applications for these UPS types helps to ensure a more available network There are five UPS System Design Configurations, including Capacity or N System, Isolated Redundant, Parallel Redundant or N+1 System, Distributed Redundant, and System plus System Redundant
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Dual- and single-power path environments play an important role in the availability of mission critical applications Generators are an important safeguard when trying to increase runtime in mission critical applications Slide 31 Thank you for participating in this Data Center University course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 4 Power distribution is the key to maintaining availability in the data center. Many instances of equipment failure, downtime, software and data corruption, are the result of a failure to provide adequate power distribution. Sensitive components require consistent power distribution as well as power that is free of interruption or distortion.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The consequences of large-scale power incidents are well documented. Across all business sectors, an estimated $104 billion to $164 billion per year are lost due to power disruptions, with another $15 billion to $24 billion per year in losses attributed to secondary power quality problems. It is imperative that critical components within the data center have an adequate and steady supply of power. Slide 5 It is important to provide a separate, dedicated power source and power infrastructure for the data center. The building in which a data center is located could have a mixture of power requirements, such as air conditioners, elevators, office equipment, desktop computers, and kitchen area microwaves and refrigerators. If the data center shares a common power source with the rest of the building, and power consumption is at a high level, it could impact the data centers air handlers, for example, and greatly increase the risk of unanticipated downtime. This course will explore the topic of power distribution within the data center. Lets begin with a review of how power is transmitted to the data center. Slide 6 This is a diagram that shows the transmission of power from the utility to a building which houses a data center. Voltage is transformed many times before it reaches the user. Voltage is either stepped up or stepped down by a series of transformers. The power generation facility at the utility generates three phases. Thus, three wires are used to transmit power. Generating and distributing 3-phase power is more economical than distributing single phase power. Single phase power only has one hot wire.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 7 Since the size of the wire directly affects the amount of current that can pass, it also determines the amount of power that can be delivered. If power were distributed only as a single phase, huge, heavy transmission lines would be needed and it would be nearly impossible to suspend them from a pole. It is much more economical to distribute AC power using three wires. Now that weve reviewed the basic concepts of power transmission, lets move on to nominal versus normal voltage. Slide 8 As power is distributed across long distances over power lines, losses in voltage caused by resistance and inductive losses can occur as the power works its way through various transformers. The voltage received, therefore, can vary depending upon the consumers position along the power line and depending upon the total load that the line is expected to supply. By the time source power reaches a the computer installation site, it can suffer voltage losses of up to 11%, even under optimal conditions. Slide 9 Nominal voltage is the voltage that the power company guarantees. Normal voltage is what is typically supplied at the site due to distribution losses. The two often represent different voltages. If the voltage coming into the data center is either too high or too low, it can impact equipment by causing it to run hot. This is corrected with the utilization of a transformer. Now lets explore transformers. Slide 10
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Transformers are essential to transmit and distribute power. They must transform 3-phase voltage and provide a mechanism to break out single phase power from 3-phase power. To achieve both of these tasks, different types of transformers are used. Stepping up or down 3-phase power requires what is called a Delta transformer. It is called Delta because its circuit diagram looks like the Greek letter Delta. Slide 11 Transformers are built by taking two wires and wrapping them around an iron core. Iron is used due to its magnetic qualities. AC power is supplied to the first wire, called the primary coil.
As the current flows through the primary coil, it induces current in the second wire (called the secondary coil). This phenomenon is called the law of induction. The strength of the induced current depends upon the number of times the second wire is wrapped around the iron core. By adjusting the number of turns on the secondary coil, the transformers output current and voltage can be determined. Slide 12 A step down transformer will take the voltage coming into the transformer and produce an output of decreased voltage. For example, 600 volts come in and 480 volts go out. The transformation takes place without any electrical connection between the input and output. Slide 13 The step up transformer works in a similar manner. The only difference is that the primary coil has fewer turns or windings than the secondary coil. In the case of the step up transformer, the voltage coming into the transformer is less than the voltage going out of the transformer. Slide 4
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Another type of transformer is a Wye transformer. This transformer also gets its name from a Greek letter because its circuit diagram looks like the Greek letter Wye. The Wye transformer is different from the Delta transformer because it outputs not just threephases but also a neutral wire. Wye to Wye transformers are not as common as Delta to Delta transformers, but can sometimes be found to support distribution in cases where the utility is not the primary power source. An example would be the upstream of a UPS and downstream of a generator. Slide 15 To break out single phase from a 3-phase source a Delta to Wye transformer is required. A Delta to Wye transformer takes in three phases and a ground and it outputs three phases and a neutral. Slide 16 A transformer that contains an equal number of turns or windings in both the Primary and Secondary coils is called an isolation transformer. The voltage coming into the transformer is equal to the voltage coming out of the transformer. Remember that within the transformer, the law of induction dictates that transformation takes place without any electrical connection between the input and output. The benefit of the isolation transformer is that it filters out electrical spikes on the input, thereby providing better power quality on the output. Now that weve covered transformers, lets discuss the service entrance. Slide 17 A continuous connection exists between the power generation station and the wiring inside a building which houses a data center. The point where the responsibility of the electrical infrastructure shifts from the utility to the owner or tenants of the building is called the service entrance. Power meters are typically placed at the service entrance to accurately track the power
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
usage. A good way to find the service entrance is to look for the building main earth ground. This is the ground for the entire building electrical system. It is typically a steel rod driven into the building foundation or earth around the foundation. Slide 18 A service transformer will sit just outside of the service entrance. This transformer will vary from building site to building site. For most businesses, the service transformer will be a 480 Volt delta transformer. Beyond the Main Service Entrance, the power is distributed within the facility. Power distribution within the facility can be broken down into six areas: Main electrical service panel, transformers, feeders, subpanels, branch circuits and receptacles. Lets explore each of these items in more detail, beginning with the main electrical service panel. Slide 19 The service transformer is wired directly to the main Electrical Service Panel. This panel has several key components, so lets take a look at a typical diagram of the main electrical service panel, and examine each of these components. The first component is the neutral bus. The neutral bus is a bar to which all the neutral wires are connected. This is done to keep all of the neutral wires referencing the same voltage. Slide 20 The next key component is the neutral to ground bond. This bond connects the neutral bus to the electrical ground of the building. If this bond is not made, neutral problems, such as high impedance can occur. Impedance is when the flow of alternating current (AC) encounters opposition in an electrical circuit.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 21 The earth ground connection is the next component well examine. This connection acts as the ground reference for the entire electrical infrastructure. It is made by driving a grounding electrode into the earth. This electrode is then connected to the to the Main Service Panel via the neutral bus which is bonded to the ground bus using the neutral to ground bond. Slide 22 The final component well explore, deals with the service transformer. The ground of the service transformer is also connected to the neutral bus to keep the ground reference consistent between the incoming power and the distributed power. Slide 23 Within the facility, transformers are used to provide either Delta or Wye power for either isolation, stepping up or down voltage or to break out a single phase from a 3-phase source. They are also useful in breaking down a facilitys power requirements into zones. Each zone can be provided with a dedicated transformer with a specific VA rating. Typical ratings range from 30 kVA to 225 kVA. Transformers are ideal for this partitioning effect because they isolate loads from the Main Service Panel. Thus, power problems such as harmonics and overloaded neutrals can be isolated from the main electrical service. However, whenever a transformer of 1000 VA or larger is used within the facility, the secondary winding must be grounded to building steel. In this case the transformer is considered a separately derived power source and must be grounded as such. Slide 24 Subpanels are metal boxes that contain all the buses and breakers for distribution to receptacles and loads. They are sized by the number of circuit breakers and bus configurations.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Typical subpanels include 240/120V single phase with three wires and 208/120V 3-phase with four wires. Subpanels are constructed and configured to ensure that all phases are equally loaded.
Slide 25 Feeders are the conductors and conduits that run between main service panels, transformers and subpanels. They are wired according to National Electric Code (NEC). Feeders are subject to very strict voltage drop parameters. Only 2% of the voltage available at the main service panel can be lost over the entire length of the feeder circuit. Slide 26 Branch circuits connect the load to the final over current protection device. In most cases, the final over current protection is a sub-panel with circuit breakers. Branch circuits consist of conductors and conduit. The size of the conductor cables in both the feeder and branch circuits are outlined in National Electric Code (NEC) article 310. Slide 27 Dedicated branch circuits are usually needed for sensitive equipment such as computers and medical instrumentation, unless power conditioning equipment is employed for that equipment. A dedicated branch circuit is one that has all three wires (hot, neutral and ground) isolated from all other equipment outlets so that noisy appliances or equipment on nearby general-purpose branches will not interfere with the sensitive equipment. Slide 28 The degree to which an electrician can achieve a noise-free circuit for sensitive equipment is usually dependent upon a number of factors:
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The quality of power delivered by the utility The age and design of the building The integrity of the grounding system throughout the building The amount of electrical noise generated within the building The degree to which electrical loads are balanced throughout the building
In some cases, it is necessary to isolate the circuit all the way back to the main distribution panel at the service entrance. Slide 29 Receptacles are the final piece to the distribution puzzle. The receptacles allow loads to be attached to the electrical distribution using a cord and plug. They come in many sizes and shapes. This is due to the wide range of power requirements by all electrical loads in existence today. (For more information on plugs, please refer to the Data Center University Course entitled Fundamentals of Power.) Now that weve addressed each of the components in the service entrance, lets move on to the different methods of power distribution, beginning with direct connect. Slide 30 Many factors come into play when deciding on a power distribution layout from the PDUs to the racks. The size of the data center, the nature of the equipment being installed and budget are all variables. However, be aware that two approaches are commonly utilized for power distribution in the data center. One approach is to run conduits from large wall mounted or floor mounted PDUs to each cabinet location. This works moderately well for a small server environment with a limited number of
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
conduits. This doesnt work well for larger data centers when cabinet locations require multiple power receptacles. Slide 31 Running each electrical conduit directly from the source power panel in more or less a straight line to a destination cabinet requires rivers of conduits to cross over one another. Over time, both power and data cables could become quite congested under the floor. This is problematic when it comes time to relocate whips or perform electrical work in the data center. It is also problematic for maintaining unobstructed air distribution to the servers, if a raised floor is used as a plenum for cooling. Slide 32 Another option for power distribution to the racks is to install the PDU in the rack unit itself. In this case the distribution is as close as possible to the load, fewer feet of cable are required and the solution is completely mobile. In both the direct connect and distributed power schemes, significant amounts of cabling would have to be removed from the raised floor in order to move the PDU to a new location. Slide 33 Data center power distribution systems have evolved in response to the needs of the modern data center. Improvements to power distribution systems have been introduced over time. Today, an updated power distribution system could have several enhanced features, most notably: Branch circuit power metering Overhead cable tray with flexible power cords Overhead fixed busway with removable power taps High power, pluggable rack power distribution units Transformerless Power Distribution Units, and Power capacity management software
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 34 Here we see a diagram of a data center employing one example of a modern power distribution system. In this example, 480 volts of power comes from the panel to the UPS and then goes through an attached distribution panel to the IT load. The power is stepped down to 208 volts via a transformer built into the rack housing the PDU. A series of branch circuits bring the power to the servers located in the associated IT enclosures.
Slide 35 Here is another example of a similar power distribution system that distributes to IT rows using one or more overhead busways. The busways are installed up front and traverse the entire planned IT rack layout. When a group of racks is to be installed, a low-footprint modular PDU is installed at the same time and plugged into the overhead busway. The connection to the busway is also shown in here. Instead of traditional circuit breaker panels with raw wire terminations, the modular PDU has a backplane into which pre-terminated shocksafe circuit breaker modules are installed. This arrangement allows the face of the PDU to be much more narrow, and eliminates on-site termination of wires. The modular PDU initially has no branch circuit modules installed. The power circuits from the modular PDU to the IT racks are flexible cable that are plugged into the front of the modular PDU on site to meet the requirements of each specific rack as needed. The branch circuit cables to the IT enclosures are pre-terminated with breaker modules that plug into the shock-safe backplane of the modular PDU. Slide 36 For equipment that requires a dedicated branch circuit, such as most blade servers, a single cable from the PDU carries one, two, or three branch circuits that plug directly into the blade server, with
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
no additional rack PDU (i.e. power strip) required. For mixed equipment in the rack, an assortment of rack PDUs are available that provide various receptacles and current ratings and may be interchanged. In this system, a PDU for a new row of IT enclosures, along with all of the associated branch circuit wiring and rack outlet strips, can be installed in an hour, without any wire cutting or terminations. Options also exist for the deployment of transformerless, rack-based distribution units. An example of such a deployment would include a 415 volt line to line UPS that directly feeds the transformerless PDUs that distribute to the racks. In the case of North America a 480 volt to 415 volt step down transformer could be installed upstream of the UPS. Slide 37 The standard power distribution system for a typical data center in North America is a 277/480V 3phase power system supplying distributed Power Distribution Units (PDUs) which convert the voltage to the 208V and 120V single-phase branch circuits utilized by IT equipment. This arrangement is represented by the one-line diagram shown here. Other parts of the world typically receive 400V from the utility and convert it to 220V at the service entrance. Slide 38 Lets conclude with a brief summary. It is imperative that critical components with the data center have an adequate and steady supply of power. The delivery of power is is the key to maintaining availability in the data center. Avoiding instances of equipment failure, downtime, software and data corruption lies the management of power distribution.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
As power is distributed across long distances, over power lines, losses in voltage caused by resistance and inductive losses can occur as the power works its way through various transformers. Voltage is either stepped up or stepped down by a series of these transformers. Transformers are essential to transmit and distribute power, because if the voltage coming into the data center is either too high or too low, it can impact the equipment by causing it to run hot. There are a wide range of receptacles used throughout the world today are due to the wide range of power requirements by all electrical loads currently in existence. The degree to which an electrician can achieve a noise-free circuit for sensitive equipment is dependent on a number of factors, including: quality of power; building age/design; grounding system integrity; electrical noise amounts; and the degree of balanced electrical loads. Distributed power designs are emerging as the preferred configuration for larger server environments, because they are easier to manage, less expensive to install, and more resistant to a physical accident than a direct connection power distribution. This ends Power Distribution Part I. Part II will explore the issue of power distribution in new high density data center environments. Slide 39 Thank you for participating in this Data Center UniversityTM course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
For best viewing results, we recommend that you maximize your browser window now. The screen controls allow you to navigate through the eLearning experience. Using your browser controls may disrupt the normal play of the course. Click the attachments link to download supplemental information for this course. Click the Notes tab to read a transcript of the narration.
Slide 3 At the completion of this course, you will be able to:
Identify physical infrastructure challenges for incident, availability, capacity, and change management Summarize physical infrastructure management strategies for Enterprise Management Systems (EMS) and Building Management Systems (BMS), you will be able to Recognize physical infrastructure management standards, and you will be able to Provide examples of physical infrastructure management solutions
Slide 4 The key to managing physical infrastructure is to employ the same strategies used in the management of servers, storage, switches, and printers. The core issues of maintaining system availability as well as managing problems and change are similar, although each device may have specific problems based on its unique characteristics. Essential categories of management for physical infrastructure include Incident Management, Change Management, Capacity Management, and Availability Management. Implementing the strategies, suggested in this course, will contribute to a successful application of the ITIL (Information Technology Infrastructure Library) framework to all aspects of data center operations.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The purpose of this course is to demonstrate a systematic approach to identifying, classifying, and solving the management challenges of next-generation data centers. Slide 5 Physical infrastructure is the foundation upon which Information Technology (IT) and telecommunication networks reside. Physical infrastructure includes:
Power Cooling Racks and physical structure Cabling Physical security and fire protection Management systems Services
For more information about physical infrastructure, please participate in the DCU course: An Overview of Physical Infrastructure Slide 6 Any discussion of management issues must first define what is meant by management. The topic of management is a broad one, which is easy to get lost in without a logical framework for discussing it. The Information Technology Infrastructure Library (ITIL) is one such framework that many customers and equipment suppliers have found helpful in understanding the various aspects of management. Slide 7 ITIL is a set of guidebooks defining models for planning, delivery, and management of IT services, created by the British Standards Institute and owned by the UK Office of Government Commerce. ITIL is not a standard but a framework whose purpose is to provide IT organizations with tools, techniques, and best practices that help them align their IT services with their business objectives. IT organizations typically select and implement the pieces that are most relevant to solving their business problems. The categories and guidelines defined by ITIL can be extremely helpful in determining and
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
achieving IT service management objectives, and many IT vendors such as HP, IBM, and Microsoft have used ITIL as a model for their operations framework. Slide 8 ITILs Service Support and Service Delivery models each include several processes, and although they are often depicted in introductory presentations as having a simple organization with just a few connections (as seen here in this image), when one reads the ITIL documentation in detail, it becomes clear that all the processes are interconnected via a myriad of process flows. Slide 9 Although the ITIL processes are all related in one way or another, it is not necessary to analyze the entire spectrum of processes and flows. Identifying which ones are critical and relevant to managing physical infrastructure is a helpful aid in achieving success in the Zero Layer of the data center hierarchy. ITIL is a wide-encompassing framework, and a complete explanation of it is outside the scope of this course. Throughout this course, we will identify the most critical management processes as defined by ITIL for management of physical infrastructure, and outline key problems as well as requirements for effective physical infrastructure management in each area. (You are encouraged to visit www.itil.co.uk for further information on ITIL itself.) Slide 10 Although most ITIL methodologies contain useful suggestions and describe various connected processes, the most important ones to consider when managing physical infrastructure are outlined in here. The remainder of this course addresses the key management challenges that each of these processes presents. Slide 11 Using the ITIL process model, the challenges and underlying problems in the physical infrastructure layer are presented in four charts corresponding to the four key ITIL management processes. Lets spend some time discussing each of these challenges. Slide 12
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Our first challenge is incident management. This process is concerned with returning to the normal service level (as defined in a negotiated Service Level Agreement or SLA between the IT group and internal business process owner) as soon as possible, with the smallest possible impact on the business activity of the organization and user. Physical infrastructure, like any other IT equipment, should be monitored, with events fed into an incident management process, either via a physical infrastructure incident management system or a generalpurpose incident management tool such as a network management or building management system. Slide 13 Specific Incident Management challenges include identifying the problem location, identifying the resolution owner, prioritizing incident urgency, as well as executing proper corrective action. Lets begin with the challenge of identify problem location. Because physical infrastructure includes diverseyet interconnected components, troubleshooting is complicated. A system-level view that indicates the relationships between interconnected components and that identifies the impact of individual component problems can help identify the problem location. Slide 14 Once the problem has been identified, the next challenge is to identify the resolution owner. Responsibility for physical infrastructure availability is often shared, potentially leading to redundant and conflicting efforts to resolve incidents. Different people can be responsible for different locations at different times of the day or week. A solution is to establish a management system that provides the ability to set and assign owner roles. Slide 15 Managers must also prioritize incident urgency. A solution is to have a management tool that alerts the user to the impact, urgency, and priority of individual events that threaten system availability.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The final challenge of incident management is to execute proper corrective actions. Because it can be difficult for one person to have all the expertise necessary to troubleshoot all issues, a system that provides recommended actions and guidance can help ensure the proper corrective action is executed. Slide 16 The next challenge deals with availability management. Availability management is concerned with systematically identifying availability and reliability requirements against actual performance, and when necessary, introducing improvements to allow the organization to achieve and sustain optimum quality IT services at a justifiable cost. Once physical infrastructure requirements have been established, service levels must be monitored, with particular care given to understanding the potential downtime that can result from individual components failing and their impact on the entire system Slide 17 Specific Availability Management challenges include: availability metrics reporting; advance warning for failure; planned downtime; and infrastructure improvement. Lets explore each one, beginning with availability metrics reporting. Availability metrics are necessary in order to track achievement against service levels agreed upon between IT and the internal business customer. A solution is to provide a tool that reports uptime and downtime, physical infrastructure versus non-physical infrastructure downtime summaries, causes of downtime, incident timestamp and duration, as well as time to recovery. Slide 18 The next availability management challenge is advance warning for failure. Easily correctable problems with physical infrastructure often go unnoticed until a failure occurs. A solution is to use a system that does not require training or expert knowledge, which provides alerting and global thresholds for UPS runtime, power distribution unit load by phase, battery health, as well as rack temperature and humidity. Slide 19
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Moving on we find the third availability management challenge of planned downtime. Planned downtime is necessary for many data centers, but tools that do not take into account for planned downtime can have two negative effects. The first being; false alerts leading to incorrect actions by personnel; followed by maintenance modes left uncorrected after maintenance is complete, such as a UPS left in bypass, or a cooling unit offline. A solution is to provide a system that allows scheduled maintenance windows, both suppressing alerts during the window and alerting the user to any maintenance conditions left uncorrected after the window has closed. Slide 20 Our final availability management challenge deals with infrastructure improvement. Expertise is often lacking to improve infrastructure availability. A solution is to use management tools that provide a risk assessment summary to identify potential areas for improvement, such as insufficient runtime, options for adding cooling redundancy, moving loads to a different phase, and moving IT equipment to different racks. Now lets move on and discuss capacity management. Slide 21 The third obstacle data center managers face is the challenge of capacity management. This process is concerned with providing the required IT resources at the right time, at the right cost, aligned with the current and future requirements of the internal customer. Power, cooling, rack space, and cabling are all IT resources that require capacity management. Product architectures that allow incremental purchases of these resources on short time frames are preferable to legacy architectures that require specifying, engineering, purchasing, and installing over yearlong timeframes, especially with regards to Total Cost of Ownership (TCO) considerations. Slide 22 Specific Capacity Management challenges include: Monitoring and recording data center equipment and infrastructure changes; providing physical infrastructure capacity; optimizing the physical layout of existing
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
and new equipment; as well as incrementally scaling the data center infrastructure. Lets discuss each one, beginning with the challenge to monitor and record data center equipment and infrastructure changes. As additional equipment is added to the data center over time, existing power and cooling capacity may be inadvertently exceeded, resulting in downtime. UPS batteries age and may need servicing. A solution is to have a system that monitors current draw for each branch circuit or rack and alerts the appropriate person to potential overload situations. The system could also monitor UPS runtimes and load thresholds. Slide 23 Another capacity management challenge data center managers encounter is the challenge to provide physical infrastructure capacity. Because IT refreshes tend to be dynamic and are difficult to predict, physical infrastructure capacity requirements often go unnoticed until it is too late. Physical infrastructure capacity can be managed with a system that provides trending analysis and threshold violation information on UPS load, runtime, power distribution, cooling, rack space utilization, and patch panel port availability. Such a system can ensure adequate advance notice and information necessary for procurement and deployment of additional capacity. Slide 24 Optimizing the physical layout of existing and new equipment is another capacity challenge. Sometimes, when a data center is updated, the new configuration is not efficient. A poorly configured data center may use more space and cost more to operate than is necessary. A management tool can optimize updates and reconfiguration by analyzing placement and layout of new IT equipment, to meet power, rack space, cooling, and cabling needs. And finally, data center managers need to solve the challenge to scale data center infrastructure incrementally. As infrastructure is added to the data center, it can be difficult to reconfigure tools to monitor the new objects. To avoid these problems, use tools that leverage existing IT infrastructure investment and monitor additional new physical infrastructure devices in an economical, simple and quick way.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Now lets discuss change management challenges. Slide 25 Change management is the process concerned with methods and procedures for making changes to infrastructure with the lowest possible impact on service quality, and is increasingly critical for optimizing business agility. Maximizing the ratio of planned to unplanned work in a data center requires a formalized change management processes for all aspects of operation. Changes such as relocating a server, rewiring a patch panel, or moving equipment from a warmer area of the data center to a cooler area are examples of changes requiring preparation, planning, simulation, and an audit trail. Slide 26 Specific Change Management challenges include: executing MACsalso known as moves, adds, and changesof IT equipment without impacting availability; implementing firmware changes in individual physical infrastructure components; maintaining all physical infrastructure components, and maintaining spares at compatible firmware revision levels. Lets start with a discussion of MACs. Moving servers can cause problems for power, cooling, rack space allocation, etc. These problems can be avoided by using a physical infrastructure management tool that can recommend workflows for planning, executing, and tracking changes. Implementing firmware changes in individual physical infrastructure components is our next challenge. Choices need to be made regarding when to perform physical infrastructure firmware upgrades. Some businesses opt for scheduling upgrades during off hours and weekends. This approach, however, can tax the personnel who need to work overtime hours and opens the door for more possible human error. Other businesses opt for scheduling upgrades during normal operating hours. If the data center is not properly equipped to operate physical infrastructure equipment in bypass mode, the risk of downtime during peak data center demand could be increased. If, however, the data center is equipped with modern physical infrastructure equipment and management support systems, that risk is greatly diminished.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 27 Maintain all physical infrastructure components is the third change management challenge well explore. Firmware upgrades are increasingly complex to manage. Using a system that notifies the administrator whenever new bug fixes or feature enhancements to firmware are available can provide mass remote upgrade capabilities. Maintain spares at compatible firmware revision levels is the last change management challenge well discuss. When spares are swapped into a modular architecture they may not be at a supported firmware revision or combination, causing downtime. Resolve this challenge by using a physical infrastructure management solution that ensures that spares match production equipment. Now lets put all the pieces together and develop a physical infrastructure management strategy. Slide 28 Although many organizations implement aspects of all the processes outlined in this course, most will develop their management strategy in the following order:
Implement an incident management system Set and measure availability targets Monitor and plan for long term changes in capacity Then, get change management processes in place
Organizations typically focus on fully implementing each management process for three to six months before moving on to the next one. Slide 29 Computing and networking systems that require high availability need a physical infrastructure that is capable of supporting them. A physical infrastructure management strategy needs to consider:
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Systematic identification of factors impacting availability Easier management Lower downtime risk, and Increased IT personnel productivity
Slide 31 Physical infrastructure management ties traditional facility responsibilities and IT department responsibilities together within an organization. Either the facility or IT may be responsible for physical infrastructure, or both departments may share the responsibility together. This convergence of responsibilities creates new questions and conflicts for managing physical infrastructure. Slide 32 As mentioned previously, physical infrastructure is the foundation upon which Information Technology and Telecommunication Networks reside. This diagram shows how physical infrastructure supports Technology, Process, and People to provide a Highly Available Network. Slide 33 At first glance, IT management systems for power, cooling, racks, security, fire protection, etc. seem similar to the operations of building management systems. Almost all buildings have infrastructure in place for power, air conditioning, environmental monitoring, fire protection, and security. What makes physical infrastructure different than traditional building management systems is that physical infrastructure focuses on the availability of computing resources. The primary focus of building management systems is the comfort and safety of the people in the building. The needs of physical infrastructure and building management systems are therefore quite different.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 34 Many IT departments have installed specific Enterprise Management Systems (EMS), such as HPs OpenView or IBMs Tivoli . They may also have specific device element managers for their servers, storage, and networking equipment. Facilities departments frequently use a Building Management Systems (BMS), such as Schneider Electric TAC . An EMS handles device centric information, based on individual network IP addresses. EMS information may be the status of a single server, a networking device, or a storage device and is communicated over an existing IT network. Slide 35 A BMS handles data point centric information. BMS information does not monitor the condition of a device, itself, but rather monitors the information that a device reports. For example, if the BMS device is a temperature sensor, the BMS does not monitor the how well the sensor, itself, is doing, but rather monitors the temperature that the sensor reports. A BMS typically uses its own serial-based network, using either proprietary communication protocols, or some level of standard protocols, such as MODBUS. Slide 36 It is likely that both IT and facility departments will want to continue to use the management systems that they each already understand and operate. Because Physical Infrastructure management ties traditional facility responsibilities and IT department responsibilities together within an organization, any Physical Infrastructure management solution must be able to support both EMS and BMS architectures. It is difficult to integrate these two management architectures; however, any management strategy must be able to provide device-level summary information for the IT package while at the same time provide a level of data point granularity to enable integration with the facility package. Slide 37 Physical infrastructure management requires more data than what has been traditionally monitored. A comprehensive strategy should incorporate information at the rack level in order to ensure reliable operation of the IT equipment. Until recently, this was not feasible. All key devices and data points need to be
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
monitored. These include all the devices in the physical infrastructure layer and the surrounding environment. Best practices dictate that the following list of devices be monitored at the rack level:
A minimum of two temperature data points Individual branch circuits Transfer switches Cooling devices, and UPS systems
Slide 38 Monitoring devices, such as rack based transfer switches, UPSs, and cooling devices is a well-understood practice. However, monitoring branch circuits and temperatures at the rack level is a relatively new concept in physical infrastructure management. Since branch circuit faults can occur from time to time, active branch management contributes to increased availability. Investing in a quality brand of circuit breaker also helps to minimize the instances of downtime. As IT equipment density increases, monitoring cooling devices is also increasingly critical to availability. Slide 39 In general, physical infrastructure management should meet the following criteria:
Easy to deploy and maintain: The system should support centralized management and require minimal training to operate. Provide advance warning of critical events: Timely information allows corrective action before equipment is damaged or fails. Able to analyze performance and predict failure: At a minimum, event and data logs should be stored, so that manual performance analysis can be done.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Flexible systems support changes in configuration while minimizing downtime. Examples of changes that can be anticipated include changing runtime, changing power load and redundancy requirements, as well as adding support for branch offices or other network nodes. Slide 40 Lets briefly discuss a summary of management approaches, and provide an example of a physical infrastructure management solution. Starting in the late 1990s, several organizations quickly installed IT systems to solve urgent business needs. This quick effort created multiple point solutions. As a result, in many installations, IT departments tend to manage equipment using element managers for different categories of equipment.
Slide 41 As shown in this illustration, it is common to utilize a storage manager such as EMC ControlCenter, for storage, a network manager such as CiscoWorks, for the networking equipment, and a server manager, such as HP Insight Manager, for servers. The advantage of these element managers is that they are generally easy to deploy and use since they are focused on managing one category of devices in many cases devices specific to an individual vendor. The limitation of this strategy is that there is no coordination of the different element managers. Slide 42 An Enterprise Management System such as Tivoli or HP OpenView has better visibility across an entire network. These tools help to coordinate the different types of devices and provide a broad view of everything occurring on the network. However, neither element managers nor an Enterprise Management System can handle the management of the physical infrastructure layer. Slide 43 Similar to EMSs, BMSs frequently manage some of the data points of physical infrastructure, such as building power, comfort air, building environment, or building security. However, in small and medium data
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
centers, BMS systems are not as sophisticated as physical infrastructure management systems when it comes to monitoring and controlling devices such as in-rowTM CRACS, PDUs, and UPSs. Slide 44 This diagram shows a typical approach to integrating existing Enterprise Management and Building Management systems into a physical infrastructure management system. Each individual EMS device or BMS data point is integrated into the high-level management system. The problem with this approach is that the integration is both complex and very expensive. This approach forces an organization either to buy or to develop unique systems for handling BMS and EMS information. Slide 45 This diagram shows how a physical infrastructure element manager fits into the high-level management system. The physical infrastructure element manager provides detailed information through direct connection the same way that server, storage, and network element managers provide detailed information through direct connection. However, physical infrastructure element managers have the advantages of being easier and less expensive to install. Physical infrastructure element managers automatically collect all individual device information and are pre-programmed with select rules and policies to manage physical infrastructure. Slide 46 A physical infrastructure element manager offers the following benefits:
Cost effective installation and maintenance Easy integration with existing enterprise and building management systems Optimized functionality for physical infrastructure management, and Cost effective management of the large amounts of physical infrastructure data
Slide 47
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Lets conclude with a brief summary. Physical infrastructure is the foundation upon which Information Technology and Telecommunication Networks reside. Physical infrastructure includes power, cooling, racks and physical structure, cabling, physical security and fire protection, management systems and services. The challenge for physical infrastructure incident management is to return systems to their normal service level, with the smallest possible impact on the business activity of the organization and the user. The challenge for physical infrastructure capacity management is to provide the required IT resources at the right time, at the right cost, and to align those resources with the current and future requirements of the internal business customer. The key standards for a physical infrastructure management system are:
It should be easy to deploy and maintain It should provide advance warning of critical events It should be able to analyze performance and predict failure, and It should be adaptable to business change
Two forms of physical infrastructure management include: Integrating each individual EMS device or BMS data point directly into the high-level management system, as well as Enterprise management and building management systems together through a physical infrastructure element manager Slide 48 Thank you for participating in this Data Center University course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
optimal system performance, creates a significant cooling challenge for IT and Facility Managers. Humidity and temperature levels outside the recommended range can rapidly deteriorate sensitive components inside the computers making them vulnerable to future failures. As high-density equipment becomes more prolific in IT rooms, proper configuration of cooling equipment becomes paramount to the availability of the data center. The concept of proper cooling not only includes the supply of adequate cool air, but also should take into account the distribution of that cool air, and the removal of hot air. Data center cooling has emerged as a predominant area of focus for todays IT and Facility Managers. Slide 5 There are many facets of data center cooling. Types of cooling equipment, humidity levels, raised floor designs, floor tile cooling configurations, and cabling arrangements all effect data center cooling. In this course, we will specifically address how the layout of cooling equipment affects data center cooling. However if you wish to learn more about these additional cooling factors, please refer back to the Data Center University Course catalog, where these topics are addressed in more detail. Slide 6 Data center and network room cooling systems consist of a Computer Room Air Conditioner (or CRAC) and an air distribution system. In larger data centers, a Computer Room Air Handler or CRAH may be used instead of a CRAC. Although CRACs and CRAHs may differ in design and capacity, the configuration of the associated distribution systems is what differentiates the different types of data center cooling systems. Slide 7 Every air distribution system has a supply system and a return system. The supply system distributes the cool air from the CRAC to the IT equipment. The return system takes the hot exhaust air from the IT equipment, and returns it back to the CRAC where it can then be cooled,
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
For both the supply and return, there are three basic methods used to
transport air between the CRAC and the equipment load. These methods are: Flooded Air Distribution System Locally Ducted system Fully Ducted system Slide 8 In a Flooded distribution system, there is no ductwork between the CRAC and the equipment load. In this case, the air moves freely between all pieces of equipment. In a Locally Ducted distribution system, air is provided or returned via ducts which have vents located near the loads. In a Fully Ducted system, supply or return air is directly ducted into or out of the loads. Each of these methods could be used on the supply side, on the return side, or both. Slide 9 In addition to providing a sufficient amount of cool air, proper distribution and removal of hot air is essential for maximum performance. There are 5 basic approaches for collecting and transporting unwanted heat from the IT environment to the outdoor environment. One or more of these methods are used to cool virtually all mission critical computer rooms and data centers. Each method uses the refrigeration cycle to transport or pump heat from the data center or computer room to the outside environment. Some approaches relocate the components of the refrigeration cycle away from the IT environment and some add additional loops (self-contained pipelines) of water and other liquids to aid in the process. These five approaches include: Air Cooled DX systems Air Cooled self contained systems Glycol Cooled systems Water Cooled systems Chilled Water systems
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 10 The first heat removal approach involves the use of air cooled computer room air conditioners, which are widely used in IT environments of all sizes and are considered the staple for small and medium rooms. This type of system is often referred to as a DX, or Direct Expansion system or split system. In an air cooled system half the components of the refrigeration cycle are in the computer room air conditioner (also known as a CRAC unit) and the rest are outdoors in the air cooled condenser. Refrigerant, typically R-22, circulates between the indoor and outdoor components in pipes called refrigerant lines. Heat from the IT environment is pumped to the outdoor environment using this circulating flow of refrigerant. The advantages of this system include low overall cost, and ease of maintenance. One of the prime disadvantages revolves around the need for refrigerant piping to be installed in the field. Only properly engineered piping systems that carefully consider the distance and change in height between the IT and outdoor environments will deliver reliable performance. In addition, refrigerant piping cannot be run long distances reliably and economically. Finally, multiple computer room air conditioners cannot be attached to a single air cooled condenser. The Air Cooled DX system is commonly used in wiring closets, computer rooms and small-tomedium data centers with moderate availability requirements. Slide 11 Air cooled self-contained systems locate all the components of the refrigeration cycle in one enclosure that is usually found in the IT environment. Heat exits the self-contained system as a stream of hot air called exhaust air. This stream of hot air must be routed away from the IT room to the outdoors or into an unconditioned space to ensure proper cooling of computer equipment. If mounted above a drop ceiling and not using condenser air inlet or outlet ducts, the hot exhaust air from the condensing coil can be rejected directly into the drop ceiling area. The buildings air
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
conditioning system must have available capacity to handle this additional heat load. Air that is drawn through the condensing coil (becoming exhaust air) should also be supplied from outside the computer room. This will avoid creating a vacuum in the room that would allow warmer, unconditioned air to enter. Self-contained indoor systems are usually limited in capacity because of the additional space required to house all the refrigeration cycle components and the large air ducts required to manage exhaust air. Self-contained systems that mount outdoors on a building roof can be much larger in capacity but are not commonly used for precision cooling applications. One of the key advantages of air cooled self-contained systems is that they have the lowest installation cost. No components need to be installed on the roof or outside of the building. In addition, all refrigeration cycle components are contained inside one unit as a factory-sealed and tested system for highest reliability. Disadvantages of this system include less heat removal capacity per unit compared to other configurations. Also, air routed into and out of the IT environment for the condensing coil usually requires ductwork and/or dropped ceiling. Air cooled self contained systems are commonly used in wiring closets, laboratory environments and computer rooms with moderate availability requirements. They are sometimes utilized to address hot spots in data centers. Slide 12 Glycol cooled systems contain all refrigeration cycle components in one enclosure (like a selfcontained system) but replace the bulky condensing coil with a much smaller heat exchanger. The heat exchanger uses flowing glycol (a mixture of water and ethylene glycol, similar to automobile anti-freeze) to collect heat from the refrigerant and transport it away from the IT environment. Heat exchangers and glycol pipes are always smaller than condensing coils (2-piece air cooled systems) and condenser air ducts (self-contained air cooled systems) because the glycol mixture has the capability to collect and transport much more heat than air does. The glycol flows via pipes to an outdoor-mounted device called a fluid cooler. Heat is rejected to the outside atmosphere as fans force outdoor air through the warm glycol-filled coil in the fluid cooler. A pump package (pump,
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
motor and protective enclosure) is used to circulate the glycol in its loop to and from the computer room air conditioner and fluid cooler. Advantages: The entire refrigeration cycle is contained inside the computer room air conditioning unit as a factory-sealed and tested system for highest reliability with the same floor space requirement as a two piece air cooled system. Glycol pipes can run much longer distances than refrigerant lines (air cooled system) and can service several computer room air conditioning units from one fluid cooler and pump package. In cold locations, the glycol within the fluid cooler can be cooled so much (below 50F [10C]) that it can bypass the heat exchanger in the CRAC unit and flow directly to a specially installed economizer coil. Under these conditions, the refrigeration cycle is turned off and the air that flows through the economizer coil, now filled with cold flowing glycol, cools the IT environment. This process is known as free cooling and provides excellent operating cost reductions when utilized. Disadvantages: Additional components are required (pump package, valves) and this increases capital and installation costs when compared with air cooled DX systems. Maintenance of glycol volume and quality within the system is required. This system introduces an additional source of liquid into the IT environment. Glycol cooled systems are commonly Used: In computer rooms and small-to-medium data centers with moderate availability requirements Slide 13 Water cooled systems are similar to glycol cooled systems in that all refrigeration cycle components are located inside the computer room air conditioner. However, there are two important differences between a glycol cooled system and a water cooled system:
A water (also called condenser water) loop is used instead of glycol to collect and transport heat away from the IT environment
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Heat is rejected to the outside atmosphere via a cooling tower instead of a fluid cooler. A cooling tower rejects heat from the IT room to the outdoor environment by spraying warm condenser water onto sponge-like material (called fill) at the top of the tower. The water spreads out and some of it evaporates away as it drips and flows to the bottom of the cooling tower (a fan is used to help speed up the evaporation by drawing air through the fill material). In the same manner as the human body is cooled by the evaporation of sweat, the small amount of water that evaporates from the cooling tower serves to lower the temperature of the remaining water. The cooler water at the bottom of the tower is collected and sent back into the condenser water loop via a pump package. Condenser water loops and cooling towers are usually not installed solely for the use of water cooled computer room air conditioning systems. They are usually part of a larger system and may also be used to reject heat from the buildings comfort air conditioning system (for cooling people) and water chillers (water chillers are explained in the next section). Advantages: All refrigeration cycle components are contained inside the computer room air conditioning unit as a factory-sealed and tested system for highest reliability. Condenser water piping loops can easily run long distances and almost always service many computer room air conditioning units and other devices from one cooling tower. In leased IT environments, usage of the buildings condenser water is generally less expensive than chilled water (chilled water is explained in the next section). Disadvantages: High initial cost for cooling tower, pump, and piping systems. Very high maintenance costs due to frequent cleaning and water treatment requirements. Introduces an additional source of liquid into the IT environment. A non-dedicated cooling tower (one used to cool the entire building) may be less reliable then a cooling tower dedicated to the Computer Room Air Conditioner. Commonly Used:
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
In conjunction with other building systems in small, medium and large data centers with moderateto-high availability requirements.
Slide 14 In a chilled water system the components of the refrigeration cycle are relocated from the computer room air conditioning systems to a device called a water chiller. The function of a chiller is to produce chilled water (water refrigerated to about 46F [8C]). Chilled water is pumped in pipes from the chiller to computer room air handlers (also known as CRAH units) located in the IT environment. Computer room air handlers are similar to computer room air conditioners in appearance but work differently. They cool the air (remove heat) by drawing in warm air from the computer room through chilled water coils filled with circulating chilled water. Heat removed from the IT environment flows out with the (now warmer) chilled water exiting the CRAH and returning to the chiller. At the chiller, heat removed from the returning chilled water is usually rejected to a condenser water loop (the same condenser water that water cooled computer room air conditioners use) for transport to the outside atmosphere. Chilled water systems are usually shared among many computer room air handlers and are often used to cool entire buildings. Advantages: Computer room air handlers generally cost less, contain fewer parts, and have greater heat removal capacity than computer room air conditioners with the same footprint. Chilled water piping loops are easily run very long distances and can service many IT environments (or the whole building) from one chiller plant. Chilled water systems can be engineered to be extremely reliable. Chilled water systems have the lowest cost per kW for large installations. Disadvantages: Chilled water systems generally have the highest capital costs for installations below 100kW of electrical IT loads.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
CRAHs generally remove more moisture from data center air than their CRAC counterparts, requiring more money be spent on humidifying the room in many climates. Introduces an additional source of liquid into the IT environment. Commonly Used: In conjunction with other systems in medium and large data centers with moderate-to-high availability requirements or as a high availability dedicated solution in large data centers. Slide 15 There are 2 basic model types of precision cooling equipment: ceiling mounted and floor mounted. Variants, such as wall-mounted or mini-split systems are similar to ceiling mounted systems and are employed similarly when adequate wall space is available. Slide 16 Ceiling mounted systems are small (300-500 pound) (136-227 kg) precision cooling devices suspended from the IT rooms structural ceiling. They cool 3-17kW of computer equipment and utilize any of the 5 IT environment heat removal methodologies. One of the key benefits of ceiling mounted systems is that they do not require floor space in the IT environment. A drawback however, is that installation and maintenance activities are more complicated due to their overhead placement. As a result, it is recommended that IT professionals, facilities personnel and manufacturers representatives or mechanical contractors handle the specification, installation and maintenance of ceiling mounted precision cooling systems. Slide 17 Floor mounted systems Floor mounted precision cooling systems usually offer the greatest range of features and capabilities. They are increasingly being used to cool or to assist in the cooling of smaller IT environments as power consumption of computer equipment continues to increase.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 18 Portable systems (also known as spot coolers) are considered part of the floor mounted category, however they are almost always have wheels and can be easily located anywhere precision cooling is required. Portable systems cool 2-6kW of computer equipment and often a normal wall outlet can be used to supply electrical power (2-4kW models). Portable systems are almost always self-contained systems. Specification, installation and maintenance of most portable cooling systems can be accomplished by IT professionals without the assistance of facilities personnel or mechanical contractors. Slide 19 Large floor mounted precision cooling systems have been extensively used to cool mission critical computing environments since their inception. These are usually the highest capacity cooling devices found in the IT environment with the ability to cool 20kW to over 200kW of IT equipment per chassis. Floor mounted systems utilize IT environment floor space and must be strategically located in the room for maximum effectiveness. Specification, installation and maintenance of large floor mounted precision cooling systems is highly dependent on the existing electrical, mechanical and structural capabilities of the building they are to be operated in. For this reason it is important for IT professionals to work closely with facilities management and manufacturers representatives during the specification process. Often the services of a State-Registered Professional Engineer are required to design and certify the solution. Most mechanical contracting firms familiar with IT environments can install and if desired, maintain the solution. Recent developments in large floor mounted systems have reduced their energy consumption and the overall space they require in the computer room or data center. Their outer dimensions and appearance have changed so they fit in spaces sized for IT rack enclosures. This allows for operational cost savings and more flexibility in IT environment planning. Slide 20
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Cooling equipment for an IT environment can be implemented in 10 basic configurations. The selection of the appropriate configuration for a particular installation is affected by the existing facility infrastructure, the total power level of the installation, the geographical location, and the physical constraints of the building.
Slide 21 The 10 configurations are achieved by combining the 5 methods of heat removal with the ceiling and floor mounted model types. The Air Cooled DX System requires roof access, an air cooled condenser and refrigerant piping for both the ceiling and floor mounted configurations. The floor and ceiling configurations are typically found in computer rooms, and the floor mounted arrangement is also found in medium data center applications. The Air Cooled Self-Contained System must have dropped ceiling or ducts installed for ceiling mounted configurations. The floor mounted system also must have a dropped ceiling for condenser air tubes. In addition, large floor mounted systems require outdoor heat rejection components. This systems is typically found in wiring closets, computer rooms and small data centers. With Glycol Cooled Systems, both ceiling and floor mounted arrangements require the building to have roof access and a 10 (3m) floor to structural ceiling height. Fluid cooler, pump package and glycol piping is also required. Ceiling mounted systems are found in computer rooms and small data centers, while floor mounted versions can be used in medium and large data center installations. Water cooled systems require that the building must have 10 (3m) floor to structural ceiling height for ceiling arrangements with a further requirement of a hookup to the building condenser water. In floor mounted configurations, the building must have condenser water system with adequate
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
capacity. Ceiling mounted installations are not commonly seen, however floor mounted configurations can be found in medium and large data center applications. In Chilled Water Systems, the ceiling mounted and floor mounted arrangements requires a reliable chilled water system and hookup, with the ceiling mounted configuration of a 10 (3m) floor to structural ceiling height. Chilled water systems that are ceiling mounted can be found in wiring closets, computer rooms and small data centers, while the floor mounted version can be used in computer rooms, ,and small, medium and large data center installations. Slide 22 As computing equipment continues to shrink in size, the overall footprint of high power computing also decreases. This means that the amount of equipment needed to perform the same amount of computing has gone down, thereby also decreasing heat that is created. Despite an overall decrease in heat levels, however, the heat density per square foot has increased, as have the prevalence of hot spots. Slide 23 The way in which individual equipment racks are placed in relation to one another can also have a significant impact on heat density, conditioned air distribution and service accessibility. When numerous racks and large amounts of equipment are a factor, careful planning of room layouts, and continued diligence in maintaining these layouts is imperative for maximum performance. Slide 24 It is common to enter a data center and see row after row of equipment racks all facing the same way. While this is aesthetically pleasing, and may seem to be the logical way in which to layout numerous racks, this approach can be detrimental to the overall availability of the data center. Before we can understand how rack placement affects performance, lets first discuss how racks and equipment components housed within the racks are affected by airflow in the data center.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 25 The majority of equipment racks on the market are designed to take in cooled air through the front, pass it over the IT equipment, and exhaust it from the rear. With either raised floor or non-raised floor environments, its important to design layouts such that they maximize this airflow pattern. Slide 26 The ideal placement of rack equipment is called a hot-aisle/cold-aisle configuration.
Slide 27 This configuration involves the installation of racks in a face-to-face (and back-to-back) orientation. By configuring units in this way, it eliminates the hot exhaust air of one unit being drawn into the intake of another unit. Slide 28 In a hot aisle /cold aisle configuration, in addition to racks being placed face to face, CRAC units must also be strategically placed to create a cold aisle by properly distributing the cold air to the face of the racks, and to maximize the return of hot exhaust air out of the back of the racks and into the hot aisle. Slide 29 In a large data center application, this would mean every other row would be forward facing to create the hot aisle/cold aisle arrangement. Slide 30 Taking this concept one step further has prompted manufacturers such as APC to developed hot aisle containment units. Hot aisle containment systems ensure proper air distribution by completely
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
separating supply and return air paths. The hot aisle is sealed off using doors and transparent ceiling tiles that extend the width of the hot aisle. These units ensure that this is little to no mixing of hot and cold air streams in the data center. The warmest air possible is returned to the CRAC units thus increasing the efficiency and capacity of the system. Slide 31 To summarize: The three basic methods used to transport air between the CRAC and the equipment load are Flooded, Locally Ducted, Fully Ducted There are five basic systems for data center heat removal. They are Air Cooled DX Systems, Air Cooled Self-Contained Systems, Glycol Cooled Systems, Water Cooled Systems and Chilled Water Systems There are 2 basic model types of precision cooling equipment: ceiling mounted and floor mounted The ideal placement of rack equipment is in a hot-aisle/cold-aisle configuration, where racks are installed face-to-face and back-to-back Slide 32 Thank you for participating in this Data Center University course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 5 Consider these statistics. According to Contingency Planning Research power related events such as blackouts and surges account for 31% of computer downtime episodes lasting more than 12 hours, power failure and surges account for 45.3% of data loss, and according to IDC power disturbances account for about 33% of all server failures. A standby generator is one critical equipment component that will keep you from becoming one of these statistics. Understanding the basic functions and concepts of standby generator systems helps provide a solid foundation allowing IT professionals to successfully specify, install, and operate critical facilities. This course is an introduction to standby generators and the subsystems that power a facilitys critical electrical loads when the utility cannot. Slide 6 A standby generator system is a combination of an electrical generator and a mechanical engine mounted together to form a single piece of equipment. The components of a generator include the prime mover, the alternator, the governor, and the distribution system. The distribution system is made up of several subcomponents which include the Automatic Transfer Switch (ATS) and associated switchgear and distribution. In many instances, generators also include a fuel tank and are equipped with a battery and electric starter. As we begin the course let us first focus on the internal combustion engine or the prime mover. Slide 7 The internal combustion engine is the well-respected workhorse of the latter half of the 20th century, and has carried this role into the new millennium. In basic terms, an internal combustion engine converts its fuel source into mechanical motion via its internal moving parts. As outside air mixes with the fuel inside the engine these moving parts ignite the air/fuel mixture to create a controlled internal explosion (combustion) within the cavities known as cylinders.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 8 Although there are numerous variations of the internal combustion engine, the most commonly used for standby generator systems is the 4-stroke engine. It is referred to as a 4-stroke engine because of the four distinct stages that occur in the combustion cycle. These stages include intake of the air/fuel mixture, compression of that mixture, combustion or explosion, and exhaust. When referring to generators, the 4-stroke engine is generally referred as the prime mover. The following slide describes the core attributes of the prime mover. Slide 9 There are four main fuels used to power generators. These include diesel, natural gas, liquid petroleum (LP), and gasoline. The selection of a fuel type depends on variables such as storage, cost, and accessibility. Generator systems with diesel or natural gas engines are the most common standby power generators utilized to support data centers. Fuel availability generally dictates the type of standby generator selected. For example, if a generator is located in an isolated area where public utilities are not available, LP or diesel fuel are logical choices. Additionally, The generators fuel type, as well as the magnitude of potential step-load changes, or whether the generator will be expected to support an instantaneous change in load current, from zero to full load for example, will influence the selection of the governor. Because these factors contribute to the accuracy and stability of the prime movers speed, they must be considered in the overall design. Lets review some of the advantages and disadvantages of the different fuel types. Slide 10 Diesel fuel is often chosen in many applications because of easy onsite storage, fewer problems with long term storage, reduced fire hazard, and more operating hours between overhauls. Disadvantages to using diesel fuel are its low volatility at low ambient temperatures and diesel does not burn as cleanly as natural gas or liquid petroleum and therefore it has potential harmful affects on the environment.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 11 Natural gas is used quite often due to its numerous advantages. Natural gas is a clean burning fuel with less exhaust emissions and the exhaust is less harmful to the environment than diesel. Because natural gas is a cleaner fuel there is minimum carbon build up and cleaner crankcase oil. Additionally, there are no fuel storage problems, and there is less engine maintenance then diesel or gasoline generators. The disadvantages of using natural gas are the high cost. Gas generators tend to cost more that other generator types and you are also limited in the size of the generator. Natural gas is provided by a single source, its regulated by the pipeline it is connected to. Subsequently, during emergencies, gas lines can get sucked dry by other consumers and if you're towards the end of the line, you might not be able to get supply. Safety is also another factor. If there are leaks it has a high explosive capability. Lastly, energy content of natural gas is lower than most other fuel sources, so you need more of it to generate electricity for you generator. Slide 12 The advantages of using a generator powered by liquid petroleum are similar to those of natural gas. Its a clean burning fuel with less environmental impact than diesel, and the exhaust is less harmful to the environment. Additionally, there are no fuel storage problems, and there is less engine maintenance then diesel or gasoline generators. The biggest disadvantage to using a liquid petroleum powered generator is that Liquid petroleum presents the greatest hazard. If any liquid petroleum vapors are leaked or released, being heavier than air, the liquid petroleum will flow to low areas such as basements and potentially create an explosion hazard. Slide 13 Gasoline is often used in smaller engine generator sets due to the fact that it is readily available and that gasoline powered engines start easier than diesel engines in cold temperatures. The disadvantages are that the storage of gasoline is a fire hazard and that long term storage and
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
usage of old gasoline can be detrimental to the performance of the generator engine. Lastly, there is the option of using duel fuel engines. For example, if a generator is capable of using both natural
gas or liquid petroleum it offers that much more flexibility when considering environmental safety needs and also the need for redundant power. Now that we have examined fuel types lets look at another important aspect of generator function, cooling. Slide 14 The majority of prime movers for generator applications are cooled with a radiator cooling system much like the cooling system in an automobile. A fan is used to move sufficient air over the radiator to maintain moderate engine temperature. The waste heat is drawn off the radiator to the outside, with ductwork of the same cross-sectional area as the radiator face. The intake air opening like, louvers into the room, is typically 25-50% larger than its ductwork. Rigorous maintenance of the cooling system is needed for reliable operation. Coolant hoses, coolant level, water pump operation, and antifreeze protection must be diligently reviewed for acceptable performance. Slide 15 Next is lubrication. Modern 4-stroke engines utilize full-flow filter systems, which pump the lube oil through externally mounted filters to prevent harmful particles and contaminants from damaging the moving parts or bearings. Make-up oil reservoirs are used to maintain proper oil level, and external oil coolers assist in preventing lubrication breakdown due to high temperatures. Slide 16 Air and fuel filters are critical elements for the reliable operation of the prime mover. Like the other components previously mentioned, it is essential that a proper maintenance schedule be followed. A system that includes dual redundant fuel lines and filters is a significant benefit in mission-critical
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
applications where long runtime must be supported. This is because fuel lines and filters can be isolated and changed while the engine remains running. Not having spare parts for filters and other "consumables" can result in downtime. Proactive monitoring of these filters is done with Differential Pressure Indicators. They show the pressure difference across a filter or between two fuel lines during engine operation. When applied to air filters, these proactive monitoring devices are known as Air Restrict Indicators. These provide a visual indication of the need to replace a drytype intake air filter while the generator engine runs. Slide 17 The last component of the prime mover that we will be discussing is the starter motor. As shown in this illustration, the starter motor system is one of the most critical elements to the successful use of a generator. The majority of generator systems use a battery-operated starter motor, as in automotive applications, although pneumatic or hydraulic alternatives are sometimes found on the heaviest prime movers. The critical element in the conventional starter is clearly the battery system. For example, the battery-charging alternator present on some engines does nothing to prevent battery discharge during the unused periods. Providing a separate, automatic charging system with remote alarm is considered a "best practice. It is also essential to keep the battery warm and corrosion-free. Engine block heaters also contribute to the startup success rate by reducing the frictional forces that the starter motor must work against when energized. Numerous studies have found startup failures to be the leading cause of generator system failures. Slide 18 When considering a standby generator, the minimum time to detect a power problem, start the prime mover, establish stable output frequency and voltage, and connect to loads is usually at least 10-15 seconds. However, many systems in use today do not reliably perform to this very quick deployment due to such factors as uncharged or stolen batteries. Other factors include improper
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
maintenance and human error. Conscientious maintenance and design of a starter motor is absolutely critical to achieving a respectable success rate for generator startup systems. Slide 19 The alternator is another critical component of the generator. The main function of the alternator is to convert mechanical energy from the prime mover into alternating current. This is similar to the alternator in an automobile, however in an automobile it is usually driven by a belt, whereas in a generator it is driven by the main drive shaft of the prime mover. Through the years, certain characteristics of alternator components have been improved to increase the efficiency, capacity and reliability of the alternator. Lets start by looking at some of these improvements used in todays data center generators. Slide 20 The following diagram illustrates a cross sectional view of a self excited, externally regulated, brushless alternator. The brushless designation refers to the fact that this design requires no contacts be placed against any revolving parts to transfer electrical energy to or from the components. Brushes in motors and very small generators may still be an acceptable design, but predictably the brushes wear out with use, and are impossible to inspect in a proactive manner. A large generator design that relies on brushes is not up to the reliability standards needed for mission-critical operation. When a generator is described as self excited it means that the electricity used to create the electromagnetic field is created within the alternator itself thereby allowing the alternator to produce large amounts of electricity with no other energy then what is provided by the prime mover. Slide 21 The Main Stator or Armature Windings are the stationary coils of wire where the electricity for the critical loads begins to be generated. The characteristics of the alternating current produced are
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
related to the quantity and geometry of the coil windings. A large variety of configurations are available to deliver combinations of ampacity and voltage requirements. Slide 22 The next component we will be discussing is the governor. The governor maintains constant RPM of the prime mover under a variety of conditions by adjusting the fuel that feeds the prime mover. A stable AC frequency is required and is directly proportional to the accuracy and response time of the governor. This item is a key component in determining the AC output power quality. Frequency variation and its impact on power quality is not a problem that users must contend with when connected to a stable utility grid. However, sensitive electronics are vulnerable to disruption due to abrupt changes in frequency under the influence of generator power. The generators capability to produce a constant frequency is directly proportional to the RPM speed of the prime mover which is controlled by the governor. Many system designs exists, from simple spring-types to complex hydraulics and electronic systems that dynamically adjust the fuel throttle to keep the engine at constant RPM. Simply adding or removing loads, or cycling those loads on and off, creates conditions to which the governor must respond. Slide 23 An isochronous (same speed) governor design maintains constant speed regardless of load level. Small variations on the speed of the prime mover still occur and their extent is a measure of the governors stability. Today governor technology exists to maintain frequency regulation to within 0.25% with response times to changing loads on the order of 1 to 3 seconds. Modern electronic solid-state designs deliver high reliability and the needed frequency regulation for sensitive loads. Slide 24 Sophisticated electronic governor systems for paralleling have recently been developed that provide superior coordination and frequency stability under a variety of conditions. When two or more generators are paralleled for capacity or redundancy they must all be governed at the same
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
speed using either the utility or another generator as the primary frequency reference. This is because if the two sources are out of synch, one of them will carry a larger fraction of the load, which will result in a needed correction. These advances are a welcome enhancement to the high availability requirements of todays data centers, due to their reliability, reduced maintenance, and coordination efforts. Now lets review the voltage regulator. Slide 25 The basic function of a voltage regulator is simply to control the voltage produced at the output of the alternator. The operation of the voltage regulator is vital to critical loads dependent on computer grade power. The goal is to configure a system with an appropriate response time to minimize sags and surges that occur as the load changes. Another issue to be aware of is the behavior of the regulator when subjected to non-linear loads such as older switch-mode power supplies. Non-linear loads draw current in a manner that is inconsistent with the voltage waveform, while resistive loads (like a light bulb) draw current in synch with the voltage waveform. Non-linear loads can interact negatively with a generator system thereby jeopardizing the availability of the critical load during standby operation. Slide 26 This diagram illustrates how the Automatic Transfer Switch ATS monitors the utility source and initiate engine starting and transfer of the load from the utility to the generator as soon as the generator is available and stable. The ATS also re-transfers the load to utility when normal conditions are restored. Other common features of ATS related equipment includes automatic generator test scheduling and monitoring of important cool-down cycles for the generator after the utility is restored. Traditionally this hardware is sourced from a variety of vendors, including the generator manufacturers, distribution switchgear manufacturers, and specialists in ATS design. Slide 27
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
When considering the best practices for a generator proper grounding should be first on the list. Grounding is a necessity when it comes to both safety and the reliability of a generator. Grounding performs such critical functions as: Preventing the shock or electrocution of maintenance or repair technicians Ensuring that circuit breakers trip before electrical malfunctions develop into fires and Provide a low-impedance path for internal processing signals Slide 28 One key issue when discussing generators is that of air and noise pollution. Environmental laws, building permits, and duration of generator use vary considerably by locale. For example, if the facility is located in a stringent area, the generator system declarations on emissions may be required when applying for permits. Industry professionals are usually experienced with the approval process in the locations they serve. As we had discussed earlier, generators are very similar to automobiles engines. Just as automobile noise and exhaust is a significant issue, so is the noise and air pollution caused by a generator. While the concept of minimizing noise and ducting exhaust air is straightforward, the environmental and regulatory issues are not. Slide 29 Local ordinances on noise pollution typically dictate the highest recordable background noise allowed in a 24-hour period. Exhaust mufflers are generally categorized as industrial, residential, or critical. Critical offers the highest level of sound reduction. To spare the expense of a retrofit design, one should consider the noise rating of the system prior to purchase and have these numbers qualified by the zoning authority in the planning stages. Mechanical vibration also contributes to the overall noise level and the perception of noise for occupants in the surrounding area. Mounting and isolation techniques exist to minimize this concern. Now that we have explored the topic of environmental pollution lets a take a closer look at a few organizations that set the regulatory standards for standby generators.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 30 In the United States, the Federal Environmental Protection Agency (EPA) has delegated to each state the jurisdictional authority and discretion on how to achieve air quality goals established on the national level. Other countries have similar regulatory bodies that set limits on generator emissions. For instance, Defra (Department for Environment Food and Rural Affairs) sets policies for environmental protection in the United Kingdom. And in India, the Ministry of Environment and Forests (MoEF) plays this role. In addition, there are worldwide organizations that offer a wealth of information about emissions and other standby generator considerations. One such organization is the EGSA (Electrical Generating Systems Association). EGSA serves as a source of information, education, and training and develops performance standards for on-site power technology.
Slide 31 Another important consideration for generators is that of aesthetics. Some municipalities have requirements in terms of placement of the generator, including that it be housed within concrete block walls that match the main buildings appearance. This keeps the generator from being noticed and keeps it aesthetically neutral to the surrounding neighborhood. Now that we have discussed some of the regulatory issues of standby generators, lets look more closely at some key maintenance measures that ensure optimal performance. Slide 32 Maintenance is the key to ensuring the seamless operation of your generator. As we have mentioned throughout the course there are many different way in which you can optimize your standby generator. When you consider the fact that malfunctions can be almost completely eliminated with the proper preventative maintenance, the importance of such preventative steps hits home. Here are a few steps to take to ensure that your generator is ready when you need it.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Its important to note that a good preventative maintenance program is going to require shutting off all power for approximately 10-20 hours once a year. This will help to check the tightness of all medium voltage (MV) and switchgear terminations If a total shut down is not a possibility, thermal imaging can be used to detect any hot spots caused by electrical terminations and connections The trip setting of major circuit breakers should be tested Transformers and cables should be tested, and lastly Coolant samples should be taken for signs of insulation deterioration Slide 33 Additionally, a management system that monitors all generator subsystems and provides early warning through preventative maintenance reminders is a great way to ensure reliability. Here are a few key preventative maintenance initiatives that can be taken to ensure reliability of your generator:
Coolant hoses, coolant level, water pump operation, and antifreeze protection must be diligently reviewed for acceptable performance. Real time remote and local monitoring system that provides critical information and alarms at every interface Battery monitoring and weak battery detection An automatic alarm that is sent whenever the controller is not set to automatic, the emergency stop is engaged, or the generator output breaker is not closed Block temperature and coolant level monitoring and alarms Fuel level and load power measurements, available even when the generator is in standby Oil level monitoring so that oil can be added or leaks repaired, rather than waiting for the generator to start and shutdown due to low pressure Slide 34
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
To summarize lets review some of the information that we have covered throughout the course: Slide 35 Thank you for participating in this Data Center University TM course. Slide 36 To test your knowledge of the course material click the Knowledge Checkpoint link on your Data Center University TM personal homepage Important Point! The Knowledge Checkpoint link is located under BROWSE CATALOG on the left side of the page Slide 37 Here at DCU, we value your opinion! We are dedicated to providing you with relevant, cutting edge education on topics pertinent to data center design build, and operations, when and where you need it. So, please take our brief survey and tell us how were doing. How do you begin? Its easy! 1) Click on the Home icon, located in the right corner of your screen. 2) Click on the We Value Your Opinion" link on the left side of the screen under Browse DCU Courses. 3) Select the course title you have just completed and take our brief survey.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
It is imperative that servers are isolated from utility power failures, surges, and other potential electrical problems. The building in which a data center is located could have a mixture of power requirements: air conditioners, elevators, office equipment, desktop computers, and kitchen area microwaves and refrigerators. It is important to provide a separate, dedicated power source and power infrastructure for the data center. This course will explore the topic of power, and how it is utilized within the data center. Lets begin by refreshing ourselves with definitions of some basic electrical terms. Slide 5 The Volt is a unit of measurement of potential difference or electrical pressure between two points. If the two points are connected together, they form a circuit and current will flow. An Ampere measures the amount of electrical current flowing through a circuit during a specific time interval. The Ohm is the unit of measurement which describes the amount of resistance electricity encounters as it flows through a circuit. Slide 6 Hertz is the unit of frequency measurement. One complete cycle of change in voltage direction per second is equal to one Hertz (Hz). Alternating Current, or AC, is constantly being reversed back and forth through an electrical circuit. Power supplied to a building by a nearby utility is an example of AC power. Direct Current, or DC, is electrical current that only flows in one direction. The power supplied by a battery is one example of a DC power source. To fully demonstrate how all of these terms relate to one another, lets compare the flow of electricity through a power cable to the flow of water through a garden hose.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 7 Lets use a typical garden hose as an illustration for how electricity can work. Water will flow through the hose at a slow rate, or a fast rate, depending on how far the faucet is opened. Water pressure (equivalent to voltage) usually remains constant whether the faucet is opened or closed. Current is controlled by the faucet position (resistance). The faucet is either more open or less open at any given time. The current can also be controlled by an increase or loss of water pressure (voltage). The amount of water that moves through a hose in gallons, or liters, per second can be compared to the quantity of electrons that flow per second through a conductor as measured in amperes. Slide 8 Our garden hose analogy can also help to explain resistance. Consider a garden hose which is partially restricted by a large rock. The weight of the rock will slow the flow of water in the garden hose. We can say that the restricted garden hose has more resistance to water flow than does an unrestricted garden hose. If we want to get more water out of the hose, we would need to turn up the water pressure at the faucet. The same is true of electricity. Materials with low resistance let electricity flow easily. Materials with higher resistance require more voltage to make the electricity flow. Slide 9 When discussing the concept of power, it is important to understand the term, electrical load. The load is the sum of the various pieces of equipment in a data center which consume and are supplied with electrical power. A typical data center load would consist of computers, networking equipment, cooling equipment, power distribution equipment and all equipment supported by the electrical infrastructure. We will now address some of the differences between AC and DC power. Slide 10
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
As mentioned in our section on key terms, Alternating Current (AC) and Direct Current (DC) are two forms of power. Lets begin to explore the ways in which each is utilized. When the direction of current flowing in a circuit constantly reverses direction, it is known as Alternating Current (AC). The electrical current coming into your home is an example of alternating current. Alternating Current, which comes from the utility company is switched back and forth approximately 60 times each second, measured as 60 Hertz. This measurement is called the frequency. The utility determines the frequency for the AC power that reaches the data center. In the US, frequency is set at 60 Hertz (Hz). In other countries, 50 Hz is more common. AC power is a combination of voltage and current. AC voltage at a generating station is stepped up via high voltage transformers to voltage levels that enable power to be distributed over long distances with minimal loss of energy. Slide 11 Direct Current (DC) has several applications in the typical data center, most commonly in telecom equipment where banks of batteries supply power at 48 Volts DC or in battery systems supporting uninterruptible power supplies, which can be at potentials over 500 Volts DC. However, whether the supply is available from banks of batteries, or from DC generators, DC systems are not practical in data centers because of heavy resistive losses and the large cable sizes required to power information technology equipment. Almost all data center equipment is designed for the local nominal AC supply voltages. Now that we have discussed the forms of current, lets compare single-phase and 3-phase power. Slide 12 Two common forms of AC power provided to data centers are single phase and 3-phase power. Single-phase power has only one basic power waveform, while 3-phase power has three basic power waveforms that are offset from each other by 120.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
When AC power comes into a building as a single voltage source, it is referred to as single phase. If the power comes into the building utilizing three voltage sources, or three phases, or three hot wires, with accompanying neutrals and grounds, it is referred to as 3-phase power. Slide 13 Single phase electricity is usually distributed to residential and small commercial customers. The single phase implies that power comes in with only one hot wire, along with accompanying neutral and ground. Generating and distributing 3-phase power is more economical than distributing single phase power. Since the size of the wire affects the amount of current that can pass, it also determines the amount of power that can be delivered. If a large amount of power were distributed as a single phase, huge heavy transmission wires would be needed and it would be nearly impossible to suspend them from a pole. It is much more economical to distribute AC power using 3-phase voltage sources. Next, lets talk about 120/240 and 208 volt configurations. Slide 14 120 Volts and 240 Volts AC are the most common single phase voltages supplied to residential customers. Single phase 240 Volts tends to supply larger domestic appliances, such as clothes dryers, electric cooking stoves, and water heaters. Single phase 120 Volts is also available in some data centers. Many IT devices, including computer monitors and individual desktop computers accept 120 Volts. 3-phase 208 Volts power usually supports commercial environments, including most data centers. (Please note: In many countries, such as in parts of Europe and Asia, voltages such as 220-240V and 400V are also common.) Next, we'll explore the concept of watts and volt-amps.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 15 The Watt measures the real power drawn by the load equipment, and is used as a measurement of both power and heat generated by the equipment. Wattage rating is typically stamped on the nameplate of the load equipment. However, the nameplate rating is rarely the same as the measured wattage in IT equipment. Many data centers have metering available on UPS or power distribution units (PDU), or even on rack mounted power strips all of which allow accurate recording of power at the site. Slide 16 The Volt-Amps (VA) rating, or apparent power, represents the maximum load that the device in question can draw. It is the product of the applied AC voltage times the current drawn by the device. VA is used in sizing and specifying wire sizes, circuit breakers, switchgear, transformers and general power distribution equipment. VA ratings represent the maximum power capable of being drawn by the equipment. VA ratings are always greater than or equal to the watt rating of the equipment. The significance of the difference between Watts and Volt-Amps is that power supplies, wiring, and circuit breakers may need to be rated to handle more current and more power than what may be expected. Slide 17 The terms Watts (W) and Volt-Amps (VA) are often used interchangeably when discussing load sizing for power infrastructure components, such as UPS devices. These terms are however, not the same. The key to understanding the relationship between Watts and VA is the Power Factor. Watts represent real power and Volt Amps represent apparent power. The power factor is the ratio of real power to apparent power. Power factor can be expressed as a number between 0 and 1 or as a %. If a given UPS has a watts rating of 8 and a VA rating of 10, then its power factor is .8 (or 80%). A UPS with a power factor of .8 is more efficient than a UPS with a power factor of .7.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Next, we will look at one type of electronic switching power supply: Power Factor Corrected. Slide 18 Power Factor Corrected power supplies were introduced in the mid-1990s and have the characteristic that Watt and VA ratings are equal. That is they have a power factor of nearly 1. Power Factor Correction is simply a method of offsetting inefficiencies created by electrical loads. All large computing equipment such as servers, routers, switches, drive arrays made after 1996 use the Power Factor Corrected power supply. Personal computers, small hubs and personal computer accessories can have a power factor of less than 1. For a small UPS designed for computer loads which only have a VA rating, it is appropriate to assume that the Watt rating of the UPS is 60% of the published VA rating. For larger UPS systems, it is becoming common to focus on the Watt rating of the UPS. State-ofthe-art larger UPS systems are rated for unity power factor. In other words they are designed so that their capacity in kVA is the same as in kW. Next, lets discuss plugs and receptacles. Slide 19 Many different types of power plugs are used throughout the world. Two of the more common plug standards in data centers are: the International Electrotechnical Commission (IEC) standard, which is based in Switzerland, but used globally; and the National Electric Manufacturers Association (NEMA) standard, which is commonly used in North America. Most plugs in the data center have three prongs and the receptacles are designed to accept these three prong configurations. In the US, a typical 3-prong plug consists of two flat prongs and one rounded prong. The larger of the flat prongs is the neutral, the smaller of the two flat prongs is the hot, and the rounded prong on the bottom is the ground.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The most common plug/receptacle combination for IT equipment is of an IEC design. These receptacles are often designed in a recessed fashion for safety reasons. The design helps to prevent a person from touching the pins when they are live. Also common are plugs and receptacles of the twist lock variety. The plug is twisted to lock into the receptacle. This is particularly useful if you choose to deploy overhead cabling rather than below the raised floor cabling. With twist lock, the receptacle is less likely to allow gravity and vibration to dislodge it from its plug.
Lets discuss IEC and NEMA plugs in greater detail. Slide 20 Among the most common IEC plugs found in data centers are: the IEC-320-C13 and IEC-320-C14, which are rated over a range from 100 to 240 Volts AC, and a current of about 10 Amps; the IEC320-C19 and IEC-320-C20, which are rated over a range from 100 to 240 Volts AC, and a current range of about 16 to 20 Amps. Also common are the IEC 309 series of 208 Volt single phase Russell Stoll connectors. The IEC 309 2P3W 208V, 20A for example, is rated at 20 Amps, and the IEC 309 2P3W 208V, 30A is rated at 30 Amps. Clues to the makeup of the plug can be determined by analyzing the name of the plug. In the case of the IEC 309 2P3W 208V, 30A , for example, the letter P identifies the number of poles, the letter W identifies the number of wires. V identifies volts and A designates the current in amperes. Receptacles are installed in rack-mounted power strips as well as on power whips, and those plugs are most commonly attached to power cords on IT equipment. (Please note: In many countries, such as in parts of Europe and Asia, voltages such as 220-240V and 400V are also common.)
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 21 There many examples of NEMA standard plug types. Each NEMA plug and receptacle type follows a naming convention. For example, a common plug type may read L5-15P. If the code begins with the letter L, the plug or receptacle locks. If the code does not begin with a letter, the plug or receptacle does not lock. In this example, the plug locks. The first number can be a digit between 1 and 24, where 3 and 4 are never used. That number represents a certain combination of voltage, number of poles, number of wires, and whether it is a grounding type plug or not. In this example, the plug is a Number 5 plug. The number after the hyphen indicates the amperage rating. In this example, the number after the hyphen is 15, which means the plug is rated to handle 15 Amps. The final letter, being a P, indicates that the device is, indeed, a plug. If the device was a receptacle, the final letter would be an R. Now that we have learned what we need to know about plugs and receptacles, lets explore some common areas where power failures can occur. Slide 22 According to M Technology, Inc., an expert in the field of Probabilistic Risk Assessment, the most common areas of power system failure in data center electrical infrastructure are: the power distribution unit (PDU) and its respective circuit breakers at 30%, all other circuit breakers at 40%, UPS failure at 20%, and balance of system at 10%. We will now discuss the topic of circuit breakers and their importance in the data center. Slide 23 A circuit breaker is a piece of equipment, or a type of switch, that is designed to protect electrical equipment from damage caused by overload or short circuit. Circuit breakers are designed to trip at a given current level. Unlike fuses and switches, circuit breakers can be reset. Large circuit breakers have adjustable trip mechanisms, while smaller circuit breakers, designed for branch circuits, have their trip levels internally preset according to their electrical current rating.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
As mentioned earlier, in the data centers electrical infrastructure, most failures can be traced back to the circuit breaker. Circuit breakers can fail in a number of ways: failure to close; failure to open under fault conditions; spurious trip, where a breaker opens with no fault; and failure to operate with the time-current specifications of the unit. Slide 24 Circuit breakers are designed to interrupt excessive current flow and come in a wide range of sizes. The number of times they trip or switch should be monitored as most have a rated lifetime of 1-10 fault current interruptions. Slide 25 If you trace the path of power into your data center, from the utility through the transformer and UPS down to the load, you will see that there are multiple breaker types all along the way. Some are bigger breakers (600 amps or greater) and some are the commodity type of breakers, such as branch circuit and PDU breakers. Circuit breaker coordination is important. The breaker closest to the fault should open faster than the circuit breakers upstream. Since the bigger breakers are often located upstream, the fault could potentially affect most of the building instead of just part of the building, if the breakers are not properly coordinated. Coordination of breakers is complicated and must be done carefully. Both the rating and speed of breakers must be considered. It is recommended that data center staffs consult with electricians who are well versed in this area. Lets discuss two popular circuit breaker types that may be found in IT equipment: thermal circuit breakers and magnetic circuit breakers. Slide 26 Increasing current raises the temperature inside a thermal circuit breaker. If the current is too high, the thermal circuit breaker gets hot enough to trip the circuit breaker. A common thermal circuit breaker uses a bimetallic strip to trip the breaker. A bimetallic strip sandwiches two different metals
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
together. Current flows through the bimetallic strip, and causes it to heat. Because one metal expands faster than the other metal as the temperature rises, the strip bends. If the current is too high, the metal strip bends enough to break the contact in the electric circuit.
Slide 27 A magnetic circuit breaker uses an electromagnetic coil to pull a switch when a circuit carries too much current. As current increases, the electromagnetic coil pulls with greater force against the spring that keeps the switch closed. When the current is too high for the circuit, the force from the electromagnetic coil overcomes the force of the spring, and forces the switch contact to break the circuit. These two breaker types can also combined into another type of breaker, called a thermalmagnetic circuit breaker. Slide 28 Circuit breakers are designed to be either fast acting or slow acting. A circuit breaker may need to switch short circuit currents as high as 15 times its rated current. A 30 Amp breaker, for example, may need to switch, in an emergency, 450 or more Amps of current. Slide 29 Circuit breakers are designed to trip at 110% of their rated threshold. This allows for normal short term overloads such as the start up currents in electrical motors. For example, a 20 Amp circuit breaker is not guaranteed to trip until the current exceeds 22 Amps. Circuit breaker tripping thresholds may vary according to design specification or safety code requirements. To avoid downtime and unnecessary circuit breaker tripping, a circuit breaker needs to be sized according to both its rated current and its tripping current. Trip settings are adjusted so that the circuit breaker in question will trip in a timely fashion on overload and before the upstream breaker trips.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
It is advisable to choose a breaker designed for the characteristics of the load. For example, some breakers have an HCAR rating, which is a rating for heating, cooling and air conditioning applications. Breakers without this particular rating should not be used for the HVAC systems. Circuit breakers with delayed action may be needed for heavy electrical loads, such as motors, transformers, and air conditioners that draw temporarily high surge currents. The circuit breaker needs to be rated high enough to prevent an electric arc from forming that could jump over the contacts of the switch. Slide 30 Certain types of circuit breakers are designed to trip a circuit if they detect a small amount of ground current. These breakers are known as Ground Fault Circuit Interrupters (GFCI), Earth Leakage Circuit Breakers (ELCI), or Residual-Current Devices (RCD). Because they are too sensitive to currents, and pose a risk to availability, GFCI units are not used in data centers; however, they are commonly placed in damp environments such as swimming pools, bathrooms, kitchens and on construction sites, to protect personnel from electric shock. Larger data centers use resistor banks to limit possible ground currents to safer levels, and protect personnel from electric shock. Next, well discuss why convenience outlets are so important in the data center environment. Slide 31 A convenience outlet is an outlet which is used for non-computer devices. It is important to provide this additional resource outlet which can be used for electronic devices that may be necessary for the data center environment; data center personnel need a place to plug in office equipment or lighting without the worry of tripping a circuit breaker or taxing the power supply. Installing convenience outlets is a way to ensure that enough power is provided to supply not only the critical load, but also any additional power that may be required. Next, well discuss safety issues such as electrical grounding and ground loops.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 32 Grounding is principally a safety measure to protect against electric shock. The grounded wire is connected to the exterior of metal cases on appliances to protect against a hot-wire short inside the appliance. If a short occurs, the ground wire will limit the touch voltage to less than 30 volts and will also provide a return path for the excessive current to trip the branch circuit breaker. Some wires are considered hot, because they are not grounded. Slide 33 Ground loops occur when there is a varying quality of connections to the earth at different points in an electrical installation. The result is that current may flow in unexpected loops between ground connections. Ground loops are a potentially hazardous situation. The solution to stopping ground loops is to confirm the quality of ground connections at all points in an electrical installation. Now, lets discuss seven categories of common power problems and their solutions. Slide 34 Impulsive transients are sudden high peak events that raise the voltage and/or current levels in either a positive or a negative direction. Electrostatic discharge (ESD) and lightning strikes are both examples impulsive disturbances. Impulsive transients can be very fast, happening as quickly as 5 nanoseconds and lasting less than 50 nanoseconds. For example, an ESD may have a peak of over 8000 volts, but last less than 4 billionths of a second. The transient, however, may still be strong enough to damage sensitive electronic equipment. An approach to solve the problem of impulsive transients is the utilization of a Transient Voltage Surge Suppressor (TVSS). A TVSS is a device that either absorbs the transient energy, or short circuits the energy to ground, before it can reach sensitive equipment.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 35 Motors turning on or off commonly cause oscillatory transients for power systems. The voltage quickly rises above its normal level, and then gradually fades back to its normal level over several wave cycles. Slide 36 Interruptions occur when there is a temporary break in the power supplied. There are four types of interruptions: Instantaneous (0.5 cycles to 30 cycles), Momentary (30 cycles to 2 seconds), Temporary (2 seconds to 2 minutes), and Sustained (longer than 2 minutes). An uninterruptible power supply (UPS) can provide short-term backup power during an interruption. Slide 37 A sag or dip is a reduction of AC voltage at a given frequency for a duration of 0.5 cycles to 1 minutes time. Sags are usually caused by system faults, and are also often the result of switching on loads with heavy startup currents. Common causes of sags include starting large loads, such as one might see when they first start up a large air conditioning unit, and remote fault clearing performed by utility equipment. Power line conditioners and UPSs can compensate for sags or dips. Slide 38 According to the IEEE, Undervoltage is a Root Mean Square (RMS) decrease in the AC voltage, at the power frequency, for a period of time greater than one minute. An undervoltage is the result of long-term problems that create sags. The term brownout has been in common usage in describing this problem, but has been superseded because the term is ambiguous in that it also refers to commercial power delivery strategy during periods of extended high demand. Undervoltages can create overheating in motors, and can lead to the failure of non-linear loads such as computer power supply failures. Undervoltages can overheat a motor or make a power supply fail. Power line conditioners and UPSs can compensate for undervoltages.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 39 A swell, or surge, is the reverse form of a sag, having an increase in AC voltage for a duration of 0.5 cycles to 1 minutes time. For swells, high-impedance neutral connections, sudden load reductions, and a single-phase fault on a 3-phase system are common sources. A swell is also prevalent when large loads are switched out of a system. Power line conditioners and UPSs can compensate for swells. Slide 40 According to the IEEE, overvoltage is an RMS increase in the AC voltage, at the power frequency, for durations greater than a few seconds. An Overvoltage is common in areas where supply transformer tap settings are incorrectly set, and where loads have been reduced and commercial power systems continue to compensate for load changes that are no longer necessary. This is common in seasonal regions where communities diminish during off-season. Overvoltage conditions can create high current draw and unnecessary tripping of downstream circuit breakers, as well as overheating and stress on equipment. Power line conditioners and UPSs can compensate for overvoltage. Slide 41 Many different causes of waveform distortion exist. DC Offset happens when direct current is added to an AC power source. DC Offset can damage electrical equipment, such as motors and transformers, by overheating them. Harmonic waveforms are another form of waveform distortion. Harmonics appear on the power distribution system as distorted current. Keep in mind that all equipment that does not have the advantage of modern harmonic-correction features should be isolated on separate circuits.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 42 Voltage fluctuation is a systematic variation of the voltage waveform or a series of random voltage changes of small dimensions, namely 95 to 105% of nominal at a low frequency, and generally below 25 Hz. Power line conditioners and UPSs can compensate for voltage fluctuations. Slide 43 Frequency variation is extremely rare in stable, utility power systems, especially systems interconnected through a power grid. Where sites have dedicated standby generators or poor power infrastructure, frequency variation is more common especially if the generator is heavily loaded. IT equipment is frequency tolerant, and generally not affected by minor shifts in local generator frequency. Next, we will follow the path of power distribution in the data center. Slide 44 Standby Power can be defined as any power source available to the data center that takes over the function of supplying power when utility power is unavailable. Two common forms of standby power are mechanical generators that use electromagnetism to produce electricity, and electrochemical systems which use batteries and fuel cells to generate electrical current. Mechanical generator systems provide power on large and small scales, for entire cities or for individual use. Electrochemical generation is typically for smaller or temporary use. So, how is power distributed in the data center? Lets explore this concept next. Slide 45 Electricians often refer to one line diagrams. One line can be very simple to very complex. At a minimum, it should illustrate the primary electrical components of the electrical system and illustrate how they link and interact with each other.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
This one line lets us see how electrical power is distributed in the data center from a server plug to outlet strips to Power Distribution Units (PDU) to UPS and bypass to Automatic Transfer Switch to the primary power source (Utility) to the emergency power source (Generator). Lets describe the function of each of these components. Slide 46 The utility provides the primary electrical power source for the data center. Ideally, multiple utility feeds should be provided from separate sub-stations or power grids. While not essential, this action will provide back-up and redundancy. An emergency, back-up power source, in the form of a generator, can be positioned to bear the load of both data center components, as well as all essential support equipment, such as air conditioners, in case of power disruption. Slide 47 A circuit is a path for electrical current to flow. A branch circuit is one, two, or more circuits whose main power is connected through the same main switch. Each branch circuit should have its own grounding wire. All wires must be of the same gauge. An uninterruptible power supply, or UPS, is a device or system that maintains a continuous supply of electric power to certain essential equipment that must not be shut down unexpectedly. The UPS equipment is inserted between a primary power source, such as a commercial utility, and the primary power input of equipment to be protected, for the purpose of eliminating the effects of a temporary power outage and transient anomalies. An automatic transfer switch is a switch that will automatically switch the power supply from one power source to another, in case of power disruption or bypass mode. For example, if the utility fails, the automatic transfer switch would immediately switch to UPS or generator power. Slide 48
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
A Power Distribution Unit (PDU) is a device that distributes electric power by usually taking high voltage and amperage and reducing it to more common and useful rates, for example from 220V 30A single phase to multiple 110V 15A or 110V 20A plugs. It is used in computer data centers and sometimes has features like remote monitoring, and control, down to plug level. (Please note: In many countries, such as in parts of Europe and Asia, voltages such as 220-240V and 400V are also common.) An outlet strip is a strip of sockets which allows multiple devices to be plugged in at one time, and usually includes a switch to turn all devices on and off. In a few cases, they may even have all outlets individually switched. Outlet strips are often used when many electrical devices are in close proximity, especially with audio/video and computer systems. A server plug is the power plug or other type of electrical connector which mates with a socket or jack, and in particular, is used with electrical or electronic equipment in the data center. Slide 49 To summarize, lets review some of the information that we have covered throughout the course. Power infrastructure is critical to the uptime of any data center. Understanding basic power terms helps to better evaluate the interaction between the utility, standby power equipment and the load. Failures can occur at various points in the power infrastructure, but special care should be given to the condition and coordination of circuit breakers. Numerous power anomalies exist that can impact the uptime of data center equipment. Understanding the threats and applying practical power solutions such as uninterruptible power supplies and generators can help to minimize the risk. Slide 50 Thank you for participating in this course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Risk Tolerance versus Cost Summary Slide 5 Data center physical security means keeping unauthorized or ill-intentioned people out of places that they do not belong, such as a data center or other locations that may contain critical physical infrastructure. Slide 6 Beyond this course, physical security can also refer to protection from catastrophic damage due to fire, flood, earthquake, violent attack, or utility malfunction, such as power loss or HVAC (Heating, Ventilating, and Air Conditioning) failure for a building. Several companies offer Certified Information Systems Security Professional (CISSP) training, which covers these areas of physical security. Slide 7 How does physical security compare with Network Security? Physical security screens people who want to enter a data center. Network Security screens data that comes to a data center, to protect against such things as computer viruses. Slide 8 Although physical security protects against sabotage, espionage, and theft, it offers even more protection against human error. Based on studies by The Uptime Institute and the 7x24 Exchange, human error accidents and mistakes, such as improper procedures, mislabeled equipment, things dropped or spilled, or mistyped commands can account for 60% of data center downtime. Physical security reduces data center downtime by reducing human presence in a data center to essential personnel only.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 9 To identify what needs protecting, create a conceptual map of the physical facility. Then, locate the areas that need to be secured, and classify them by the strength or level of security. The areas that need to be secured might have concentric boundaries with security strongest at the core Slide 10 Or the areas that need to be secured might have side-by-side boundaries that require comparable security levels. Slide 11 As illustrated in this screen, concentric areas can have different or increasingly stringent access methods, providing added protection called depth of security. The darkest shading indicates the deepest, strongest security. With depth of security, an inner area is protected both by its own access methods and by those of the areas that enclose it. In addition, any breach of an outer area can be met with another access challenge at a perimeter further in. Computer racks stand at the innermost depth of security, because they house critical IT equipment. It is important to include in the security map not only areas containing the functional IT equipment of the facility, but also areas containing elements of the physical infrastructure which, if compromised, could result in downtime. For example, someone could accidentally shut down the HVAC equipment or deliberately steal generator starting batteries. A system management console could be fooled into thinking the fire sprinklers should be activated. To summarize, successful physical security needs to consider any form of physical access that can adversely impact business-critical equipment. Slide 12 Physical security asks two main questions:
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Who are you? and 2. Why are you here? The first question, Who are you? establishes or verifies personal identity. The second question, Why are you here? provides justification for physical access, a reason to be there. Slide 13 Certain individuals who are known to the facility need access to the areas relevant to their position. For example, the security director will have access to most of the facility but not to client data stored at the installation. The head of computer operations might have access to computer rooms and operating systems, but not the mechanical rooms that house power and HVAC facilities. The CEO of the company might have access to the offices of the security director and IT staff and the public areas, but not the computer rooms or mechanical rooms. A reason for access to extremely sensitive areas can be granted to specific people for a specific purpose that is, if they need to know, and only for as long as they have that need. Because a persons organizational role typically implies the reason for access, security focuses on identity verification. The next section discusses physical security identification methods and devices. Slide 14 Identification uses three basic approaches: 1. What you have 2. What you know 3. Who you are
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 15 What you have is something you wear or carry, such as a key, a card, or a small object, such as a token that can be worn or attached to a key ring. Several types of cards and tokens are currently being used for access control. They range from the simple to the sophisticated. They have varying performance, based on several factors, including: Ability to be reprogrammed Resistance to counterfeiting Type of interaction with card reader: swipe, insert, flat contact, no contact (proximity) Convenience: physical form and how carried/worn Amount of data carried Computational ability Cost of cards Cost of reader Slide 16 The magnetic stripe card is the most common type of card, with a simple magnetic strip of identifying data. When the card is swiped in a reader the information is read and looked up in a database. This system is inexpensive and convenient; its drawback is that it is relatively easy to duplicate the cards or to read the information stored on them.The barium ferrite card (also called a magnetic spot card) is similar to the magnetic stripe card but offers more security without adding significant cost. The Weigand card is a variation of the magnetic stripe card. Unlike readers for proximity cards and magnetic-stripe cards, Weigand readers are not affected by radio frequency interference (RFI) or electromagnetic fields (EMF). The robustness of the reader combined with the difficulty in duplicating the card makes the Weigand system extremely secure (within the limits of a what you have method), but also more expensive. Slide 17
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The bar-code card carries a bar code, which is read when the card is swiped in the reader. This system is very low-cost, but easy to fool an ordinary copy machine can duplicate a bar code well enough to fool a bar-code reader. Bar-code cards are good for minimum-security requirements, especially those requiring a large number of readers throughout the facility or a large volume of traffic traversing a given access point. This is not so much a security system as it is an inexpensive access monitoring method. The infrared shadow card improves upon the poor security of the bar-code card by placing the bar code between layers of PVC plastic. The reader passes infrared light through the card, and the shadow of the bar code is read by sensors on the other side. The proximity card, sometimes called a prox card, is a step up in convenience from cards that must be swiped or touched to the reader. As the name implies, the card only needs to be in "proximity" with the reader. It is a card with a built-in silicon chip for onboard data storage and/or computation. The general term for objects that carry such a chip is smart media. Smart cards offer a wide range of flexibility in access control. For example, the chip can be attached to older types of cards to upgrade and integrate with pre-existing systems, or the cardholders fingerprint or iris scan can be stored on the chip for biometric verification at the card reader thereby elevating the level of identification from what you have to who you are. Contactless smart cards having the vicinity range offer nearly ultimate user convenience: halfsecond transaction time with the card never leaving the wallet. Slide 18 What you know is more secure than what you have. An example of what you know is a password, code, or procedure for something such as opening a coded lock, typing a verification code at a card reader, or using a keyboard to access to a computer. Slide 19
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
A password/code presents a security dilemma: if it is easy to remember, it will likely be easy to guess. If its hard to remember, it will likely be hard to guess, but it will also likely be written down, thus, reducing its security.
Slide 20 Who you are refers to identification by recognition of unique physical characteristics. Physical identification is the natural way people identify one another with nearly total certainty. When accomplished (or attempted) by technological means, it is called biometrics. Slide 21 Researchers have developed several computer-based, biometric scanning techniques that look for human features, such as: Fingerprints and Hands or the shape of fingers and thickness of hands The Iris, or the pattern of colors within the Iris The Face or the relative position of the eyes, nose, and mouth The Retina or the pattern of blood vessels within the retina Handwriting or the dynamics of the pen as it moves, and Voice Slide 22 This chart shows the relative reliability of each of the three basic security identification approaches. What you have is the least reliable form of identification, because there is no guarantee that the correct person will use it. It can be shared, stolen, or lost and found. What you know is more reliable than What you have, but passwords and codes can still be shared, and if they are written down, they carry the risk of discovery.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Who you are is the most reliable security identification approach, because it is based on something physically unique to you. Biometric devices are generally very reliable, if recognition is achieved that is, if the device thinks it recognizes you, then it almost certainly is you. The main source of unreliability for biometrics is not incorrect recognition or spoofing by an imposter, but the possibility that a legitimate user may fail to be recognized (false rejection).
Slide 23 A typical security scheme uses methods of increasing reliability. For example, entry into the building might require a combination of swipe card plus PIN; entry to the computer room might require a keypad code plus a biometric. Combining methods at an entry point increases reliability at that point; using different methods for each level significantly increases security at inner levels, since each is secured by its own methods plus those of outer levels that must be entered first. Slide 24 A safe location enhances physical security. Best practices include keeping a data center in its own building, away from urban areas, airports, high voltage power lines, flood areas, highways, railways, or hazardous manufacturing plants. If a data center has to be in the same building as the business it supports, the data center should be located toward the center of the building, away from the roof, exterior walls, or basement. Data center location within a building is further discussed in the next section, Building Design. Slide 25 When building a new facility or renovating an old one, physical security can be addressed from the ground up by incorporating architectural and construction features that discourage or thwart intrusion. Security considerations in the structure and layout of a building generally relate to
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
potential entry and escape routes, access to critical infrastructure elements such as HVAC and wiring, and potential sources of concealment for intruders. Slide 26 Specific suggestions include: Data Center Position -- Position the data center door in such a way that only traffic intended for the data center is near the door Steel Doors -- Use steel doors and frames, with solid doors instead of hollow-core. Make sure that hinges cannot be removed from the outside
Walls --Data center walls should use materials sturdier than the typical sheet rock used for interior walls. Sensors -- Sensors can be imbedded in the walls to detect tampering Exteriors -- The room used for the data center should not abut any outside walls Sight Lines -- Allow long and clear lines of sight for any security stations or cameras within the data center. Slide 27 Concealment -- Make use of barriers to obstruct views of the entrances and other areas of concern from the outside world. This prevents visual inspection by people who wish to study the building layout or its security measures Ducts -- Be aware of the placement of ventilation ducts, service hatches, vents, service elevators and other possible openings that could be used to gain access. Avoid Clutter -- Avoid creating spaces that can be used to hide people or things. Locks -- Install locks and door alarms to all roof access points so that security is notified immediately upon attempted access. Avoid points of entry on the roof whenever possible. The keys should be of the type that cannot be casually duplicated.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Plumbing -- Take note of all external plumbing, wiring, HVAC, etc., and provide appropriate protection. Slide 28 And Lastly, some other methods of physical security are: Locked Cages Keep critical equipment stored in locked cages that have individual locks. Key Holder List Keep a list of who has keys. Backup Lighting Critical areas should have backup lighting in case of power interruption. Limited Number of Entrances Entrances to a Data Center or other areas that contain critical equipment should be limited to what is required by fire and safety regulations.
Slide 29 A common and frustrating loophole an in otherwise secure access control systems can be the ability of an unauthorized person to follow through a checkpoint behind an authorized person. This is called piggybacking or tailgating if the unauthorized person slips through undetected. The traditional solution is an airlock-style arrangement called a mantrap, which consists of having doors at entry and exit, with room for only one person in the space between the doors. Mantraps can be designed with access control for both entry and exit, or for exit only in which case a failed attempt to exit the enclosure causes the entry door to lock and an alert to be issued indicating that an intruder has been caught. A footstep detecting floor can be added to confirm there is only one person passing through. A new technology for solving this problem uses an overhead camera for optical tracking and tagging of individuals as they pass, issuing an alert if it detects more than one person per authorized entry. Slide 30 Still cameras can be used for such things as recording license plates at vehicle entry points, or in conjunction
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
with footstep sensors to record people at critical locations. Closed circuit TV (CCTV) cameras hidden or visible can provide interior or exterior monitoring, deterrence, and post-incident review. Several types of camera views can be used fixed, rotating, or remotely controlled. Some things to consider when placing cameras: Is it important that a person in camera view be easily identifiable? Is it only necessary to determine if the room is occupied? Are you watching to see if assets are being removed? Is the camera simply to serve as a deterrent? Slide 31 If CCTV signals are recorded, there must be procedures in place to address the following issues: How will tapes be indexed and cataloged for easy retrieval? Will the tapes be stored on site or off site? Who will have access to the tapes? What is the procedure for accessing tapes? How long will the tapes be kept before being destroyed? Slide 32 Despite all the technological advancements in the field of physical security, experts agree that a quality staff of protection officers tops the list of methods for backing up and supporting access control. Guards provide the surveillance capability of all the human senses, plus the ability to respond with mobility and intelligence to suspicious, unusual, or disastrous events. The International Foundation for Protection Officers (IFPO) is a non-profit organization founded for the purpose of facilitating standardized training and certification of protection officers. Their Security Supervisor Training Manual is a reference guide for protection officers and their employers. Slide 33
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Everyone is familiar with traditional house and building alarm systems and their sensors motion sensors, heat sensors, contact (door-closed) sensors, and the like. Data center alarm systems might use additional kinds of sensors as well laser beam barriers, footstep sensors, touch sensors, vibration sensors. Data centers might also have some areas where a silent alarm is preferred over an audible one in order to catch perpetrators in the act. If the sensors are networkenabled, they can be monitored and controlled remotely by a management system, which could also include personnel movement data from access-control devices. Slide 34 Any security system design must consider how to handle visitors. Typical solutions are to issue temporary badges or cards for low-security areas, and to require escorting for high security areas. Mantraps, designed to prevent two people from passing an entry point with one authorization, require a provision for a temporary override or for issuance of visitor credentials to allow passage.
Staff and visitors should be required to wear badges that are visible at all times while in a facility. Visitors should be required to sign in through a central reception area, and be made aware of all visitor policies prior to facility access. Slide 35 Technology cant do the job all by itself, particularly since we are calling upon it to perform what is essentially a very human task: assessing the identity and intent of people. While people are a significant part of the security problem, they are also part of the solution the abilities and fallibilities of people uniquely qualify them to be not only the weakest link, but also the strongest backup. Slide 36 In addition to mistakes and accidents, there is inherent risk in the natural human tendency toward friendliness and trust. A known person entering the facility could be a disgruntled employee; the
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
temptation to bend rules or skip procedures for a familiar face could have disastrous consequences; a significant category of security breach is the inside job. Even strangers can have surprising success overcoming security the ability of a clever stranger to use ordinary guile and deceit to gain access is so well documented that it has a name: social engineering. Anyone in an area where harm could be done must be well trained not only in operational and security protocols, but also in resistance to creative social engineering techniques. An asset removal policy should be in place for company materials. Security guards should be trained and authorized to monitor, document, and restrict the removal of company assets, such as computer media and computer equipment, such as laptops. Slide 37 Protection from a security breach often comes down to the recognition and interpretation of unexpected factors a skill in which technology is no match for alert people. Add an unwavering
resistance to manipulation and shortcuts, and human presence can be a priceless adjunct to technology. Beyond an alert staff, the incomparable value of human eyes, ears, brains, and mobility also qualifies people for consideration as a dedicated element in a security plan the old-fashioned security guard. The presence of guards at entry points and roving guards on the grounds and inside the building, while expensive, can save the day when there is failure or hacking of technological security. The quick response of an alert guard when something isnt right may be the last defense against a potentially disastrous security breach. In protecting against both accidental and deliberate harm, the human contribution is the same, strict adherence to protocols.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 38 The two overall considerations for physical security are: Define the problem Apply the technology The first objective identifies what needs protecting, and identifies who is permitted access. The second objective selects a set of access control methods for authorized personnel, and adds other elements to back up the overall security strategy. Slide 39 One of the last topics we will be discussing is risk tolerance verses cost. This chart summarizes the relationship between costs due to loss and costs due to security. As you can see, finding the right balance between cost and need is the key. Slide 40 Physical security means keeping unauthorized or ill-intentioned people out of places that they do not belong, such as a data center or other locations that may contain Network Critical Physical Infrastructure Physical security screens people who want to enter a data center -- Network Security screens data that comes to a data center Human error accounts for 60% of data center downtime Create a map of the physical facility to identify what needs protecting, then identify the areas that need to be secured The two main questions that security access asks are Who are you? Why are you here? Slide 41 The three main security identification methods are:
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
What you have What you know Who you are Security identification methods should be combined to improve reliability Biometric security identification is not yet efficient enough to use on a mass scale or as an exclusive form of identification The two overall considerations for physical security are: Define the problem Apply the technology Potential loss needs to be weighed against the known costs of security, when designing and implementing a physical security strategy Slide 42 Thank you for participating in this Data Center University course. Slide 43 To test your knowledge of the course material click the Knowledge Checkpoint link on your Data Center University personal homepage Important Point! The Knowledge Checkpoint link is located under BROWSE CATALOG on the left side of the page Slide 44 Here at Data Center University, we value your opinion! We are dedicated to providing you with relevant, cutting edge education on topics pertinent to data center design, build, and operations, when and where you need it. So, please take our brief survey and tell us how were doing. How do you begin? Its easy! 1) Click on the Home icon, located in the right corner of your screen. 2) Click on the We Value Your Opinion" link on the left side of the screen under Browse DCU Courses. 3) Select the course title you have just completed and take our brief survey.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Heat Transfer and Heat Generation The Ideal Gas Law The Refrigeration Cycle, and Summary
Slide 5 Every Information Technology professional who is involved with the operation of computing equipment needs to understand the function of air conditioning in the data center or network room. This course explains the function of basic components of an air conditioning system for a computer room. Slide 6 Whenever electrical power is being consumed in an Information Technology (IT) room or data center, heat is being generated. We will talk more about how heat is generated a little later in this course. In the Data Center Environment, heat has the potential to create significant downtime, and therefore must be removed from the space. Data Center and IT room heat removal is one of the most essential yet least understood of all critical IT environment processes. Improper or inadequate cooling significantly detracts from the lifespan and availability of IT equipment. A general understanding of the fundamental principles of air conditioning and the basic arrangement of precision cooling systems facilitates more precise communication among IT and cooling professionals when specifying, operating, or maintaining a cooling solution. Slide 7 Despite revolutionary changes in IT technology and products over the past decades, the design of cooling infrastructure for data centers had changed very little since 1965. Although IT equipment has always required cooling, the requirements of todays IT systems, combined with the way that those IT systems are deployed, has created the need for new cooling-related systems and
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
strategies which were not foreseen when the cooling principles for the modern data center were developed over 30 years ago.
Slide 8 Today's technology rooms require precise, stable environments in order for sensitive electronics to operate at their peak. IT hardware produces an unusual, concentrated heat load, and at the same time, is very sensitive to changes in temperature or humidity. Most buildings are equipped with Comfort Air Conditioning units, which are designed for the comfort of people. When compared to computer room air conditioning systems, comfort systems typically remove an unacceptable amount of moisture from the space and generally do not have the capability to maintain the temperature and humidity parameters specified for IT rooms and data centers. Precision air systems are designed for close temperature and humidity control. They provide year-round operation, with the ease of service, system flexibility, and redundancy necessary to keep the technology room up and running. As damaging as the wrong ambient conditions can be, rapid temperature swings can also have a negative effect on hardware operation. This is one of the reasons hardware is left powered up, even when not processing data. Design conditions are recommended to be in the range of 72-75F (22-24C). Precision air conditioning is designed to constantly maintain temperature within 1F (0.56C). In contrast, comfort systems are unable to provide such precise temperature and humidity controls. Slide 9 A poorly maintained technology room environment will have a negative impact on data processing and storage operations. A high or low ambient temperature or rapid temperature swings can corrupt data processing and shut down an entire system. Temperature variations can alter the electrical and physical characteristics of electronic chips and other board components, causing
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
faulty operation or failure. These problems may be transient or may last for days. Even transient problems can be very difficult to diagnose and repair. Slide 10 High Humidity High humidity can result in tape and surface deterioration, condensation, corrosion, paper handling problems, and gold and silver migration leading to component and board failure. Low Humidity Low humidity greatly increases the possibility of static electric discharges. Such static discharges can corrupt data and damage hardware.
Slide 11 Now that we know that heat threatens availability of IT equipment, its important to understand the physics of cooling, and define some basic terminology. First of all, what is Heat? Heat is simply a form of energy that is transferred by a difference in temperature. It exists in all matter on earth, in varied quantities and intensities. Heat energy can be measured relative to any reference temperature, body or environment. What is Temperature? Temperature is most commonly thought of as how hot or cold something is. It is a measure of heat intensity based on three different scales: Celsius, Fahrenheit and Kelvin. What is Pressure? Pressure is a basic physical property of a gas. It is measured as the force exerted by the gas per unit area on its surroundings.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
What is Volume? Volume is the amount of space taken up by matter. The example of a balloon illustrates the relationship between pressure and volume. As the pressure inside the balloon gets greater than the pressure outside of the balloon, the balloon will get larger. Therefore, as the pressure increases, the volume increases. We will talk more about the relationship between pressure, volume and temperature a little later in this course. Slide 12 Now that we know the key terms related to the physics of cooling, we can now explore the 3 methods of heat transfer. A unique property of heat energy is that it can only flow in one direction, from hot to cold. For example if an ice cube is placed on a hot surface, it cannot drop in temperature; it can only gain heat energy and rise in temperature, thereby causing it to melt. A second property of heat transfer is that Heat energy cannot be destroyed. The third property is that heat energy can be transferred from one object to another object. In considering the ice cube placed on a hot surface again, the heat from the surface is not destroyed, rather it is transferred to the ice cube which causes it to melt. Slide 13 There are three methods of heat transfer: conduction convection and radiation. Conduction is the process of transferring heat through a solid material. Some substances conduct heat more easily than others. Solids are better conductors than liquids and liquids are better conductors than gases. Metals are very good conductors of heat, while air is very poor conductor of heat. Slide 14
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Convection is the result of transferring heat through the movement of a liquid or gas. Radiation related to heat transfer is the process of transferring heat by means of electromagnetic waves, emitted due to the temperature difference between two objects. Slide 15 For example, blacktop pavement gets hot from radiation heat by the suns rays. The light that warms the blacktop from the Sun is a form of electromagnetic radiation. Radiation is a method of heat transfer that does not rely upon any contact between the heat source and the heated object. If you step barefoot on the pavement, the pavement feels hot. This feeling is due to the warmth of the pavement being transferred to your cold feet by means of conduction. The conduction occurs when two objects at different temperatures are in contact with each other. Heat flows from the warmer to the cooler object until they are both the same temperature. Finally, if you look down a road of paved blacktop, in the distance, you may see wavy lines emanating up from the road, much like a mirage. This visible form of convection is caused by the transfer of heat from the surface of the blacktop to the cooler air above. Convection occurs when warmer areas of a liquid or gas rise to cooler areas in the liquid or gas. As this happens, cooler liquid or gas takes the place of the warmer areas which have risen higher. This cycle results in a continuous circulation pattern and heat is transferred to cooler areas. "Hot air rises and cool air falls to take its place" - this is a description of convection in our atmosphere. Slide 16 As mentioned earlier, heat energy can only flow from hot to cold. For this reason, we have air conditioners and refrigerators. They use electrical or mechanical energy to pump heat energy from one place to another, and are even capable of pumping heat from a cooler space to a warmer space. The ability to pump heat to the outdoors, even when it is hotter outside than it is in the data center, is a critical function that allows high-power computing equipment to operate in an enclosed space. Understanding how this is possible is a foundation to understanding the design and operation of cooling systems for IT installations.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 17 Whenever electrical power is being consumed in an Information Technology (IT) room or data center, heat is being generated that needs to be removed from the space. This heat generation occurs at various levels throughout the data center, including the chip level, server level, rack level and room level. With few exceptions, over 99% of the electricity used to power IT equipment is converted into heat. Unless the excess heat energy is removed, the room temperature will rise until IT equipment shuts down or potentially even fails. Slide 18 Lets take a closer look at heat generation at the server level. Approximately 50% of the heat energy released by servers originates in the microprocessor. A fan moves a stream of cold air across the chip assembly. The server or rack-mounted blade assembly containing the microprocessors usually draws cold air into the front of the chassis and exhausts it out of the rear. The amount of heat generated by servers is on a rising trend. A single blade server can release 4 Kilowatts (kW) or more of heat energy into the IT room or data center. Such a heat output is equivalent to the heat released by forty 100-Watt light bulbs and is actually more heat energy than the capacity of the heating element in many residential cooking ovens. Now that we have learned about the physics and properties of heat, we will talk next about the Ideal Gas Law. Slide 19 Previously, we defined pressure, temperature, and volume. Further, it is imperative to the understanding of data center cooling to recognize how these terms relate to each other. The relation between pressure (P), volume (V) and temperature (T) is known as the Ideal Gas Law, which states PV/T= constant . In this equation, P = pressure of gas, V = volume occupied, and T = temperature. In simpler terms, if pressure is constant, an increase in temperature results in a
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
proportional increase in volume. If volume is constant, an increase in temperature results in a proportional increase in pressure. Inversely, if volume is decreased and pressure remains constant, the temperature must decrease. Basically, pressure and volume are directly proportional to temperature and inversely proportional to each other. Slide 20 Pressure and temperature are both controlled by the ideal gas law. However, because the volume is not held constant (that is, the atmosphere can expand and contract), the relationships between pressure and temperature are complex. Temperature decreases linearly with increasing altitude, whereas pressure decreases exponentially. For example, you may have experienced the outside of an aerosol can becoming colder as you spray it. This is because the can is a fixed volume, and as the pressure within the can decreases as it is sprayed, the temperature also decreases causing the can to feel cold. Slide 21 The refrigeration cycle is a closed cycle of evaporation, compression, condensation and expansion, that has the net effect of moving heat energy away from an environment and into another environment, in this case, from inside the data center, to the outdoors. The working fluid used in the refrigeration cycle is known as the refrigerant. Modern systems primarily use fluorinated hydrocarbons that are nonflammable, non-corrosive, nontoxic, and nonexplosive. Refrigerants are commonly referred to by their ASHRAE numerical designation. The most commonly used refrigerant in the IT environment is R-22. Environmental concerns of ozone depletion may lead to legislation increasing or requiring the use of alternate refrigerants like R134a. Slide 22 Refrigerant changes its physical state from liquid to gas and back to liquid again each time it traverses the various components of the refrigeration cycle. As the refrigerant changes state from
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
liquid to gas, heat energy flows into the refrigerant from the area to be cooled (the IT environment for example). Conversely, as the refrigerant changes state from gas to liquid, heat energy flows away from the refrigerant to a different environment (outdoors or to a water source). Slide 23 Evaporation is the first step in removing heat energy from a computer room, and is the first step in the Refrigeration Cycle. The evaporator coil acts as an automobile radiator operating in reverse. Slide 24 Warm air from the computer room is blown across the evaporator coil by a fan, while the tubes comprising the coil are supplied with the refrigerant exiting the expansion valve. When the warm computer room air passes through the cold evaporator coil it is cooled and this cool air is delivered back to the computer room. Even though the evaporator coil is cold, at approximately 46F (7.8C), the refrigerant inside is evaporating, or boiling, changing from liquid to a gaseous state. It is the heat from the computer room that is boiling the refrigerant, passing heat energy to the refrigerant in the process. The refrigerant at this point is a cool gas in a small pipe that is carrying the heat energy away from the computer room. Slide 25 Compression is the next step in removing heat energy from a computer room. The vaporized but cool refrigerant carrying the heat from the data center is drawn into a compressor. Slide 26 This compressor has two important functions: It pushes the refrigerant carrying the heat energy around the refrigeration loop. 2. It compresses the gaseous refrigerant from the evaporator coil to over 200 psi. It is a fundamental property of gases that the compression of a gas causes its measured temperature to rise. Therefore, the moving gaseous refrigerant exiting the compressor is hot, over 125F (52C), as well as compressed. This temperature rise due to compression is the key to the ability of the refrigeration cycle to eject heat into the outdoor environment.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 27 The next stage of the refrigeration cycle is Condensation. In this stage, the hot compressed refrigerant carries the computer room heat energy from the compressor to the Condenser Coil. Slide 28 The coil is made of small tubes coiled up into a block of metal fins and resembles an automobile radiator. This coil transfers heat to the air and operates at a temperature HIGHER than the air. This means that the air flowing across the coil is heated by the coil, and that the hot gaseous refrigerant flowing through the coil is conversely cooled. Heat is flowing from the refrigerant to the air. The air is typically blown across the hot coil by a fan which exhausts the hot air to the outdoors. In this way the heat energy from the computer room has been pumped to the outdoors. The Condenser coil acts similarly to the radiator in a car, in that it carries heat from the engine to the air outside the car. Slide 29 In the next stage, the expansion stage, the refrigerant exits the Condenser Coil as a high-pressure liquid, although at a lower temperature. Slide 30 The refrigerant then passes through an expansion valve which has two key functions that are critical to the refrigeration cycle: It precisely regulates the flow of high-pressure refrigerant at a rate that maintains an optimal difference in pressure to ensure efficient cooling. Secondly, the refrigerant escapes the expansion valve as a cooled refrigerant. Slide 31 Once this cooled refrigerant reaches the evaporator Coil, it is changed to a gas. This is because the boiling point of the liquid refrigerant is extremely low. Therefore as the warm air from the computer room blows across the coils of the evaporator, the refrigerant that enters the coil gets
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
heated and starts boiling. Thus it changes to a gas. In this way, the cold refrigerant absorbs the heat energy from the air and carries it away from the data center. At this stage, the refrigeration cycle is repeated, and the net result of the process is that heat is continuously flowing into the Evaporator Coil and continuously flowing out of the Condenser Coil. Slide 32 To summarize, lets review some of the information that we have covered throughout this course. When IT equipment is operating, heat is generated, and the removal of this heat is critical to the proper functioning of data center environments Precision Cooling systems are required to provide adequate cooling conditions for IT spaces Heat, Pressure, Temperature and Volume are interrelated, as demonstrated by the Ideal Gas Law Heat is transferred via Conduction, Convection and Radiation, and can only move from areas of high heat to areas of low heat Refrigeration Cycle is a closed cycle of evaporation, compression, condensation and expansion that serves to remove heat from the data center Slide 33 Thank you for participating in this Data Center University course. Slide 34 To test your knowledge of the course material click the Knowledge Checkpoint link on your Data Center University personal homepage Important Point! The Knowledge Checkpoint link is located under BROWSE CATALOG on the left side of the page Slide 35 Here at DCU, we value your opinion! We are dedicated to providing you with relevant, cutting edge education on topics pertinent to data center design build, and operations, when and where you
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
need it. So, please take our brief survey and tell us how were doing. How do you begin? Its easy! 1) Click on the Home icon, located in the right corner of your screen. 2) Click on the We Value Your Opinion" link on the left side of the screen under Browse DCU Courses. 3) Select the course title you have just completed and take our brief survey.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Ideal Gas Law, The refrigeration cycle and precision vs. comfort cooling. In this course, we will continue with the discussion of cooling in the data center by addressing cooling related devices for the IT space, humidity and static electricity, relative humidity and demand fighting, dew point control, humidification systems, factors affecting humidity control, humidity and temperature measurement, operational set points, short cycling, and finally, a summary. Slide 5 Every Information Technology professional who is involved with the operation of computing equipment needs to understand the importance of air conditioning in the data center or network room. Data center and IT room heat removal and humidity management is one of the most essential yet least understood of all critical IT environment functions. Improper or inadequate cooling and humidity management significantly detracts from the lifespan and availability of IT equipment. A general understanding of these principles facilitates more precise communication among IT and cooling professionals when specifying, operating, or maintaining a cooling solution. This course explains the role humidity plays in data center cooling. Slide 6 A data center must continuously operate at peak efficiency in order to maintain the business functions it supports and to decrease operational expenses. In this environment, heat has the potential to create significant downtime, and therefore must be removed from the space. In addition to heat, the control of humidity in Information Technology environments is essential to achieving high availability. Humidity can affect sensitive electronic equipment in adverse ways, and therefore strict humidity controls are required. IT and cooling professionals need a general understanding of the effects of humidity on their mission-critical systems in order to achieve peak performance. Slide 7 There are a few devices commonly used in data center cooling strategies that we will be referring to throughout this course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The first device is a Computer Room Air Conditioning unit or CRAC. This device, usually installed in the data center, uses a self-contained refrigeration cycle to remove heat from the room and directs it away from the data center through some kind of cooling medium. A CRAC must be used with a heat rejection system which then transfers the heat from the data center into the environment. We will also refer to a CRAH, or Computer Room Air Handling unit. This is a device usually installed in the data center or IT room that uses circulating chilled water to remove heat. A CRAH must be used in conjunction with a chiller. A Chiller is a device used to continuously refrigerate large volumes of water by way of the refrigeration cycle. This large volume of chilled water is distributed to Computer Room Air Handlers (CRAH) which are designed to remove heat from the IT environment. Finally, later in the course, humidifiers, which are devices used to add moisture to the air will be discussed. Additional information on these devices will be covered in Fundamentals of Cooling Part 3. Slide 8 The control of humidity in Information Technology environments is essential to achieving high availability. The primary benefit in maintaining proper humidity levels is a reduction in Static Electricity, which by definition is an electrical charge at rest. Damage from electrostatic discharge can be catastrophic, but more likely will cause low-grade damage that may be initially undetectable, but increases potential for later failures. Static Electricity results from low air humidity, or dry air. The movement of dry cooling air throughout the data center itself can be a source of static electricity every time it moves across an ungrounded insulated surface and must be guarded against by maintaining proper humidity levels. Therefore, making the air itself just a little more electrically conductive and the surfaces it touches just slightly wet, reduces the potential for a build up of electrical charges that lead to an electrostatic discharge.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 9 Many different things make up the air that surrounds us. Its a combination of gases consisting of nitrogen, oxygen, carbon dioxide, and water vapor. The water vapor in air is known as humidity. Air in the IT environment must contain the proper amount of humidity in order to maximize the availability of computing equipment. Too much or too little humidity directly contributes to reduced productivity and equipment downtime. Slide 10 The IT environment is affected by many of the same conditions as the atmosphere around us. When watching the evening weather report, you most likely have heard the terms Relative Humidity, Dew Point and Saturation. Relative humidity represents the actual amount of water vapor in the air relative to the maximum amount of water vapor the air can hold at a given temperature. As the air temperature increases, the air can hold more water vapor. Relative humidity is always expressed as a percentage from 0% to 100%. Think of the air in the data center as a giant sponge, which contains a constant amount of water. As the air increases in temperature, the sponge gets bigger and bigger; therefore, it has the capacity to hold more water. Because the amount of water is held constant however, the water molecules are less concentrated, thereby causing the sponge to feel rather dry. This is an example of a low relative humidity. As temperature increases with a fixed amount of water, relative humidity decreases. Now, as the temperature decreases, the sponge will get smaller, as will its capacity to hold water. The ratio of actual amount of water in the sponge RELATIVE to the maximum amount of water that the sponge can hold is much higher, or higher percent relative humidity. As the temperature continues to decrease, the relative humidity reaches 100%, and the sponge becomes saturated (it can not hold anymore water). It is at this temperature (known as dew point) that the water vapor leaves the air, and appears as liquid water droplets on any object in the data center including IT equipment. Dew point is always expressed as a temperature. As temperature decreases with a fixed amount of water, relative humidity increases.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
A common example of this is when a cold drink is left outside on a warm summer day, and droplets of water form on the can or glass. This is because the cold drink cools the surrounding air to a temperature lower than the airs dew point. The air has more water vapor than it can hold at its new lower temperature and the extra water vapor leaves the air as liquid water droplets on the glass. Relative humidity and dew point are related terms. The dew point for air at a given temperature will rise as the airs relative humidity increases. Another important term related to humidification is saturation. When air reaches 100% relative humidity the airs dew point is always equal to its temperature and the air is considered saturated. Slide 11 The amount of water normally contained in air is actually very small. As an example, the air inside a small data center measuring 30 feet by 20 feet and having a 10-foot ceiling will contain just over 40 ounces of water vapor under normal conditions. For example, if the temperature in the small data center was 73F, the 40 ounces of water vapor contained in the air would equate to a relative humidity of 50%. If the relative humidity is zero there is no water vapor present. If the relative humidity is 100%, then the air is holding all the water vapor it possibly can. The amount of water that can be contained in this volume of air is not fixed however. As the temperature of air increases it has the ability to hold more and more water vapor. As air temperature decreases, its ability to hold water also decreases. Slide 12 As mentioned, relative humidity, dew point, and temperature are all related. Therefore, to control IT environment humidity and temperature you can either maintain the relative humidity, or maintain the dew point temperature at the Computer Room Air Conditioning (or CRAC) Level. Slide 13 Lets look at maintaining relative humidity first. Remember that as air increases in temperature it requires more moisture be added to maintain the same relative humidity. Take for example, a data
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
center with two Computer Room Air Conditioning , or (CRAC) units with the same Relative Humidity setting, say 45%. If the air in that room is returning to the CRACs at different temperatures, for example one at 75 degrees, and one at 70 degrees, the higher temperature return air will have more water added to it by the humidifier in the CRAC unit than the lower temperature return air will. When a room contains several CRAC units set to maintain the same RH setting, the unequal addition of moisture among the units can eventually trigger one or more of the units to go into dehumidification mode. The other CRAC units will detect the resulting drop in humidity and will increase their own humidification to compensate. In an unmonitored room containing several CRAC units, its possible to have half the rooms cooling units adding humidity while the other half work to reduce it. This condition is known as demand fighting. Slide 14 Lets look a little closer at the problem of demand fighting. If Computer Room Air Conditioning (CRAC) units in a data center do not work together in a coordinated fashion, they are likely to fall short of their cooling capacity and incur a higher operating cost. CRAC units normally operate in four modes: Cooling, Heating, Humidification and Dehumidification. While two of these conditions may occur at the same time (i.e., cooling and dehumidification), all systems within a defined area should always be operating in the same mode. Demand fighting can have drastic effects on the efficiency of the CRAC system leading to a reduction in the cooling capacity, and is one of the primary causes of excessive energy consumption in IT environments. If not addressed, this problem can result in a 20-30% reduction in efficiency which, in the best case, results in wasted operating costs and worst case, results in downtime due to insufficient cooling capacity. Slide 15 Lets now look at maintaining dew point. Dew point control of IT environment humidity is more cost effective than relative humidity control, as it greatly reduces the frequency of demand fighting. This is due to the fact that as air increases in temperature in an IT environment its dew point stays the same. For example, air at 90F exiting a piece of computer equipment has exactly the same dew
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
point as the 70F air entering the computer. Relative humidity and measured air temperature are always related for any specific dew point temperature. When several CRAC units are set to maintain humidity via dew point large differences in return air temperature will not drive excessive humidification or dehumidification in different units. All cooling units simply maintain humidity based on the actual amount of water required in each pound of air that passes through the unit.
Slide 16 Humidification systems are used to increase the moisture content of air. They exist in virtually all data centers and in some cases are almost continuously used. They are commonly installed in precision cooling systems but may also be stand-alone central systems. Humidifiers installed inside computer room air conditioners or air handlers replace water lost to condensation before the air exits the cooling unit. Slide 17 There are three types of humidification systems commonly installed in computer room air conditioners and air handlers: Steam canister humidifiers, Infrared humidifiers and Ultrasonic humidifiers. All three designs effectively humidify the IT environment. Slide 18 Steam canister humidifiers are composed of a water-filled canister containing electrodes. When the electrodes are powered, water is boiled and steam (water vapor) is produced. The steam is introduced via a tube into the air stream to be humidified. The latest steam canister designs have the capability to regulate the amount of steam they produce to the exact amount needed and also have the ability to compensate for electrode fouling. This results in better humidity control, less electrical consumption and fewer maintenance requirements.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 19 Infrared humidifiers suspend quartz lamps over an open pool of water. The effect of the intense infrared light on the surface of the water is the release of water vapor that migrates into the air stream requiring humidification. Slide 20 Ultrasonic humidifiers rapidly vibrate water to create a fog or mist that is introduced into the air stream requiring humidification. Ultrasonic humidifiers require a reverse-osmosis water purification system to supply water, however smaller systems can sometimes use de-ionized water. Slide 21 People in a data center, and leaking or un-insulated water pipes can increase humidity in the IT environment, while the air conditioning process and infiltration by drier outside air can decrease humidity. Minimizing these factors that affect humidity internal to the IT environment is equally as important as controlling external factors. By controlling both internal and external factors that affect humidity levels in the data center, IT professionals may maximize the performance of the systems that have been designed to regulate that humidity. Slide 22 Minimizing infiltration of external factors protects the IT environment from chronic humidity control problems that become acute with significant changes in outside weather. The use of vapor barriers in the construction or renovation of computer rooms and data centers will help to control infiltration. A vapor barrier is any form of protection that surrounds the IT environment against uncontrolled humidity gain or loss from outside the room. A vapor barrier could could simply involve sealing doorways, or it could mean retrofitting the structure of the data center to seal the entire space. It is important to consider certain conditions when utilizing a vapor barrier. These include: Sealing perimeter infiltrations This involves blocking and sealing all entrance points that lead to uncontrolled environments
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Sealing doorways Doors and doorways should be sealed with high efficiency gaskets and sweeps to guard against air and vapor leaks Paint perimeter walls all perimeter walls from the structural deck to the ceiling should be treated with a paint impenetrable to moisture in order to minimize the amount of moisture infiltration. Avoid unnecessary openings this becomes particularly relevant in spaced that have been converted to IT rooms. Open access windows, mail slots, and too-large cable openings should all be blocked or sealed. Slide 23 Office space that is converted into a computer room but still retains the building air conditioning system for ventilation purposes creates unique challenges and benefits. The benefit is that the outdoor air required for ventilation is already processed by the building climate control system to a moderate temperature and humidity level before it enters the computer room. The challenge is ensuring that the large volume of air that building systems typically introduce into office space (now converted to a computer room) does not conflict with the operation of the rooms additional precision cooling equipment. Slide 24 For example, if the volume of air entering the room from the building ventilation system is warmer or at a different relative humidity than the desired setting on the computer room air conditioner, a portion of the air conditioners capacity will be used to cool or change the humidity of the air as necessary. Computer rooms with temperature and humidity problems that utilize both building and precision cooling systems require operational scrutiny to prevent demand fighting situations. Slide 25 In order to evaluate overall cooling system performance, it is important to take periodic measurements of humidity and temperature in the data center. To measure the humidity levels in the data center, the single most important place to measure is at the cooling air intake on IT
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
equipment, which on most pieces of computing equipment is located in the front. Note the exhaust air exiting the server has a higher temperature and lower humidity but the dew point is unchanged. This is because the nature of the heat a server generates raises the temperature of the entering air but does not change the amount of moisture in the air. Slide 26 Measurement at every piece of IT equipment is not normally possible. In environments that use rack enclosures, it is acceptable to monitor humidity inside the front door of the enclosure. Monitoring points should be 2 inches or 50 mm off the face of the rack equipment in the top 1/3 of the rack enclosure. This is the elevation where damaging low humidity conditions at the equipment air intake are most likely to occur. The use of a temperature-humidity probe that interfaces with currently used operating and control systems will facilitate monitoring and provide proactive warning of out-of-range humidity conditions . There are also many hand-held monitoring devices available that allow for spot-checking of temperature and relative humidity anywhere in the room. Slide 27 Measuring the temperature at the CRAC unit validates system performance. In order to do this, both return and supply temperatures must be measured. Three monitoring points should be used on the supply and return at the geometric center. In ideal conditions, the supply air temperature should be set to the inlet temperature required at the server inlet. The return air temperature measured should be much greater than the temperature at the server inlet. Slide 28 CRAC units should be tested to ensure that measured temperatures (supply & return) and humidity readings are consistent with design values. Set points for temperature and humidity should be consistent on all CRAC units in the data center. Unequal set points will lead to demand fighting and fluctuations in the room. Heat loads and moisture content are relatively constant in an area and CRAC unit operation should be set in groups by locking out competing modes through either a
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
building management system (BMS) or a communications cable between the CRACs in the group. No two units should be operating in competing modes during a recorded interval, unless part of a separate group. When grouped, all units in a specific group will be operating together for a distinct zone Set point parameters should be within the following ranges to ensure system longevity and peak performance. Temperature 68-77F (20-25C) Humidity 40-55% R.H. Slide 29 Despite proper operational set points, a common cooling challenge occurs when the cool supply air from the CRAC unit bypasses the IT equipment and flows directly into the CRAC unit air return duct. This is known as short cycling and is a leading cause of poor cooling performance in a data center. Temperature measurement is one way to determine if short cycling is occurring.
Measurements should be taken at the CRAC supply duct, CRAC return duct, and at the server inlet. Return air temperatures lower than that of the server inlet temperatures indicates short cycling inefficiencies. For example, if the CRAC supply AND return temperatures are 70F, but the server inlet temperature is measuring 75F, this would be an indication of short cycling. Slide 30 Environmental factors such as dew point, and relative humidity play an important role in data center cooling Humidity measurement and control is vital for proper data center management Managing internal and external factors affecting humidity increases performance, and decreases operational costs Demand fighting occurs when data center air conditioners operate in competing modes, and leads to increased wastes, and decreased efficiency Maintaining proper operational thresholds ensures peak efficiency and maximizes system longevity
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Short cycling, a common cause of overheating, occurs when the cool supply air from the CRAC unit bypasses the IT equipment and flows directly back into the CRAC unit air return duct Slide 31 Thank you for participating in this Data Center University course. Slide 32 To test your knowledge of the course material click the Knowledge Checkpoint link on your Data Center University personal homepage Important Point! The Knowledge Checkpoint link is located under BROWSE CATALOG on the left side of the page
Slide 33 Here at Data Center University, we value your opinion! We are dedicated to providing you with relevant, cutting edge education on topics pertinent to data center design build, and operations, when and where you need it. So, please take our brief survey and tell us how were doing. How do you begin? Its easy! 1) Click on the Home icon, located in the right corner of your screen. 2) Click on the We Value Your Opinion" link on the left side of the screen under Browse DCU Courses. 3) Select the course title you have just completed and take our brief survey.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Discuss the evolution of cabling Classify different types of common data center cables Describe cabling installation practices Identify the strategies for selecting cabling topologies Utilize cable management techniques Recognize the challenges associated with cabling in the data center
cable trays and cable management devices are critical to the support of IT infrastructure as they help to reduce the likelihood of downtime due to human error and overheating. Slide 5 This course will address the basics of cabling infrastructure and will discuss cabling installation practices, cable management strategies and cable maintenance practices. We will take an indepth look at both data cabling and power cabling. Lets begin with a look at the evolution of data center cabling. Slide 6 Ethernet protocol has been a data communications standard for many years. Along with Ethernet, several traditional data cabling practices continue to shape how data cables are deployed. High speed data cabling over copper is a cabling medium of choice Cable fed into patch panels and wall plates is common The RJ45 is the data cable connector of choice
The functionality within the data cables and associated hardware, however, has undergone dramatic change. Increased data speeds have forced many physical changes. Every time a new, faster standard is ratified by standardization bodies, the cable and supporting hardware have been redesigned to support it. New test tools and procedures also follow each new change in speed. These changes have primarily all been required by the newer, faster versions of Ethernet, which are driven by customers needs of more speed and bandwidth. When discussing this, it is important to note the uses and differences of both fiber-optic cable, and traditional copper cable. Lets compare these two. Slide 7 Copper cabling has been used for decades in office buildings, data centers and other installations to provide connectivity. Copper is a reliable medium for transmitting information over shorter
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
distances; but its performance is only guaranteed up to 109.4 yards (100 meters) between devices. (This would include structured cabling and patch cords on either end.) Copper cabling that is used for data network connectivity contains four pairs of wires, which are twisted along the length of the cable. The twist is crucial to the correct operation of the cable. If the wires unravel, the cable becomes more susceptible to interference. Copper cables come in two configurations: Solid cables provide better performance and are less susceptible to interference making them the preferred choice for use in a server environment. Stranded cables are more flexible and less expensive, and typically are only used in patch cord construction. Copper cabling, patch cords, and connectors are classified based upon their performance characteristics and for which applications they are typically used. These ratings, called categories, are spelled out in the TIA/EIA 568 Commercial Building Telecommunications Writing Standard. Slide 8 Fiber-optic cable is another common medium for providing connectivity. Fiber cable consists of five elements. The center portion of the cable, known as the core, is a hair thin strand of glass capable of carrying light. This core is surrounded by a thin layer of slightly purer glass, called cladding, that contains and refracts that light. Core and cladding glass are covered in a coating of plastic to protect them from dust or scratches. Strengthening fibers are then added to protect the core during installation. Finally, all of these materials are wrapped in plastic or other protective substance that serves as the cables jacket. A light source, blinking billions of times per second, is used to transmit data along a fiber cable. Fiber-optic components work by turning electronic signals into light signals and vice versa. Light travels down the interior of the glass, refracting off of the cladding and continuing onward until it arrives at the other end of the cable and is seen by receiving equipment.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
When light passes from one transparent medium to another, like from air to water, or in this case, from the glass core to the cladding material, the light bends. A fiber cables cladding consists of a different material from the core in technical terms, it has a different refraction index that bends the light back toward the core. This phenomenon, known as total internal reflection, keeps the light moving along a fiber-optic cable for great distances, even if that cable is curved. Without the cladding, light would leak out. Fiber cabling can handle connections over a much greater distance than copper cabling, 50 miles (80.5 kilometers) or more in some configurations. Because light is used to transmit the signal, the upper limits of how far a signal can travel along a fiber cable is related not only to the properties of the cable but also to the capabilities and relative location of transmitters. Slide 9 Besides distance, fiber cabling has several other advantages over copper: Fiber provides faster connection speeds Fiber is not prone to electrical interference or vibration Fiber is thinner and light-weight, so more cabling can fit into the same size bundle or limited spaces Signal loss over distance is less along optical fiber than copper wire
Two varieties of fiber cable are available in the marketplace: multimode fiber and single mode fiber. Multimode is commonly used to provide connectivity over moderate distances, such as those in most data center environments, or among rooms within a single building. Single mode fiber is used for the longest distances, such as among buildings on a large campus, or between sites. Copper is generally the less expensive cabling solution over shorter distances (i.e. the length of data center server rows), while fiber is less expensive for longer distances (i.e. connections among buildings on a campus).
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 10 In the case of data center power cabling, however, historical changes have taken a different route. In traditional data centers, designers and engineers were not too concerned with single points of failure. Scheduled downtime was an accepted practice. Systems were periodically taken down to perform maintenance, and to make changes. Data center operators would also perform infrared scans on power cable connections prior to the shutdowns to determine problem areas. They would then locate the hot spots that could indicate possible risk of short circuits and address them. Traditional data centers, very often, had large transformers that would feed large uninterruptible power supplies (UPSs) and distribution switchboards. From there, the cables would go to distribution panels that would often be located on the columns or walls of the data center. Large UPSs, transformers, and distribution switchgear were all located in the back room. The incoming power was then stepped down to the correct voltage and distributed to the panels mounted in the columns. Cables connected to loads, like mainframe computers, would be directly hardwired to the hardware. In smaller server environments, the power cables would be routed to power strips underneath the raised floor. The individual pieces of equipment would then plug into those power strips, using sleeve and pin connectors, to keep the cords from coming apart. Slide 11 Downtime is not as accepted as it once was in the data center. In many instances, it is no longer possible to shut down equipment to perform maintenance. A fundamentally different philosophical approach is at work. Instead of the large transformers of yesterday, smaller ones, called power distribution units (PDUs) are now the norm. These PDUs have moved out of the back room, onto the raised floor, and in some cases, are integrated into the racks. These PDUs feed the critical equipment. This change
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
was the first step in a new way of thinking, a trend that involved getting away from the large transformer and switchboard panel. Modern data centers also have dual cord environments. Dual cord helps to minimize a single point of failure scenario. One of the benefits of the dual cord method is that data center operators can perform maintenance work on source A, while source B maintains the load. The server never has to be taken offline while upstream maintenance is being performed. This trend began approximately 10 years ago and it was clearly driven by the user. It became crucial for data center managers to maintain operations 24 hours a day, 7 days per week. Some of the first businesses to require such operations were the banks, who introduced ATMs, which demanded constant uptime. The customer said We can no longer tolerate a shutdown. Now that we have painted a clear picture of the history of cabling infrastructure, well discuss the concept of modularity and its importance in the data center. Slide 12 Modularity is an important concept in the contemporary data center. Modular, scalable Network Critical Physical Infrastructure (NCPI) components have been shown to be more efficient and more cost effective. The data cabling industry tackled the issue of modularity decades ago. Before the patch panel was designed, many adds, moves and changes were made by simply running new cable. After years of this run a new cable mentality, wiring closets and ceilings were loaded with unused data cables. Many wiring closets became cluttered and congested. The strain on ceilings and roofs from the weight of unused data cables became a potential hazard. The congestion of data cables under the raised floor also impeded proper cooling and exponentially increased the potential for human error and downtime. Slide 13
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
In the realm of data cabling, the introduction of the patch panel brought an end to the run a new cable philosophy and introduced modularity to network cabling. The patch panel, located either on the data center floor or in a wiring closet, is the demarcation point where end points of bulk cable converge. If a data center manager were to trace a data cable from end to end, starting at the patch panel, he would probably find himself ending at the wall plate. This span is known as the backbone. The modularity of the system is in the use of patch cables. The user plugs his patch cable into a wall plate. If he needs to move a computer, for example, he simply unplugs his patch cable and connects into a different wall plate. The same is true on the other end, back at the patch panels. If a port on a hub or router malfunctions, the network administrator can simply unplug it and connect it into another open port. Data center backbone cabling is typically designed to be non-scalable. The data cabling backbone, 90% of the time, is located behind the walls, not out in the open. Typically a network backbone, when installed, especially in new construction scenarios, accounts for future growth considerations. Adds, moves and changes can be very costly once the walls are constructed. In new construction it is best to wire as much of the building as possible, with the latest cable standard. This reduces expenses once the walls are constructed. Now that we have discussed the concept of modularity, lets overview the different types of data cables that exist in a data center. Slide 14 So, what are the different types of common data center specific data cables? Category 5 (Cat 5) was originally designed for use with 100 Base-T. Cat 5e supports 1 Gig Ethernet. Cat 6a supports 10 Gig Ethernet. It is important to note that a higher rated cable can be
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
used to support slower speeds, but the reverse is not true. For example, a Cat 5e installation will not support 10 Gig Ethernet, but Cat 6a cabling will support 100 Base-T. Cable assemblies can be defined as a length of bulk cable with a connector terminated onto both ends. Many of the assemblies used are patch cables of various lengths that match or exceed the cabling standard of the backbone. A Cat 5e backbone requires Cat 5e or better patch cables. Slide 15 Data center equipment can require both standard and custom cables. Some cables are specific to the equipment manufacturer. One example of a common connection would involve a scenario in which a Cisco router with the 60-pin LFH connector connected to a router with V.35 interface requires an LFH60 to V.35 Male DTE cable. An example of a less common connection would be a stand alone tape backup that may have a SCSI interface. If the cable that came with the equipment does not match up to the SCSI card in a computer, the data center manager will find himself looking for a custom SCSI cable. A typical example of the diversity of cables required in the data center is a high speed serial router cable. In a wide area network (WAN), routers are typically connected to modems, which are called DSU/CSUs. Some router manufacturers feature unorthodox connectors on their routers. Depending on the interface that the router and DSU/CSU use to communicate to one another, several connector possibilities exist. Other devices used in a computer room can require any one of a myriad of cables. Common devices besides the networking hardware are telco equipment, KVMs, mass storage, monitors, keyboard and mouse, and terminal servers. Sometimes brand-name cables are expensive or unavailable. A large market of manufacturer equivalent cables exists, from which the data center manager can choose.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 16 When discussing data center power cabling, it is important to note that American Wire Gauge (AWG) copper wire is the common medium for transporting power in the data center. This has been the case for many years and it still holds true in modern data centers. The formula for power is Amp x Volts = Power; and data center power cables are delineated by amperage. The more power that needs to be delivered to the load, the higher the amperage has to be. (Note: The voltage will not be high under the raised floor. It will be less than 480V; most servers are designed to handle 120 or 208V.) If the level of power is the same, the amperage and voltage are the same. As the amperage increases or decreases, the gauge of the wire needs to be larger or smaller to accommodate the change in amperage. AWG ratings organize copper wire into numerous recognizable and standard configurations. A relatively new trend in the domain of data center power cabling is the invention of the whip. Whips are pre-configured cables with a twist lock cap on one end and insulated copper on the other end. The insulated copper end feeds a breaker in the main PDU; the twist lock end feeds the rack mounted PDU that supplies the intelligent power strips in the rack. Server equipment then plugs directly into the power strip. With whips, there is no need for wiring underneath the floor (with the possible exception of the feed to the main PDU breakers). Thus, the expense of a raised floor can be avoided. Another benefit of whips is that a licensed electrician is not required to plug in the twist lock connectors of the whip into the power strip twist lock receptacles. Slide 17 Dual cord, dual power supply also introduced significant changes to the data center power cabling scheme. In traditional data centers, computers had one feed from one transformer or panel board, and the earliest PDUs still only had one feed to servers. Large mainframes required two feeds to
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
keep systems consistently available. Sometimes two different utilities were feeding power to the building. Now, many servers are configured to support two power feeds, hence the dual cord power supply. Because data center managers can now switch from one power source to another, this allows for maintenance on infrastructure equipment without having to take servers offline. It is important to understand that the power cabling requirements to support the dual cord power supply configuration have doubled as a result. The same wire, the same copper, and the same sizes, are required as was required in the past, but now data center designers need to account for double the power infrastructure cable, including power related infrastructure that may be located in the equipment room that supports the data center. Now that weve talked about a basic overview of both power and data cabling, lets take a look at some best practices for cabling in the data center. Slide 18 Some best practices for data cabling include: Overhead deployments
Overhead cables that are in large bundles should run in cable trays or troughs. If the manufacturer of the tray or trough offers devices that keep the cable bend radius in check then they should be used as well. Do not over tighten tie wraps or other hanging devices. It can interfere with the performance of the cable. Underfoot deployments
Be cognizant of the cables bend radius specifications and adhere tightly to them. Do not over tighten tie wraps. This can interfere with the performance of the cable.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Rack installations
As talked about previously, be cognizant of the cables bend radius specifications and adhere tightly to them. Dont over tighten tie wraps. This can interfere with the performance of the cable. Use vertical and/or horizontal cable management to take up any extra slack. Testing cables
There are several manufacturers of test equipment designed specifically to test todays high speed networks. Make sure that the installer tests and certifies every link. A data center manager can request a report that shows the test results. Are there any common practices that should be avoided? When designing and installing the networks backbone care should be taken to route all Unshielded Twisted Pair (UTP is the U.S. standard) or Shielded Twisted Pair (STP is the European standard) cables away from possible sources of interference such as power lines, electric motors or overhead lighting.
Slide 19 Power cabling best practices are described in the National Electric Code (NEC). When addressing best practices in power cabling, it is important that data center professionals use the term, continuous load. The continuous load is defined as any load left on for more than 3 hours, which is, in effect, all equipment in a data center. Due to the requirements of the continuous load, data center operators are forced to take all rules that apply to amperages and wire sizes and de-rate those figures by 20%. For example, if a wire is rated for 100 amps, the best practice is not to run more than 80 amps through it. Lets discuss this further. Over time, cables can get overheated. The de-rating approach helps avoid overheated wires that can lead to shorts and fires. If the quantity of copper in the cable is insufficient for the amperages required, it will heat to the point of melting the insulation. If insulation fails, the copper is exposed to anything metal or grounded in its proximity. If it gets close enough, the electricity will jump or arc
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
and could cause a fire to start. Undersized power cables also stress the connections. If any connection is loose, the excess load exacerbates the situation. The de-rating of the power cables takes these facts into account. To further illustrate this example, lets compare electricity to water. If too much water gets pushed into a pipe, the force of the water will break the pipe if it is too small. Amperages are forcing electricity through the wire; therefore, the wire is going to heat up if the wire is undersized. The manufacturer, or supplier, of the cable provides the information regarding the circular mill, or the area of the wires, inside the cable. The circular mill does not take into account the wire insulation. The circular mill determines how much amperage can pass through that piece of copper. Next, lets compare overhead and under the floor installations.
Slide 20 The benefit of under the floor cabling is that the cable is not visible. Many changes can be made and the wiring will not be seen. The disadvantage of under the floor cabling is the significant expense of constructing a raised floor. Data center designers also need to take into account the danger of opening up a raised floor and exposing other critical systems like the cooling air flow system, if the raised floor is used as a plenum. With overhead cabling, data center designers can use cabling trays to guide the cables to the equipment. They can also run conduit from the PDU directly to the equipment or computer load. The conduit is not flexible, however, which is not good if constant change is expected. A best practice is to use overhead cables which are all pre-configured in the factory and placed in the troughs to the equipment. This standardization creates a more convenient, flexible environment for the data center of today.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 21 Where your power source is, where the load is, and what the grid is like, all affect the design and layout of the cabling in the data center. When discussing overhead cabling, data centers designers are tasked with figuring out the proper placement of cables ahead of time. Then, they can decide if it would be best to have the troughs directly over the equipment or in the aisle. Also designers have to take into account local codes for distributing power. For example, there are established rules that require that sprinkler heads not be blocked. If there is a 24 inch (60.96 cm) cable tray, designers could not run that tray any closer than 10 inches (25.4 cm) below the sprinkler head to cover up or obstruct the head. They would need to account for this upfront in the design stage. Now that weve touched upon best practices for installation, lets discuss some strategies for selecting cabling topologies. Slide 22 Network Topology deals with the different ways computers (and network enabled peripherals) are arranged on or connected to a network. The most common network topologies are: Star. All computers are connected to a central hub. Ring. Each computer is connected to two others, such that, starting at any one computer, the connection can be traced through each computer on the ring back to the first. Bus. All computers are connected to a central cable, normally termed bus or backbone. Tree. A group of start networks are each connected to a linear backbone.
For data cabling, in IEEE 802.3, UTP/STP Ethernet scenarios, a star network topology is used. Star topology implies that all computers are connected to a central hub. In its simplest form a UTP/STP Ethernet Star topology has a Hub at the center and devices (i.e. personal computer, printers, etc.) connected directly to it. Small LANs fit this simple model. Larger installations can be much more complicated, with segments connecting to other segments, but the basic Star topology remains intact.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 23 Power cables can be laid out either overhead in troughs or below the raised floor. Many factors come into play when deciding on a power distribution layout from the PDUs to the racks. The size of the data center, the nature of the equipment being installed and budget are all variables. However, be aware that two approaches are commonly utilized for distribution of power cables in the data center. Slide 24 One approach is to run the power cables inside conduits from large wall mounted or floor mounted PDUs to each cabinet location. This works moderately well for a small server environment with a limited number of conduits. This does not work well for larger data centers when cabinet locations require multiple power receptacles.
Slide 25 Another approach, more manageable for larger server environments, is the installation of electrical substations at the end of each row in the form of circuit panels. Conduit is run from power distribution units to the circuit panels and then to a subset of connections to the server cabinets. This configuration uses shorter electrical conduit, which makes it easier to manage, less expensive to install, and more resistant to a physical accident in the data center. For example, if a heavy object is dropped through a raised floor, the damage it can cause is greatly reduced in a room with segmented power, because fewer conduits overlap one another in a given area. Even more efficient is to deploy PDUs in the racks themselves and to have whips feed the various racks in the row. Slide 26 What are the best practices for cable management and organization techniques? Some end users purchase stranded bulk data cable and RJ45 connectors and manufacture their own patch cables on sight. While doing this assures a clean installation with no excess wire, it is time consuming and
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
costly. Most companies find it more prudent to inventory pre-made patch cables and use horizontal or vertical cable management to take up any excess cable. Patch cables are readily available in many standard lengths and colors. Are there any common practices that should be avoided? All of todays high speed networks have minimum bend radius specifications for the bulk cable. This is also true for the patch cables. Care should be taken not to exceed bend radius on the patch cables. Slide 27 Proper labeling of power cables in the data center is a recommended best practice. A typical electrical panel labeling scheme is based on a split bus (two buses in the panel) where the labels represent an odd numbered side and an even numbered side. Instead of normal sequenced numbering, the breakers would be numbered 1, 3, 5 on the left hand side and would be numbered 2, 4, 6 on the right side, for example. When labeling a power cable or whip, the PDU designation from the circuit breaker would be a first identifier. This identifier number indicates from where the whip comes. Identifying the source of the power cable can be complicated because the power may not be supplied from the PDU that is physically the closest to the rack and may not be the one that is feeding the whip. In addition, data center staff may want to access the B power source even though the A power source might be physically closer. This is why the power cables need to be properly labeled at each end. The cable label needs to indicate the source PDU (i.e. PDU1) and also identify the circuit (i.e. circuit B). Ideally on the other end of the cable, a label will indicate what load the cable is feeding (i.e. SAN device, or Processer D23). To help clarify labeling, very large data centers are laid out in two foot squares that match the raised floor. They are usually addressed with east/west and numbered designations. For example, 2 west by 30 east identifies the location of an exact square on the data center floor (which is supporting a particular piece or pieces of equipment). Therefore the
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
label identifies the load that is being supported by the cable. Labeling of both ends of the cable in an organized, consistent manner allows data center personnel to know the origin of the opposite end. Slide 28 With network data cabling, once the backbone is installed and tested it should be fairly stable. Infrequently, a cable may become exposed, damaged, and therefore needs to be repaired or replaced. But once in place, the backbone of a network should remain secure. Occasionally, patch cables can be jarred and damaged; this occurs most commonly on the user end. Since the backbone is fairly stable except for occasional repair, almost all changes are initiated simply by disconnecting a patch cable and reconnecting it somewhere else. The modularity of a well designed cabling system allows users to disconnect from one wall plate, connect to another and be back up and running immediately. In the data center, adds, moves and changes should be as simple as connecting and disconnecting patch cables. So what are some of the challenges associated with cabling in the data center? Well talk about three of the more common challenges. Slide 29 The first challenge is associated with useful life. The initial design and cabling choices can determine the useful life of a data cabling plant. One of the most important decisions to make when designing a network is choosing the medium: copper, fiber or both? Every few years newer-faster-better copper cables are introduced into the marketplace, but fiber seems to remain relatively unchanged. If an organization chose to install FDDI grade 62.5/125 fiber 15 years ago, that organization may still be using the same cable today. Whereas if the same organization had installed Cat 5 the organization more than likely would have had replaced it by now. In the early days few large installations were done in fiber because of the
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
cost. The fiber was more expensive and so was the hardware that it plugged into. Now the costs of fiber and copper are much closer. Fiber cabling is also starting to change. The current state of the art is 50/125 laser optimized for 10 Gig Ethernet. Next, there is airflow and cooling. There are a few issues with cables and cabling in the data center that effect airflow and cooling. Cables inside of an enclosed cabinet need to be managed so that they allow for maximum airflow, which helps reduce heat. When cooling is provided through a raised floor it is best to keep that space as cable free as possible. For this reason expect to see more and more cables being run across the tops of cabinets as opposed to at the bottom or underneath the raised floor. Finally, there is management and labeling. Many manufacturers offer labeling products for wall plates, patch panels and cables. Also software packages exist that help keep track of cable management. In a large installation, these tools can be invaluable. Lets take a look at some expenses associated with cabling in the data center.
Slide 30 For data cabling, the initial installation of a cabling plant, and the future replacement of that plant, are the two greatest expenses. Beyond installation and replacement costs, the only other expense is adding patch cables as the network grows. The cost of patch cables is minimal considering the other costs in an IT budget. Cabling costs are, for the most part, up front costs. Regarding power cables, the design of the data center, and the location of the PDUs, will have a significant impact on costs. Dual cord power supplies are driving up the cost because double the power cabling is required. Design decisions are critical. Where will the loads be located? How far
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
from the power distribution? What if PDUs are fed from different circuits? If not planned properly, unnecessarily long power cable runs will be required and will drive up overall data center infrastructure costs. Next, lets look at cabling maintenance. Slide 31 How are cables replaced? Patch cables are replaced by simply unplugging both ends and connecting the new one. However, cables do not normally wear out. Most often, if a cable shorts, it is due to misuse or abuse. Cable assemblies have a lifetime far beyond the equipment to which they are connected.
How are cables rerouted? If the cable that needs to be rerouted is a patch cable then it can simply be unplugged on one or both ends and rerouted. If the cable that needs to be rerouted is one of the backbone cables run through the walls, ceilings, or in cable troughs, it could be difficult to access. The backbone of a cabling installation should be worry-free, but if problems come up they can sometimes be difficult to address. It depends on what the issue is, where it is, and what the best solution is. Sometimes rerunning a new link is the best solution. Slide 32 The equipment changes quite frequently in the data center; on the average a server changes every 2-3 years. It is important to note that power cabling only fails at the termination points. The maintenance occurs at the connections. Data center managers need to scan those connections and look for hot spots. It is also prudent to scan the large PDU and its connection off the bus for hot spots.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Heat indicates that there is either a loose connection or an overload. By doing the infrared scan, data center operators can sense that failure before it happens. In the dual cord environment, it becomes easy to switch to the alternate source, unplug the connector, and check its connection. Slide 33 Every few feet, the power cable is labeled with a voltage rating, an amperage rating, and the number of conductors that can be found inside the cable. This information is stamped on the cable. American Wire Gauge (AWG) is a rating of the size of the copper, and identifies the number of conductors in the cable. Inside a whip there are a minimum of 3 wires, one hot, one neutral, and one ground. It is also possible, to have 5 wires (3 hot, 1 neutral, 1 ground) inside the whip. Feeder cables which feed the Uninterruptible Power Supply (UPS) and feed the PDU are thicker, heavier cables. Single conductor cables (insulated cables with multiple strands of uninsulated copper wires inside) are usually placed in groups within metal conduit to feed power hungry data center infrastructure components such as large UPSs and Computer Room Air Conditioners (CRACs). Multiple conductor cables, (cables inside the larger insulated cable that are each separately insulated) are most often found on the load side of the PDU. Single conductors are most often placed within conduit, while multiple conductor cables are generally distributed outside of the conduit. Whips are multiple conductor cables. Slide 34 To summarize, lets review some of the information that we have covered throughout the course. A modular, scalable approach to data center cabling is more energy efficient and cost effective Copper and fiber data cables running over Ethernet networks are considered the standard for data centers American Wire Gauge copper cable is a common means of transporting power in the data center
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Cabling to support dual cord power supplies helps to minimize single points of failure in a data center To minimize cabling costs, it important for data center managers to take a proactive approach to the design, build, and operation of the data center of today
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Fundamentals of Availability Data Center University Course Transcript Slide 1 Welcome to Data Center Universitys course on Fundamentals of Availability. Slide 2 For best viewing results, we recommend that you maximize your browser window now. The screen controls allow you to navigate through the eLearning experience. Using your browser controls may disrupt the normal play of the course. Click the attachments link to download supplemental information for this course. Click the Notes tab to read a transcript of the narration. Slide 3 At the end of this course, you will be able to: Understand the key terms associated with availability Understand the difference between availability and reliability Recognize threats to availability Calculate cost of downtime
Slide 4 In our rapidly changing business world, highly available systems and processes are of critical importance and are the foundation upon which successful businesses rely. So much so, that according to the National Archives and Records Administration in Washington, D.C., 93% of businesses that have lost availability in their data center for 10 days or more have filed for bankruptcy within one year. The cost of one episode of downtime can cripple an organization. Take for example an e-business. In a case of downtime, not only would they potentially lose thousands or even millions of dollars in lost revenue, but their top competitor is only a mouse-click away. Therefore loss is translated not only to lost revenue but also to a loss in customer loyalty. The challenge of maintaining a highly available network is no longer just the responsibility of the IT departments, rather it extends out to management and department heads, as well as the boards which govern company policy. For this reason, having a sound understanding of the factors that lead to high availability, threats to availability, and ways to measure availability is imperative regardless of your business sector.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 5 Measuring Business Value begins first with an understanding of the Network Critical Physical Infrastructure or NCPI. The NCPI is the foundation upon which Information Technology (IT) and telecommunication networks reside. NCPI consists of the Racks, Power, Cooling, Fire Prevention/Security, Management, and Services. Slide 6 Business value for an organization, in general terms, is based on three core objectives: Increasing revenue Reducing costs Better utilizing assets Regardless of the line of business, these three objectives ultimately lead to improved earnings and cash flow. Investments in NCPI are made because they both directly and indirectly impact these three business objectives. Managers purchase items such as generators, air conditioners, physical security systems, and Uninterruptible Power Supplies to serve as insurance policies. For any network or data center, there are risks of downtime from power and thermal problems, and investing in NCPI mitigates these and other risks. So how does this impact the three core business objectives above (revenue, cost, and assets)? Revenue streams are slowed or stopped, business costs / expenses are incurred, and assets are underutilized or unproductive when systems are down. Therefore, the more efficient the strategy is in reducing downtime from any cause, the more value it has to the business in meeting all three objectives. Slide 7 Historically, assessment of NCPI (Network Critical Physical Infrastructure) business value was based on two core criteria: availability and upfront cost. Increasing the availability (uptime) of the NCPI system and ultimately of the business processes allows a business to continue to bring in revenues and better optimize the use (or productivity) of assets. Imagine a credit card processing company whose systems are unavailable credit card purchases cannot be processed, halting the revenue stream for the duration of the downtime. In addition, employees are not able to be productive without their systems online. And minimizing the upfront cost of the NCPI results in a greater return on that investment. If the NCPI cost is low and the risk / cost of downtime is high, the business case becomes easier to justify. While these arguments still hold true, todays rapidly changing IT environments are dictating an additional criteria for assessing NCPI business value. Agility. Business plans must be agile to deal with changing market conditions, opportunities, and environmental factors. Investments that
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
lock resources limit the ability to respond in a flexible manner. And when this flexibility or agility is not present, lost opportunity is the predictable result. Slide 8 A term that is commonly used when discussing availability is the term 5 Nines. Although often used, this term is often very misleading, and often misunderstood. 5 9s refers to a network that is accessible 99.999% of the time. However, it is a rather misleading term. Well explain why a little later on in the course. Slide 9 There are many additional terms associated with availability, business continuity and disaster recovery. Before we go any further, lets define some of these terms. Reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time. Availability, on the other hand, is the degree to which a system or component is operational and accessible when required for use. It can be viewed as the likelihood that the system or component is in a state to perform its required function under given conditions at a given instant in time. Availability is determined by a systems reliability, as well as its recovery time when a failure does occur. When systems have long continuous operating times, failures are inevitable. Availability is often looked at because, when a failure does occur, the critical variable now becomes how quickly the system can be recovered. In the data center, having a reliable system design is the most critical variable, but when a failure occurs, the most important consideration must be getting the IT equipment and business processes up and running as fast as possible to keep downtime to a minimum. Slide 10 Upon considering any availability or reliability value, one should always ask for a definition of failure. Moving forward without a clear definition of failure, is like advertising the fuel efficiency of an automobile as miles per tank without defining the capacity of the tank in liters or gallons. To address this ambiguity, one should start with one of the following two basic definitions of a failure. According to the IEC (International Electro-technical Commission) there are two basic definitions of a failure: The termination of the ability of the product as a whole to perform its required function. The termination of the ability of any individual component to perform its required function but not the termination of the ability of the product as a whole to perform.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 11 MTBF Mean Time Between Failure, is a basic measure of a systems reliability. It is typically represented in units of hours. The higher the MTBF number is, the higher the reliability of the product. MTTR Mean Time to Recover (or Repair), is the expected time to recover a system from a failure. This may include the time it takes to diagnose the problem, the time it takes to get a repair technician onsite, and the time it takes to physically repair the system. Similar to MTBF, MTTR is represented in units of hours. MTTR impacts availability and not reliability. The longer the MTTR, the worse off a system is. Simply put, if it takes longer to recover a system from a failure, the system is going to have a lower availability. As the MTBF goes up, availability goes up. As the MTTR goes up, availability goes down. Slide 12 As before mentioned 5 9s is a misleading term because the use of the term has become diluted. 5 9s has been used to refer to the amount of time that the Data Center is powered up and available. In other words, a data center that has achieved 5 9s is powered up 99.999% of the time. However, loss of power is only 1 part of the equation. The other part of the availability equation is reliability. Lets take for example two data centers that are both considered 99.999% available. In one year, Data Center A lost power once, but it lasted for a full 5 minutes. Data Center B lost power 10 times, but for only 30 seconds each time. Both Data Centers were without power for a total of 5 minutes each. The missing detail is the recovery time. Anytime systems lose power, there is a recovery time in which servers must be rebooted, data must be recovered, and corrupted systems must be repaired. The Mean Time to Recover process could take minutes, hours, days, or even weeks. Now, if you consider again the two data centers that have experienced downtime, you will see that Data Center B that has had 10 instances of power outages will actually have a much longer duration of downtime, than the data center that only had once occurrence of downtime. Data Center B will have a significantly higher Mean Time to Recover. It is because of this dynamic that reliability is equally important to this discussion of availability. Reliability of a data center talks to the frequency of downtime in a given time frame. There is an Inversely proportional relationship in that as time increases, reliability decreases. Availability, however is only a percentage of downtime in a given duration. Slide 13 It should be obvious that there are numerous factors that affect data center availability and reliability. Some of these include AC Power conditions, lack of adequate cooling in the data center, equipment failure, natural and artificial disasters, and human errors.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 14 Lets look first at the AC power conditions. Power quality anomalies are organized into seven categories based on wave shape: 1. Transients 2. Interruptions 3. Sag / Undervoltage 4. Swell / Overvoltage 5. Waveform distortion 6. Voltage fluctuations 7. Frequency variations Slide 15 Another factor that poses a significant threat to availability is a lack of cooling in the IT environment. Whenever electrical power is being consumed, heat is being generated. In the Data Center Environment, where a mass quantity of heat is being generated, the potential exists for significant downtime unless this heat is removed from the space. Slide 16 Often times, cooling systems may be in place in the data center, however, if the cooling is not distributed properly hotspots can occur. Slide 17 Hot spots within the data center further threaten availability. In addition, inadequate cooling significantly detracts from the lifespan and availability of IT equipment. It is recommended that when designing the data center layout, a hot aisle/cold aisle configuration is used. Hot spots can also be alleviated by the use of properly sized cooling systems, and supplemental spot coolers and air distribution units.
Slide 18 The health of IT equipment is an important factor in ensuring a highly available system, as equipment failures pose a significant threat to availability. Failures can occur for a variety of reasons, including damage caused by prolonged improper utility power. Other such causes are from prolonged exposure to elevated or decreased temperatures, humidity, component failure, and equipment age. Slide 19
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Disasters also pose a significant threat to availability. Hurricanes, tornadoes, floods, and the often subsequent blackouts that occur after these disasters all create tremendous opportunity for downtime. In many of these cases, downtime is prolonged due to damage sustained by the power grid or the physical site of the data center itself.
Slide 20 According to Gartner Group, the largest single cause of downtime is human error or personnel issues. One of the most common causes of intermittent downtime in the data center is poor training. Data center staff or contractors should be trained on procedures for application failures/hangs, system update/upgrades, and other tasks that can create problems if not done correctly. Slide 21 Another problem is poor documentation. As staff sizes have shrunk, and with all the changes in the data center due to rapid product cycles, its harder and harder to keep the documentation current. Patches can go awry as incorrect software versions are updated. Hardware fixes can fail if the wrong parts are used. Slide 22 Another area of potential downtime is management of systems. System Management has fragmented from a single point of control to vendors, partners, ASPs, outsource suppliers, and even a number of internal groups. With a variety of vendors, contractors and technicians freely accessing the IT equipment, errors are inevitable. Slide 23 It is important to understand the cost of downtime to a business, and specifically, how that cost changes as a function of outage duration. Lost revenue is often the most visible and easily identified cost of downtime, but it is only the tip of the iceberg when discussing the real costs to the organization. In many cases, the cost of downtime per hour remains constant . In other words, a business that loses at a rate of 100 dollars per hour in the first minute of downtime will also lose at the same rate of 100 dollars per hour after an hour of downtime. An example of a company that might experience this type of profile is a retail store, where a constant revenue stream is present. When the systems are down, there is a relatively constant rate of loss. slide 24 Some businesses, however, may lose the most money after the first 500 milliseconds of downtime and then lose very little thereafter. For example, a semiconductor fabrication plant loses the most money in the first moments of an outage because when the process is interrupted, the Silicon wafers that were in production can no longer be used, and must be scrapped.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 25 And others yet, may lose at a lower rate for a short outage (since revenue is not lost but simply delayed), and as the duration lengthens, there is an increased likelihood that the revenue will not be recovered. Regarding customer satisfaction, a short duration may often be acceptable, but as the duration increases, more customers will become increasingly upset. An example of this might be a car dealership, where customers are willing to delay a transaction for a day. With significant outages however, public knowledge often results in damaged brand perception, and inquiries into company operations. All of these activities result in a downtime cost that begins to accelerate quickly as the duration becomes longer. Slide 26 Costs associated with downtime can be classified as direct and indirect. Direct costs are easily identified and measured in terms of hard dollars. Examples include: 1) Wages and costs of employees that are idled due to the unavailability of the network. Although some employees will be idle, their salaries and wages continue to be paid. Other employees may still do some work, but their output will likely be diminished. 2) Lost Revenues are the most obvious cost of downtime because if you cannot process customers, you cannot conduct business. Electronic commerce magnifies the problem, as eCommerce sales are entirely dependent on system availability 3) Wages and cost increases due to induced overtime or time spent checking and fixing systems. The same employees that were idled by the system failure are probably the same employees that will go back to work and recover the system via data entry. They not only have to do their day job of processing current data, but they must also re-enter any data that was lost due to the system crash, or enter new data that was handwritten during the system outage. This means additional hours of work, most often on an overtime basis. 4) Depending on the nature of the affected systems, the legal costs associated with downtime can be significant. For example, if downtime problems result in a significant drop in share price, shareholders may initiate a class-action suit if they believe that management and the board were negligent in protecting vital assets. In another example, if two companies form a business partnership in which one companys ability to conduct business is dependent on the availability of the other companys systems, then, depending on the legal structure of the partnership, the first company may be liable to the second for profits lost during any significant downtime event. Indirect costs are not easily measured, but impact the business just the same. In 2000, Gartner Group estimated that 80% of all companies calculating downtime were including indirect costs in their calculations for the first time.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Examples include: reduced customer satisfaction; lost opportunity of customers that may have gone to direct competitors during the downtime event; damaged brand perception; and negative public relations. Slide 27 A businesss downtime costs are directly related to the industry sectors. For example, Energy and Telecommunications organizations may experience lost revenues on the order of 2 to 3 million dollars an hour. Manufacturing, Financial Institutions, Information Technology, Insurance, Retail and Pharmaceuticals all stand to lose over 1 million dollars an hour. Slide 28 There are many ways to calculate cost of downtime for an organization. For example, one way to estimate the revenue lost due to a downtime event is to look at normal hourly sales and then multiply that figure by the number of hours of downtime. Remember, however, that this is only one component of a larger equation and, by itself, seriously underestimates the true loss. Another example is loss of productivity. The most common way to calculate the cost of lost productivity is to first take an average of the hourly salary, benefits and overhead costs for the affected group. Then, multiply that figure by the number of hours of downtime. Because companies are in business to earn profits, the value employees contribute is usually greater than the cost of employing them. Therefore, this method provides only a very conservative estimate Slide 29 To stay competitive in todays global marketplace, businesses must strive to achieve high levels of availability and reliability. While 99.999% availability is the ideal operating condition for most businesses. Power outages, inadequate cooling, natural and artificial disasters, and human errors pose a significant barrier to high availability. The direct and indirect costs of downtime in many business sectors can be exorbitant, and often is enough to bankrupt many organizations. Therefore it is critical for businesses today to calculate their level of availability in order to reduce risks, and increase overall reliability and availability. Slide 30 Thank you for participating in this Data Center University course. of the labor cost of downtime.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 3 At the completion of this course, you will be able to: Explain the importance of fire protection for data centers Identify the main goals of a data center fire protection system Explain the basic theory of fire suppression Differentiate the classes of fire and the stages of combustion Recognize the different methods of fire detection, fire communication and fire suppression Identify the different types of fire suppression agents and devices appropriate for data centers Slide 4 Throughout history, fire has systematically reeked havoc on industry. Todays data centers and network rooms are under enormous pressure to maintain seamless operations. Some companies risk losing millions of dollars with one data center catastrophe.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 5 In fact, industry studies tell us that 43% of businesses that closed due to fire never reopen and 29% of those that do reopen fail within 3 years. With these statistics in mind, it is imperative that all businesses prepare themselves for unseen disasters. The good news is that the most effective method of fire protection is fire prevention. At the completion of this course you will be one step closer to understanding industry safeguarding methods that are used to protect a data centers hottest commodity, information. Slide 6 This course will discuss the prevention, theory, detection, communication and suppression of fire specific to data centers. Slide 7 Let us start by discussing the National Fire Protection Association or the NFPA. The NFPA is a worldwide organization that was established in 1896 to protect the public against the dangers of fire and electricity. The NFPAs mission is to reduce the worldwide burden of fire and other hazards on the quality of life by developing and advocating scientifically based consensus codes and standards, research, training, and education. The NFPA is responsible for creating fire protection standards, one of them being NFPA 75. NFPA 75 is the standard for protection of computer or data processing equipment. One notable addition to NFPA 75 that took place in 1999, allows data centers to continue to power electronic equipment upon activation of a Gaseous Agent Total Flooding System, which we will discuss later in detail. This exception was made for data centers that meet the following risk considerations: Economic loss that could result from: Loss of function or loss of records Loss of equipment value Loss of life
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
and the risk of fire threat to the installation, to occupants or exposed property within that installation Its important to note that NFPA continually updates its standards to accommodate the ever changing data center environment. Please note, that NFPA does set the worldwide standards for fire protection but in most cases the Authority Having Jurisdiction (AHJ) has final say in what can or cannot be used for fire protection in a facility. Now that we have identified the standards and guidelines of fire protection for a data center, lets get started with some facts about fire protection. Slide 8 Fire prevention provides more protection then any type of fire detection device or fire suppression equipment available. In general, if the data center is incapable of breeding fire there will be no threat of fire damage to the facility. To promote prevention within a data center environment it is important to eliminate as many fire causing factors as possible. A few examples to help achieve this are: When building a new data center, ensure that it is built far from any other buildings that may pose a fire threat to the data center Enforce a strict no smoking policy in IT and control rooms The data center should be void of any trash receptacles All office furniture in the data center must be constructed of metal. (Chairs may have seat cushions.) The use of acoustical materials such as foam or fabric or any material used to absorb sound is not recommended in a data center Even if a data center is considered fire proof, it is important to safeguard against downtime in the event that a fire does occur. Fire protection now becomes the priority. Slide 9 The main goal of a data center fire protection system is to contain a fire without threatening the lives of personnel and to minimize downtime. With this in mind, if a fire were to breakout there are
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
three system objectives that must be met. The first objective is to detect the presence of a fire. The second objective is to communicate the threat to both the authorities and occupants. Finally, the last objective is to suppress the fire and limit any damage. Being familiar with common technologies associated with fire detection, communication, and suppression allows IT managers to better specify a fire protection strategy for their data center. Prior to the selection of a detection, communication or suppression system, a design engineer must assess the potential hazards and issues associated with the given data center. Slide 10 When discussing fire protection its important that we first understand the basic theory behind fire. This section will provide a tutorial on fire. We will cover the following topics: The Fire Triangle The classes of Fire and Fires Stages of Combustion Slide 11 The fire triangle represents the three elements that must interact in order for fire to exist. These elements are heat, oxygen and fuel. Fuel is defined as a material used to produce heat or power by burning. When considering fire in a data center, fuel is anything that has the capability to catch fire, such as servers, cables, or flooring. As you can see, when one of these factors is taken away, the fire can no longer exist. This is the basic theory behind fire suppression. Slide 12 Fire can be categorized into five classes; class A, B, C, D, K. As you can see from the Classes of Fire chart, Class A represents fires involving ordinary combustible materials such as paper, wood, cloth and some plastics. Class B fires are fires involving flammable liquids and gases such as oil, paint lacquer, petroleum and gasoline. Class C fires involve live electrical equipment. Class C fires are usually Class A or Class B fires that have electricity present. Class D fires involve combustible metals or combustible metal alloys such as magnesium, sodium and potassium. The last class is
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Class K fires. These fires involve cooking appliances that use cooking agents such as vegetable or animal oils and fats. Generally, Class A, B and C fires are the most common classes of fire that one may encounter in a data center. This chart represents all of the different classes of fire that are able to be extinguished successfully with a basic fire extinguisher. Later in the course, we will discuss several types of extinguishing agents used in data centers. Slide 13 The next step in categorizing a fire is to determine what stage of combustion it is in. The four stages of combustion are: The incipient stage or pre-combustion stage, The visible smoke stage, The flaming fire stage, and lastly, The intense heat stage. As these stages progress, the risk of property damage, and risk to life increases drastically. All of these categories play an important role in fire protection specifically, data centers. By studying the classes of fire and the stages of combustion it is easy to determine what type of fire protection system will best suit the needs of a data center. Slide 14 Now that we have completed our tutorial on fire, let us look at some fire detection devices. There are three main types of fire detection devices, they are: Smoke detectors Heat detectors and Flame detectors For the purposes of protecting a data center, smoke detectors are the most effective. Heat detectors and flame detectors are not recommended for use in data centers, as they not provide detection in the incipient stages of a fire and therefore do not provide early warning for the protection of high value assets. Smoke detectors are far more effective forms of protection in data
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
centers simply because they are able to detect a fire at the incipient stage. For this reason we will be focusing on the attributes and impact of smoke detectors.
Slide 15 The two types of smoke detectors that are used effectively in data centers are: Intelligent spot type detectors and Air sampling smoke detectors Slide 16 Intelligent spot type smoke detectors are much more sensitive than a conventional smoke detector. Intelligent spot type smoke detectors utilize a laser beam which scans particles that pass through the detector. The laser beam is able to distinguish whether or not the particles are simply dust or actually a by-product of combustion such as smoke. Furthermore, intelligent spot type smoke detectors are individually addressable. This means that it has the ability to send information to a central control station and pinpoint the exact location of the alarm. Another feature of intelligent spot type smoke detectors is that the sensitivity of the detector can be increased or decreased during certain times of the day. For example, when workers leave an area, the sensitivity can be increased. The intelligent spot type smoke detectors can also compensate to a changing environment due to environmental factors such as humidity or dirt accumulation. Slide 17 Intelligent spot type detectors are most commonly placed in the following areas: Below raised floors, On ceilings, and Above drop down ceilings, In air handling ducts to detect possible fires within an HVAC system. By placing detectors near the exhaust intake of the computer room air conditioners, detection can be accelerated.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 18 Air sampling smoke detection systems, sometimes referred to as a Very Early Smoke Detection (VESD) Systems, are usually described as a high powered photoelectric detector. These systems are comprised of a network of pipes attached to a single detector, which continually draws air in and samples it. The pipes are typically made of PVC but can also be CPVC, EMT or copper. Depending on the space being protected and the configuration of multiple sensors, these systems can cover an area of 2,500 to 80,000 square feet or 232 to 7,432 square meters. This system also utilizes a laser beam, much more powerful than the one contained in a common photoelectric detector, to detect by-products of combustion. As the particles pass through the detector, the laser beam is able to distinguish them as dust or byproducts of combustion. Slide 19 Now that we have talked about fire detection devices and systems, lets take a look at the next objective of fire protection, communication. All of the previously mentioned detection devices would be virtually useless if they were not directly tied into an effective signaling and notification device. Signaling devices provide audible alarms such as horns, bells or sirens or visual alarms such as strobes, which warn building occupants after a signaling device has been activated. Signaling devices are also an effective way of communicating danger to individuals who may be visually or hearing impaired. One of the most basic and common signaling devices are pull stations. These images represent your typical pull station. Slide 20 The next communication system we will be covering is the control system. Control systems are often considered the brains of a fire protection system. The computer programs used by control systems allow users to program and manage the system based on their individual requirements. The system can be programmed with certain features such as time delays, thresholds, and passwords. Once the detector, pull station or sensor activates the control system, the system has
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
the ability to set its preprogrammed list of rules into motion. Most importantly, the control system can provide valuable information to the authorities. Slide 21 One important safety feature to mention that is not directly related to communication or suppression systems is the Emergency Power Off or (EPO). If a fire progresses to the point where all other means of suppression have been exhausted, the authorities that arrive on site will have the option to utilize this feature. The EPO is intended to power down equipment or an entire installation in an emergency to protect personnel and equipment. EPO is typically used either by fire fighting personnel or by equipment operators. When used by firefighters, it is used to assure that equipment is de-energized during fire fighting so that firefighters are not subjected to shock hazards. The secondary purpose is to facilitate fire fighting by eliminating electricity as a source of energy feeding combustion. EPO may also be activated in case of a flood, electrocution, or other emergency. There is a high cost associated with abruptly shutting down a data center. Unfortunately, EPO tripping is often the result of human error. Much debate has ensued over the use of EPO and may one day lead to the elimination of EPO in data centers. Slide 22 The last goal of a data center fire protection system is suppression. The next section will review suppression agents and devices that are often used in data centers or IT environments. Lets start with the most common suppression agents and devices, they are: Fire extinguishers and, Total flooding fire extinguishing systems Slide 23 Fire extinguishers are one of the oldest yet most reliable forms of fire suppression. They are extremely valuable in data centers because they are a quick solution to suppressing a fire. Fire
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
extinguishers allow for a potential hazardous situation to be addressed before more drastic or costly measures need to be taken. It is important to note that only specific types of gaseous agents can be used in data center fire extinguishers. HFC-236fa, is a gaseous agent specific to fire extinguishers, that has been approved for use in data centers. It is environmentally safe and can be discharged in occupied areas. Additionally, it exists as a gas therefore, it leaves no residue upon discharge. Simply put, it extinguishes fires by removing heat and chemically preventing combustion. Slide 24 A more sophisticated form of fire extinguishing is the Total Flooding Fire Extinguishing System, sometimes referred to as a clean agent fire suppression system. Total Flooding Fire Extinguishing Systems are comprised of a series of cylinders or high pressure tanks filled with an extinguishing or gaseous agent. A gaseous agent is a gaseous chemical compound that extinguishes the fire by either removing heat or oxygen or both. Given a closed, well-sealed room, gaseous agents are very effective at extinguishing a fire while leaving no residue. When installing such a system, the total volume of the room and how much equipment is being protected is taken into consideration. The number of the tanks or cylinders to be installed are dependent upon these factors. It is important to note, that the Standard that guides Total Flooding Suppression Systems is NFPA 2001. The next slide features a live demonstration of a Total Flooding Fire Extinguishing system in action. Slide 25 If a fire occurs and the system is activated, the gaseous agent discharges and fills the room in about 10 seconds. One of the best features of this system, is that it is able to infiltrate hard to reach places, such as equipment cabinets. This makes Total Flooding Fire Extinguishing Systems perfect for data centers. Now that we have discussed Total Flooding Fire Extinguishing System, lets start reviewing the agents that such systems deploy.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 26 In the past, some agents were conductive and/or corrosive. Conductive and corrosive agents have negative impacts on IT equipment. For example, conductive agents may cause short circuits between electronic components within IT equipment and corrosive agents may eat away at electronic components within IT equipment. The gaseous agents used in todays data centers are non conductive and non corrosive. An effective agent that is both non conductive and non corrosive and was widely used in data centers is Halon. Unfortunately, it was discovered that Halon is detrimental to the ozone layer and as of 1994, the production of Halon is no longer permitted. This has lead to the development of safer and cleaner gaseous agents. Lets review some of the more popular gaseous agents for data centers. Slide 27 Today, some of the most commonly used gaseous agents in data centers are Inert gases and Fluorine Based Compounds. Lets review the characteristics of each agent. Slide 28 The most widely accepted inert gases for fire suppression in data centers are : Pro-Inert or IG-55 and Inergen or IG-451 Inert gases are composed of nitrogen, argon, and carbon dioxide, all of which are found naturally in the atmosphere. Because of this, they have zero Ozone Depletion Potential, meaning that it possesses no threat to humans or to the environment. Inert gases can be also discharged in occupied areas and are non-conductive. Inergen requires a large number of storage tanks for effective discharge. But because Inergen is stored as a high pressure gas, it can be stored up to 300 feet or 91.44 meters away from the discharge nozzles and still discharge effectively. Inergen is used successfully in telecommunication offices and data centers.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 29 Another suppression alternative for data centers is Fluorine Based Compounds. Fluorine Based Compound, HFC-227ea is known under two commercial brands; FE-200 and FE-227. HFC-227ea has a zero ozone depletion potential (ODP) and an acceptably low global warming potential. It is also an odorless, colorless. Slide 30 HFC-227ea is stored as liquefied compressed gas with a boiling point of 2.5 degrees F (-16.4 degrees C). It is discharged as an electrically non-conductive gas that leaves no residue and will not harm occupants; however, like in any other fire situation all occupants should evacuate the area as soon as an alarm sounds. It can be used with ceiling heights up to 16 feet . HFC-227ea has one of the lowest storage space requirements; the floor space required is equal to that of needing only 1.7 times that of a Halon 1301 system. HFC-227ea chemically inhibits the combustion reaction by removing heat and can be discharged in 10 seconds or less. An advantage to this agent is that it can be retrofitted into an existing Halon 1301 system but the pipe network must be replaced or an additional cylinder of nitrogen must be used to push the agent through the original Halon pipe network. Some applications include data centers, switchgear rooms, automotive, and battery rooms. Slide 31 There is also HFC-125. HFC-125 it is known under two commercial brands; ECARO-25 and FE-25. HFC-125 has a zero ozone depletion potential (ODP) and an acceptable low global warming potential It is an odorless, colorless and is stored as a liquefied compressed gas. This agent chemically inhibits the combustion reaction by removing heat and can be discharged in 10 seconds or less as an electrically non-conductive gas that leaves no residue and will not harm occupants. It can be used in occupied areas; however, like in any other fire situation all occupants should be evacuated as soon as an alarm sounds. It can be used with ceiling heights up to 16 feet or 4.8 meters. One of the main advantages of HFC-125 is that it flows more like Halon than any other
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
agent available today and can be used in the same pipe network distribution as an original Halon system. Slide 32 Other methods of fire suppression often found in data centers are: Water Sprinklers Systems and Water Mist Suppression Systems Of the two options, Water Sprinklers are often present in many facilities due to national and/or local fire codes. Let s review a few of the key elements of Water Sprinklers and Water Mist Suppression Systems. Slide 33 Water sprinkler systems are designed specifically to protect the structure of a building. The system is activated when the given environment reaches a designated temperature and the valve fuse opens. A valve fuse is a solder or glass bulb that opens when it reaches a temperature of 165175F or 74-79C. Slide 34 There are currently three configurations of water sprinkler systems available: wet-pipe, dry-pipe, and pre-action. Wet-pipe systems are the most commonly used and are usually found in insulated buildings. Dry-pipe systems are charged with compressed air or nitrogen to prevent damage from freezing. Pre-action systems prevent accidental water discharge by requiring a combination of sensors to activate before allowing water to fill the sprinkler pipes. Because of this feature, preaction systems are highly recommended for data center environments. Lastly, it is important to note that water sprinklers are not typically recommended for data centers, but depending on local fire codes they may be required. Slide 35
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The last suppression system we will be discussing is the water mist suppression system. When the system is activated it discharges a very fine mist of water onto a fire. The mist of water extinguishes the fire by absorbing heat. By doing so, vapor is produced, causing a barrier between the flame and the oxygen needed to sustain the fire. Remember the "fire triangle"? The mist system effectively takes away two of the main components of fire, heat and oxygen. This makes the system highly effective. Additionally, because a fine mist is used, less water is needed; therefore, the water mist system needs minimal storage space. Water mist systems are gaining popularity due to their effectiveness in industrial environments. Because of this, we may see an increase in the utilization of such systems in data centers. Slide 36 In summary: The three system objectives of a data center fire protection system are: To identify the presence of a fire Communicate the threat to the authorities and occupants To suppress the fire and limit any damage The 2 types of smoke detectors that are used effectively in data centers are: Intelligent spot type smoke detectors Air sampling smoke detectors Signaling devices provide: Audible alarms such as: Horns Bells or sirens Visual alarms such as: Strobes Slide 37 The most common fire suppression devices used in data centers are: Fire extinguishers
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Total Flooding Fire Extinguishing Systems The most commonly used gaseous agents in data centers are: Inert gases Fluorine Based Compounds Additional methods of fire suppression are: Water sprinkler Water mist suppression systems Slide 38 Thank you for participating in this Data Center University TM course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Slide 5 Two types of standards for racks and enclosures are: The 19 inch standard Earthquake standards Slide 6 The Electronics Industries Association (EIA) established the EIA-310 standard to ensure physical compatibility between racks, enclosures, and rack mounted equipment. The intent of the standard is to ensure compatibility and flexibility within the Data Center. EIA-310 is used world-wide for 19Inch Rack-Mounted equipment. Slide 7 EIA-310 defines the Rack Unit (U) to be the usable vertical space for a piece of rack mounted equipment. The U is equal to 1.75 inches. If a rack is described to be 10U, it means that there is a physical interior vertical space of 17.5 inches available for equipment mounting. Slide 8 There are several types of vertical mounting rails for standard equipment. These include square holes for cage (captive) nuts and clip nuts, or round holes, with or without threads. The 19 Inch Standard defines important dimensions for racks, enclosures, and rack mounted equipment. For example, EIA-310 defines minimum enclosure opening between rails to be 450 mm (17.72 inches), to provide clearance for equipment chassis widths. The width between the centers of the equipment mounting holes is 465 1.6 mm (18.31 inches 0.063 inches).
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The minimum enclosure width to provide clearance for equipment front panels/ bezels/ faceplates is 483.4 mm (19 inches). Slide 9 Most enclosures now use square holes and cage nuts, although some customers require threaded holes or non-threaded through holes. The more common square holes with cage nuts support several thread sizes and types. If a cage nuts threads get damaged, the repair is as easy as replacing the cage nut. Because the cage nut floats in its mount, the nut has some freedom to move, which makes nut and bolt alignment easier. The trend for open frame racks is to have threaded holes. There are many thread sizes, but #1224 is the most common thread size. The main advantage of threaded holes placed directly into the rack is that deployment is fast, since there are no cage nuts to install. Slide 10 Uniform Building Code (UBC) and Eurocode specify how enclosures should be bolted to the floor in geographies where there is a high risk for earthquakes. The Network Equipment Building System (NEBS) and the European Technical Standards Institute (ETSI) standards have more stringent requirements than the UBC and Eurocode, and specify floor anchoring and reinforced frame structures for enclosures. Slide 11 The Open Frame Rack comes in two basic types: Two Post and Four Post. Slide 12 The Two Post frame also known as a relay rack holds equipment that can be front or center mounted. It is typically used for lightweight applications in IT environments. Although the two post frame has a relatively low price, it offers no security, no airflow control, low weight capacity and low stability.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Depending upon the manufacturer, common rack accessories may include shelving, vertical cable organizers, brackets for power distribution, and baying kits which permit several racks to be joined together. Slide 13 The Four Post frame allows equipment to be supported from the front and back, making it a more versatile option than the two post frame. It is typically used for server, networking, and telecom applications in IT environments. The obvious advantage to the Four Post frame is that it is physically stronger than the Two Post frame and can support heavier equipment. Depending upon the manufacturer, common rack accessories may include light and heavy-duty shelves, vertical cable organizers, brackets for power distribution, and baying kits. Slide 14 Open frames have the advantage of allowing easy access to equipment, and they can be easily assembled by the owner. They are also a low cost, economical solution. Significant disadvantages of open frames are: They do not provide physical security/ protection The equipment is exposed They do not allow for optimized airflow in densely packed or high-heat-producing configurations
The Open Frame rack typically relies on natural convection to dissipate heat from equipment. As the density of rack mounted equipment increases, natural convection has a limited ability to remove the heat that needs to be dissipated. Enclosures, discussed in the next section of this course, provide an improved means to control and manage airflow. Slide 15
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Enclosures are advanced rack containment systems. As these illustration show, there are several varieties of basic enclosure designs. However, most enclosures include front and rear doors, side panels, and a roof. Within an enclosure, channels are created for forced air to move through rackmounted equipment. These channels provide enhanced air cooling capability over open racks. Slide 16 Depending upon the manufacturer, enclosures may also have cable management options, power distribution units, power protection devices, cooling devices, environmental management systems, and other accessories. Slide 17 Compared to open frame racks, enclosures offer improved static load capacity, cooling, security, and multi-vendor compatibility for rack mounted equipment. Next ,we will discuss some common enclosure types. Slide 18 This slide shows an example of a server enclosure. There are different enclosure sizes for different applications. Server applications most commonly use 42U high x 600mm wide x 1070 mm deep. Server enclosures have been getting deeper to support the higher densities of power and cabling. Some applications that have high cable density, combine network switches with server equipment, or use side-to-side cooling instead of front-to-back cooling. Those applications will require enclosures that are wider than 600mm.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Some rooms that have high ceilings may permit enclosures to be as tall as 47 U. Some 47U applications may also require wide enclosures. When using tall enclosures, be cautious about safety regulations and overhead fire suppression sprinklers.
Slide 19 This slide shows an example of a network enclosure. High-density cabling or networking applications typically require 42U x 750mm wide . Slide 20 As shown in this illustration from behind the networking enclosure, networking applications require wider racks than server applications, to give room for cabling. A fully loaded networking enclosure can require up to 2000 Category 5 or Category 6 network cables. Slide 21 Here is an example of a seismic enclosure. Seismic enclosures, are specially reinforced to protect equipment from earthquakes. To ensure equipment and personnel safety, seismic enclosure installations should conform to regional standards, such as NEBS or ETSI for Zone 4. Most commercial data centers and telecom central offices that are not in high risk zones, utilize less stringent standards like the UBC or Eurocode, rather than the stricter NEBS or ETSI standards. Slide 22 Here is an example of a wall mount enclosure. Wall mount enclosures are useful when only a couple of pieces of rack equipment need to be enclosed. One of the key features of the wall mount enclosure is its double-hinged frame construction, which allows easy access to the rear of the rack mounted equipment. Wall mount enclosures conserve floor space and provide a neat, clean installation for wiring closets. Slide 23
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
APC recently conducted a worldwide survey of CIOs, Facility Managers, and IT managers. Based on survey findings, 5 areas have been identified for optimization with regard to rack system selection. They are: 1. Lifecycle Costs 2. Availability 3. Maintenance and Serviceability 4. Adaptability and Scalability (Flexibility) 5. Manageability Slide 24 The survey found that optimizing lifecycle costs was the most important requirement to most organizations. The most common problems that pose a challenge to the optimization of lifecycle costs with regard to rack systems are: Non-standardized racks Non-standardized racks lead to a higher total cost of ownership, due to the unique design features dictated by the IT equipment manufacturers. These non-standard design features result in difficulty with moves and the integration of multi-vendor equipment. A much better solution is to purchase vendor-neutral racks with guaranteed universal compatibility. Vendor neutral racks allow for greater flexibility when purchasing and mounting equipment, and more standard processes for mounting and servicing equipment. Slow speed of deployment. The time and work involved in the assembly of non-standard equipment (racks) or even in migration and refreshes are costly, both in downtime and labor. Preengineered solutions save time and simplify planning and installation. Slide 25
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
The survey revealed that optimizing availability was also an important requirement. The most common problems that pose a challenge to optimizing availability are: 1. Inadequate airflow to IT equipment damages hardware. This problem has increased over the last few years with the dramatic increase in heat densities. And it is important to note that there is no standard for measuring cooling effectiveness when comparing enclosures. 2. Inadequate power redundancy to the rack. The solution is to bring dual power paths to single or dual-corded IT equipment. 3. Lack of physical security. Because of the increased demands to provide ample air, power, and data to racks, the number of individuals accessing enclosures for service tasks has increased, leaving the units more vulnerable to human error. Enclosures need to be physically secured with locking doors and locking side panels to prevent unauthorized or accidental access. 4. Non-compliance with seismic requirements. The solution is to have all racks that are located in Zone-4 regions to be in compliance with seismic building standards. The following slides offer solutions for improving airflow as a means of increasing availability. Slide 26 Good front-to-back airflow door ventilation is critical to effective cooling. This slide shows examples of perforated front and rear doors that provide for maximum ventilation. Slide 27 Blanking panels are covers that are placed over empty rack spaces. Keeping blanking panels snugly in place prevents heated exhaust from being recirculated and entering IT equipment intakes. The main reason why blanking panels are not commonly used, is that the benefits of blanking panels are not always understood. People often fail to realize the cooling benefits that they provide, and mistakenly think that the are for aesthetic purposes only or that they are difficult to install. Slide 28
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Having blanking panels that snap-in to any square-holed rack enclosure, and install without tools, significantly reduces the time and labor cost associated with installing panels. In addition, by standardizing on a panel size of 1U, racks can be populated easily, rather than dividing empty spaces into various-sized panels of 1, 2, 4, and 8U. Slide 29 This slide shows an Air Distribution Unit (ADU) installed in a rack system. An ADU is a cooling device for raised floor applications that mounts at the bottom 2U of any EIA-310 19 inch rack that has an open base. The blue lines represent cooling airflow. The ADU connects into the raised floor and pulls supply air directly into the enclosure. This prevents the conditioned air from mixing with warmer room air before reaching the equipment. The ADU minimizes temperature differences between the top and bottom of the enclosure. It also prevents hot exhaust air from recirculating to the inlet of the enclosure. This is a detailed view of an ADU. An ADU is only recommended as a problem-solver for heat densities of up to 3.5 kW per rack. An ADU is good for overcoming low ventilation pressure under raised floors. Slide 30 This slide shows a side ADU installed above a rack mounted device with side-to-side airflow. The blue lines represent cooling airflow. The red lines represent warm airflow. The side ADU pulls air in from the cold aisle, and redirects and distributes it to the equipment inlet, located on the right side. Slide 31 This slide shows the airflow for an Air Removal Unit (ARU). The ARU is a scalable cooling solution, because it can be added to an existing rack enclosure, and requires no internal rack space or raised floor connections to install. It replaces the rear door of an enclosure. This example shows a unit with a redundant fan for improved availability.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Cool air enters the rack, exhausts out the rear of the rack equipment, is pulled through the rack Air Removal Unit, and is released through the top. The high powered fans in the Rack Air Removal Unit overcome the air resistance of cables in the rear of the rack, and prevent exhaust air re-circulation. An optional, ducted exhaust system delivers hot air to the space above a drop-down ceiling or some other type of enclosed overhead space, and eliminates the possibility of hot air mixing with room air.
Slide 32 Maintenance and serviceability improve with practical experience. The most common problems that pose a challenge to maintenance and serviceability are: 1. Server migration delays. Limited space and deployment speed typically cause server migration delays. Enclosures that offer split doors save aisle space and make equipment access easier. Quick release doors and side panels also save time. 2. Poor cable management leads to IT equipment damage, because of airflow obstruction. Wires that form rats nests make it difficult to identify individual power and data cables. Abandoned cables get intertwined with active cables, and block airflow under raised floors. Storing power and data cables at the rear of the racks makes them easier to access. Routing data and power cables above the racks makes them more organized and accessible, and eliminates potential air dams under raised floors. 3. Non-standardized racks are a maintenance issue, because server manufacturers often state that the warrantee is void if a server is placed in a rack that does not comply with specific rack standards. Apart from aesthetics, this non-standard approach introduces complexity due to the
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
unique characteristics of each rack. Rack vendors should guarantee compatibility with all servers. The racks should meet or exceed a server manufacturers ventilation and spatial requirements, and
comply with the Electronic Industries Alliance (EIA) 310-D standard for rack mounting IT and networking equipment. Slide 33 The survey found that adaptability and scalability need to be optimized. Some of the problems that hinder optimization are: Frequently changing power and cooling requirements. Racks may have to support different power requirements, multiple supply voltages or several outlet types. Rack systems therefore need to provide tool-less Power Distribution Units (PDUs), and three phase power whips to support changeover capability for different voltages, power capacities, and outlets. Changing room layouts cause migration and mobility problems. Rack enclosures should provide field-reversible doors, quick-release hinge pins, quick-release side panels, and castors for mobility. Racks should also adapt to new overhead power and data cabling systems. Slide 34 Some of the problems that pose a challenge to optimizing manageability are: Lack of environmental monitoring capability at the rack level. A lack of an environmental monitoring capability leads to difficulty identifying thermal gradients from the top to the bottom of the rack. It also causes difficulty detecting hazards such as smoke and humidity extremes. Any large thermal gradient could lead to equipment damage or shutdown. The solution is to provide environmental management devices, and a graphical user interface that allows remote monitoring, along with automatic email, pager, or telephone notification of changes in the rack level environment.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Lack of power monitoring capability at the rack level. Monitoring racks is critical to availability. Branch circuit logic monitoring is crucial. The solution is to provide display mechanisms that can automatically report and manage power conditions on power strips at the rack level, locally or remotely, through a digital display.
Lack of critical management of IT equipment. The study attributes this lack to the growing popularity of server clusters. According to Dells senior manager of product marketing for clustering, clustering is increasingly used in mission-critical environments. IT personnel want a solution to centrally manage all equipment from one location. Lack of security at the rack level. A solution is to provide rack locks as well display screens and automatic notification to report and manage rack level security breaches. Slide 35 Lastly, physical considerations for rack layout is very important when designing a data center. Racks should be arranged to form alternating hot and cold aisles. When choosing a rack, it is important to select dimensions that work well with layout calculations. This illustration shows an optimal design with cold aisles that are four feet wide, and hot aisles that are three feet wide. Slide 36 This course has covered Rack Standards, Rack Types, and Rack Enclosures, Best Practices for Rack System Selection, and Physical Considerations for Rack Layout. Major points to remember include: How racks are selected and configured has a profound and lasting impact on a data centers availability, agility, and total cost of ownership Enclosures enhance rack system cooling by preventing hot and cold air from mixing
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.
Enclosures should be universal, modular, organized, and scalable Racks should be arranged to form alternating hot and cold aisles Slide 37 Thank you for participating in this Data Center University course.
2012 Schneider Electric. All rights reserved. All trademarks provided are the property of their respective owners.