Effect of availability requirements on modular architecture

Many data centers include some level of redundancy to ensure fault tolerance and allow maintenance without shutting down. Redundancy means that subsystems have extra components designated as backups. [[Data Centre Redundancy]] To achieve redundancy, data centers need a basic approach to modularity. In traditional data center designs, predicting the performance of redundant systems over time can be complicated, which is why building to full capacity from the start is common. A modular data center design should clearly outline how to add IT capacity modules while maintaining redundancy. As the data center scales, the same level of redundancy should be preserved. Ideally, different areas of the data center would have adjustable redundancy levels to match varying needs and control costs. There are multiple ways to implement redundancy, such as N+1, 2N, or System-plus-System. However, these terms don’t capture the full range of options. For example, in an N+1 UPS system, redundancy could be built into the UPS, achieved through parallel UPS units, or implemented using a tri-redundant or "catcher" design with static transfer switches. Each of these leads to different architectural setups and modularity. Effective modular architectures balance redundancy goals with module size. Smaller modules reduce costs but increase complexity as more modules are needed. Design concept: modular architecture of a data center is strongly influenced by redundancy requirements, and it is impractical to have a single architecture that is effective in both low cost and high reliability data center applications. --- ## Fault Partitioning A key aspect of availability architecture is **fault partitioning** by isolating devices within a subsystem. In a modular data center, devices like chillers can either be connected to a single bus or assigned separately to different rooms or pods. When devices are paralleled on a single bus, it allows for N+1 redundancy, meaning an extra device can back up any failed unit. However, paralleling requires the bus to be designed and analyzed for all possible configurations, which adds complexity, especially for systems like chiller piping or UPS wiring. This complexity undermines the benefits of modularity. By assigning each device to a specific pod or room, this complexity is eliminated because the infrastructure is simpler and defined in advance. Adding new data center capacity doesn't disrupt existing power and cooling systems. However, this approach requires a separate redundant unit for each device, which can be costly. To address this, modern modular devices with built-in N+1 redundancy are available, often at a lower cost than traditional parallel bussing. ![[Pasted image 20241013172333.png]] > Design concept: different approaches to paralleling of power busses within a modular architecture are a key differentiating attribute between alternative designs. Systems with independence between busses (least paralleling) are the most scalable and flexible and the easiest to maintain and upgrade without risk of downtime. However, to achieve redundancy cost-effectively this way typically requires devices such as UPS and chilling plants with device redundancy (internal N+1 architecture within the device).