A company had a hundred traditional, physical, Windows servers in production that they wanted to migrate into a new vSphere environment. They intended to duplicate each production server to create a hundred test virtual servers, each configured identically to its production counterpart, except they would run in an isolated network.
An assessment of the existing environment provides the following capacity and peak usage data.
- The total capacity of all the CPUs in the currently utilized environment was 480 GHz.
- The total capacity of the entire RAM in the currently utilized environment was 400 GB.
- The peak measured CPU utilization was 160 GHz.
- The peak measured RAM utilization was 180 GB.
Although each test VM was configured with the same CPU and RAM capacity as their production counterparts, the customer did not expect the test VMs to be under the same workload. In fact, they estimated that the typical test VM workload would be about 25% of the production workload.
Based the their goals to provide plenty of resources to meet the need for peak demand, expected growth, overhead, and spare capacity, the company calculated they should implement 224 GHz total CPU and 270 GB RAM for production use. Since they expected that the new test environment would require 25% of the resources required by production, they decided to add 56 GHz and 68 GB for test, resulting in totals of 280 GHz and 338 GB.
To ensure that the Test VMs and Production VMs would run nicely on the same hardware, the administrator planned to utilize resource pools. During design sessions, several options for configuring the pools were considered. Each option involved creating two resource pools, one named Production and one named Test, at the top level of a DRS Cluster. Each of these options were configured and tested in a lab environment, using software to produce processor and memory workloads in sets of VMs, causing heavy resource demand and contention. This allowed the administrator to compare the results of each configuration and make the best design choice.
Option 1 –
- Production Pool: CPU Shares = High (8,000 Shares)
- Production Pool: RAM Shares = High (8,000 Shares).
- Test Pool: CPU Shares = Low (2,000 Shares)
- Test Pool: RAM Shares = Low (2,000 Shares)
The tests showed that this configuration ensured that during periods of contention the Production VMs collectively received at least 80% of the CPU and RAM that was available. Also, the Test VMs collectively received 25% of the amount accessed by the Production VMs. Yet, during periods of heavy demand in the Production VMs and idle workload in the Test VMs, the Production VMs were able to access resources beyond what was originally planned. In other words, nothing prevented Production VMs from accessing CPU and RAM that were originally planned and added for the Test VMs. Likewise, during periods of idle workload in the Production VMs, the Test VMs were able to access resources that were originally planned for Production VMs.
Option 2 –
- Production Pool: CPU Reservation = 224 GHz
- Production Pool: RAM Reservation = 270 GB
- Test Pool: CPU Limit = 56 GHz
- Test Pool: RAM Limit = 68 GB
The tests showed that this configuration ensured the Production Pool always received at least 224 GHz CPU and 270 GB RAM whenever its VMs attempted to access that much or more. The Production Pool was able to access additional CPU and RAM during periods of low demand from Test VMs; however, the Test Pool was never able to utilize more than 56 GHz CPU and 68GB RAM, even during periods of low demand from Production. In other words, this configuration ensured that adequate memory was guaranteed for the entire set of Production VMs and a hard cap was established for the entire set of Test VMs. Although this configuration could be useful, an argument can be made that these settings are somewhat redundant. In this configuration, the Test VMs still may not access CPU beyond its limit even if the current Production workload is low and available CPU resources exist. This means that Test VMs could be running slowly although unused CPU is available.
Option 3 –
- Production CPU Reservation = 160 GHz
- Production RAM reservation = 180GB
- Production CPU and RAM Shares = High
- Test CPU reservation = 40 GHz
- Test RAM reservation = 45 GB
- Test CPU and RAM Shares = Low
The Production pool’s CPU Reservations were set to match to the estimated peak CPU and RAM utilization, without including the growth, overhead, and spare capacity. The CPU and RAM Reservations for the Test pool were set to be 25% of the corresponding reservation for the production pool. Additionally the pools were configured to allow pools to access resources beyond their reservation, such that they would compete for available resources during periods of contention, based on priority.
The tests showed that this configuration ensured that each pool was able to access at least their reserved resources during periods of heavy workload. Additionally, each pool was able to access more as needed (as expected for growth and overhead). Whenever the workload grew to a point to produce resource contention, the Production Pool was allocated 80% of the remaining resources, and the Test Pool was allocated 20%.
Option 4 –
- Production CPU Reservation = 160 GHz
- Production RAM Reservation = 180 GB
- Production CPU Limit = 224
- Production RAM Limit = 270
- Test CPU Reservation = 40 GHz
- Test RAM Reservation = 45 GB
- Test CPU Limit = 56 GHz
- Test RAM Limit = 68 GB
The tests showed that this configuration ensured that the Production and Test servers were guaranteed sufficient resources corresponding to the expected peak workload to be generated within the VMs. Additionally, each pool was able to access additional resources that were planned for overhead, growth, and spare capacity. The test showed that neither pool could access more CPU and RAM than their configured Limits, which corresponded to the amount of resources that were originally planned for each pool, including overhead, growth, and redundancy.
Eventually, Option 1 was selected because it was considered to be simple and flexible, yet effective. Option 3 was considered to be the next-best choice, but it was expected that the Reservations on each pool would have to be modified over time, as new VMs were added. Option 4 was considered to be too complex.