What is High Performance Computing (HPC) in DataCenter?


When it comes to high-performance computing or High Performance Computing (HPC), we tend to solve several types of problems. These problems usually fall into one of four categories:

Compute intensive - A single problem requires a large amount of computation.
Memory intensive - A single problem requires a large amount of memory.
Data intensive - A single problem that works on a large data set.
Throughput intensive - Many unrelated problems are calculated simultaneously.

This article will introduce the HPC in detail to help you understand what they mean in solving the common problems listed above.

The workload is heavy on the handle
First, let's look at problems that require a lot of processing. The goal is to distribute work on a single problem to multiple CPUs to reduce processing time as much as possible. To do this, we need to perform the problem in parallel. Each process or thread processes part of the workload and executes them simultaneously. CPUs often need to exchange data quickly, requiring specialized hardware for communication. Examples of these types of problems can be found when analyzing data related to tasks such as modeling in finance, risk management or healthcare. This is probably the largest part of the HPC issue set and is the traditional sector of HPC.

When trying to solve heavy processing problems, we might think that adding more CPUs will reduce the execution time. This is not always true. Most parallel codebases usually have what we call scaling-limit. The reason is because the system is overloaded by managing too many copies, in addition to other basic constraints.

This is summarized in Amdahl's law.

In computer architecture, Amdahl's law is a formula that speeds up the theoretical speed of the latency of performing a task with the expected fixed workload of a system that has resources improved. It was named after computer scientist Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.

Amdahl's law, commonly used in parallel computing to predict theoretical acceleration when using multiple processors. For example, if a program needs 20 hours to use a single processing core and a specific part of the program takes an hour to execute, it cannot be parallel, while 19 hours (p = 0.95) real time can now be parallelized regardless of how many processors are devoted to parallel execution of this program, the minimum execution time cannot be less than that critical hour. Therefore, the theoretical acceleration is limited to 20 times (1 / (1 - p) = 20). For this reason, parallel processing with multiple processors is only useful for programs of a naturally parallel nature - Wikipedia

Amdahl's law can be constructed in the following way, where:

S [latency] is the theoretical acceleration of performing an entire task;
s is the acceleration of the part of the task that benefits from improved system resources;
p is the ratio of the time taken to which the benefit from the original improved resources is occupied.
More,

Example chart: If 95% of the program can be parallelized, the theoretical maximum speed using parallel processing will be 20 times.

The bottom line: When you create multiple parts of a problem that can run concurrently, you can divide the work between more processors and thus gain more benefits. However, due to the complexity and operating costs, eventually the use of multiple CPUs becomes detrimental instead of really useful.

There are many libraries that help with parallelization of problems, such as OpenMP or Open MPI, but before moving to these libraries, we should try to optimize performance on a CPU, then make p bigger the better.

The workload is heavy on memory
Memory-intensive workloads will require more memory space than more CPUs. In my opinion, this is one of the most difficult problems to solve and often requires great care when setting up your system. Programming and transcoding is easier because the memory is seamless, allowing the creation of a single system image. However, the optimization becomes more difficult as it extends beyond the initial initialization of the system because of the uniformity of the components. Traditionally, in data centers, you will not replace servers every three years. If we want to have more resources in the cluster, and we want consistent performance, heterogeneous memory will produce actual latency. We also have to think about the connection between CPU and memory.

Today, most of these concerns have been eliminated by popular servers. We can ask for thousands of similar instances with the same hardware and specifications, and companies like Amazon Web Services are happy to use them.

The workload is heavy on data
This is probably the most common workload we find today and probably the one with the most buzzword. These are called Big Data workloads. Data-heavy workloads are workloads suitable for software packages such as Hadoop or MapReduce. We distribute data to a specific problem on multiple CPUs to reduce the time to execute them. Similar work can be done on each data segment, although not always. This is essentially the inverse of a memory-heavy workload in moving data quickly to and from drives that are more important than connectivity. The type of problem being solved in these workloads is usually in the field of Life Science (genomics) or the field of research and has a wide scope in commercial applications, especially around user data and interoperability. operative.

Workloads need high throughput
Batch processing jobs (jobs with almost negligible operations to execute in parallel as well as jobs with little or no communication between CPUs) are considered heavy workloads with high throughput. . In high-throughput workloads, we focus on throughput over a period of time rather than processing performance on any problem. We distribute multiple problems independently on multiple CPUs to reduce overall execution time. These workloads need:

Divide into independent parts naturally
There is little or no CPU-CPU communication
Execute in separate processes or threads on a CPU (simultaneously)
Workloads that are heavy on processing can be divided into workloads that are heavy on high throughput, however, high throughput workloads do not necessarily require a lot of CPU.