Looking at an example in the Mechanical Computer Aided Design (MCAE) area, techniques have been developed to use memory efficiently and not allowing the OS to start swapping. For example, in simulating a car crash, running on a single thin node would take significant amounts of memory. To get maximum performance, the entire model, composed of breaking up the solid car definition into small elements ("finite elements"), the material properties, the external forces, and the like, would have to be loaded into memory. Depending on the resolution of the model-- basically, how many elements are created and how many time steps are solved for -- a tremendous amount of memory would have to be used. Applications like this can be solved either on one system, or broken into smaller parts and solved on a number of horizontal systems.
Application developers have been able to write software that runs across a number of systems. The problem is divided into smaller pieces, with each piece placed on a different computer. For example, the hood of the car can be simulated on one system, the engine compartment on another, the roof on a third, and so on. The different systems would need to communicate the boundary conditions to the other systems, which require fast communications. The benefit to this approach is that the simulation can be run in much less time, since a number of CPUs are working together, in parallel. Also, since each node or core only gets a part of the data, the memory requirements are less on each node. In total, the memory requirements will be similar (or slightly more) compared to running on one system, but the time to solution will be considerably less. In an era of high competitiveness in many markets, the time to get the results outweighs the additional cost of purchasing more computer systems.
An example of a crash analysis CAE code is LS-DYNA from Livermore Software Technology (LSTC). Using one of the test codes, called "Neon Refined," customers or systems vendors can run this test case across various machine types, and look at different parameters. Figure 2 shows how an MCAE application scales across multiple machines. The system used for this example was the Sun Fire X2100, which contains one socket per system. By dividing the problem and using the memory associated with each node, the LS-DYNA run is very scalable.
Another example where running an application in a horizontal scaling environment is in the area of Computational Fluid Dynamics (CFD). Since the simulation can be broken up and solved in a parallel manner, the scaling can be very good and even slightly "super-linear" (see Figure 3). Since the multiple CPUs or cores contain more cache than a single core, this typically happens when more data can be held closer to the computational unit. Thus, the overall processing will be faster, since there is less access to main memory.
The amount of memory is critical in the overall performance of the system. The general rule is not to skimp on memory purchases and then to buy the fastest CPU available. It is important to investigate whether paying more for RAM is more beneficial than paying more for a faster CPU. In addition, by scaling horizontally, more memory can be addressed, which may result in higher overall performance.