Author Archive for Christian Tanasescu

Extreme Scale-up with SGI® Altix® 4700 and Intel® Itanium® Processors

This week SGI published three new word records at Standard Performance Evaluation Corporation (SPEC) for SPECjbb2005, SPECfp_rate_base2006 and SPECint_rate_base2006. The SGI benchmarking team achieved these results at the Leibniz Supercomputing Centre (LRZ) in Garching, Germany, on an Altix 4700 with 1024 Itanium 9040 cores, 1.6GHz, 18MB cache and running SLES10 with ProPack. The results reconfirm that SGI Altix 4700 is the most scalable platform suitable for application fusion as proved by the new world record results in the series of SPEC benchmarks:

SPECjbb2005 benchmarks evaluate the performance of servers running typical Java business applications for Internet, finance, enterprise and database applications by emulating a three-tier client/server system, with emphasis on the middle tier. The benchmark exercises the capabilities of the Java Virtual Machine (JVM), its operating system and the performance of CPUs, caches, memory hierarchy and the scalability of the shared-memory system. SGI raised the bar again to recapture undisputed leadership in this benchmark with 9,611,262 Business Operations per Second (BOPS) on Altix 4700 with 512 Itanium 9040 cores,1.6GHz and 18MB cache using Oracle® JRockit, a Java Virtual Machine (JVM). The new SGI record is over 74 percent higher than the previous record.

SPEC CPU2006 rate benchmark measures the capacity of a system to complete a fixed number of tasks. In a large shared-memory environment, this test stresses the scalability of the operating system, the memory subsystem, and to some extent the I/O subsystem.

SPECfp_rate_base2006 is an indicator of system response for HPC workloads; it is a mix of floating point intensive applications from different domains that stress the platform in different ways. Running SPECfp_rate_base2006 at this scale is non-trivial because it puts significant stress on the kernel, scheduler, file system, memory bandwidth and IO bandwidth. Using a partition of the Altix 4700 system at LRZ, configured with 1024 Itanium cores, 1.6GHz and 18MB cache, SGI achieved the word record with a SPECfp_rate_base2006 score of 10600. This result is more than five times faster than the closest Single System Image (SSI) competitor on the SPEC list.

SPECint_rate_base2006 is an industry-standard benchmark suite to measure system performance when running an integer-intensive workload. The same challenges mentioned above apply to run at this scale. On the SGI benchmarking team set the SPECint_rate_base2006 world record, achieving a score of 9030, which is four times faster than the next closest SSI competitor.

To complete this story, let me recall another important result. Since 2006, SGI Altix 4700, installed at LRZ, has held the world record for STREAM, the industry-standard benchmark to measure the aggregate memory bandwidth,with 4.35TB/s, which is 5x faster than the closest SSI competitor.

To summarize the facts, SGI Altix 4700 is proven to be:

4x higher in the number of cores in a Single System Image
5x higher in memory bandwidth
5x better in performance for floating point workloads
4x better in performance for integer workloads

This leads me to the conclusion that Altix 4700 with Intel Itanium processors defines a new platform class: extreme scale-up architecture, it pushes the scale-up concept to new limits.

Why is scale-up relevant? Well, many key HPC problems like cryptography, fraud detection, search engines, complex event processing and graph-based problems, simply do not run on clusters. As an IDC study revealed, the majority of ISV applications run on a single node because massive in-memory computation enables full-scale system simulation without the need to reduce resolution or precision, and without breaking the problem apart. Applications bound by random I/O data access can achieve huge performance gains when bringing the entire dataset into memory. Load balancing can not be corrected on a small node cluster without explicitly copying data to it. In a scale-up system, the work is simply directed to the available processor. Most importantly for developers, the single system image doesn’t restrict parallel programming models that can be used, including hybrid schemes, to enable key research into programming model use, which is especially important as we move into the new world of multi-core CPUs. And of course, system administration is much easier.

SPEC results are available at:

http://www.spec.org/jbb2005/results/res2009q3/jbb2005-20090727-00756.html
http://www.spec.org/cpu2006/results/res2009q3/cpu2006-20090802-08313.html
http://www.spec.org/cpu2006/results/res2009q3/cpu2006-20090802-08312.html
http://www.cs.virginia.edu/stream/top20/Bandwidth.html

SPEC, SPECint, SPECfp and SPECjbb are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of 9/03/2009.

Data analysis — from science to enterprise

It’s a little known fact that the same uniquely scalable shared memory architecture that enables SGI Altix® 4700 with Intel® Itanium® processors to power technical and scientific breakthroughs also excels at high-performance reasoning on ontologies for enterprise data analytics applications. Ontologies are knowledge models or formal representations of a set of relevant concepts within a domain and the relationship between these concepts.

A major advantage of ontologies for data analytics is the ability to share the meaning (semantics) of information in a knowledge model, capture complex relationships and integrate heterogeneous data sources. These characteristics can only be achieved if the ontology run-time is able to scale with a growing number of facts.

Silicon Graphics and Ontoprise GmbH have demonstrated that OntoBroker® inference engine can load and process large ontologies in main memory. The SGI Altix 4700 platform has a unique configuration elasticity that allows adding processors and memory independent of each other, and thus easily accommodating user growth and securing hardware investment with a lower TCO. Here are some benchmark examples using the OntoBroker inference engine on a SGI Altix 4700 server:

–Complex graph traversal. Finding the possible traversable paths between the nodes in a graph comprising of 1 million nodes takes only 15.7 seconds, thus making complex reasoning possible

–Semantic retrieval and query processing with 104 facts. Five classes of 195 queries with varying complexity run in less than 18 milliseconds on the SGI Altix® 4700

–SmartWeb®. The longest queries for multimedia web content with 95 queries and 60 rules takes an average of 17 seconds from a disk-based database but only an average of 73 milliseconds from a non-materialized in-memory model.

–Wikipedia® knowledge base search. The most complex query with average result sets from Wikipedia® knowledge base takes less than 35 ms with a trade-off of load time of 48 million wiki facts in 138 minutes

–Automotive test. A large ontology comprising of 1.4 million facts takes only 133 seconds for in-memory load while queries over the indexed database model, and the database load took 381 seconds. The query times showed that large result sets benefit most from an in-memory model

So with a professional reasoning engine like OntoBroker, and scalable servers like SGI Altix 4700, users of many applications can find required information much more quickly and easily.

Massive memory

As a leader in computationally intensive computing, SGI tends to set the pace for the large memory systems often required to crunch the numbers in massive data sets. In a recent press release, we announced that our Altix systems have now achieved 21 Terabytes of globally addressable memory at customer sites. I’d like to explain what this means in more depth and offer examples.

The Altix 450/4700 (Itanium) systems can accommodate 128 terabytes of globally shared memory under the control of a single instance of the Linux operating system. The system may also be partitioned among multiple instances of Linux and provide globally addressable shared memory among OS instances via SGI’s unique NUMAlink® interconnect technology. What this means to the customer is essentially saving time: time-to-results, time-to-solution and time-to-innovation. It significantly simplifies application development and debugging for all parallel programming models be it OpenMP, pthreads, MPI or SHMEM.

In addition, it offers an integrated platform for application fusion, which enables running a mix of different applications and workloads. As workloads usually change during the project life-cycle, a global shared memory platform lowers TCO compared to clusters that require node reconfiguration.

We have seen great success for memory-resident database applications with uses in Internet data centers and transaction processing; as well as those based on “graph theory,” an important area of mathematics with uses in defense and homeland security applications, multi-disciplinary science, and data assimilation. Some customers who are already seeing the advantages of the SGI Altix product line are:

Wright-Patterson Air Force Base: the laboratory here uses an SGI Altix 4700 system with 4,608 Intel Itanium processors in a single supercomputer equipped with 20 TB of globally addressable memory and 440 TB of usable disk space. Globally addressable memory means applications can be shared across various operating systems via SGI NUMAlink. One of the largest computers in the Department of Defense, the SGI resource helps DoD researchers to design faster, reduce risk by increasing the quality of modeling and simulation, and support an intensifying effort to develop “game-changing” computational science and engineering applications.

The Leibniz Supercomputing Centre Munich (LRZ): This facility operates a 4,864 Intel Itanium processor system with slightly over 39 TB of globally addressable memory that is hard at work solving increasingly complex simulations in physics and astrophysics, materials research, fluid dynamics, chemistry, geosciences and biological sciences.

Click here for more information about SGI® Altix® Itanium globally addressable memory capabilities, or click here for the press release.

Itanium vs. Russian Weather

In my last post I mentioned a Silicon Graphics hybrid solution being used at the Institute of Cancer Research. To expand on that story, we clearly see hybrid solutions becoming increasing prevalent to high-end HPC implementations. High-performance computer environments often handle a mix of workflows that place a dual burden on IT platforms. Some workflows need shared-memory platforms to handle large problems or data sets. Other workflows require massively parallel clusters that can distribute jobs across dozens, hundreds or even thousands of processors and their allocated memory. While smaller computations could be effectively handled by a capacity platform like SGI Altix XE, larger problems require a shared-memory capability platform like SGI Altix 4700 Itanium.

One such hybrid solution was recently purchased by Russia’s Federal Service for Hydrometeorology and Environmental Monitoring (Roshydromet). Able to carry out more than 27 trillion calculations a second, the system provides Roshydromet enhanced capability to expand both the duration and the accuracy of these critical forecasts.

For engineers and scientists struggling to increase productivity while facing tight budgets, hybrid computing solutions offer the flexibility and efficiency that can only come from fitting the solution to the workflow—rather than force-fitting the workflow into a particular computing platform.

Itanium part of leading cancer research center solution

As Senior Director of Application and Performance Engineering at SGI at Silicon Graphics (SGI), I’ve been part of SGI’s efforts to deliver solutions powered by Intel Xeon and Itanium processors to customers around the world. Recently, we announced plans to outfit a new world-class cancer research facility in the UK. The Institute of Cancer Research in London will rely on SGI systems for groundbreaking work in integrative network biology. Unlike mainstream cancer research, which usually focuses on the structure and behavior of individual genes, proteins or cells, this new field studies how how cancer cells interact within the larger biological network. The research could lead to new drugs or treatments that prevent metastasis – the stage when cancer can turn deadly.  Studying the dynamics of cellular networks will generate enormous data sets and involve a wide range of distributed and shared-memory applications, thus requiring a true hybrid solution that addresses both sides of the computational coin.