Cerebras showcased its new AI supercomputer Andromeda at SC22. With 13.5 million cores in 16 Cerebras CS-2 systems, Andromeda boasts one exaflop of AI compute and 120 petaflops of dense compute. Its computing workhorse is Cerebras’ wafer-scale manycore processor, WSE-2.
Each WSE-2 wafer has three physical planes, which handle arithmetic, memory and communications. By itself, the memory plan’s 40GB of onboard SRAM can hold an entire BERTLARGE. But the arithmetic plane also has around 850,000 independent cores and 3.4 million FPUs. These cores have an overall internal bandwidth of approximately 20 PB/s, across the Cartesian mesh of the communication plane.
Cerebras is emphasizing what it calls “near-perfect linear scaling,” meaning that for a given job, two CS-2s will do it twice as fast as one, three will take a third as long, and so on. How? Andromeda’s SC-2 systems rely on parallelization, Cerebras said, from the cores on each wafer to the SwarmX fabric that coordinates them all. But the supercomputer’s talents extend beyond its already impressive 16 knots. Using the same data parallelization, researchers can link up to 192 CS-2 systems together for a single job.
Andromeda grows with Epyc victories
Andromeda gets its data from a bank of 64-core AMD EPYC 3 processors. These processors, AMD said via email, work in tandem with the CS-2 wafers, performing “a wide range of data pre- and post-processing.”
“AMD EPYC is the best choice for this type of cluster,” Cerebras founder and CEO Andrew Feldman told us, “because it offers unmatched core density, memory capacity, and IO. This made the obvious choice to provide data to the Andromeda supercomputer.
Among its sixteen second-generation wafer-scale engines, Andromeda runs on 18,164 Epyc 3 cores. However, that throughput comes at a price. All in all, the system consumes around 500 kilowatts when running at full capacity.
Go Big or Go Home
Andromeda is not the fastest supercomputer on earth. Frontier, an Oak Ridge National Lab supercomputer capable of running nuclear weapons simulations, passed the exaflop limit earlier this year. Frontier also runs at higher precision, from 64-bit to Andromeda’s 16-bit half-precision. But not all operations require nuclear-grade accuracy. Andromeda no try be Frontier.
“I’m a bigger car. We are not beating them. They cost $600 million to build. That’s less than $35 million,” Feldman said.
Nor is Andromeda trying to usurp Polaris, a cluster of more than two thousand Nvidia A100 GPUs at the Argonne National Lab. Indeed, like Andromeda, Polaris itself uses AMD EPYC cores to do pre- and post-processing. Instead, each supercomputer excels at a slightly different kind of job.
In general, CPUs are generalist while ASICs (including GPUs) and FPGAs are more specialized. This is why cryptocurrency miners love GPUs. The blockchain involves a lot of repetitive math. But Andromeda is even more specialized. It excels at handling large sparse matrices: multidimensional arrays of tensor data that are mostly zeros.
AI is highly data intensive, both in the pipeline and in the actual computation of the AI. So, Feldman said, Andromeda uses Epyc processors to streamline the process. “The AMD Epyc-based machines are on servers outside of Cerebras CS-2,” Feldman said, coordinating and preparing the data. Then, Andromeda’s SwarmX and MemoryX fabrics take over.
A GPU cluster must coordinate across every core, card, and server rack. This leads to an inevitable delay. There is also exponential memory overhead as networks get larger and more complex. In contrast, WSE-2 manages much of its information pipeline within the same piece of hardware. At the same time, Cerebras’ manycore wafer-scale processors can do more on a single (giant) piece of silicon than a consumer CPU or GPU. This allows Andromeda to handle deeply parallel tasks.
Great language models
In the same way that a Formula 1 racing car is wasted on surface roads, Andromeda finds its stride on a large scale. Nowhere is this more evident than its overwhelming success with large language models (LLMs).
Imagine an Excel spreadsheet with one row and one column for every single word of the entire English language. Natural language processing models use matrices, special spreadsheet-like grids, to plot the relationships between words. These models can have billions, even tens of billions, of parameters. Their sequences can be 50,000 tokens long. You’d think that as the training set grew, that exponential overload would strike again. But LLMs often work using the sparse tensors that Andromeda loves.
Andromeda customers, including AstraZeneca and GlaxoSmithKline, report success using LLM on Andromeda to search for “homicides,” including the COVID genome and epigenome. During an experiment at the National Energy Technology Lab, scientists describe doing an “impossible GPU” job with Andromeda that Polaris simply couldn’t complete. And it may not crunch the numbers for nuclear bombs, but Andromeda is also hard at work on fusion research.
“Coupling the AI power of the CS-2 with precision simulation from Lassen creates a CogSim computer that opens new doors for inertial confinement fusion (ICF) experiments at the National Ignition Facility,” said Brian Spears of Lawrence Livermore National Lab.
Andromeda meets the academic world
Andromeda currently lives in Colovore, an HPC data center in Santa Clara. But Cerebras has also made time for academics and college students to use Andromeda for free.
And there’s one more thing graduate students, in machine learning and elsewhere, might want to note: Andromeda plays well with Python. In machine learning, that’s at stake, but we mean it truly well. You can submit AI work to Andromeda, Cerebras says, “quickly and painlessly from a Jupyter notebook, and users can switch between models with just a few keystrokes.”
“It’s remarkable that Cerebras has provided graduate students with free access to such a large cluster,” said Mateo Espinosa, a doctoral candidate at the University of Cambridge in the UK. Espinosa, who previously worked at Cerebras, is working with Andromeda on her thesis on explainable artificial intelligence. “Andromeda offers 13.5 million AI cores and near-perfect linear scaling across the largest language models, without the pain of distributed computing and parallel programming. This is every ML graduate student’s dream.
Machine learning must swim upstream in an ever-growing river of data. Up to a point, we can simply dedicate more hardware to the task. But within and across networks, latency starts to add up quickly. To get the same amount of work done in a given amount of time, you have to spend more energy on the problem. The sheer volume of data makes throughput its bottleneck. That “triple point” is where Cerebras tries to make its mark.
All Andromeda images courtesy of Cerebras.