For nearly as long as we’ve had supercomputers, we’ve also had people asking themselves, “How do I build myself one of those, except with a tenth of the budget and using just a fraction of the power?” Several teams of scientists have built “Beowulf clusters,” supercomputers that are actually clusters of commodity-grade hardware, sharing their own LAN. And remember all those PlayStation supercomputers? Now, a team of students at Southern Methodist University in Dallas has built a supercomputer by connecting 16 Nvidia Jetson Nano modules together, along with four power supplies, a network switch, some cooling fans, and roughly five dozen handmade wires. (Fact: All the best prototypes always have hand-soldered wires hanging out the back.)
According to Conner Ozenne, a senior computer science major and one of the leads on the project, “We chose to use Nvidia Jetson modules because no other small compute devices have onboard GPUs, which would let us tackle more AI and machine learning problems.”
Architecturally, the Jetson Nano is most similar to the Nintendo Switch, which runs on Nvidia’s Tegra X1 SoC, so we’ll use that as a point of comparison.
The Switch and the Nano have the same theoretical maximum memory bandwidth (25.6 GB/s). They’ve also got the same quad-core Cortex-A57 SoC, but the Nano’s CPU is clocked considerably higher (1.43GHz versus 1.02GHz for the Switch when docked). As far as the two platform’s relative GPU power, however, the situation is reversed. The Maxwell-based Tegra X1 SoC inside the Switch offers 256 shader cores compared to just 128 on the Jetson Nano.
While this implies the Nano would be half the speed of the Switch in the same workload, the gap might not be quite that large. The Switch reportedly tops out at 768MHz in docked mode while the Jetson Nano has a maximum clock of up to 921MHz. Altogether, the “baby” supercomputer combines 64 Cortex-A57 cores, 64GB of RAM, and 2,048 Maxwell cores across 16 boards.
Nano Lives Up To Its Name
Let’s tackle the elephant in the room first. The objective specs of the SMU 16-board supercomputer are scarcely inspiring, considering that single-socket desktop systems now offer as many as 64 cores. Jetson Nano is really living up to the ‘nano’ part of its name here. Not only are the stats pretty pedestrian on their own, the entire cluster literally fits on a desk.
But all kidding aside, comparing the specs of a system like this to conventional PC hardware misses the point. The challenges associated with scaling workloads effectively across a large network of slow devices, with a relatively small amount of memory per device, are conceptually similar whether one is discussing true supercomputers or smaller-scale embedded device systems like this one.
“We started this project to demonstrate the nuts and bolts of what goes into a computer cluster,” said Eric Godat, the team lead for research and data science in SMU’s IT organization. “The mini-cluster is an effective teaching tool for how all this stuff really works — it lets students experiment with stripping the wires, managing a parallel file system, reimaging cards, and deploying cluster software.”
Price vs. Performance
Any given AI workload would likely run better on the GTX 980 (2,048 cores on one chip) as opposed to 16 Jetson Nano GPUs across 16 boards, but the latter is a much better, if still simplistic, simulation of some of the scaling challenges full -scale supercomputing engineers deal with on the job.
Nvidia’s blog post references the idea of upgrading the current 16-board system with Jetson Orin Nano hardware. The performance boost from any such jump would be considerable. As we’ve previously detailed, Orin Nano offers six Cortex A-78AE CPU cores at 1.5GHz and 512 Ampere GPU cores with 16 tensor cores. Jetson Nano is a comparative shrimp with its 4x Cortex-A57 CPUs and 128 Maxwell cores. Orin Nano is more expensive than Jetson Nano, however, at $199 versus $129.
Orin Nano’s performance improvement ought to be much larger than the increase in price, but we hope Nvidia brings a still lower-cost Orin to the market in this space. A $129 Orin Nano with 256 Ampere cores and, say, eight tensor cores would still be a huge upgrade.
At the same time, Nvidia has little reason to cut prices. Right now, the Jetson Nano really only competes with itself. While there are some other ARM-based boards that are compatible with accelerators, the Jetson Nano’s GPU is the only product in its price class and of its type.
The students will be showing off their mini cluster at the SC22 supercomputing conference in Dallas. This year, SC22 runs Nov. 13-18.