Meet Cerebras WSE, the world's biggest chip at over 56 times the size of an NVIDIA V100



[ad_1]

While our friendly chip giants are discussing the increase in double-digit performance, a start-up called Cerebras Systems has taken the lead and presented a prototype offering an absolutely incredible increase in the number of transistors by 5600%. compared to the best chip available at the moment: the NVIDIA V100. From 21.1 billion to 2.1 billion transistors, the startup has been able to solve key technical problems that no one else has been able to do and become the world's first processor the size of a slice.

WSE (Scale Engine) (WSE) from Cerebras Systems: the world's first chip with billions of transistors

The Cerebras Wafer Scale Engine is the first processor in the world designed for wafers. You may be wondering why no one else has done something so obvious. The reason is that the main technical challenge of cross-line communication has never been overcome. You see, the current lithographic equipment is designed to burn tiny processors on a slice; they can not create a complete processor on a wafer. This means that the scribe lines will exist one way or the other and that the individual blocks must be able to communicate one way or another across these lines and that is what that Cerebras has resolved to claim the throne of the first account processor of billions of transistors.

Cerebras WSE covers an area of ​​46,225 mm² and houses 1,200 billion transistors. All cores are optimized for AI workloads and the chip consumes 15 KW of power. Since all this power must also be cooled, this cooling system should be as revolutionary as its power system. From their comments on vertical cooling, I think an immersive cooling system with fast moving freon would probably be the only thing that can tame this creature. The feeding system should also be incredibly robust. According to Cerebras, the chip is about 1000 times faster than traditional systems, simply because communication can take place between scribe lines instead of jumping through hoops (interconnect, DIMM, etc.).

The WSE contains 400,000 nuclei of linear hollow algebra (SLA). Each heart is flexible, programmable and optimized for the calculations that underlie most neural networks. Programmability ensures that cores can execute all algorithms in the ever-changing field of machine learning. The 400,000 cores of the WSE are connected via the Swarm communication structure in a 2D mesh with a bandwidth of 100 Pb / s. Swarm is a massive on-chip communication structure that offers exceptional bandwidth and low latency for a fraction of the power consumed by traditional techniques used to group graphics processing units. It is fully configurable. The software configures all the cores of the WSE to support the precise communication required to form the model specified by the user. Swarm provides a unique and optimized communication path for each neural network.

The WSE has 18GB of on-chip memory, all accessible in a single clock cycle, and provides a memory bandwidth of 9 PB / s. This represents a capacity 3000 times higher and a bandwidth 10,000 times greater than that of the main competitor. More cores, more local memory allow a fast and flexible calculation, with less latency and with less energy.

This would allow a considerable acceleration of artificial intelligence applications and reduce training times from a few months to just a few hours. It's truly revolutionary, no doubt, provided they can keep their promise and start doing it soon to their customers. The Cerebras WSE is manufactured on a 300 mm TSMC wafer using their 16 nm process, which means that it is a state of the art technology and a single knot behind giants like NVIDIA. Of course, with 84 interconnected blocks housing more than 400,000 cores, the process on which they are made simply does not matter.

The performance and ranking of Cerebras WSE will be very interesting. On the other hand, if you use the entire slice as a matrix, you will get a 100% yield if the design can absorb defects or 0% if it can not. Clearly, since the prototypes have been made, the design is able to absorb the defects. In fact, the CEO stated that the design provided for about 1% to 1.5% functional surface defects and that the microarchitecture simply reconfigured available cores. In addition, redundant cores are placed in the chip to minimize performance losses. There is no information on binning at the moment, but it goes without saying that it is the most affordable design in the world.

We were also told that the company had to design its own science of manufacturing and packaging, since no tools are currently designed to handle a processor the size of which it is. ;a slice. In addition, the software had to be rewritten to handle more than 1 trillion transistors in a single processor. Cerebras Systems is clearly a company that has incredible potential. Seeing the splashes they caused in Hot Chips, we can not wait to see the results of testing these engines in slice scales.



Submit

[ad_2]

Source link