Network of optical neurons at 50zJ per operation? No, but it's still a good idea

[ad_1]

Artificial intelligence (AI) has experienced a fairly significant revival over the last decade. Usually, AI has become useless, leaving it to ruin our lives in an obscure and opaque manner. We even confided to artificial intelligence the task of destroying our cars for us.

The experts in artificial intelligence will tell us that we simply need bigger neural networks and that cars will probably stop breaking down. You can do this by adding more graphics cards to an AI, but the power consumption becomes excessive. The ideal solution would be a neural network capable of processing and transferring data at near zero energy cost, which may be the case for optical neural networks.

To give you an idea of the magnitude of the energy we are talking about here, a good GPU uses 20 picoJoules (1pJ is 10^-12J) for each multiplication and accumulation operation. An integrated circuit specially designed for this purpose can reduce this to about 1pJ. But if a team of researchers is right, a network of optical neurons could reduce this number to an incredible 50 zeptoJoules (1zJ is 10^-21J).

How did the researchers come to this conclusion?

In layers, like onions

Let's start with an example of how a neural network works. A set of inputs is distributed over a set of neurons. Each entry into a neuron is weighted and added, then the output of each neuron receives a boost. The strongest signals are amplified more than the weak ones, which increases the differences. This combination of multiplication, addition and enhancement occurs in a single neuron and the neurons are layered, the output of one layer becoming the input of the next. As the signals propagate through the layers, this structure will magnify some and suppress others.

For this system to be able to perform useful calculations, we must predefine the weighting of all inputs of all layers, as well as the parameters of the Boost function (more precisely, the nonlinear function). These weights are generally defined by giving the neural network a set of learning data to work on. During training, the weighting and function parameters change to good values as a result of repeated failures and occasional successes.

There are two fundamental consequences here. First, a neural network requires a lot of neurons to deal with complex problems. Second, a neural network must be able to adjust its parameters as it accumulates new data. This is where our theoretical optical neural network flexes its nascent muscles.

All optical AIs

In the optical neural network, the inputs are light pulses that are divided. The weights are defined by changing the brightness. If these are defined in physical material, they can often not be modified, which is undesirable. In the schema of the researcher, however, the weights come from a second set of optical pulses, which makes it much more flexible.

At a single neuron, all optical pulses arrive together and are summed by the interference process. The interfering pulses hit a photodetector to perform the multiplication. Then, the electrical output of the photodetector can have the boost function that we like applied electronically. The final value that this produces is then emitted as light to be sent to the next neural network layer.

The interesting thing about this is that the weight is an optical pulse that can be continuously adjusted, which should lead to an optical neural network with the flexibility of a computer-based neural network, but working much more quickly.

Remarkably, the researchers propose to do all this in free space rather than on optical integrated circuits. Their argument is that combinations of diffractive optical elements – elements that manipulate optical beams in complex ways – and photodiode arrays are much more precise and scalable than photonic circuits that we can print on chips.

They may be right. Manufacturing reliability is the scourge of photonic circuits for the moment. I can not imagine creating a large-scale photonic circuit using current techniques successfully. I am therefore ready to accept this argument and I even agree that extending to millions of multi-layered neurons is feasible. Everything looks good.

You have to count all the energy

But I do not buy the energy argument at all. The researchers calculated the amount of energy required by the optical pulses to ensure the accuracy of the output of the detection stage from a single neuron. This gives an impressive sound of 50 zJ per operation.

It may be right, but it ignores many important things. How much energy for the boost function? How much energy does it take to transform electrons into light? The researchers have linked numbers to some of these results, but their calculations basically tell us that energy per person is not easy to calculate because the required electronic components do not exist.

Instead, the big gain is in saving energy for data transport. A large neural network can be distributed across multiple GPUs. This generates two important energy costs: the transfer of data on the GPU itself and the transfer of data between the GPUs. The optical architecture virtually eliminates this cost while accommodating a larger number of doors.

Even in terms of size, the optics will not be worse than a box full of GPUs. I think you could install a good sized optical neural network on a medium sized desktop. However, it may be important to keep the lights off during operation.

So where is all this going? I think researchers will demonstrate this network of neurons over the next two years. This will consume much more energy per operation than by the photodiode. Once you take into account the support structure, the total energy cost will exceed 1 pJ per operation. In the end, I think they will demonstrate a high degree of scalability but no significant energy savings.

Physical Review X, 2019, DOI: 10.1103 / PhysRevX.9.021032 (About DOIs)

[ad_2]

Source link