Deep learning is such a powerful tool precisely because neural nets form the technology’s core machinery.
If we’re only talking about a traditional neural network then it is made up of many stacked layers of (technically speaking) simple computational nodes.
The job of these computational nodes is to work with each other and find any patterns in the given data.
These are discrete layers essentially.
Moreover, these layers at the exact things what keep neural nets from effectively and successfully modeling processes which are continuous.
Now, a new group of researchers has come up with a new design for neural nets that scraps all the layers entirely.
Duvenaud, an AI researcher trying to overcome some challenges in the field of AI, recently noted that his team actually did not develop the new idea.
He says that his team was probably the first one to implement the new idea in a way that was generalizable.
In order to understand how such an implementation is possible, let’s briefly take a walk through what the stacked layers in neural nets actually do.
Currently, the most common technique used to take advantage of what neural networks have to offer is supervised learning.
And the most widely used process that researchers and engineers use to train a given neural network actually involves them feeding the neural networks a good bunch of data that is labeled.
So let’s assume for a second that someone had this desire to build a system which had the ability to recognize various different animals.
The way to go about it would be to feed the given neural network some animal pictures and then pair the pictures with corresponding, and correct, animal name.
Now, if one would look under the hood, one would find that the neural network would actually begin to solve a pretty much crazy puzzle.
A mathematical puzzle.
What we mean by that is, the neural network under consideration would start looking at each and every pair (the picture and name pair) that is fed to it and then tries to figure out a practical formula for the purposes of reliably turning a given image into its corresponding, and correct, category.
And once the neural network has managed to crack that mathematical puzzle, it has the ability to reuse that same formula time and time again in order to correctly identify and categorize any new given animal image.
That is, the majority of the time.
However, if someone wanted the neural network to find a single formula that one could use to describe the complete picture-to-category/name transformation then that would be a bit too overly broad.
More importantly, such a broad formula would result in a very low-accuracy model.
As an analogy, if someone expected a neural network to come up with a single formula to categorize all types of animals correctly then that would be similar to using a single rule in order to differentiate dogs and cats.
One could, with reasonable accuracy, say that dogs tend to have floppy ears.
Of course, a lot of dogs simply do not have floppy ears.
In fact, some cats have floppy ears.
Hence, if a neural network used the same rule to differentiate between dogs and cats, it would lead to lots of false positives and negatives.
As it turns out, this is exactly where the stacked layers of neural networks come in.
These layers have the ability to break up the actual process of transformation into tiny little steps and allow the network to go ahead and find a series of different formulas that can (each one of them) describe a given stage of the given process.
In simpler terms, the first neural network layer might take the responsibility of handling all the available pixels and then use a different formula in order to pick the ones which had the most relevant for the task of differentiating between dogs and cats.
Then another layer might take up the job of using another different formula to actually construct much larger patterns with the help of groups of pixels and then try to figure out whether a given image happens to have ears or whiskers.
In this way, each subsequent neural network layer would take the responsibility of identifying increasingly more complex features related to the given animal.
It would continue to do that until it is the turn of the final neural network layers to decide whether an image represents a dog or a cat on the basis of all the previously accumulated mathematical calculations.
In the end, it is a pretty sophisticated model which, most of the time, leads to predictions which are more accurate.
As some of our readers can already imagine, the stacked up layer approach to solving problems has served the field of artificial intelligence very well.
However, it has one major drawback.
If someone wanted to come up with a model for anything else which transformed on a continuous basis over time, one would have no other choice but to chunk that ‘thing’ up into several discrete steps.
What does that translate into in the practical sense?
Practically, if we take the example of health records, it would mean that one would have to group the available medical health records into months or years or any finite periods.
We hope we don’t have to explain too much how such an approach would provide inexact predictions.
To take an example, if someone paid a visit to the doctor on February 10 and then went to the doctor again on October 12 then the collected data from the two visits would actually get grouped with each other under the current same year.
Taking that further, perhaps the best way forward would be to model the real world as accurately as possible by adding a bunch of more layers in order to increase the actual granularity.
Some might say what is stopping anyone from breaking the patient’s health records into hours or days?
Of course, then this would lead to a patient going to the doctor a total of two times within 24 hours, which is unrealistic.
If one takes to a further extreme, then that would indicate the best and most accurate neural network for such kinds of jobs would have, pretty much, an infinite number of stacked up layers with the help of which the neural network would model infinitesimal step-changes.
The obvious question that arises from all of this is that whether such a neural network is practical or not.
We are aware of the fact that some readers might have started to say “this sounds familiar”.
And it is.
Because in such an approach, one essentially arrives at exactly the same type of problem for which mathematicians and others invented calculus in order to solve it.
What we mean to say is that calculus offers these nice little equations which allow anyone with enough mathematical skills to actually calculate a series of changes that take place across an infinitesimal number of steps.
In even simpler terms, calculus saves us all from the actual nightmare that would be if one wanted to use discrete units to model continuous change.
And this is where we have to mention the magic that Devenaud along with his collaborators have managed to bring forward via their new paper.
In essence, Devenaud’s paper replaces the stacked up layer with neat calculus equations.
As some might have already figured out, if one did eliminate layers and replaced them with calculus equations, then the result would mean no network.
In such a scenario, there are no nodes.
And there are no connections.
There is only one, single and continuous slab of computation.
Putting all of that aside for a brief moment and sticking with the established conventions, AI researchers working with Duvenaud termed the new designed as Ordinary differential equations net or ODE net.
We’re not going to argue with those who say that this group of researchers needs to work on its branding a bit more.
Now, we know that this is where a lot of readers might have a headache or two.
So we’ll leave now.
But if you want to clear up your head, do read more on Devenaud and his methods to tie everything together with easy-to-understand analogies.