My last article presented an in depth analysis of Neural Turing Machines (NTM), which is generally thought of as the foundation of the Memory-Augmented Neural Networks research field. Other ideas have sprung since, which may ultimately be better suited for an eventual computational complexity analysis. In the following article, I will give a (somewhat short and shallow I’m afraid) analysis of several Memory-Augmented Neural Network architectures. More precisely, I will give a short description of each, along with pros and cons regarding my objectives:
- Formalism. Be able to treat the inputs as formal languages, in a classification setting. Put otherwise, I want inputs to be words which do or do not belong in the set (language) to learn, and labels to be whether they belong to the set or not. The goal is to learn a machine that accepts words from the language and rejects all others. (I would like to formalize that in a future post, along with the implications on computational complexity theory)
- Configurability. In other words, have an arbitrary memory size and number of steps. I want to be able to change those parameters at will to control computational complexity.
- Interpretability. The more we can understand what is going on in the network, the better we can analyze its complexity.
- Trainability. It should be possible for me to obtain good algorithms, at least on small instances. Also, it should be possible to learn with supervised learning.
Disclaimer: what I say here may not be always true, it is the result of very fast readings of the papers.
Architectures based on attention
Differentiable Computers [paper]
- Description. The differentiable computer is very close to the NTM (and by more or less the same authors!). In fact, we could say it is essentially an NTM, with two add-ons: a temporal link matrix giving the order in which the memory cells were written to, and a measure of how much memory cells have been used yet.
- Formalism. It has been tried especially on real world problems, such as Question&Answer, but it has also been tried successfully on graph traversal, which is a very algorithmic task. I think this kind of challenge could easily be turned into a decision problem.
- Configurability. Memory can be arbitrarily large. The usage measure is very appealing for my needs of measuring complexity.
- Interpretability. The authors are able to plot what happens in the memory, which is awesome for interpretability I am less sure about the differentiable computer because their examples are more on real-world applications, but with the NTM the authors were able give a pseudocode of a learned algorithm.
- Trainability. To obtain an 98.8% accuracy on graph traversal, about 1 million training examples were needed. It is a lot.
(End-to-end) Memory Networks [paper 1/ paper 2]
- Description. In Memory Networks, four functions are learned: one that maps the input to an embedding, one that specifies how to write a memory cell, one that converts the embedding and the memory into an output embedding, and one that transforms this embedding into an output. The difference between Memory Networks and End-to-end Memory Networks is that the latter needs less supervision to be trained.
- Formalism. In these networks, the memory is only written to in order to help the next inference. It does not perform algorithms per se. In fact it was not designed for that reason.
- Configurability. The memory is configurable, but time steps are bound to 1.
- Interpretability. It cannot be turned into an algorithm.
Pointer Networks [paper]
- Description. This one focuses on combinatorial problems, such as the Traveling Salesman. It does not have a generic memory but rather can output an arbitrary number of pointers to some discrete values in the input.
- Formalism. It does not correspond very much to what I am looking for.
- Configurability. No memory!
- Interpretability. It does not seem much more interpretable than vanilla neural networks.
Architectures based on Data structure
Stack-augmented Recurrent Neural Networks [paper]
- Description. In here we are interested in learning simple algorithmic patterns. It is simply a recurrent neural networks which has access to a stack memory. It can incorporate, or not, the last token on its stack in the computation of an output. The network learns when it should pop, push, or do nothing.
- Formalism. The pattern describes what to learn in the form of formal languages! Strangely it uses only a stack and is therefore not Turing complete.
- Configurability. There can be numerous stacks (and become Turing-complete in principle). But like an RNN, it will only have one timestep per token in the input sequence.
- Interpretability. The way it works is easily understandable. It could probably be turned into a pushdown automaton. They have great figures about the inner workings of the algorithm.
- Trainability. It is hard to train with SGD, but other techniques are proposed.
Hierarchical Attentive Memory Networks [paper]
- Description. Here the focus is on the memory structure, which is not just a matrix like in the other papers, but a whole binary tree. Just like in Stack-augmented Neural Networks, it is simply an RNN (or LSTM) which incorporates its memory content to its output. But here, the network learns all the operation needed to search in a binary tree. The gain is that operations to the memory are in instead of .
- Formalism. It should be able to recognize formal languages just like Stack-augmented Neural Networks, with the advantage of being Turing complete.
- Interpretability. I think the gain in computational complexity will decrease interpretability. The neural network is not just a controller, there tiny neural networks here in there to access the memory.
Architectures based on Pointers
Neural Programmer-Interpreter [paper]
- Description. This one is rather complicated. It consists of an LSTM which acts as a router between shorter, fixed programs. Introduces the concept of weak generalisation: LSTM can generalize to unseen data but only of the same length as training examples. The NPI (and other MANNs) can achieve strong generalization by outputting the right answer even for longer inputs.
- Formalism. Some base execution traces must be given as inputs, which disqualifies it.
- Configurability. This is too far from what I want to do.
- Interpretability. It is likely to be easy to interpret, as the learned algorithm will be a sequence of short programs.
- Trainability. Needs less data, because labels contain rich information. Also, curriculum learning is used, and seems a must for MANNs. It consists of learning with gradually increasing input lengths.
Neural Random-Access Machines [paper]
Since Neural Random-Access Machines appears to be the most interesting, I will dedicate a whole article on them.
Качественные WordPress ссылки в комментариях от 5000 уник. доменов заказать здесь .
Quality WordPress links in the comments from 5000 uniques. domains order here.
Use artificial intelligence to cut turnaround time, extend your budget, and create more high-quality content that Google and readers will love. How It Works? WordAi is extremely fast and intuitive. Just enter your content, click rewrite, and in a matter of seconds, WordAi will rewrite an entire piece of content for you. WordAi comes up with different ways to express the same ideas by rewriting every sentence from scratch. This not only removes duplicate content, it also makes the rewritten content read completely naturally. Scale Your Business Make your entire content production process 10x more efficient. Use AI to create more high-quality, unique content that impresses clients and engages readers, all without having to put in more hours Register here and get a bonus.
I was pretty pleased to discover this great site. I need to to thank you for your time for this particularly fantastic read!! I definitely appreciated every part of it and I have you book marked to see new stuff on your blog
What¦s Taking place i am new to this, I stumbled upon this I have discovered It positively useful and it has aided me out loads. I hope to contribute & assist different customers like its helped me. Great job.