In this post I shall define in more formal terms what I am trying to accomplish with a MANN.

### Formal Languages

In formal language theory, decision algorithms (such as a Turing machines) will take sequences of discrete tokens as inputs, and end up either *accepting* it or *rejecting* it. We call the discrete tokens *letters* (, , ), the set of possible tokens the *alphabet* (with the set of all possible sequences of letters from , including the empty one, ), and the sequences *words* (e.g. ), of length (). A set of words is called a *language* , and is a subset of all possible words, i.e. . We write for the complement of , i.e. all words from which are not in . Of note, the alphabet and the word lengths are always finite, but a language is typically infinite.

By accepting only some words from a specific alphabet, a Turing machine defines a language, which is exactly the set of words it accepts. We denote it to emphasize that it comes from the Turing machine. But a language can also remain abstract, meaning that we can define a set of words without having a machine to distinguish it from other words. We will start from such an abstract, *target* language and try to learn a machine that accepts it.

Following traditional supervised machine learning nomenclature, data would come as tuples , where is the input vector space and is the label space ( in classification). In our setting, we keep , but set . Given a target language , where is a boolean function that emphasizes that the words in the language must have some property, then data will come as the tuples . Put otherwise, examples are words, and labels are whether they belong to the target language~: .

### Data Generation

What is great about algorithmic tasks (for tasks for which we do know a real algorithm, at least!), is that we can generate synthetic data in great amounts. But we must be careful with the way we generate it. Suppose we have a language, such as . As a human programmer, it is rather trivial to come up with an algorithm that recognizes this language, but we would like to replace the human mind by a data-driven MANN. How should the training set look like?

Could we simply take a subset of ? Two issues come to my mind~: firstly, the word lengths come in an infinite variety, we cannot sample from in a uniform way! Secondly, we are going to be biased towards acceptance; the algorithm that simply returns will always be right.

To tackle the second issue, we could instead generate words from at random, then run our hand-crafted algorithm to know what the label should be. But is much, much smaller than , so we could fall nearly only on rejected inputs, which will bias the MANN towards rejection. We will have to come up with a way of balancing accepted and rejected inputs. Maybe something like~: with probability , emit a word from , and with probability emit a word from .

To tackle the first issue, the fact that we cannot sample uniformly from , I suggest the curriculum learning method. Many articles I read about MANNs mentioned that using this technique greatly helped the training. It consists of starting the training phase with easy instances, and gradually increasing the word length. For each word length, we will be able to draw words in any way we like. But we must be careful~: it is not true for every language that every word length has words both in and ! For instance in , all words with an odd number of letters are rejected.

Posing a good data-generating algorithm is an interesting problem. It should increase the word length gradually, but at the same time ensure balance between accepted and rejected words. It seems the data-generating algorithm must depend on the language~: while we could use a straightforward approach for , I think this would not work properly for languages such as .

### Generalization

To recap, we suppose we are trying to make a MANN learn to accept sequences of symbols in the alphabet from a target language and reject all others. A carefully chosen data-generating algorithm will emit tuples of the form , which will be fed to the MANN. The MANN’s parameters are gradually optimized so that it minimizes an empirical loss (Which loss function? I do not think this is a crucial information for computational complexity issues. I believe I saw negative log-likelihood in my readings. ). Given enough training examples, the MANN should now be able to generalize to unseen words.

I will denote two kinds of generalization~: **weak** generalization means the MANN is able to classify correctly unseen words *of the same length* as words it saw during training. On the other hand, **strong** generalization means it can classify correctly unseen words *longer* than anything seen during training. For instance, suppose our data-generating algorithm for stopped at words of length 10, then a MANN who can classify the unseen words as and as performs weak generalization, and if it can classify as and as , it generalizes strongly.

Weak generalization is not very interesting to me. It is already possible to achieve it with Vanilla neural networks. Indeed, if large enough, such networks can approximate any function of the type (you could argue that they in fact have inputs from , which is an infinite set, but in theory we can consider real -and natural- numbers as sequences of zeros and ones in order for them to fit in a classical computer). We could for instance have one neural network for every word length seen during training. In theory of computation, however, we are interested in functions of the type . This is a significant distinction. The star means that every word length must be considered, including astronomically large lengths. To learn such functions, vanilla neural networks would need infinitely many parameters, because they do not use recursive structures in their reasoning, they are only some long series of *ifs*. By using a memory and loops, MANNs can technically learn the second type of function.

When a human programmer writes an algorithm, it will test it on reasonably small instances and assume it is correct for arbitrarily large instances too, because it used its judgment to write an actually valid algorithm. But neural networks have no such judgment. There is a risk that even MANNs become stupid for sequences longer than seen during training. Regularization procedures should ensure that they learn something general, not depending on the size. Maybe ensuring the learned algorithm has a short description (i.e. small Kolmogorov complexity) would help. This is one of the major challenges of my research, so we will come back to this in the future.

### Computational Complexity

It is important to note that formal languages do not only consider bizarre sequences of *a*s and *b*s that no one cares about. With a very small alphabet (, or the binary numbers ) we can encode everything. So a language on such an alphabet could be for instance *the set of all Travelling salesman problem (TSP) instances whose shortest solution is below a threshold *. If we could learn to distinguish those words from other TSP instances which do not have a solution below , then we would be able to solve an NP-complete problem.

I just used the *decision* variant of the TSP, which is often describe as *find the shortest path going through every node in the graph once and only once*. I changed the formulation in order to make a boolean function able to solve the problem. Having a boolean function, we always implicitly have a language~: it is the set of inputs for which the boolean function returns true. In computational complexity theory, complexity classes are defined using those languages.

A language (or equivalently, a decision problem) belongs to a complexity class if there exists a Turing machine (or equivalently, an algorithm) accepting while respecting ‘s constraints. For instance, P contains all languages which can be accepted with a Turing machine running within a time that is at most polynomial with regard to the word length.

An other example~: LogSpace constraints that the Turing machine must use only a memory space that is logarithmic in the word length. Typically, it has one or two *counters*: balancing the parenthesis is a good example. (()((()()))) should be accepted, but not ((()). For the first, we can use bits to count from 0 to 12. I insist on LogSpace as it will have a special property in Neural Random Access Machines (see my next article, if everything goes as planned). Those machines learn pointers to access a memory, but LogSpace problems should be solvable *within* the pointers.

If I first can build a framework in which MANNs can learn to accept a language (instead of outputting a non-boolean value, as it is presently the case), then all I will need to do is add the constraint of a specific complexity class on the MANN, so that it will be forced to find a solution within that class.

Therefore the MANN type I will rely upon will need to be configurable: I should be able to change the memory size, the maximal number of time steps, etc. I will have won if I manage to restrain a MANN like this while still achieving strong generalization.

## 12 Commentaires

Passer au formulaire de commentaire

Hello,

It is with sad regret to inform you that DataList.biz is shutting down. We have made all our databases available for you at a one-time fee.

You can visit us on DataList.biz

Regards.

Darnell

It is with sad regret to inform you that companyleads.org is shutting down.

We have made available over 300 million records for $149.

Regards,

companyleads.org

May I simply say what a relief to uncover somebody that really understands what theyre discussing over the internet. You certainly know how to bring a problem to light and make it important. More and more people really need to check this out and understand this side of the story. I was surprised that youre not more popular because you certainly have the gift.

Качественные WordPress ссылки в комментариях от 5000 уник. доменов заказать здесь .

Quality WordPress links in the comments from 5000 uniques. domains order here.

Use artificial intelligence to cut turnaround time, extend your budget, and create more high-quality content that Google and readers will love. How It Works? WordAi is extremely fast and intuitive. Just enter your content, click rewrite, and in a matter of seconds, WordAi will rewrite an entire piece of content for you. WordAi comes up with different ways to express the same ideas by rewriting every sentence from scratch. This not only removes duplicate content, it also makes the rewritten content read completely naturally. Scale Your Business Make your entire content production process 10x more efficient. Use AI to create more high-quality, unique content that impresses clients and engages readers, all without having to put in more hours Register here and get a bonus.

Hello.

We are offering Bullet Proof SMTP servers that never get suspended. Email as much as you want.

DMCA ignored, bulletproof locations, 100% uptime guaranteed, unlimited data transfer, and 24/7/365 support.

100 Spots available. BulletProofSMTP.org

ZippyLeads.org is running an easter special till the 18th of April.

Get all the leads you need for your company with our easter special.

Hello.

It is with sad regret to inform you TopDataList.com is shutting down.

We have made all our databases available for you for a once off fee.

Visit us on TopDataList.com

You have brought up a very wonderful details , regards for the post.

Hello, from CustomData.me we are a provider of unique databases that could help your business.

Please visit us at CustomData.me to see if we can help you.

Regards,

Aida

certainly like your web site however you need to take a look at the spelling on quite a few of your posts. Several of them are rife with spelling issues and I in finding it very troublesome to inform the truth on the other hand I’ll definitely come back again.