Though we can imagine the rules of language as a formal grammar, an important caveat – reiterated in many discussions of formal linguistics – is that this should not be taken as any indication that people mentally execute rewrite rules while speaking. A grammar serves merely as a formal description of the language that we can produce and recognise.
Grammatical induction
When you or I speak, we obviously aren’t consulting a mental list of all English sentences. As mentioned in the previous blogpost, the finite human mind cannot store the infinitely many sentences of a natural language. Instead, knowing a human language means knowing a set of rules for producing and recognising its sentences. Learning a language therefore means inferring those rules from witnessed samples of language. This is grammatical induction.
The most straightforward and general hypothesis of how humans induce a grammar would be to say that the child has access to some algorithm that produces a language’s grammar from a sample of its sentences.
Under this hypothesis, a child might not necessarily have any prior knowledge about what kind of language they will learn. Nor would they even need a mental faculty uniquely dedicated to language acquisition. Since a grammar is simply a set of formal instructions for some procedure, we would imagine the grammar-inducing algorithm not to be used only for language, but for various other classification tasks. If this hypothesis is correct1 the problem of language acquisition would be reduced to the general problem of induction.
However, no algorithm can exist that faithfully induces a language’s grammar from a sample of its sentences, because no grammar can be unambiguously induced from a finite set of strings. This comes from a theorem that any finite set of strings can be generated by infinitely many different grammars, and can therefore belong to infinitely many different languages. Increasing the size of the finite sample set may reduce the number of possible languages, but can never bring it from infinite to finite.
It is therefore impossible to determine a unique, correct grammar from only a finite number of sentences. Obviously, humans witness only a finite amount of language in their lifetimes.
Language identification in the limit
In 1967, mathematician E.M. Gold proposed a solution to this dilemma in his paper Language Identification in the Limit. He described a procedure by which a learner can eventually come up with a correct grammar for some ‘target’ language after witnessing only finitely many strings.
The procedure occurs in discrete trials. In each, the learner is given a single string and is allowed to guess any grammar to describe the target language. The learner’s guess in one trial is independent of their previous guesses, so they can change their mind as many times as they like.
The procedure lasts forever, but if the learner ever gets to a point after which they always guess the same grammar, and if that grammar exactly describes the target language, we say that they have successfully identified the language ‘in the limit’.
The learner can never know whether they have succeeded, because they can never be sure that they won’t have to change their mind in the future.
Gold contrasted two situations:
In one, the learner is provided with sentences from the target language. This situation is called ‘text’ or ‘positive information presentation’.
In the other, the learner is provided with both sentences and non-sentences, and is told whether each string is a grammatical sentence or not, as if by a native informant. As such, this is called ‘informant’ or ‘complete information presentation’.
How far up the Chomsky hierarchy can we go before it becomes impossible to identify a language in the limit? And what sort of information does a learner need at each level?
Gold proved that with access just to sentences (text presentation), only finite cardinality languages are learnable. But with access to both sentences and non-sentences (informant presentation), all primitive recursive languages are learnable – including the natural languages.
Proving Gold’s theorems is straightforward – said Pinker – so let’s give it a shot.
The strategy for learning finite cardinality languages is trivial. The learner can simply guess that the language is the set of grammatical sentences that they have witnessed so far. When every sentence in the language has appeared (which is possible only in the case of finite languages), they will have identified the language.
However, if the target language is infinite (as are all languages outside the class of finite cardinality languages), then this strategy – guessing that the language is equal to the set of sentences seen so far – will fail, because the learner’s guess will have to change infinitely many times in order to accommodate the language’s infinitely many sentences. Language classes beyond the class of finite cardinality languages therefore require the learner to adopt a different strategy.
We’ve said that the primitive recursive languages can be enumerated using a ‘grammar-grammar’. Each of the enumerated grammars can be transformed into an algorithm for recognising sentences2, so the learner can test each grammar against the sample received so far. The learner can therefore test all of the primitive recursive languages, one by one – a maximally general strategy. Whenever a grammar disagrees with the evidence, the learner rejects it, and moves on to the next.
The correct grammar has a definite position in the enumeration, so we might guess that the learner will eventually reach it and will never again need to change their mind.
Because the class of primitive recursive languages is the highest class of languages whose grammars and decision procedures can be enumerated (which is necessary for this procedure to work), it is the highest learnable class under these conditions.
Learning from a text
When learning from a text, if the learner ever guesses an infinite language, then if the target language forms a finite subset of this incorrect guess3, they will never find themselves having to change their mind.
Infinitely many grammars are consistent with any finite sample of sentences, and without additional information, the learner has no way of knowing whether their guess is too general.
Learning from an informant
But when learning from an informant, any incorrect grammar will eventually be rejected when it either fails to generate a sentence in the target language or generates a non-sentence.
In this case, the learner is able to reject over-general grammars, because they can realise that a sentence that it is capable of generating is in fact a non-sentence in the target language.
Therefore, only under informant presentation can the learner identify any primitive recursive language in the limit.
However, because the enumeration procedure considers all possible grammars, identifying most languages will take extremely long, requiring many many many incorrect grammars to be rejected.
Changing the order of the enumeration won’t affect the procedure’s speed for all languages, since all enumerations are – on the whole – equivalent. For every language that appears earlier in some enumeration A than in some other enumeration B, there is a language that appears earlier in B than in A. There is no possible ordering in which all languages are guessed earlier than in any other ordering.
Pinker likened the situation to that in Jorge Luis Borges’s The Library of Babel, which features a vast library that houses books of all possible character combinations.
Although Gold’s enumeration procedure is so slow, it’s the fastest we can hope for, since we want to guarantee that the learner won’t prematurely exclude a correct grammar. Enumerating all possible grammars is our only option. No other procedure can guarantee success for all languages.
The reality
What we haven’t yet asked is: Do children learn from a text or from an informant? Do they have access to information about what sentences are ungrammatical?
According to Pinker, children generally aren’t corrected when they speak ungrammatically, and take little notice when they are.
Nor do they seem to have access in any other way to ‘negative’ information. At the time of Pinker’s writing, no consistent difference had been found between how parents react to their children’s grammatical and ungrammatical speech.
It therefore seems that children are learning from a text, in which situation Gold’s learner – as we have shown – is unable to identify a human language in the limit.
Gold’s attempt at a model to meet the Learnability Condition – unhindered even by the Developmental, Cognitive, and Time Conditions – cannot do so without violating the Input Condition by requiring that the learner receive negative information.
Children demonstrably do learn human languages, but how?, when the best general learner cannot.
There must be something about Gold’s model that obstructs the learning of human languages. It was therefore the aim of much subsequent research to determine the conditions under which learning a human language is possible.
This is what we’ll look at next.