Formal language models

In the second blogpost of this series, we’ll follow Pinker in his discussion of formal language models. This post will give a brief introduction to formal (a.k.a. mathematical) linguistics, which enables discussions about language learnability.

Languages as formal objects

If we want a formal model of language acquisition, we of course require a formal definition of language.

We start off with a ‘vocabulary’ – a finite set of symbols. Sometimes this vocabulary might be called an ‘alphabet’, but whether we liken its symbols to words or to letters is unimportant. These symbols can be strung together in what we call – oddly enough – ‘strings’. Strings are explicitly finite in length, but infinite in number. A language can be defined as a (potentially infinite) set of strings. Strings within the language are ‘sentences’ of that language; all other strings are ‘non-sentences’.

For instance, a (very simple) language might be the finite set of strings $\{ a,\ b,\ ac,\ bc,\ abc \}$ , from the vocabulary $\{ a,\ b,\ c \}$ .

We have just said that a language can be either finite or infinite. Human languages – as it happens – are infinite; there are infinitely many possible English, Turkish, and French sentences. Just think about that for a moment. Isn’t it exciting that there are an infinity of sentences out there that have never been said? You or I could be the first to say some of them! And an infinite subset of those sentences that have never been said will not be said even once in the lifetime of the universe!

While a finite language we can completely describe just by listing all of its sentences, we cannot create a list of all the sentences of an infinite language without infinite storage or memory. You and I have only finite memory, so how can we know an infinite language?

These infinite languages must each have some finite description, such as a grammar – a set of rules that can generate all of a language’s sentences (and no non-sentences).

A grammar has four parts

In mathematical linguistics, the term ‘grammar’ doesn’t mean the same thing as it does when we discuss language more generally.

A formal grammar has four parts:

A terminal vocabulary, which we previously referred to simply as the ‘vocabulary’ of the language, and whose symbols appear in its sentences.
An auxiliary vocabulary, another finite set of symbols. Unlike the terminal symbols, auxiliary symbols themselves do not appear in sentences per se. Instead, they get ‘rewritten’ into terminal symbols and other auxiliary symbols.
A start symbol (S), which initiates the generation of every sentence¹.
A finite set of rewrite rules, which each replace one string of symbols with another string.

A grammar can be used to generate sentences through the iterative execution of rewrite rules, always beginning by rewriting the start symbol S. Each subsequent rewrite operation replaces one or more symbols in the resulting string. Sentence generation terminates when there are no auxiliary symbols left. The language described by a grammar is the set of all strings that can be generated thus.

Note that it is actually possible to describe a single language using many different grammars. For instance, it is possible to describe our very simple language of $\{ a,\ b,\ ac,\ bc,\ abc \}$ using a grammar consisting of:

The terminal vocabulary $\{ a,\ b,\ c \}$
The auxiliary vocabulary $\{ x,\ y,\ z \}$
The following rewrite rules:

$S \rightarrow x \\ S \rightarrow y c \\ x \rightarrow a \\ x \rightarrow b \\ y \rightarrow a \\ y \rightarrow b \\ y \rightarrow z b \\ z \rightarrow a$

Alternatively, we could describe the same language using a grammar with the same terminal vocabulary, but an empty auxiliary vocabulary, and the rewrite rules:

$S \rightarrow a \\ S \rightarrow b \\ S \rightarrow a c \\ S \rightarrow b c \\ S \rightarrow a b c$

The Chomsky Hierarchy

We can classify languages according to how their sentences are produced or recognised. By grouping languages into classes, we can prove theorems about those classes.

Language classes fall into a hierarchy of sets – the Chomsky hierarchy. Each class in the hierarchy is a subset of the class above it. At the top of the hierarchy is the most encompassing class, that of the recursively enumerable languages² – languages with a grammar or some other finite description.

The recursively enumerable languages all have a means of generating every one of their sentences, but they do not all have a means of determining whether or not any given string is a sentence or a non-sentence. We can therefore divide the recursively enumerable languages into those with decision procedures – called ‘decidable’ or ‘recursive’ – and those without.

Pinker tells us that there is no general way of determining the decidability of any given recursively enumerable language, a problem known in computability theory as the ‘halting problem’. However, using a grammar-grammar, it is in theory possible to enumerate (list out) a subset of the decidable languages called the ‘primitive recursive languages’.

It isn’t possible to enumerate the set of all decidable languages, an observation intrinsically tied to why it is not always possible to say whether a given language is decidable.

We can further divide the primitive recursive languages by placing certain constraints on their grammars.

The rules of a context-sensitive grammar may replace a single auxiliary symbol when it is found in a specific context (i.e. next to certain other symbols).

The rules of a context-free grammar may each replace only a single auxiliary symbol. This constraint removes context-sensitivity, though the class of context-free grammars is a proper subset of the class of context-sensitive grammars because context-sensitive grammars are permitted to not contain contextual rewrite rules.

The rules of a finite state grammar can replace a single auxiliary symbol with at most one auxiliary symbol (often called a state) and a terminal symbol.

A grammar with no auxiliary symbols is called a finite cardinality grammar, and can only generate a finite language.

What we have just described – after Pinker’s example – is similar, but not identical, to what is most often meant nowadays by ‘the Chomsky hierarchy’, which considers certain language classes not yet mentioned.

Remember our earlier two grammars that both described the same language? If you look back at them, you can see that one was finite cardinality and the other was finite state. So, did they generate a finite cardinality language or a finite state language? Well, both. Remember, the hierarchy is nested. Any finite cardinality language is also a finite state language. Beware, however, that this doesn’t mean that all finite cardinality grammars are finite state. Rather, any finite cardinality grammar can be equivalently expressed as a finite state grammar.

In the following table, each language class is a strict superset of all those below it.

Recursively enumerable

Decidable / Recursive

Primitive Recursive

Context-sensitive

Context-free

Finite state

Finite cardinality

Natural languages

Obviously, having gone to the effort of describing it, we would like to know where in this hierarchy natural languages fall.

We previously said that natural languages are not finite cardinality; there are, for example, infinitely many English sentences.

Nor are natural languages finite state: in his 1957 book Syntactic Structures, Chomsky showed that finite state grammars are unable to describe natural languages, because they cannot generate sentences with a variable number of embeddings, which natural languages indeed permit.

For instance, English phrases can be embedded in other English phrases any number of times:

Peter likes anchovies.

I heard that Peter likes anchovies.

When I heard that Peter likes anchovies, …

It is possible, though not simple, to show that natural languages are not context-free, as was done by Maurice Gross in Mathematical Models in Linguistics (1972) and by Paul Postal in Boas and the Development of Phonology: Comments Based on Iroquoian (1964).

Of course, context-free languages are worth studying in their own right – they just don’t share many of the properties of natural language.

Pinker reports that it isn’t clear how much higher natural languages are in the hierarchy, but that in their 1973 paper On the Generative Power of Transformational Grammars Stanley Peters and Robert Ritchie make a strong case that context-sensitive grammars are sufficiently powerful to generate natural languages, as earlier conjectured by Chomsky in his 1965 book Aspects of the Theory of Syntax.

Most formal linguists, including Chomsky, describe natural languages using ‘transformational grammars’, which use context-free rewrite rules to generate bracketed ‘deep structures’. Sentences are produced through the subsequent modification of these deep structures by ‘transformations’ that permute, delete, or copy parts of them. These transformations can confer context-sensitivity to the transformational grammar. However, transformational grammars do not occupy a defined position in the Chomsky hierarchy.

So, it seems that the best general statement we can make – as did Pinker – is that human languages are context-sensitive.

Having introduced the basics of formal linguistics, the next post will follow Pinker’s search for a model of language learnability, looking at the seminal work of E.M. Gold.

Because the start symbol is contained in every grammar, it is common to ignore it when describing one.
Explaining the precise meaning of ‘recursively enumerable’ is a bit beyond me at the moment. Notwithstanding, I intend to address it in a later blogpost.

Languages as formal objects

A grammar has four parts

The Chomsky Hierarchy

Natural languages

Share this:

Published by James