Writing System and Orthography
Writing System and Orthography
Introduction
An orthography is a set of conventions for writing a language, including norms of spelling, punctuation,
word boundaries, capitalization, hyphenation and emphasis.
The history of writing traces the development of writing systems. The use of writing prefigures various
social and psychological consequences associated with literacy and literary culture. Each historical
invention of writing emerged from systems of proto-writing that used ideographic and mnemonic symbols
but were not capable of fully recording spoken language.
A writing system comprises a set of symbols, called a script, as well as the rules by which the script
represents a particular language. The earliest writing was invented during the late 4th millennium BC.
Throughout history, each writing system invented without prior knowledge of writing gradually evolved
from a system of proto-writing that included a small number of ideographs, which were not fully capable
of encoding spoken language, and lacked the ability to express a broad range of ideas.
Writing systems are generally classified according to how its symbols, called graphemes, generally relate
to units of language. Phonetic writing systems, which include alphabets and syllabaries, use graphemes
that correspond to sounds in the corresponding spoken language. Alphabets use graphemes
called letters that generally correspond to spoken phonemes, and are typically classified into three
categories. In general, pure alphabets use letters to represent both consonant and vowel sounds,
while abjads only have letters representing consonants, and abugidas use characters corresponding to
consonant–vowel pairs. Syllabaries use graphemes called syllabograms that represent
entire syllables or moras. By contrast, logographic (alternatively morphographic) writing systems use
graphemes that represent the units of meaning in a language, such as its words or morphemes. Alphabets
typically use fewer than 100 distinct symbols, while syllabaries and logographies may use hundreds or
thousands respectively.
A writing system also includes any punctuation used to aid readers and encode additional meaning,
including that which would be communicated in speech via qualities
of rhythm, tone, pitch, accent, inflection, or intonation.
Writing systems typically satisfy three criteria. Firstly, the writing must have some purpose or meaning to
it, and a point must be communicated by the text. Secondly, writing systems make use of specific
symbols which may be recorded on some writing medium. Thirdly, the symbols used in writing generally
correspond to elements of spoken language. In general, systems of symbolic communication like signage,
painting, maps, and mathematics are distinguished from writing systems, which require knowledge of an
associated spoken language to read a text. The norms of writing generally evolve more slowly than those
of speech; as a result, linguistic features are frequently preserved in the written form of a language after
they cease to appear in the corresponding spoken language.
Proto Writing
During the Early Bronze Age (3300–2100 BCE), the first writing systems evolved from systems of proto-
writing, which used ideographic and mnemonic symbols to communicate information, but did not record
human language directly. Proto-writing is attested as early as the 7th millennium BCE, with well-known
examples including:
The Jiahu symbols carved into tortoise shells, found in 24 Neolithic graves excavated at Jiahu in
northern China and dated to the 7th millennium BCE. The majority of the signs uncovered were
inscribed individually or in small groups on different shells. Most archaeologists consider the Jiahu
symbols as not directly linked to the emergence of true writing.
The Vinca symbols found on artifacts of the Vinca culture of central and southeastern Europe, dating
to the 6th–5th millennia BCE.
The Indus script attested in short inscriptions between 2600 and 2000 BCE.
Other examples of proto-writing include quipu, a system of knotted cords used as mnemonic devices
within the Inca Empire (15th century CE)
Orthography in phonetic writing systems is often concerned with matters of spelling, i.e. the
correspondence between written graphemes and the phonemes found in speech. Other elements that may
be considered part of orthography include hyphenation, capitalization, word boundaries, emphasis and
punctuation. Thus, orthography describes or defines the symbols used in writing, and the conventions that
regulate their use.
Most natural languages developed as oral languages and writing system have usually been crafted or
adapted as ways of representing the spoken language. The rules for doing this tend to
become standardized for a given language, leading to the development of an orthography that is generally
considered "correct". In linguistics, orthography often refers to any method of writing a language without
judgment as to right and wrong, with a scientific understanding that orthographic standardization exists
on a spectrum of strength of convention. The original sense of the word, though, implies a dichotomy of
correct and incorrect, and the word is still most often used to refer specifically to a
standardized prescriptive manner of writing. A distinction is made between emic and etic viewpoints,
with the emic approach taking account of perceptions of correctness among language users, and the etic
approach being purely descriptive, considering only the empirical qualities of any system as used.
Orthographic units, such as letters of an alphabet, are conceptualized as graphemes. These are a type
of abstraction, analogous to the phonemes of spoken languages; different physical forms of written
symbols are considered to represent the same grapheme if the differences between them are not
significant for meaning. Thus, a grapheme can be regarded as an abstraction of a collection of glyphs that
are all functionally equivalent. For example, in written English (or other languages using the Latin
letter a ⟨a⟩ and ⟨ɑ⟩. Since the substitution of either of them for the other cannot change the meaning of a
alphabet), there are two different physical representations (glyphs) of the lowercase Latin
word, they are considered to be allographs of the same grapheme, which can be written |a|.
The italic and boldface forms are also allographic.
Graphemes or sequences of them are sometimes placed between angle brackets, as in |b| or |back|. This
distinguishes them from phonemic transcription, which is placed between slashes (/b/, /bæk/), and
from phonetic transcription, which is placed between square brackets ([b], [bæk]).
Orthographies that use alphabets and syllabaries are based on the principle that written graphemes
correspond to units of sound of the spoken language: phonemes in the former case, and syllables in the
latter. In virtually all cases, this correspondence is not exact. Different languages' orthographies offer
different degrees of correspondence between spelling and pronunciation. English, French, Danish,
and Thai orthographies, for example, are highly irregular, whereas the orthographies of languages such
as Russian, German, Spanish, Finnish, Turkish, and Serbo-Croatian represent pronunciation much more
faithfully.
Logographic Systems
A logogram is a character that represents a morpheme within a language. As each character represents a
single unit of meaning, many different logograms are required to write all the words of a language. If the
logograms do not adequately represent all meanings and words of a language, written language can be
confusing or ambiguous to the reader.
Logograms are sometimes conflated with ideograms, symbols which graphically represent abstract ideas;
most linguists now reject this characterization: Chinese characters are often semantic–phonetic
compounds, which include a component related to the character's meaning, and a component that gives a
hint for its pronunciation.
Syllabaries
A syllabary is a set of written symbols that represent either syllables or moras, a unit of prosody that is
often but not always a syllable in length. The graphemes used in syllabaries are called syllabograms.
Syllabaries are best suited to languages with relatively simple syllable structure, since a different symbol
is needed for every syllable.
An orthography in which the correspondences between spelling and pronunciation are highly complex or
inconsistent is called a deep orthography (or less formally, the language is said to have irregular spelling).
An orthography with relatively simple and consistent correspondences is called shallow (and the language
has regular spelling).
Alphabets
An alphabet is a set of letters, each of which generally represent one of the segmental phonemes in a
spoken language. However, these correspondences are rarely uncomplicated, and spelling is often
mediated by other factors than just which sounds are used by a speaker. The word alphabet is derived
from alpha and beta, the names for the first two letters in the Greek alphabet. An abjad is an alphabet
whose letters only represent the consonantal sounds of a language. They were the first alphabets to
develop historically, with most that have been developed used to write Semitic languages, and originally
deriving from the Proto-Sinaitic script. The morphology of Semitic languages is particularly suited to this
approach, as the denotation of vowels is generally redundant
Orthography and Phonemes
In less formally precise terms, a language with a highly phonemic orthography may be described as
having regular spelling or phonetic spelling. Another terminology is that of deep and shallow
orthographies, in which the depth of an orthography is the degree to which it diverges from being truly
phonemic. The concept can also be applied to non-alphabetic writing systems like syllabaries.
Regualar
A phoneme may be represented by a sequence of letters, called a multigraph, rather than by a single
letter (as in the case of the digraph ch in French and the trigraph sch in German), that retains
predictability only if the multigraph cannot be broken down into smaller units. Some languages use
diacritics to distinguish between a digraph and a sequence of individual letters, and others require
knowledge of the language to distinguish them; compare goatherd and loather in English.
Irregular
Sometimes, different letters correspond to the same phoneme (for instance u and ó in Polish are both
pronounced as the phoneme /u/). That is often for historical reasons (the Polish letters originally stood for
different phonemes, which later merged phonologically). That affects the predictability of spelling from
pronunciation but not necessarily vice versa. Another example is found in Modern Greek, whose
phoneme /i/ can be written in six different ways: ι, η, υ, ει, οι and υι.
Defective Orthography
An orthography based on a correspondence to phonemes may sometimes lack characters to represent all
the phonemic distinctions in the language. This is called a defective orthography. An example in English
is the lack of any indication of stress. Another is the digraph |th|, which represents two different phonemes
(as in then and thin) and replaced the old letters |ð| and |þ|. A more systematic example is that
of abjads like the Arabic and Hebrew alphabets, in which the short vowels are normally left unwritten and
must be inferred by the reader.