Spoken in China and by overseas Chinese, the Chinese language (中文) is a member of the Sino-Tibetan family of languages. Chinese is a tonal language related to Tibetan and Burmese[?], but unrelated to other CJK and neighbouring languages genetically, such as, Korean, Vietnamese, Thai or Japanese. However, these languages were strongly influenced by Chinese in the course of history, linguistically, and also extralinguistically. Korean and Japanese both have writing systems employing Chinese characters, which are called Hanja and Kanji respectively. Along with those two languages, Vietnamese also contains many Chinese loanwords.

About one-fifth of the world speaks some form of Chinese as their native language. It is common for speakers of Chinese to be able to speak several variations of the language. Typically in southern China, a person will be able to speak the local dialect, Mandarin Chinese, and occasionally either speak or understand another dialect. In addition, most educated Chinese will be able to read to some degree Classical Chinese.

In the field of software and communications internationalization, CJK is a collective term for Chinese, Japanese, and Korean.

The notion of a "Chinese Language" may seem at first to be a fiction. The term "Chinese" is employed for the classical written language known as "wen2 yan2 (文言 "literary language")" which was used by Confucius, as well as the modern standard[?] known as "bai2 hua4 (白話 [白话] "vernacular")". It includes many different spoken variations which may be mutually unintelligible. The spoken language of Beijing is for example very different from Cantonese, the conversational language of Hong Kong.

Nevertheless, there are good reasons for using a collective name. The most important one is that Chinese themselves consider the language to be unified entity, and there are good reasons for treating it as such. The most important is that the distinctions between the different variations of Chinese are not very distinct. For example, in writing an informal love letter, one may use informal "bai hua." In writing a newspaper article, the language used is different and begins to include aspects of "wen yan." In writing a ceremonial document, one would use even more "wen yan." The language used in the ceremonial document may be completely different from that of the love letter, but there is a socially accepted continuum existing between the two. Pure "wen yan", however, is rarely used.

There are similar continuums in spoken language. A person living in Taiwan for example, would commonly mix pronunciations, phrases, and words from Mandarin and Min-nan, and these mixtures would be considered socially appropriate under many circumstances. A person living in Hong Kong would use different combinations of Mandarin, colloquial Cantonese, and written Cantonese depending on the social situation.

Another distinctive aspect of the Chinese language is the complex relationship between the various spoken varieties, and the various written varieties. Chinese is written using a logographic script in which one character represents one word element, or morpheme. It is generally the case that a Chinese text written in "bai hua" would be readable by most educated Chinese, but again the relationship between written and spoken Chinese is complicated. For example, an educated person in Hong Kong would be able to write a text in written formal Cantonese which is readable by someone who is a Mandarin speaker. However, that written formal Cantonese, while similar to written formal Mandarin, would be very different from a word-for-word transcription of what the Cantonese speaker would speak and would also be different from written colloquial Cantonese. One might ask: "If formal written formal Cantonese is different from spoken Cantonese, then where does the Cantonese reader learn written formal Cantonese?" The answer is that the individual would learn it in school, just as an English speaker learns how to write and speak "proper" English in school.

More about the varieties of spoken Chinese as well as Chinese grammar in Chinese dialects.

More about the written Chinese language in Chinese written language.

More about Chinese characters in Chinese characters.

Table of contents

Computer processing of Chinese

The computerized processing of Chinese characters involves some special issues both in input and character encoding schemes.

Chinese encoding systems

  • Guobiao (國標[国标] abbreviation for Chinese National Standard[?]) which is used in Mainland China. All Guobiao standards are prefixed by GB, the latest version is GB18030 which is a one, two or four byte encoding.
  • Big5 which is used in Taiwan and Hong Kong is a one or two byte encoding.
  • Unicode is not well accepted by the Chinese government. The Chinese government mandates software must support GB18030 encoding to be legally sold in China. Some says it is purely a political move of protectionism.

Because Guobiao is used in Mainland China while Big5 is used in Taiwan and Hong Kong, Guobiao is usually displayed using simplified characters and Big5 is usually displayed using traditional characters. There is however no mandated connection between the encoding system and the font used to display the characters. However, font and encoding are always tied together for practical reasons. For example, one cannot map traditional Chinese glyphs to the GB encoding without compromising the meaning of some characters. Some "simplification" involves mapping many characters with different meaning and usage into a much simpler common writing. One can easily map many-to-one in a Big5 encoding using simplified glyphs. But mapping one-to-many when assigning traditional glyphs to the GB encoding is tricky, because whatever you pick, some characters would be the wrong choice in some of the usages. Technically one can map simplified glyphs to the Big5 encoding, but such product would not find a profitable market and hence practically non-existent. Unlike UNICODE which assigns different codes for simplified characters than traditional characters, neither Big5 nor Guobiao supports both traditional and simplified characters simultaneously. The GB18030 may be an exception because it was designed to be even bigger than Unicode.

One interesting problem in Chinese data processing is the conversion between traditional and simplified Chinese. As stated in the above paragraph, the one-to-many and many-to-one conversions are tricky. The traditional to simplified (many-to-one) conversion is simple but sometimes information is lost and a round trip conversion often results in a data loss. The simplified to traditional (one-to-many) conversion often requires usage context or common phrases to resolve conflicts.

History of Chinese

Deciphering the history of Chinese poses an interesting problem. How do you know the pronunciation of a language which is not written phonetically? Nevertheless, there are enough clues in the writing system (especially the xiesheng characters), rhymes in poetry, and transcriptions of foreign names, so that the effort that has been devoted at solving this problem is a testimony to the ingenuity of linguists.

Old Chinese

Old Chinese, sometimes known as 'Archaic Chinese', is the language of the early and mid Zhou Dynasty (11th to 7th centuries B.C.), whose texts include inscriptions on bronze artifacts, the poetry of the 詩經 Shijing, the history of the 書經 Shujing, and portions of the 易經 Yijing (I Ching).

Work on reconstructing Old Chinese started with Qing dynasty philologists. The pioneer of Western study of Old Chinese is the Swedish linguist Bernhard Karlgren, whose work is based on the forms of the characters and the rhymes of the 'Shijing'.

Middle Chinese

Middle Chinese is the language of the Sui, Tang, and Song dynasties (7th through 10th centuries A.D.). It is can be divided into an early period, for which the 切韻 'Qieyun' rhyme table (A.D. 601) relates to, and a late period in the 10th, which the 廣韻 'Guangyun' rhyme table reflects. Bernhard Karlgren called this phase 'Ancient Chinese'.

Linguists are confident in having a good reconstruction of which Middle Chinese sounded like. The evidence for the pronunciation of Middle Chinese comes from several sources: modern dialect variations, rhyming dictionaries, and foreign translations.

Just as Proto-Indo-European can be reconstructed from modern Indo-European languages, so can Middle Chinese be reconstructed (very tentatively) from modern dialects. In addition, ancient Chinese philologists devoted great amount of effort in summarizing the Chinese phonetic system through "rhyming tables", and these tables serve as a basis for the work of modern linguists. Finally, Chinese phonetic translations of foreign words often provide clues. For example, "Dravida" was translated by religious scribes into a series of characters 達羅毗荼 that are now read in Mandarin as /ta35 lwo35 phi35 thu35/. This suggests that Mandarin /wo/ is the modern reflexes of an ancient /a/-like sound, and that the Mandarin tone /35/ is a reflex of ancient voiced consonants. Both of these can in fact be confirmed through comparison among modern Chinese dialects.

Old Mandarin

Old Mandarin refers to the language of Yuan dynasty of the 14th century A.D. and preserved in the 'Zhongyuan yinyun' rhyme book.

Modern Chinese

The transition from "wen yan" to "bai hua"

A side product of the May Fourth Movement around the early 1900s popularized vernacular literature.

The creation of a "national language"

Educating Mandarin

Character simplification

The communist government tried to improve the literacy rate of her people by reducing and simplifying the character set in the Chinese language in the 1940s. The number of brush strokes required to write many words were reduced. (e.g. 葉 maps to 叶; 萬 maps to 万). Sometimes many complicated characters were folded into one simpler character.

Its effect on the language is still controversial decades later. Some complained that the simplification jeopardised the study of ancient literature by creating a disconnect between daily used text and the literal text. After a few more generations, the Chinese general population will not be able to appreciate the complexity of ancient literature. Some praised the simplification because it allowed more under-educated people to read more.

The Future of Chinese

In Mainland China and Singapore Chinese is written using simplified characters. In Taiwan and Hong Kong Chinese is written using traditional characters. However, ever since the PRC government took over Hong Kong in 1997, more and more simplified Chinese text has shown up in daily usage in Hong Kong. Some schools have already switched teaching material to the simplified characters.

Some believe the use of traditional Chinese characters will diminish over time because of the relative difficulty of learning to write them. On the other hand, there are many culturally conservative people (some non-Chinese) who consider simplified characters a "bastardized" and "dumbed-down" version of Chinese writing. Due to the political influence of mainland China, however, it is unlikely that traditional characters will ever regain their dominance.

See also


