Project LeaderĄGDah-An Ho
Executing organization: Institute of Linguistics, Academia Sinica
The Language Archives project is part of the Taiwan e-learning and National Digital Archives Program (TELDAP) which was launched in 2002 under the auspices of the National Science Council of Taiwan. The first phase of the project formally started in 2002 ˘w a pilot study was carried out in 2001 ˘w and the second in 2007. The project aims at recording and preserving language diversity and is composed of the Chinese Language and the Formosan Language Archives. The former is further divided into four sub-projects of Southern Min and Hakka, A socio-phonetic study of spoken Taiwan Mandarin , Tagged Corpus of Old Chinese, and Lexicon of Pre-Qin Inscriptions (including bone, bronze and bamboo media). The goal of the Formosan Language Archive is to preserve the endangered Formosan languages of Taiwan. Since 2002, the research results of the Language Archives project have been posted on the internet to allow browsing and information retrieval. The website has already become a crucial reference for research on Chinese and Formosan languages.
The Language Archives project has five components: Southern Min and Hakka; the Indigenous Austronesian Languages of Taiwan; Sociolinguistics of Spoken Taiwan Mandarin; Tagged Corpus of Old Chinese; and Lexicon of Pre-Qin Inscriptions (including bone, bronze, and bamboo media). Since 2002, the research results of the Language Archives project have been posted on the internet to allow browsing and information retrieval. The website has already become a crucial reference for research on Chinese and Formosan languages. In 2007, the Archives project entered its second phase, that of recording and preserving language diversity. In this phase, the focus is on international cooperation and archiving of the local languages of Taiwan.
The "Language Archive" has already accomplished the following: for the early modern Chinese corpus, Hong Lou Meng, Jin Ping Mei and Ping Yao Zhua have been put on online with parts-of-speech tagging; 770k words of Shui Hu Zhuan and 80k words of xiqu are tagged with parts-of-speech; 6,500 interpretations of bronze inscriptions are proofread; 730 interpretations of bamboo manuscripts are proofread; for the modern Chinese corpus, 800k phrases are collected, 2.5 million words are tagged with parts-of-speech, and 110k sentences are structurally analyzed; for the modern Chinese speech corpus, 11 hours (6.78GB) of topical conversations and 2 hours (1.3GB) of news recitals are digitally transcribed; The "Formosan Language Archives" digitally processed eleven languages (Atayal, Amis, Bunun, Kanakanavu, Paiwan, Pazeh, Puyuma, Rukai, Saisiyat, Siraya, Tsou) and syntactically analyzed texts in six of the above-mentioned languages; 500 paronyms of English and Chinese sources are integrated with GIS.
The Nanwang Puyuma used to be a male shamanic society which, at the turn of the 20th century, turned into a society with female shamans. Lately it is experiencing a rise of mediums that have integrated Chinese rites. While female shamans are in charge of all the rites that concern the family, male shamans are responsible for rites that concern the village. The Formosan Language Archive features 31 Nanwang Puyuma (male and female) rituals that were collected in the 1980s and are divided into everyday/non-collective rites and exceptional/collective rites.