Inference Group

Для чего нужен Dasher?  
Как работает Dasher? 
·Три страницы 
·Try Dasher in your browser 
Скачать Dasher 
·Подсказки для Новичков  
Dasher для особых случаев 
Другие языки 
Заглядывая в будущее 
История Dasher 
·Development 2008 
·CVS branches 2005 
·Driving methods 
·Button Dasher Notes 
·Chinese « 
·Japanese alphabets 
·Notes from March 2005 
·Version 3.0.2 Release Notes 
·API Notes 
·Version 3.0.1 Release Notes 
·Notes from 12/02 
·Notes from 11/02 
·Notes from 10/02 
·Further notes 08/02 
·Dasher Plan 07/02 
·Discussion 07/02 
·To Do List 07/02 
·To Do List 04/02 
Обзор прессы 

Search :

Chinese Dasher wiki

Chinese "Ruby" Corpus

I have found a Chinese corpus which gives both pinyin and Chinese Character strings together. I used this corpus to make our pinyin corpus download/training/training_pinyin_CN.txt and a "Ruby" corpus download/training/training_chineseRuby_CN.txt . [Ruby is our name for mixed phonetic text and chinese or Japanese characters; in Japanese, we call Ruby furigana.]

The original corpus is in /home/mackay/dasher/incoming/chinese/pinyin and /home/mackay/dasher/incoming/chinese/character.

My perl program that creates the Ruby output is /home/mackay/dasher/incoming/chinese/pinyin/CONVERTP.p . The associated alphabet file is alphabet.chineseRuby.xml

My perl program that creates the pure pinyin output is /home/mackay/dasher/incoming/chinese/pinyin/CONVERT3.p . The associated alphabet file is alphabet.pinyin.xml .

On Fri 5/8/05 I fixed an error in my conversion program, with the help of Chunlin Ji. Here are his notes.

Rules to mark the tone for Pinyin:

  1. if there are more than one vowels and the first one is 'i', 'u' or 'ü', then the second vowel takes the mark;
  2. Otherwise,the first vowel takes the mark. (the vowels in Pinyin: 'a', 'e', 'i', 'o', 'u', 'ü' )
By the way, there are several small tricks in writing Pinyin, e.g. "Hanyu Pinyin" simplifies the spellings of syllables with 'ü' by using the 'u' form instead in cases where no ambiguity could result, for example when 'ü' comes after 'j', 'q', 'x' or 'y' . This is merely a spelling convention; the 'u's here are still pronounced 'ü'".

For a detailed guide to the rules of Pinyin,please refer to the following webpages (in English) Combinations of initials and finals ( Where do the tone marks go? ( Basic Rules of Hanyu Pinyin Orthography (

Software: Here are some free and popular input methods in Linux. I guess they may contain the source codes to convert Pinyin to Chinese characters. 1.SICM: (Input methods include (Simplified/Traditional) Chinese, Japanese, Korean and many European languages) 2.Fcitx: (In English: 3.XCIN: (widely used in Taiwan) 4.Chinput: 5.XSIM:

a software which can translate Chinese character to Pinyin is useful to create training data? If so, the following software may help. (Webpage is in Chinese)

The bopomofo alphabet is here.

"The Inference Group" при поддержке "the Gatsby Foundation"
и в тесном сотрудничестве с "IBM Zurich Research Laboratory"
Дэвид Маккэй
Последнее обновление сайта Fri Oct 1 10:33:21 BST 2010