General idea
|
|
This Furigana package converts a normal Japanese text to a text with Furigana marked up.
The Hiragana sequence between | and > will be converted into Kanji sequence.
Let H(n) be a Hiragana character.
Let K(n) be a Kanji character.
| H(1) H(2) H(3) H(4) > K(1) K(2) H(4)
The Hiragana sequence H(1) H(2) H(3) H(4) corresponding to the Kanji sequence K(1) K(2) H(4) is expressed between a | and a >.
Here is an example. A provided training text may look like this:
メロスは激怒した。
The converted text would look like this:
|めろす>メロスは|げきど>激怒した。
In this case,げきど is the Hiragana sequence that will be converted to a Kanji sequence 激怒.
Notice that the Hiragana sequence した does not need to be converted. So no > is added.
Also, in this case, we can see that めろす is converted into Katakana sequence
メロス.
|
Required software
|
|
|
Instructions
|
|
- Download the furigana package
Extract the archive.
$ tar xvfz furigana.tar.gz
./Furigana/
./Furigana/e2u
./Furigana/u2e
./Furigana/MkFurigana.pl
./Furigana/Makefile
./Furigana/PreProcess.pl
./Furigana/README
- Provide a Shift-JIS encoded Japanese training text.
To obtain a free training text, try Aozora Bunko
For example; let us provide "Hashire Merosu" by Dazai, Osamu as a training text.
$ ls
1567_ruby_4948.zip
$ unzip 1567_ruby_4948.zip
Archive: 1567_ruby_4948.zip
inflating: hashire_merosu.txt
$ ls
1567_ruby_4948.zip hashire_merosu.txt
- Place the training text in the same directory as the Furigana package and rename the trainig text as "input.sjis"
$ cp hashire_merosu.txt ../Furigana/input.sjis
- Run "make" and "make clean". The created training text in UTF-8 will be "training.txt"
$ ls
Makefile MkFurigana.pl PreProcess.pl README e2u input.sjis u2e
$ make
perl5.8.4 PreProcess.pl input.sjis > text.euc
chasen text.euc > chasen.euc
perl5.8.4 MkFurigana.pl chasen.utf8 > training.txt
$ make clean
rm text.euc
rm chasen.euc
$ ls
Makefile PreProcess.pl e2u training.txt
MkFurigana.pl README input.sjis u2e
|
Options
|
|
There are some options provided for your convenience.
- Input file name:
$ make INPUT=<filename>
- Output file name:
$ make OUTPUT=<filename>
- Perl paths:
The default perl is set to 5.8.4.
$ make PERL=<perl-path>
|
Other tools
|
|
These tools may be helpful when using Furigana packages.
|
Bug reports
|
|
Please send bug reports to:
Takashi Kaburagi kabruragi[AT]mrao.cam.ac.uk
|