|Martijn van Veen|
Me showing off Dasher
After obtaining a bachelors degree in Electrical Engineering, with a focus on Telecommunications, I started a Master's study at the Eindhoven University of Technology in 2003, again in the Electrical Engineering department. Here I choose Information Theory as my major, which is part of the Signal Processing Group.
In 2006 I started my graduation thesis with a research project on language modeling. My supervisors, Dr. ir. F.M.J. Willems and Dr. ir. Tj.J. Tjalkens developed a universal compression algorithms in the 1990s together with Dr. ir. Yu. Shtarkov, called Context-Tree Weighting (CTW). Due to the high similarity between statistical compression methods consisting of a probabilistic model and an arithmetic encoder, and a predictive language model, using CTW as language model is a natural choice. Especially since CTW is known as a very efficient compression algorithm, so we expected it would do a good job at language modeling as well.
|There are several applications for language
models, such as in speech recognition systems. One very interesting
application of a language model is in text entry methods. The model
calculates a probability for every possible input character and makes the
most likely characters easier to input. If the model is accurate, this can
greatly speed up text input. One such predictive text entry methods is
subject of my graduation thesis thus became "make CTW suitable for use as
a language model in Dasher". Indeed CTW proved to be a good method to base
a language model on, and we were able to outperform the currently used
language model based on Prediction by Partial Match (PPM). The thesis I
wrote on this subject can be downloaded from this page on the left.
In February 2007 I graduated. Professor of Natural Philosophy David Mackay invited me to collaborate with the Dasher developers at the Inference Group of the Physics department, Cavendish laboratory in Cambridge during March and April. In that period I worked with Phil Cowans on Dasher. A summary of the results of that work can be downloaded on the left of this page. The main result is that in future Dasher versions the CTW language model is included in the code.
Although I am currently working for ASM in the semi-conductor industry, I am still interested in language modeling and source coding in general. If you have questions about the CTW language model do not hesitate to contact me. I can be reached at mvveen at ieee.org