Newsflash: Woo-hoo! The project just got accepted by Embedded Linux Journal, so it's all systems go. If you'd like to help out or just find out more, head over to www.wordvault.org/Devices to have a look around and/or to sign up for the mailing list.

-Tristan, 2001-02-01.


Wordvault Anywhere Project Proposal

0) Table of Contents

1) Summary

The aim of the WordVault Anywhere project is to create a portable device with a natural language interface that will be able to provide translations, definitions and pronunciation examples for every single English and foreign-language word. But wait! Before you hit the delete key... this is the ultimate aim of the project and will probably not be achievable for three years or more. The first phase, outlined below, will be to create a functional prototype (controlled using commands in English only) of this system, using readily available hardware and software components.

2) Phase One Objectives

Phase one of the project corresponds to the timescale of the Embedded Linux Journal contest period, namely from February to July 2001 inclusive. The goal for this phase will be to produce a working prototype of the device that will be able to accept voice commands, lookup words using pre-exisiting dictionaries, and provide a synthesized speech output of the results, including translations to other languages. Items missing from this prototype that should be made available in final versions could include a voice-driven program to configure voice recognition, and also a user-friendly system to add, remove and update dictionary files from the device. However, I would hope to have enough software and documentation available so that other moderately technical people would be able to construct a similar device themselves or to download and test the system on their own desktop Linux machines.

During the same time period, the Wordvault.org sister project will be started in order to begin capturing pronunciation data for use in future phases (see section 5 - Sister Project, below).

3) Proposed Hardware

The WordVault Anywhere prototype device will be based on the Tri-M Engineering MZ104 PC/104 board plus two other components: a soundcard and 2.5" disk drive. The final system will also require a rechargeable battery pack power supply. External connectors would be provided for i) a serial port for communication during development and debugging ii) a speaker and microphone (or headset), and iii) an power in socket for battery recharging. Final versions would probably involve having the speaker and microphone integrated in to the PC/104 stack, plus the addition of a USB and/or parallel port to allow fast transfer of dictionary files between a desktop PC and the WordVault Anywhere device.

Thus, the final PC/104 prototype stack will be as follows:

 		Input ->	Stack	-> 	Output
   
   		Microphone	Soundcard	Speaker
   		Serial port	MZ104 board	Serial port
   		Power supply	Power pack
   				2.5" HDD
  

Examples of Soundblaster Pro compatible soundcards in the PC/104 form factor include the Adastra 1041 and the Microspace MSMM1042.

The rechargeable battery pack could be provided by one of those listed in the Wearables HOWTO3, by Tri-M Systems4 (hint, hint) or from other vendors listed on the PC104 Consortium website5.

Finally, as the initial space requirements will not be large (a few hundred megabytes should be ample), then any old, second-hand 2.5" EIDE disk drive could be used as the storage medium, so long as it is one that is able to operate on +5V DC alone.

4) Proposed Software

From the ground up: the WordVault Anywhere device will run a standard Linux kernel, plus have access to all standard text-mode Linux applications (disk space should not be a problem due to the inclusion of a standard EIDE hard drive in the stack). These will be useful for controlling and configuring the machine during development. However, the core functionality of the system will involve four components: speech recognition, word lookup, speech synthesis, and a control program to tie the preceding three together. Taking each one in turn:

4a) Speech Recognition

There are a variety of speech recognition programs available for Linux, as outlined in the Speech Recognition HOWTO6. The number of words recognizable by the system would be kept deliberately small at first (users would be required to spell each word, as outlined in Appendix II - Example Dialog, below). This should improve the accuracy of the recognition and help with such considerations as using the device in noisy environments or by non-native speakers. A suitable simple program that could be used immediately is CVoiceControl7, although it may be necessary to turn to more powerful though less end-user-oriented projects such as ISIP8 or OpenMind Speech9 (formerly FreeSpeech). All three are open source, naturally.

4b) Dictionary system

Upon researching this proposal, I came across a clear winner in the field of dictionary systems, namely the DICT protocol10. This is a fully open system, as documented in RFC 222911 and for which a variety of open-source client and server programs12 are available. In addition to the this client/server system, freely available dictionaries will have to be installed. These will include English-language repositories such as WordNet13 and Websters 191314, as well as translating dictionaries available from the Internet Dictionary Project15 and FreeDict16. Also, in future, audio pronunciation data should be available from the Wordvault.org sister project using the MIME option of the DICT protocol (see section 5 - Sister Project, below).

Note that although the focus of the WordVault project will be to provide definitions, translations and pronunciation, it would be trivial to alter the system to access more specialized information provided that it was available in DICT format. Examples of currently available references include The 1995 CIA World Factbook17, the Jargon File18 the Chemical Elements Database19, and more20.

4c) Speech synthesis

The comp.speech FAQ21 lists a variety of potential speech synthesis programs. Open-source examples include rsynth22, Festival23 and others24. Festival also includes English, American and Spanish pronunciation. Either of the above can be used in conjunction with speechd25 to enable speech output by simply redirecting standard output to /dev/speech. Further possibilities include semi-free and non-free software such as the multi-lingual MBROLA26 project and IBM's ViaVoice Text-to-Speech SDK26a, but I'd consider these a last resort due to potential licensing headaches.

4d) Control program

You will have noticed that up to now I haven't mentioned developing any software of my own (I aspire to all three cardinal virtues of the Perl programmer27 :) This is because I am aiming to do as little extra work as possible, and to avoid reinventing the wheel. However, to get the system to hang together a command interface needs to be created, which relates to the previous three components as outlined in the diagram below. This program will accept text input and provide text output, and will be most likely written in Python (using the Cmd class for speed of development). The aim of the program will be to accept input to control a DICT client and to pre-format the text responses, as well as providing control and configuration of the system. The result will be a semi-intuitive interface to the dictionaries (see Appendix II - Example Dialog, below).

Overall, the system will be composed of components as follows:


                                 text                    text
           Speech recognition   ----->  Control program  ----->  Speech synthesis
           (e.g. CVoiceControl)  input  (Python script   output  (e.g. Festival)
                                         controlling
                                         DICT client)
                                               |
                                               |
                                          DICT | protocol
                                               |
                                               |
   					  DICT server
   					
  

This breaking down of the system into component parts aims to conform to the Unix philosophy28 and means that the system will be able to benefit from advances in each of its constituents (e.g. if a better speech synthesis program is found, it can simply be dropped in to place). It should also make the project quicker to develop and more resilient than if I took a monolithic approach (if everything goes horribly wrong, and e.g. the voice input or output is unacceptable, then they could be replaced with a keypad or LCD display respectively).

Note that in order to install the software on the device, I would cheat by simply installing a trimmed-down Linux distro on the hard-drive by plugging it in to a laptop machine. I would then installing the software mentioned above, disable unnecessary daemons etc., and enable the use of a serial login terminal. The bootable drive could then be plugged into the PC/104 board, and the device could then be controlled by using a null modem cable and a terminal emulation program such as minicom. Files could be transferred from a development machine to the target drive by using ZMODEM file transfer commands.

Future developments could involve optimizing the system to allow it to function better (e.g. trimming down the kernel to save memory) and potentially to take advantage of the available DiskOnChip storage and MachZ chip features (but, hey, that'll have to wait for more of Doug Stead's articles :)

5) Sister Project: WordVault.Org

So far, all the above has just been about re-using existing sources of information, without giving anything back to the wider community. This is where the idea behind WordVault.org comes in: the aim of this website will be to provide a multi-lingual interface to the publicly-available DICT format resources mentioned above, but also to solicit example pronunciation from site visitors. These will initially be captured by uploading recorded samples, but eventually may be recorded in more user-friendly ways such as via a Java applet.

Once audio samples for words have been entered, they will be checked by moderators and then be made available to subsequent site visitors in a variety of formats29, along with definitions of the words they are searching for. Finally, once a sufficient number of audio clips are available, they will be able to be formatted and served by the DICT server on the WordVault Anywhere device, and therefore allow it to provide example pronunciation for words, something that should be of great benefit to people who are trying to learn non-phonetic languages (such as English). The large size of these audio samples is the principal reason behind using a hard drive in the WordVault Anywhere device.

I have registered the domain wordvault.org, and (hopefully along with volunteers) will set up a web-based management system using the Zope30 content management system. The Zope system should also provide easy management of development documentation for the phase one project outlined above, as well as providing downloadable dictionaries and running a DICT server itself to allow remote queries from other client programs.

Appendix I) Answers to Questions

Q1) What is the working title for your project?

A1) WordVault Anywhere

Q2) What need or desire will your embedded Linux project satisfy?

A2) The ability for people to easily access definitions and translations of words in their own language (initially only English). In future, to be able to hear pronunciation examples too.

This should be of particular interest to people who are trying to learn another language (especially when living in a foreign country) and who constantly require access to definitions, translations and pronunciation examples.

Q3) What are your qualifications for carrying out an embedded Linux project, including programming and hardware experience? You may include URLs of related work, either hobby or professional.

A3) I've been assembling my own PCs since the days of 286s, and I am ready for a new challenge. I have been using Linux as my primary operating system for over five years (and been a LJ subscriber for nearly three), can write simple shell scripts, and know my way around most commands and configuration files. I am happy with recompiling programs from C source code (including the Linux kernel, of course). I also have two years commercial experience mucking about with various scripting languages (mainly web dev stuff: Perl, PHP, ColdFusion, ASP, Python, Zope).

I realize this is hardly hardcore embedded programming. However, more than my concrete experience, I believe that it is my enthusiasm and ability to learn quickly (I have a PhD if that helps :) that will make or break the project. Plus I certainly don't intend to do this alone. I'll be asking questions and promoting the project on mailing lists that deal with speech recognition31, speech synthesis32, the DICT protocol33 and Zope34. And I plan to release early and often (I haven't read those Eric Raymond essays35 for nothing :)

Q4) What additional hardware are you considering using? (you are not required to use it in your final project)

A4) A soundcard, a 2.5" hard drive and a battery.
For full details, see section 3 - Hardware, above.

Q5a) What software do you plan to develop?

A5a) A Python script to control and configure the system.

Q5b) What tools and libraries do you plan to use?

A5b) Speech recognition software (e.g. CVoiceControl7), speech synthesis software (e.g. Festival23) and DICT client/server software12. Full details are given in section 4 - Software, above.

Q6) Do you plan to use an embedded Linux distribution? If so, which one?

A6) Probably not. As the project uses a hard drive, storage space should not be a problem, so a standard Linux distro with a trimmed-down kernel should suffice for phase one.

Q7) What sources of information and support will you consult while carrying out your project?

A7) A fair few. Please see Appendix III - References, below.

Q8) Please include your contact information:

Q8a) Your full name as you would like it to appear in Linux Journal

A8a) Tristan Roddis

Q8b) Your shipping address and phone number(confidential)

A8b) Telephone: +52 9 516 4827

Address: Panoramica del Fortin 110, CP68000 Centro, Oaxaca, Mexico

PLEASE NOTE THAT THIS ADDRESS IS DUE TO CHANGE SHORTLY, AND THAT THE MEXICAN POSTAL SERVICE IS LOUSY - IF I AM A FINALIST PLEASE NOTIFY ME BY EMAIL/PHONE AND I WILL PROVIDE UPDATED ADDRESS DETAILS FOR SHIPPING.

Q8c) What to link your name to on the web site if you are a finalist

A8c) Please link to www.wordvault.org (which should be up and running by then) or www.roddis.org if you want to link to my personal site rather than the project site.

Development Considerations:

Q9) Will the project either satisfy a real need, or have aesthetic, entertainment, or scientific value?

A9) Real need: I hope so (as mentioned in answer A2 above). Furthermore, the whole project should become even more widely applicable and useful as it matures. E.g. as pronunciation samples get added to the mix, and when the quality and quantity of the freely-available dictionaries improves. Plus it's so tied in to other projects (see section 4 - Software, and Section 5 - Sister Project, above) that even if the whole thing fails to have any value in its own right, then it should at least provide code and/or information that can be easily reused elsewhere.

Q10) Does the person or team have skills and/or motivation appropriate to carrying out the project?

A10) Well, I'm keen, and I'm not afraid to use a soldering iron! :) (for a more serious answer, see answer A3 above)

Q11) Is the project different from other, previously constructed embedded Linux projects?

A11) As far as I know, the idea is unique. Despite extensive web searching I didn't find anything similar to the device I am proposing to build (although there are of course lots of related projects that I will be freely borrowing from, as detailed in the Appendix III - References, below).

Appendix II) Example Dialog

Here follows an example of the sort of dialog between a user and the WordVault Anywhere device that should be achievable by the end of phase one. As mentioned above, the initial voice recognition will rely on a small number of command words, plus sounds for each letter of the alphabet and numbers, so that users can spell words. This is denoted by s-p-e-l-l-i-n-g below.

Note that all definitions and translations are real, as obtained from the WordNet 1.613 and IDP Spanish15 DICT databases (to check the sort of information that is available already, I recommend trying out a DICT client yourself, such as the GDict36 program for GNOME).

[switch on]

WordVault Anywhere: Starting up... System initialized. Welcome to WordVault.

User: define h-u-b-r-i-s
WordVault Anywhere: hubris. Noun. Definition: overbearing pride or presumption.

User: define l-a-z-i-n-e-s-s
WordVault Anywhere: laziness. Noun. Definition 1: inactivity resulting from a dislike of work. Synonym: indolence. Definition 2: apathy and inactivity in the practice of virtue (personified as one of the deadly sins). Synonym: sloth, acedia.

User: synonyms
WordVault Anywhere: 1 indolence. 2 sloth. 3 acedia. Chose a number or say none to cancel.

User: 2
WordVault Anywhere: sloth. Noun. Definition 1:a disinclination to work or exert yourself. Synonym: slothfulness. Definition 2: any of several slow-moving arboreal mammals of South and Central America; they hang from branches back downward and feed on leaves and fruits. Synonym: tree sloth. Definition 3: apathy and inactivity in the practice of virtue (personified as one of the deadly sins) Synonyms: laziness, acedia.

User: match i-m-p-a-t
WordVault Anywhere: 6 matches found. 1 impatience. 2 impatiens capensis. 3 impatient. 4 impatient of. 5 impatiently. 6 with impatience. Chose a number or say none to cancel.

User: repeat
WordVault Anywhere: 6 matches found. 1 impatience. 2 impatiens capensis. 3 impatient. 4 impatient of. 5 impatiently. 6 with impatience. Chose a number or say none to cancel.

User: 1
WordVault Anywhere: impatience. Noun. Definition 1: a lack of patience; irritation with anything that causes delay. Synonym: restlessness. Definition 2: a restless desire for change and excitement. Definition 3: a dislike of anything that causes delay. Antonym: patience.

User: help
Wordvault Anywhere: Help is available on the following commands. Say help plus the command name to get more details or say help introduction to hear an overview. Commands: antonyms, define, end, match, options, repeat, set, show, synonyms, translate....[etc].

User: help options
Wordvault Anywhere: The options command allows you to hear your current preferences....[etc]

User: options
WordVault Anywhere: Options are set as follows: language is english. database is WordNet 1.6. translation is french. voice is american....[etc] Say help plus the name of the preference to hear more details. Say show plus the name of the preference to see available options. Say set plus the name of the preference and its new value to change an option.

User: show translation
WordVault Anywhere: Available translations are french, italian, romanian and spanish.

User: set translation spanish
WordVault anywhere: option translation has been set to spanish.

User: translate w-o-r-d
WordVault Anywhere: la palabra. l-a p-a-l-a-b-r-a.

User: translate v-a-u-l-t
WordVault Anywhere: bóveda, sótano, caja fuerte. Noun. b-ó-v-e-d-a, s-ó-t-a-n-o, c-a-j-a f-u-e-r-t-e.

User: repeat
WordVault Anywhere: bóveda, sótano, caja fuerte. Noun. b-ó-v-e-d-a, s-ó-t-a-n-o, c-a-j-a f-u-e-r-t-e.

User: end
WordVault Anywhere: Are you sure you want to shut down WordVault? say yes to confirm or no to cancel.

User: yes
WordVault Anywhere: Powering down the system. Goodbye.

[switch off]



Appendix III) References

Click on the reference number to return to where it was first mentioned in the text above.


1 Adastra SND-104: www.emjembedded.com/products/enclosures/adastra.sound.html
2 Microspace MSMM104: www.adlogic-pc104.com/msmm104.html
3 Wearables HOWTO (power supplies) www.linuxdoc.org/HOWTO/Wearable-HOWTO-5.html
4 Tri-M Systems (power supplies) www.tri-m.com/products/power.html
5 PC/104 Consortium products page: www.pc104.org/products
6 Speech Recognition www.linuxdoc.org/HOWTO/Speech-Recognition-HOWTO/software.html
7 CVoiceControl: www.kiecza.de/daniel/linux/index.html
8 ISIP speech recognition: www.isip.msstate.edu/projects/speech/
9 Open Mind Speech: freespeech.sourceforge.net/
10 DICT: www.dict.org/
11 RFC 2229: ftp://ftp.isi.edu/in-notes/rfc2229.txt
12 DICT software: ftp://ftp.dict.org/pub/dict/
13 WordNet: www.cogsci.princeton.edu/~wn/
14 Webster 1913: humanities.uchicago.edu/forms_unrest/webster.form.html
15 Internet Dictionary Project: www.june29.com/IDP/index.html
16 FreeDict: www.freedict.de/
17 The 1995 CIA World Factbook: www.hri.org/docs/CIA/
18 Jargon File: www.tuxedo.org/~esr/jargon/
19 The Elements Database: ucsub.colorado.edu/~kominek/elements/
20 Other dictionaries: www.dict.org/links.html
21 Comp.speech FAQ (software): www-svr.eng.cam.ac.uk/comp.speech/Section5/Q5.5.html
22 Rsynth: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsynth-2.0.tar.gz
23 Festival Speech synthesis: www.cstr.ed.ac.uk/projects/festival/
24 Other synthesis software: ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/
25 Speechd: www.speechio.org/
26 MBROLA: tcts.fpms.ac.be/synthesis/mbrola.html
26aViaVoice Text-to-Speech SDK: www-4.ibm.com/software/speech/enterprise/te_5.html
27 These are: Laziness, Impatience and Hubris: www.netropolis.org/hash/perl/virtue.html
28 Unix Philosophy: vip.hex.net/~cbbrowne/unix.html#UNIXPHILOSOPHY
29 Web audio formats: faculty-staff.ou.edu/B/Andrea.D.Beesley-1/soundvideo.html
30 Zope: www.zope.org
31 Speech recognition mailing list: leb.net/mailman/listinfo/ddlinux
32 Speech synthesis mailing lists: www.speechio.org/list.html or lists.SKYLIST.net/plaintalk
33 DICT mailing list: www.dict.org/links.html
34 Zope mailing lists: www.zope.org/Resources/MailingLists
35 The Cathedral and the Bazaar: www.tuxedo.org/~esr/writings/cathedral-bazaar/
36 GDict: gdict.dhs.org. (documentation at www.inkstain.net/fleck/gdict/ )

Miscellaneous sites:

Linux sound resources: sound.condorow.net/
Blind Linux: leb.net/blinux/
Emacspeak www.cs.cornell.edu/home/raman/emacspeak/emacspeak.html
Human languages page: www.june29.com/HLP/
PC/104 FAQ: http://www.controlled.com/pc104faq/
Embedded Linux sites: www.linuxembedded.com, www.linuxdevices.com, www.alllinuxdevices.com