[ technical details | Changelog | Thanks ]
- Version Information
- Version: 0.3.2, updated: 2010-07-04
- Developed with Delphi5 and Perl
- Download source code (Delphi, Perl) and phonem.exe (command line tool).
- Content
- For Delphi: TPhonem: A class derived from a TStringList that converts text into a phoneme.
For Perl: I just included a small Perl Module phonem.pm with two subs, one to phonemize and one to initialize the phonemization table. See phonem.pm for comments.
A replacement table for German is implemented as default. For other languages, you need to devise other tables. phonem.exeis a command line tool to transform text read from standard in to phonemes and write it to standard out.- Licence
- uPhonem.pas, phonem.exe and phonem.pm is freeware that comes without any warranty or support. Thus you use it at your own risk. You may even use it in commercial products as long as you mention me as original author somewhere. If you make modifications to the source code, you must maintain the original copyright notice and publish it under the same licence. For the exact conditions, see LICENCE.txt that is contained in the download package.
- I am looking forward to feedback via eMail.
- Stay tuned
- I set up an RSS news feed to keep you informed about changes to my website. Not all of them will contain update information for my programmes.
TPhonem (Delphi) / phonemize() (Perl)
TPhonem is a class that allows you to do phonetic search with Delphi. It converts a given input text to a phoneme. The algorithm is quite simple, the parameters form the hard part. Basically, I just do conversions from character sequences to other sequences. Michael Abmayer and I have once implemented the algorithm that was described in principle in the German computer magazine c't (around 1990 or so) in VBA.
Michael did some modifications on the replacement table to improve performance for German family names. This table is the default replacement table that is being used. You will not be happy if you want to phonemize other languages than German, unless you have a suitable translation table.
So I ask any of you who has replacement tables that fit certain languages or needs to send them to me so that I can offer them for download here.
Examples: The names Mayer
, Meier
, Maier
and Mayr
are all varaiants which are phonetically identical. TPhonem converts all of them (with the default replacement table) to MAYR
.
Another example is Haydn
vs. Heyden
: both names are mapped to HAYDM
.
Tweaking the replacement table is not forbidden but encouraged. It depends very much on the words that you want to compare which table suits best.
Loading alternate replacement tables works by simply using the LoadFromFile method in the Delphi class or loadPhonemTableFromFile() in Perl.
The Perl implementation just handles scalars (strings), hash references and array references. In array references, all elements of the array are replaced with their phonemes. In hash references, all values (not the keys) are replaced with their phonemes.
Technical details
About the algorithm:
- remove leading and trailing whitespace
- transform input string to upper case
- eliminate double, triple, … characters (ss, tt, and friends)
- apply each item of the translation table in the given order to the whole string as long as there are matches
Although I have optimized the implementation a little bit, I advise you to not use TPhonem to transcribe large texts at once.
About the replacement table:
The table is nothing but a list of match=replace pairs.
Any ocurrence of match
in the input string is replaced with
replace
in the order of the table.
This is were you can influence the results of TPhonem.
Changelog
- 0.3.2 — [ 2010-04-07 ]
- Added pre-compiled phonem.exe to the package (no other changes, thus no new version number)
- 0.3.2 — [ 2007-12-20 ]
- Included cleanup routine after LoadFromFile and LoadFromStream
- Yet another change in the replacement table
- Shipped the default list with the package as external file
- Added Perl implementation
- 0.3.1 — [ 2007-12-13 ]
- Submission by Dieter Dasberg: Flaw fix and new rules
- minor change in SetInput
- 0.3 — [ 2004-01-18 ]
- Submissions by Peter Tiemann: Bug fix and new rules
- 0.2.3 — [ 2003-10-22 ]
- Introduced some promising but experimental replacement rules
- Some bug fixes (empty match, setlength)
- 0.2.1 — [ 2003-10-21 ]
- Tweaked the replacement list a little bit
- Better removal of duplicates (not only A-Z) (Thomas Bornhaupt)
- 0.2 — [ 2003-10-21 ]
- Performance improvement in PhonemReplace
- 0.1 — [ 2003-10-20 ]
- First implementation as TStringList descendant based on the VBA implementation that Michael and I created for the Niederösterreichischen Seniorenbund
Thanks
I want to thank Thomas Bornhaupt
for he provided a much faster routine for eliminating duplicate characters and Marian Aldenhövel for his precious comments.
Peter Tiemann has submitted some rules and a bug fix.
Dieter Dasberg has also submitted new rules and an improvement suggestion for D7.
[ top ]
Contact & Imprint — Kontakt & Impressum
Created: 2003-10-20 — last modified: 2010-04-07 — last update of web site: 2010-06-27
Follow me: