Open Source Toolkit for Extraction of Cognates and False Friends (TECFF)
Today I granted to the community (under MIT license) the source code of the most interesting algorithms designed for my PhD thesis (implemented in C#):
- MMEDR – algorithm for measuring weighted orthographic similarity between Bulgarian and Russian words taking into account some linguistically motivated Bulgarian-Russian correspondences (current supports Bulgarian and Russian only)
- SemSim – algorithm for measuring semantic similarity between words by searching in Google and analyzing the returned text snippets (currently supports Bulgarian, Russian and English)
- CrossSim – algorithm for measuring cross-lingual semantic similarity by searching in Google and analyzing the returned text snippets (currently supports Bulgarian and Russian only)
- FFExtract: algorithm for extracting false friends from parallel corpus by determining candidates through MMEDR algorithm and combining statistical and semantic evidence for distinguishing between cognates and false friends (currently supports Bulgarian and Russian only)
The project is titled TECFF (Toolkit for Extraction of Cognates and False Friends) and is available for public download from http://code.google.com/p/cognates-and-false-friends-tools/.