Skip to main content.
September 30th, 2009

Open Source Toolkit for Extraction of Cognates and False Friends (TECFF)

Today I granted to the community (under MIT license) the source code of the most interesting algorithms designed for my PhD thesis (implemented in C#):

The project is titled TECFF (Toolkit for Extraction of Cognates and False Friends) and is available for public download from http://code.google.com/p/cognates-and-false-friends-tools/.

Posted by nakov as blog at 12:54 AM EEST

Comments Off

September 22nd, 2009

Java2Days Conference: 8-9 October 2009 in Bulgaria

All Java and Java EE developers are invited to the unique for the Balkans and Eastern Europe conference on Java technologies called Java2Days. At the conference distinguished speakers will talk in Sofia about Java, Java EE 6, JBoss, EJB 3.1, Spring Framework, JPA, OSGi, GWT, JSF, jBPM, Wicket, JRockit, cloud computing and other hot technologies. Some of the speakers:

For more information visit the conference official Web site: http://java2days.com/.

Posted by nakov as news, java, blog at 7:14 PM EEST

Comments Off

DevReach 2009: 12-13 October in Bulgaria

All .NET and Microsoft oriented developers are invited to DevReach 2009 conference - the premier conference for Microsoft technologies for the Balkans and Eastern Europe region. This year the conference attracts distinguished speakers who will deliver talks about Silverlight, WPF, ASP.NET 4.0, ASP.NET MVC, AJAX, IIS 7, Visual Studio 2010, SharePoint, SQL Server 2008, business intelligence, data access and ORM, LINQ, RESTful applications, WCF, WWF, .NET service bus, Scrum and many others. Some of the speakers:

For more information visit the DevReach conference official web site: http://www.devreach.com/.

Posted by nakov as .net, news, blog at 7:00 PM EEST

Comments Off

September 17th, 2009

RANLP’2009 Workshop: A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words

Today I presented a scientific publication about measuring modified orthographic similarity between Bulgarian and Russian words at the Workshop “Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages”, held in conjunction with the scientific conference RANLP’2009. The paper is titled “A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words” and is a small part of my PhD thesis.

Abstract

We propose a novel knowledge-rich approach to measuring the similarity between a pair of words. The algorithm is tailored to Bulgarian and Russian and takes into account the orthographic and the phonetic correspondences between the two Slavic languages: it combines lemmatization, hand-crafted transformation rules, and weighted Levenshtein distance. The experimental results show an 11-pt interpolated average precision of 90.58%, which represents a significant improvement over two classic rivaling approaches.

Download

Download the article: RANLP2009-Workshop-Nakov-Paskaleva-Nakov-MMEDR-Similarity-Bulgarian-Russian-Words.pdf

Download the presentation: RANLP-2009-Workshop-Nakov-Paskaleva-Nakov-MMEDR-Similarity-Bulgarian-Russian.ppt.

Posted by nakov as blog at 7:42 PM EEST

Comments Off

September 14th, 2009

RANLP 2009: Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web as a Corpus

Today I presented at the prestigious scientific conference RANLP’2009 a research paper about new methods of extraction of false friends from parallel corpora, which is a major part of my PhD thesis. The article is named “Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web as a Corpus” and was accepted after passing a thorough anonymous review by two distinguished scientists from the area of Natural Language Processing (NLP) and Information Retrieval (IR).

Abstract

False friends are pairs of words in two languages that are perceived as similar, but have different meanings, e.g., Gift in German means poison in English. In this paper, we present several unsupervised algorithms for acquiring such pairs from a sentence-aligned bi-text. First, we try different ways of exploiting simple statistics about monolingual word occurrences and cross-lingual word co-occurrences in the bi-text. Second, using methods from statistical machine translation, we induce word alignments in an unsupervised way, from which we estimate lexical translation probabilities, which we use to measure cross-lingual semantic similarity. Third, we experiment with a semantic similarity measure that uses the Web as a corpus to extract local contexts from text snippets returned by a search engine, and a bilingual glossary of known word translation pairs, used as “bridges”. Finally, all measures are combined and applied to the task of identifying likely false friends. The evaluation for Russian and Bulgarian shows a significant improvement over previously-known algorithms.

Download

Download the article: RANLP2009-Nakov-Nakov-Paskaleva-Unsupervised-Extraction-of-False-Friends.pdf.

Download the presentation: Nakov-Unsupervised-Extraction-of-False-Friends.ppt.

Posted by nakov as news at 7:17 PM EEST

Comments Off

September 9th, 2009

About the ASP.NET Persistent Authentication Cookies Timeout

Most people using ASP.NET Form Authentication use the built-in <asp:Login> control that works fine but when we use a custom login form we have the follofing problem: the cookie expiration timeout in ASP.NET Forms Authentication for persistent and non-persistent sessions uses the same value. It is defined in Web.config in the timeout attribute of the <forms> tag and has default a value of 30 minutes. Thus but default if you login without “remember me” option, your maximal inactivity period will be 30 minutes. In the same time if you login with “remeber me” option, your cookie’s life will also be 30 minutes, which is obviously incorrect. If you put in Web.config very big session timeout, e.g. 50 years, persistent login will work well but the non-persistent login will not be limited to 30 minutes or so.

The above described problem is a well-known and documented design flaw in Microsoft ASP.NET Forms Authentication framework. The values for persistent timeout and non-persistent timeout obvisously should be designed to be separately definable but Microsoft failed to do this even after numerous discussions in the community groups, forums, blogs, etc.

Note that if you use the <asp:Login> control, and check “remember me”, the asp:Login control itself will set the cookie timeout to 50 years, but if you use a custom (self made) login form or different Web applications framework (not ASP.NET Web Forms), you will need to work around this well-documented bug. Typically I use the following code to workaround this problem:

Posted by nakov as blog at 12:32 PM EEST

Comments Off