Author: Svetlin Nakov
April 24, 2010
On 24 April 2010 I was invited along with my colleague Mihail Stoynov as speakers at a half-day seminar on Java 7. In 3 sequential sessions we presented in deep details what is coming with the new release of the Java platform – Java 7, which is expected to be released in the end of 2010.
The event was part of the spring conference of the Bulgarian Oracle User Group (BGOUG), as part of its new initiative towards growing the Java community. The BGOUG is old and mature community organization that organizes conferences, seminars and Oracle community events for more than 10 years, strongly supported by Oracle Corp. but independent entity. There were about 200 people at the seminar which was held from 23 to 25 April 2010 in the city of Plovdiv.
Our Java 7 talk was about 60 software engineering professionals. It was exciting seminar with great interest of all attendees (nobody left the talk even after the third hour). The new language changes and technologies were interesting to the auditory and there was a discussion on some of the topics.
Java 7 – Technical Content
To be honest my colleague Mihail prepared most of the presentation materials and demonstrations. I was talking about the dynamic languages in JVM, the new invokedynamic bytecode instruction, method handles and dynamic invocations. There was attractive demo which shows how to dynamically invoke a method which is not known at compile time. I also prepared an interesting talk about the closure in Java, the lambda calculus, lambda expressions, lambda functions, first class functions, functional programming, extension methods and parallel processing that are expected to come with Java 7. It is really nice to see how dramatically the everyday collections processing will be changed. For example with a simple single line of code we will be able to extract the names of a list of persons that match some predicate (inline Boolean condition, e.g. firstName == ”Peter”) and arrange them in alphabetical order.
The rest of the talk was focused on: Java modularity and project Jigsaw, the small language enhancements from the project Coin, the JSR 203: NIO 2, Compressed 64-bit oops, Garbage-First GC, Upgraded Class-Loaders, URLClassLoader.close(), Unicode 5.1, SCTP and SDP protocols.
The demonstrations were based on the early access preview version of JDK 7 – Java SE 1.7.0-ea-b89 and NetBeans 6.9 beta that supports almost all of the features implemented b89. All features that were implemented in b89 were demonstrated. The others left just on the slides but the concepts were made clear by the presenters.
Download the Java 7 presentation: Java-7-New-Features-Stoynov-Nakov-BGOUG-Plovdiv-24-April-2010.pptx
Download the demonstration examples: Java-7-New-Features-BGOUG-Stoynov-Nakov.(NetBeans.6.9Beta.demos).zip
Tags: colleague, community, lambda, lambda calculus, lambda expressions, NetBeans, oracle community, oracle user group, programming extension, project
Author: Svetlin Nakov
April 12, 2010
Today Svetlin Nakov defended successfully his PhD thesis titled “Automatic Extraction of False Friends from Parallel Bilingual Corpus” and was awarded with the scientific and educational degree “Doctor of Philosopy” (PhD) in Informatics in the area of computational linguistics.
The thesis was defended according to the Bulgarian law, in front of the Specialized Scientific Council in Informatics and Mathematical Modeling of the Higher Attestaion Commission of the Bulgarian Academy of Sciences (BAS). Unlike Western Europe and USA in Bulgaria PhD degree is given by national specialized scientific council consisting of about 20 distinguished scientists.
I started work on my PhD thesis in 2007 after I changed my research area to computational linguistics. Initially it was hard to find a research topic which was not well researched and where open questions exist that could be approached. With the help of my research advisor Prof. Paskaleva and by the help of external consultants we found an interesting research topic: false friends. It was relatively easy to research and develop new algorithms for extracting false friends, especially for Bulgarian and Russian due to the fact that cognates and false friends in this particular pair of languages was never been researched by computational linguists. Additionally the idea to use the Web as a corpus was just started to get popular approach for natural language processing and information retrieval.
False Friends – Definition
False friends are words in different languages that are similar spelling and are perceived as similar but have different meanings. For example Bulgarian word “стар” which means “old” and is pronounced [star] and the English word “star” are false friends. They have exactly the same pronunciation but have entirely different meanings.
Automatic Extraction of Cognates and False Friends from Parallel Bilingual Corpus – Abstract
The PhD thesis “Automatic Extraction of Cognates and False Friends from Parallel Bilingual Corpus” conducts research about cognates and false friends between Bulgarian and Russian and proposes algorithms for their extraction. New methods for measuring orthographic and semantic similarity (monolingual and cross-lingual) are proposed and their applications in solving various computational linguistics tasks are demonstrated, particularly for synonyms extraction, distinguishing between cognates and false friends and improving words alignment. A two-step method for automatic extraction of false friends from bi-texts is proposed: at the first step pairs of words with similar orthography are collected from the text and at the second step these pairs are categorized as cognates or false friends on the basis of measuring the cross-lingual semantic similarity between them using the Web as a corpus and by applying statistical techniques accounting their occurrences and co-occurrences in the corresponding sentences in the bi-text.
Scientific Research and Publications
During my work as PhD student I managed to publish 7 scientific papers related to my PhD thesis (as author or co-author):
- Nakov P., Nakov S., Paskaleva E. “Improved Word Alignments Using the Web as a Corpus”, Proceedings of International Conference “Recent Advances in Natural Language Processing” (RANLP 2007), pages 400-405, Borovets, Bulgaria, 2007
- Nakov S., Nakov P., Paskaleva E. “Cognate or False Friend? Ask the Web!”, Proceedings of the 1st International Workshop on Acquisition and Management of Multilin¬gual Lexicons, held in conjunction with RANLP 2007, pages 55–62, Borovets, Bulgaria, 2007
- Nakov S. “Automatic Acquisition of Synonyms Using the Web as a Corpus”. Proceedings of the 3rd Annual South-East European Doctoral Student Conference (DSC 2008), Volume 2, pages 216-229, Thessaloniki, Greece, 2008
- Nakov S. “Measuring Cross-Lingual Semantic Similarity by Searching in Google”. Proceedings of the 5th International Conference “The Language: A Phenomenon without Frontiers”, ISBN 978-954-9685-43-5, pages 238-242, Varna, Bulgaria, 2008
- Nakov S. “Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach”, Serdica Journal of Computing, issue 3, pages 133-158, 2009
- Nakov S., Nakov P., Paskaleva E. “Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web as a Corpus”, Proceedings of International Conference “Recent Advances in Natural Language Processing” (RANLP 2009), pages 292-298, Borovets, Bulgaria, 2009
- Nakov S., Paskaleva E., Nakov P. “A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words”, Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages held in conjuction with RANLP 2009, Borovets, Bulgaria, 2009
Regardless of the fact that most of my publications were made in Bulgaria, these are published in prestigious conferences like RANLP which is ranked in the top 5 conferences in computational linguistics in the world. It is notable that most of the distinguishing authors cited in my papers attend the RANLP conference.
In the beginning it was new to me how to write high-quality scientific papers that will be accepted with high probability in distinguishing conferences in computational linguistics but I got solid help from my co-authors and my scientific advisor. Initially I believed that it is more complex to invent a new concept, method, framework, theorem, formula or algorithm and to obtain valuable scientific results than to publish them as paper. I found that this assumption is not exactly true – sometimes it takes more time and effort to publish the scientific results than to obtain them.
What I learned from my PhD is how to conduct scientific research: how to perform scientific experiments, how to evaluate the obtained results, how to draw motivated conclusions and how to publish the results in a way that will make the reviewers happy. I learned how to write scientific papers with ease: how to structure their content, how to state the proposed ideas as motivated extension of the most recently published scientific achievements (related work), how to present the experiments, how to describe the obtained results in short but clear manner and how to cite related publications. It was nice experience and now I know when I am reading an article whether it is low-quality marketing text or well motivated scientific work.
Graduating PhD Means Really Hard Work!
I started my PhD just as a natural continuation of my high education. I graduated bachelor and masters degrees with excellent results and with ease and I thought PhD will also be easy, but it was different, entirely different.
When I started my PhD work I was author of 4 books and had solid experience as software engineer and trainer so I believed I am good writer and developer and engineer and will cope with the PhD challenge with ease. But conducting scientific research is different. It is not just an application of existing knowledge to solve a specific problem or successfully deliver a software project. It is about inventing new concepts, methods and algorithms, not previously know to anybody. It is about researching open problems, about inventing and experimenting new methods for approaching them and about finding new algorithms and formulating new concepts that could not be found in any book or publication.
PhD Thesis == 5-10 Times * Master’s Thesis
I needed about a month of active work to write my Master’s Thesis. Most people invest similar amount of time and effort for theirs. To prepare and defend a PhD Thesis I needed 5-10 times more effort, time and work. Publishing a valuable research paper could take few weeks for inventing new ideas, trying them, conducting experiments and obtaining meaningful results and takes more few weeks to write the paper itself in a way that makes the reviewers happy. Publishing 7 papers means 5-6 months of active work which I did for 3 years mostly in the weekends. Writing the PhD Thesis itself takes additionally a month of full time work. Thus compared to a typical Master’s Thesis graduating successfully a PhD degree takes 5-10 times more effort than to write a Master’s Thesis.
This was my experience. I am sure that some people graduate successfully with less effort but I could not afford myself doing low-quality work. I am just a person who works hard and with high-quality.
If I knew how much effort this PhD degree would require I would probably not start it.
If you are interested in my research area, please feel free to download the presentation of my PhD thesis: Nakov-PhD-Thesis-False-Friends-Presentation.ppt (PowerPoint presentation, in Bulgarian).
Also download the extended resume of my work (abstract): Nakov-PhD-Avtoreferat-False-Friends.pdf (PDF file, in Bulgarian).
Tags: Automatic, bulgarian academy of sciences, computational, computational linguistics, computational linguists, Extraction, Nakov, natural language processing, phd thesis, RANLP