[Evangelism] Help to improve text search for East-Asia languages

Takeshi Yamamoto tyam at mac.com
Thu Nov 6 04:33:18 UTC 2008


Let me post this other than Plone-AsiaPacific ML for the people
who uses Plone with non-English/Latin languages.
Some languages need to be handled differently for better text searching.

Help to improve text search for East-Asia languages

Let me post the first initiative (requesting help in other word) for  
Asia Pacific area.
Some of you may know Japanese Plone community is working on improving
text search feature of plone for East Asian languages.  For example,  
Japanese
words can not be distinguished by space, as well as Chinese and Korean  
languages.
Mr. Terada, CEO of CMSCOM has stood up and worked on google summer of  
code
as one of Plone foundation-supported project this year.   
Unfortunately, the student
has gave up and it was not complete.  Terada-san has decided to make  
it completed
and started it again as his company's project.  Since that feature is  
valuable for many
people(1.5 billion people are living in Kanji region), and it is open  
source, and
we hope it could be built into out-of-the-box Plone, Japanese  
community  is
supporting this project.  We will have a sprint event for this in the  
World Plone Day 2008 Tokyo.

The software current status is BETA version and you can download and  
try, or
just access to the test and play with it.  We appreciate any of your  
bug report
or suggestions.  We do not have enough testers for "non-Japanese"  
languages.

Languages what we would like to cover with that bigramsplitter are:

Japanese
Mandarin Chinese (Beijing)
Cantonese (Canton)
Taiwanese (Taiwan)
Korean (Korea)
Mongolian (Mongol)
Thai (Thailand)
Vietnamese (Viet Nam)
Jawi (Malaysia)
Bahasa Indonesia (Indonesia)
Hebrew (Israel)
Arabic (Middle-East)
etc.

The languages which are not used in Asia, but different from English/ 
Latin
languages are welcome, of course.

The project site is here:
http://code.google.com/p/bigramsplitter/

You can download the code from here:
http://code.google.com/p/bigramsplitter/downloads/list

The test site is here to play with.
http://c2search.cmscom.jp/

You may need an account to put some text to be searched in your own  
language.
Request your login account here.
http://c2search.cmscom.jp/contact-info

Sorry for the test site is not well internationalized, but there is no  
problem if you
write your request in English.

Thanks a lot in advance.
Takeshi Yamamoto / retsu




More information about the Evangelism mailing list