The purpose of this project is for thesis.
-
Python version >= 3.7
-
MySQL
-
Pycharm CE IDE
- MySQL connector
pip search mysql-connector | grep --color mysql-connector-python
- In case you need to update sklearn to latest version
pip install -U scikit-learn
- Running
Pythonas server
cd ROOT_DIRECTORY
python3 server.py
- Running web application. Open the filename
demo/index.htmlwith chrome browser
- Insert the link from [https://www.postkhmer.com/](Phnom Penh Post Khmer Website)
Note: Copy the shortlink of the post from the category you want to insert
- Open the stan-alone application
standalone-app/BI Narin.zipin order to perform Khmer word segmentation
Note: You can manually insert white-space, hidden space or empty space, to the document but it is not recommended.
-
Upzip the file
-
Setting up support library from
standalone-app/BI Narin/Setup Packages/KWSAddinSetup(x64).msi
Note: Install library that support with our current operating system.
- Run the
standalone-app/BI Narin/Setup Packages/KWSAppSetup(x64).msifile
Note: Run the .exe file that support with your current operating system.
- Running crawl
python3 crawler.py
Note: If there are some error, please try to re-run it because we crawl more than 1000 posts and it could some how was block.
- Go to web application to train the inserted data
-
Edit file
clean_word_list.txt -
Add all the characters