Fast Word Segmentation of Noisy Text

Fast Word Segmentation of Noisy Text

5 years ago
Anonymous $cyhBy-qkd5

https://towardsdatascience.com/fast-word-segmentation-for-noisy-text-2c2c41f9e8da

Faster Word Segmentation by using a Triangular Matrix instead of Dynamic Programming. The integrated Spelling correction allows noisy input text. C# source code on GitHub.

For people in the West it seems obvious that words are separated by space, while in Chinese, Japanese, Korean (CJK languages), Thai and Javanese words are written without spaces between words.