larryzhao larryzhao - 5 months ago 38x
Ruby Question

How to count words in a multi-language text using Ruby & JavaScript

What I want to achieve is to get the word count in a multi-language text.

Like if I have a text has both English and Chinese:

The last Olympics was held in 北京
, the count should be 8, because there's six English words and two Chinese characters, like the word count in Microsoft Word.

What's the best way to do that in Ruby and in JavaScript?


You could try this in JavaScript. It basically gets the symbols by excluding every character possible in English. I might've forgotten some character and it may not work with other languages that have extra special characters but give it try. I'm using jQuery's $.trim function for brevity but you could also use "How do I trim a string in javascript?".


var str = 'The last Olympics 隶草 was held in 北京';
var words = '', symbols = '';
str.replace(/([\w\s]*)([^\w;,.'"{}\[\]+_)(*&\^%$#@!~\/?]*)/g, function(a,b,c) {
    words += b;
    symbols += c;
words = $.trim(words).split(' ');
symbols = symbols.replace(' ', '').split('');

var total_words = words.length + symbols.length

You may also want to try XRegExp. It's a JavaScript library that enhances regex and has some nice features.