deemeetree deemeetree - 7 months ago 14
Javascript Question

Convert accented text into ASCII characters?

I would like to convert accented letters and various encodings into the plain English ASCII one in Javascript and wonder what are the possible options. What I need is that:

éclair ~becomes~ eclair

bär ~becomes~ bar

привет ~becomes~ privet

こんにちは ~becomes~ konnichiva


As you can see the idea is that any language gets converted into the plain English ASCII equivalent. The áčçéñtèd letters are converted into their plain equivalents, letters in cyrillic or japanese encoding are converted into into their transliterated equivalent.

Anyone knows an approach to do that in Javascript?

Han Han
Answer

There are a number of Node modules that do similar things but are much lighter-weight than node-iconv, and in particular, are in all JS and don't require you to compile any C or C++:

  1. node-unidecode appears to do mostly what you asked for:

    $ npm install unidecode
    ...
    unidecode@0.1.3 node_modules/unidecode
    $ node
    > var unidecode = require('unidecode');
    undefined
    > unidecode('éclair')
    'eclair'
    > unidecode('bär')
    'bar'
    > unidecode('привет')
    'priviet'
    > unidecode('こんにちは')
    'konnitiha'
    
  2. node-transliterator is even lighter-weight, but behaves even further from what you asked for:

    $ npm install transliterator
    ...
    transliterator@0.1.0 node_modules/transliterator
    $ node
    > var transliterator = require('transliterator');
    undefined
    > transliterator('éclair')
    'eclair'
    > transliterator('bär')
    'baer'
    > transliterator('привет')
    ''
    > transliterator('こんにちは')
    ''
    
  3. node-urlify is slightly closer but also way further from what you asked for:

    $ npm install urlify
    ...
    urlify@0.3.5 node_modules/urlify
    $ node
    > var urlify = require('urlify').create({ spaces: ' ' });
    undefined
    > urlify('éclair')
    'eclair'
    > urlify('bär')
    'bar'
    > urlify('привет')
    'privet'
    > urlify('こんにちは')
    '_____'
    
  4. Finally, limax is more heavyweight, when I did npm install limax it printed lots of C compiler warnings, but it still just worked and is the closest to what you asked for:

    $ npm install limax
    ...
    limax@0.0.2 node_modules/limax
    ├── speakingurl@0.9.1
    ├── pinyin2@2.0.8
    ├── hepburn@0.5.2 (bulk-replace@0.0.1)
    └── cld@0.0.6
    $ node
    > var slug = require('limax')
    undefined
    > slug('éclair')
    'eclair'
    > slug('bär')
    'baer'
    > slug('привет')
    'privet'
    > slug('こんにちは')
    'konnichiha'