TimTheEnchanter TimTheEnchanter - 4 months ago 21x
Python Question

How to identify colors in a string with nltk in python?

The question really speaks for itself, but my problem is I want to be able to identify colors in a string with nltk and all I can find is how to classify parts of speech. I know I could just make a list of all the colors I want to support but since I want to support all the colors available in css this would be quite a long list (some of them get strange, like teal and aquamarine). If there is a simpler way to do this than writing them all out it would be greatly appreciated. Thanks!


It seems that I forgot to mention when I first asked my question that I required The color names spaced out like in natural language instead of run together due to it's use in speech recognition. Therefore, I have selected "Tadhg McDonald-Jensen"'s answer as the best because it answers my original question quite well. However I have also posted my own answer which supplies color names with spaces. Hope this helps!


You can use the webcolors package to get all css color names that it recognizes, just check for membership of webcolors.CSS3_NAMES_TO_HEX:

>>> import webcolors
>>> "green" in webcolors.CSS3_NAMES_TO_HEX
>>> "deepskyblue" in webcolors.CSS3_NAMES_TO_HEX
>>> "aquamarine" in webcolors.CSS3_NAMES_TO_HEX
>>> len(webcolors.CSS3_NAMES_TO_HEX)

This means that webcolors.CSS3_NAMES_TO_HEX.keys() will give you a list in python2 or dictkeys set in python3 of all css3 color names.