Prince Prince - 2 months ago 9
Python Question

How to trim Devanagri text from a given string by the help of unicodes

I want to write a python code that can fetch out Devanagari text from the given string, but I don't know how to use Unicode for the same.

My input will be in this form

Translate 'अंक'
36 अ [V]
36 ं [n]
57 ं (क [N]
36 क [kV]

I want text written in Devanagari only not that numbers or English alphabets.

My output should be in this form

अंक अ ं ं(क क

I Have tried this code

import codecs

file ="C:/Users/prince/Desktop/hindi.txt",mode = "r", encoding = "utf-8")
file_dic ="C:/Users/prince/Desktop/dic.txt",mode = "w", encoding = "utf-8")
for i in range (0, 330):
u =
if (u[i] >= 0900) && (u[i]<= 097F):
file_dic.write(' ')


A regular expression will keep your Devanagari text together. Your example would print spaces between every character. Below also adds the Devanagari Extended range in Unicode as well:


import re

text = '''\
Translate 'अंक'  
36  अ       [V]  
36  ं       [n]  
57  ं  (क [N]  
36  क [kV]

print(' '.join(re.findall(r'[\u0900-\u097f\ua8e0-\ua8ff]+',text)))


अंक अ ं ं क क

Writing to the files in your example:

import re

with open("C:/Users/prince/Desktop/hindi.txt",mode = "r", encoding = "utf-8") as file:
    text =
with open("C:/Users/prince/Desktop/dic.txt",mode = "w", encoding = "utf-8") as file_dic:
    file_dic.write(' '.join(re.findall(r'[\u0900-\u097f\ua8e0-\ua8ff]+',text)))