imshashi17 imshashi17 - 6 days ago 5
Python Question

concatenation of two unicode srings?

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re
separators = [u"।", u",", u"."]
dat=open(r"C:\Users\User\Desktop\text4.txt",'r').read()
text=dat.decode("utf-8")
wros=text.split()
out=""
import string
space=" "
counter=0;
for word in wros:
out=u" ".join(word)

writ=open("C:\\Users\\User\\Desktop\\text5.txt",'w')
writ.write(out.encode('utf-8'))
writ.close()


text4.txt contains
भारत का इतिहास काफी समृद्ध एवं विस्तृत है।


text5.txt outputs as
ह ै ।


desired output is
भारत का इतिहास काफी समृद्ध एवं विस्तृत है।


please tell me what i am doing is wrong ? HElp required ! thanks in advance

Answer

I don't know what you have to do with word but I would do it this way:

text = open('text4.txt').read()

text = text.decode("utf-8")

# split one string into list of (old) words
words = text.split()

# list for new words
out = []

# modify words
for word in words:
    # here - do something with `word`
    out.append(word)

# concatenate all new words to one string 
result = u' '.join(out)

result = result.encode('utf-8')

writ = open('text5.txt', 'w')
writ.write(result)
writ.close()