Ronnie Ronnie - 9 months ago 63
Java Question

Java : Split Sentence using unknown character?

I know, Many people have asked about splitting sentence questions. But my question is slightly different. I got some unknown character in String data (unknown for me, and looks like tab character) and I am trying to use it as delimiter for splitting.

Source Text is : (* try to select the blank spaces portion, may see effect)

The President Profile of the President
Swearing in of the President
Assets of the President
Speeches Speeches
Foreign Visits
Press Releases
Gallery Photo Gallery
Video Gallery
Rashtrapati Bhavan Panoramic View

I was thinking the that blank space portion may be tab character. but I was wrong. I tried to match with tab but no effect.

Then I opened this string in Notepad ++ and set true to show all character. There I found this character. Kindly refer below image.

enter image description here

In above digram, One can clearly see something arrow symbol ("----->") in orange color, which symbol is this? and Its width is not fixed. So how can I split some sentences?
is anybody face this problem?

Answer Source

Unknowingly I got the answer. The spaces or arrow shows in above pics is nbsp; Html Entity. That is why I was unable to break the sentence. The above shown output came from Tika parser where I tried to hit html url and extract the html page data. Finally break it into sentences.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download