TheDoctor TheDoctor - 1 year ago 208
Java Question

How can I remove punctuation from input text in Java?

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:

String[] words = instring.split("\\s+");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].toLowerCase();
}
String[] wordsout = new String[50];
Arrays.fill(wordsout,"");
int e = 0;
for (int i = 0; i < words.length; i++) {
if (words[i] != "") {
wordsout[e] = words[e];
wordsout[e] = wordsout[e].replaceAll(" ", "");
e++;
}
}
return wordsout;


I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.

Answer Source

This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:

String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");

Spaces are initially left in the input so the split will still work.

By removing the rubbish characters before splitting, you avoid having to loop through the elements.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download