Ganesh Ramachandran Ganesh Ramachandran - 1 month ago 10
Java Question

How to split the string (by matching a set of regular expression) into tokens and print each token in JAVA?

Problem Statement

Given a string s , matching the regular expression [A-Za-z !,?._'@]+, split the string into tokens. We define a token to be one or more consecutive English alphabetic letters. Then, print the number of tokens, followed by each token on a new line.

Input Format

A single string, s.
s is composed of English alphabetic letters, blank spaces, and any of the following characters: !,?._'@

Output Format

On the first line, print an integer,n, denoting the number of tokens in string s (they do not need to be unique). Next, print each of the n tokens on a new line in the same order as they appear in input string s .


Sample Input

He is a very very good boy, isn't he?

Sample Output

10

He

is

a

very

very

good

boy

isn

t

he


My Code:

import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Solution {

public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
String s = scan.nextLine();
scan.close();
String[] splitString = (s.replaceAll("^[\\W+\\s+]", "").split("[\\s!,?._'@]+"));
System.out.println(splitString.length);
for (String string : splitString) {
System.out.println(string);
}
}
}


This code works fine for the Sample Input but do not pass this test case.


Test case:

Input:

YES leading spaces are valid, problemsetters are evillllll


Expected Output:

8

YES

leading

spaces

are

valid

problemsetters

are

evillllll


What changes in the code will pass this test case ?

Answer

Speaking about trimming non-word chars in the beginning of the string, your regex is not correct.

The ^[\\W+\\s+] matches 1 character at the beginning of a string, either a non-word (\W), a + or a whitespace. Using replaceAll makes no sense since only 1 char at the start of the string will get matched. Also, \W actually matches whitespace characters, too, so there is no need including \s into the same character class with \W.

You may replace that .replaceAll("^[\\W+\\s+]", "") with .replaceFirst("^\\W+", ""). This will remove 1 or more non-word chars at the beginning of the string (see this regex demo).

See this online Java demo yielding your expected output.