orangespire orangespire - 4 months ago 19
Java Question

Scanning and adding data to arrays with unspecified delimiters

I have an assignment which requires me to work with the data below in a txt file. There is no specified delimiter in which it would make it easier for me to sort into an array list. I could use the

Scanner
class to read the text file and sort it into an array like:

for (int rows; rows < array.length; rows++){
array[rows][0] = fileIn.next();
array[rows][1] = fileIn.next();


and so on... However, the names are a bit harder since they have various numbers of white spaces in them and might have different numbers of names. I would like to have the whole name such as "Allison, Mrs. Hudson J C (Bessie Waldo Daniels)" as its own element. I'm not exactly sure where to start, but I think one solution is to have the program check if "male"||"female" is present so that we can start a new element. Any help would be appreciated.

1 1 Allen, Miss. Elisabeth Walton female 29 211.3375
1 1 Allison, Master. Hudson Trevor male 0.9167 151.5500
1 0 Allison, Miss. Helen Loraine female 2 151.5500
1 0 Allison, Mr. Hudson Joshua Creighton male 30 151.5500
1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25 151.5500
1 1 Anderson, Mr. Harry male 48 26.5500
1 1 Andrews, Miss. Kornelia Theodosia female 63 77.9583
1 0 Andrews, Mr. Thomas Jr male 39 0.0000
1 1 Appleton, Mrs. Edward Dale (Charlotte Lamson) female 53 51.4792
1 0 Artagaveytia, Mr. Ramon male 71 49.5042
1 0 Astor, Col. John Jacob male 47 227.5250
1 1 Astor, Mrs. John Jacob (Madeleine Talmadge Force) female 18 227.5250
1 1 Aubart, Mme. Leontine Pauline female 24 69.3000

Answer

That is a good fit for a regex - see here for an example of your data.

([\d]) +([\d]) +(.+\S) +(female|male) +([\d.]+)  +([\d.]+)

Here the full example on repl.it in Java

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Main {
    public static void main( String args[] ){
        String text = 
            "1   1   Allen, Miss. Elisabeth Walton   female  29  211.3375\n"+
            "1   1   Allison, Master. Hudson Trevor  male    0.9167  151.5500\n"+
            "1   0   Allison, Miss. Helen Loraine    female  2   151.5500\n"+
            "1   0   Allison, Mr. Hudson Joshua Creighton    male    30  151.5500\n"+
            "1   0   Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female  25  151.5500\n"+
            "1   1   Anderson, Mr. Harry male    48  26.5500\n"+
            "1   1   Andrews, Miss. Kornelia Theodosia   female  63  77.9583\n"+
            "1   0   Andrews, Mr. Thomas Jr  male    39  0.0000\n"+
            "1   1   Appleton, Mrs. Edward Dale (Charlotte Lamson)   female  53  51.4792\n"+
            "1   0   Artagaveytia, Mr. Ramon male    71  49.5042\n"+
            "1   0   Astor, Col. John Jacob  male    47  227.5250\n"+
            "1   1   Astor, Mrs. John Jacob (Madeleine Talmadge Force)   female  18  227.5250\n"+
            "1   1   Aubart, Mme. Leontine Pauline   female  24  69.3000\n";

        String lines[] = text.split("\\r?\\n");

        String pattern = "([\\d]) +([\\d]) +(.+\\S) +(female|male) +([\\d.]+)  +([\\d.]+)";
        Pattern r = Pattern.compile(pattern);

        for (String l : lines) {
            Matcher m = r.matcher(l);
            if (m.find( )) {
                System.out.println(" ------------------- New Text Line -------------------");
                System.out.println("Group 1: " + m.group(1) );
                System.out.println("Group 2: " + m.group(2) );
                System.out.println("Group 3: " + m.group(3) );
                System.out.println("Group 4: " + m.group(4) );
                System.out.println("Group 5: " + m.group(5) );
                System.out.println("Group 6: " + m.group(6) );
            } else {
                System.out.println("Line did not match");
            }   
        }
    }
}

Would result in an output like so

 ------------------- New Text Line -------------------
Group 1: 1
Group 2: 1
Group 3: Allen, Miss. Elisabeth Walton
Group 4: female
Group 5: 29
Group 6: 211.3375
 ------------------- New Text Line -------------------
Group 1: 1
Group 2: 1
Group 3: Allison, Master. Hudson Trevor
Group 4: male
Group 5: 0.9167
Group 6: 151.5500
 ------------------- New Text Line -------------------
Group 1: 1
Group 2: 0
Group 3: Allison, Miss. Helen Loraine
Group 4: female
Group 5: 2
Group 6: 151.5500
Comments