dan luo dan luo - 4 months ago 12
Java Question

Fixing this java tokenizer error in my code, trying to parse through text document for keywords frequency but stuck at this

I created this input reader in my parser class Java, it counts 5 keywords' frequency in 5 text document files of HTML sources.

1) But first I have the following logic error:
Multiple markers at this line.
- StringTokenizer cannot be resolved
to a type
- StringTokenizer cannot be resolved
to a type

I have arrays cannot be resolved in my array list line.

2) If the error has been fixed, how do I make my parser read 5 documents at once?

Here is my main Java:

import java.io.FileNotFoundException;
import java.io.IOException;
public class TfIdfMain {

public static void main(String args[]) throws FileNotFoundException, IOException {
DocumentParser dp = new DocumentParser();
dp.parseFiles("C:\\Users\\dachen\\Documents");
dp.getCosineMatrix();
}
}


Here is my document parser class:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class DocumentParser {

private void doSomething(){
String text = "Professor, engineering, data, mining, research";
StringTokenizer str = new StringTokenizer(text);
String word[] = new String[10];
String unique[] = new String[10];
String x;
int count = -1;
while (str.hasMoreTokens()) {
count++;
x = str.nextToken();
word[count] = x;
System.out.println(count + ": " + word[count]);
}

System.out.println("---Frequency---");
for (int i = 0; i < 7; i++) {

if ((!Arrays.asList(unique).contains(word[i]))) {
unique[i] = word[i];
}

);
}
}
}
}

Answer

For a set of multiple files:

String[] files = {"foo.txt", "bar.txt", "baz.txt"};

for(String file : files) {
    DocumentParser dp = new DocumentParser();
    dp.parseFiles(file);
    dp.getCosineMatrix();
}

Basically, define the array of files, then iterate using a for loop, creating a new DocParser each time. If you can reuse the DocParser with new files, then just move the DocumentParser declaration outside the for loop.

Comments