Jin W Jin W - 4 months ago 15
Java Question

Compile error in my parser, think I have my input file wrong, but unsure what did wrong

So basically this is a parser/cosine matrix calculator, but I keep getting compile error. I think I have the path for my input of reading the text file right. But it still won't compile.

This is my main class:

import java.io.FileNotFoundException;
import java.io.IOException;

public class TfIdfMain {

public static void main(String args[]) throws FileNotFoundException, IOException {
DocumentParser dp = new DocumentParser();
dp.parseFiles("C:/Users/dachen/Documents/doc1.txt"); // give the location of source file
dp.tfIdfCalculator(); //calculates tfidf
dp.getCosineSimilarity(); //calculates cosine similarity
}
}


My parser class:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class DocumentParser {

//This variable will hold all terms of each document in an array.
private List<String[]> termsDocsArray = new ArrayList<String[]>();
private List<String> allTerms = new ArrayList<String>(); //to hold all terms
private List<double[]> tfidfDocsVector = new ArrayList<double[]>();

/**
* Method to read files and store in array.
*/
public void parseFiles(String filePath) throws FileNotFoundException, IOException {
File[] allfiles = new File(filePath).listFiles();
BufferedReader in = null;
for (File f : allfiles) {
if (f.getName().endsWith(".txt")) {
in = new BufferedReader(new FileReader(f));
StringBuilder sb = new StringBuilder();
String s = null;
while ((s = in.readLine()) != null) {
sb.append(s);
}
String[] tokenizedTerms = sb.toString().replaceAll("[\\W&&[^\\s]]", "").split("\\W+"); //to get individual terms
for (String term : tokenizedTerms) {
if (!allTerms.contains(term)) { //avoid duplicate entry
allTerms.add(term);
}
}
termsDocsArray.add(tokenizedTerms);
}
}

}

/**
* Method to create termVector according to its tfidf score.
*/
public void tfIdfCalculator() {
double tf; //term frequency
double idf; //inverse document frequency
double tfidf; //term requency inverse document frequency
for (String[] docTermsArray : termsDocsArray) {
double[] tfidfvectors = new double[allTerms.size()];
int count = 0;
for (String terms : allTerms) {
tf = new TfIdf().tfCalculator(docTermsArray, terms);
idf = new TfIdf().idfCalculator(termsDocsArray, terms);
tfidf = tf * idf;
tfidfvectors[count] = tfidf;
count++;
}
tfidfDocsVector.add(tfidfvectors); //storing document vectors;
}
}

/**
* Method to calculate cosine similarity between all the documents.
*/
public void getCosineSimilarity() {
for (int i = 0; i < tfidfDocsVector.size(); i++) {
for (int j = 0; j < tfidfDocsVector.size(); j++) {
System.out.println("between " + i + " and " + j + " = "
+ new CosineSimilarity().cosineSimilarity
(
tfidfDocsVector.get(i),
tfidfDocsVector.get(j)
)
);
}
}
}
}


This is my error:

Exception in thread "main" java.lang.NullPointerException
at DocumentParser.parseFiles(DocumentParser.java:22)
at TfIdfMain.main(TfIdfMain.java:7)


Did I have the path to the text file in my document wrong?

Answer

Windows filepath should use \ instead of / . Additionally there was another bug here that the code didn't need entire filepath, just the directory path. So instead of

dp.parseFiles("C:/Users/dachen/Documents/doc1.txt");

Should be

 dp.parseFiles("C:\\Users\\dachen\\Documents");
Comments