serendipity serendipity - 17 days ago 6
Java Question

Removing duplicate key-value pairs in a map with values being in a list

Below is my code to detect abbreviations and their long forms. The code loops over a line in a document, loops over each word of that line and identifies an acronym candidate. It then again loops over each line of the document to find an appropriate long form for the abbreviation. My issue is if an acronym occurs multiple times in a document my output contains multiple instances of it. I just want to print an acronym only once with all its possible long forms. Here's my code:

public static void main(String[] args) throws FileNotFoundException
{
BufferedReader in = new BufferedReader(new FileReader("D:\\Workspace\\resource\\SampleSentences.txt"));
String str=null;
ArrayList<String> lines = new ArrayList<String>();
String matchingLongForm;
List <String> matchingLongForms = new ArrayList<String>() ;
List <String> shortForm = new ArrayList<String>() ;
Map<String, List<String>> abbreviationPairs = new HashMap<String, List<String>>();


try
{
while((str = in.readLine()) != null){
lines.add(str);
}
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
String[] linesArray = lines.toArray(new String[lines.size()]);




// document wide search for abbreviation long form and identifying several appropriate matches
for (String line : linesArray){
for (String word : (Tokenizer.getTokenizer().tokenize(line))){
if (isValidShortForm(word)){
for (int i = 0; i < linesArray.length; i++){
matchingLongForm = extractBestLongForm(word, linesArray[i]);
//shortForm.add(word);
if (matchingLongForm != null && !(matchingLongForms.contains(matchingLongForm))){
matchingLongForms.add(matchingLongForm);

//System.out.println(matchingLongForm);
abbreviationPairs.put(word, matchingLongForms);
//matchingLongForms.clear();
}
}

if (abbreviationPairs != null){
//for(abbreviationPairs.)
System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
abbreviationPairs.clear();
matchingLongForms.clear();
//System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew);
}


else
continue;
}
}
}
}


Here's the current output:

Abbreviation Pair: {GLBA=[Gramm Leach Bliley act]}
Abbreviation Pair: {NCUA=[National credit union administration]}
Abbreviation Pair: {FFIEC=[Federal Financial Institutions Examination Council]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {OFAC=[Office of Foreign Assets Control]}

Answer

You want a key value pair for abbreviation and text. So you should use Map. A map cannot contain duplicate keys; each key can map to at most one value.

The Problem is in the position of the output and not in the map. You try to output in the loop, so the Map is shown multiple time.

Move the code outside the loop:

if (abbreviationPairs != null){
     //for(abbreviationPairs.)
     System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
     abbreviationPairs.clear();
     matchingLongForms.clear();
     //System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew);
}