Christian Christian - 2 months ago 11
Java Question

Optimization of deletion nodes in 2-d array

I have double dimensional array of dimensions 720x90. Let's denote rows by R and C as columns.
R1 = {C1,...,C90}

....

R90 = {C1,...C90}

Now, I want to see if any of the data in any of the rows appears anywhere else in any other rows. For instance, lets say the data in row 470 and column 67 is a duplicate of row 672 and column 34. In that case, I want to remove both row 470 and row 672 from the data set and continue checking. After I have checked all the rows, I want to print just the index of the rows that have survived. I have written a brute-force method of this. However, when I run this code, it never returns and I am not able to diagnose why. Also, is there a more efficient way to do this?

//check all the subsets of the interleaved data
public static int checkSubsets(String[][] subsets){
List subset = new ArrayList();
for(int i = 0; i< 720; i++){
for(int j = 0; j < 90; j++)
subset.add(subsets[i][j]);
}
Object duplicate;
Iterator itr = subset.iterator();
while(itr.hasNext()){
duplicate = itr.next();
while(itr.hasNext()){
subset.remove(duplicate);
itr=subset.iterator(); //to avoid concurrent modification
itr.next();
}
}
return subset.size();
}

Answer

The algorithm is almost there, but helpfull data-structures are missing.

To add a bit of spice I used Java 8 somewhat.

As you did one can collect the values to check for duplicates. However one needs to remember the first row of that value, as only there it is still unknown whether a duplicate exists.

public static int checkSubsets(String[][] subsets) {

    // The results.
    final Set<Integer> duplicateRows = new HashSet<>();

    // From the first occurrence of a duplicate value we do not know it yet,
    // so need to remember.
    final Map<String, Integer> firstRowOfValue = new HashMap<>();

    for (int i = 0; i < subsets.length; ++i) {
        for (int j = 0; j < subsets[i].length; ++j) {
            final String value = subsets[i][j];
            Integer oldRow = firstRowOfValue.putIfAbsent(value, i);
            if (oldRow != null) { // Duplicates
                duplicateRows.add(i);
                duplicateRows.add(oldRow);
                // oldRow might already be added if third duplicate or same row.
            }
        }
    }

    IntStream.rangeOf(0, subsets.length)
        .filter(i -> !duplicateRows.contains(i))
        .forEach(System.out::println);
    return subsets.length - duplicateRows.size();
}

The IntStream part would be in java 7:

for (int i = 0; i < subsets.length; ++i) {
    if (!duplicateRows.contains(i)) {
        System.out.println(i);
    }
}

With java 7 you can safely substitute here putIfAbsent with put.