yeppe - 5 months ago 10

Java Question

I came across below problem related to Matrix Manipulation.

*problem statement*

There is a NxN matrix,divided into N * N cells. Each cell has a predefined value. Which would be given as an input. Iteration has to happen K number of times which is also give in the test input. We have to make sure that we pick the optimum/min value of row/column at each iteration. Final output is the cumulative sum of optimum value saved at the end of each iteration.

**Steps 1.** Sum up the individual row and column and find the min sum of rows and columns, (it could be a row or a column, have to calculate the minimum)

**Step 2.** Store the sum found above separately

**Step 3.**

Increment elements of the min. sum row or column. by 1

Repeat steps 1,2,3 from 1 to Kth value

`add the sum at each iteration(specified in step2)`

output is the sum obtained on on the Kth iteration.

`2 4`

1 3

2 4

22

I was able to write a code (in java) and tested the same for some sample test cases. The output worked fine. The code works fine for sample data matrix of lower order, say, 2x2,4x4,even till 44x40 (that has less iteration). However, when the matrix size is increased to 100X100 (complex iteration), I see the expected output output values differ at 10s and hundreds place of the digit from the actual output and its random. Since I am not able to find a correct pattern of output vs input. Now, it is taking a toll on me to really debugging 500th loop to identify the issue. Is there any better way or approach to solve such problem related to huge matrix manipulation. Has anyone come across issues similar to this and solved it.

I am mainly interested in knowing the correct approach to solve given matrix problem. What Data structure to use in java. At present, I am using primitive DS and arrays int[] or long[] to solve this problem. Appreciate any help in this regard.

Answer

*Which data structure?*

What you need here is a data structure which allows you to *efficiently* query and update the minimum sum line. The most commonly used for this is a *heap* https://en.wikipedia.org/wiki/Heap_(data_structure).

For your purposes it's probably best to just implement the simplest kind, an array-based binary heap:

- See here: https://en.wikipedia.org/wiki/Binary_heap
- And here: http://courses.cs.washington.edu/courses/cse373/11wi/homework/5/BinaryHeap.java

..for implementation details.

** Procedure**:

- Initialize your heap to size
`M + N`

where`M, N`

are the number of rows and columns. - Before the loop, pre-compute the sum of each row and column, and add them as
*objects*to the heap. Also add two arrays`A, B`

which store the row and columon objects separately. - Now
*heapify*the heap array with respect to the line sum attribute. This ensures the heap follows the criterion of the binary heap structure (parent always > children). Read the sources to find out more about how to implement this (quite easy for a fixed array) - For each iteration, look at the
*first*element in the heap array. This is always the one with the smallest line sum. If this is a row object, then increment the sum attribute by`N`

(no. of columns), and increment each object in`B`

(list of columns) by 1. Do the same if it's a column. - After this, always
*heapify*before the next iteration.

At the end, just return the *first* element's attribute.

** Time complexity**:

The original naive solution (looping through all columns and rows every time) is .

Using a heap, the *heapify* operation at each step is (for a binary heap).

This means the total complexity is , *FAR* smaller. The `max`

term is to compensate for the fact that at each iteration it may be either rows *or* columns which are incremented.

As a side note, there are other heap structure types which have even better time complexity than the binary heap, e.g. binomial trees, Fibonacci heaps etc. These however are far more complicated, and have higher constant-factor overheads as a result. Thus for your project I feel they are not necessary, as many of them need *phenomenal* data set sizes to justify for the constant factor overhead.

Besides, they all support the same external operations as the binary heap, as *defined* by the Abstract Data Structure of Heap.

(heapify is an internal operation specific to the binary heap structure. Quite a few of the other ones are theoretically superior as they do this operation implicitly and "lazily")