Faran52 Faran52 - 14 days ago 6
C Question

Group Adjacent Substrings and find the count

Hi this was a question I was asked in Interview to write code in C/C++.

Given a string you are provided with the possible operations.
We can group the adjacent substrings, for e.g ABCABCBC can be compressed as
2ABC1BC or 1ABCA2BC

Among all the possible options, task is to find the resultant string with the minimum length.

In case if there are multiple solutions return the lexicographically smallest string.
So for the above example the solution would be 2ABC1BC

Another example would be
FLFLAFLAFLAF
Solution: 1FLF3LAF

I know it can be done using suffix tree, or permutation/recursion method but cannot get the logic to as to how. Please help ?

Answer

An algorithm that compresses input prefix and recursively calls itself on the rest of the input can do it (live demo):

#include <iostream>
#include <string>
#include <limits>
#include <cstring>
using namespace std;

// determine how many times the prefix of specific length is repeated in the input
int repetitions(string const & input, int prefix_length)
{
    int i = 1;
    while (
        (i + 1) * prefix_length <= input.size() && 
        memcmp(input.data(), input.data() + i * prefix_length, prefix_length) == 0
        )
    {
        ++i;
    }

    return i;
}

// recursively compress input and return compressed string
string compress(string const & input)
{
    // recursion termination
    if (input.empty())
        return string();

    string best;
    int best_length = numeric_limits<int>::max();

    // try various prefix lengths
    for (int prefix_length = 1; prefix_length <= input.length(); ++prefix_length)
    {
        const auto max_count = repetitions(input, prefix_length);

        // try various number of repetitions up to the maximum possible
        for (int count = 1; count <= max_count; ++count)
        {
            // the candidate is compressed prefix and the rest of input compressed recursively
            const auto current = 
                to_string(count) + input.substr(0, prefix_length) + // compressed prefix
                compress(input.substr(prefix_length * count)); // the rest of input compressed recursively

            // store candidate if it is better than the currently best
            if (current.length() < best_length)
            {
                best = current;
                best_length = current.length();
            }
        }
    }

    return best;
}

int main() 
{
    string input;
    cin >> input;
    cout << compress(input);
    return 0;
}

Note that this is just a simple to write solution. It is not very efficient and would suffer from stack overflow for long inputs.