Ms. Corlib - 6 months ago 45

C# Question

I'm trying to write an algorithm that finds the number of anagrammatical substrings of a string. For instance, the string

`"abba"`

(1)

`"a"`

`"a"`

(2)

`"b"`

`"b"`

(3)

`"ab"`

`"ba"`

(4)

`"abb"`

`"bba"`

A fact I'm trying to use to optimize is

If a string has no anagrammatical pairs of substrings of length k,

then it has no anagrammatical pairs of substrings of length k+1

Can you confirm whether that's true or not?

Because my algorithm

`static int NumAnagrammaticalPairs(string str)`

{

int count = 0; // count of anagrammatical pairs found

int n = str.Length / 2; // OPTIMIZATION: only need to look through the substrings of half the size or less

for(int k = 1; k <= n; ++k)

{

// get all substrings of length k

var subsk = GetSubstrings(str,k).ToList();

// count the number of anagrammatical pairs

var indices = Enumerable.Range(0, subsk.Count);

int anapairs = (from i in indices

from j in indices

where i < j && IsAnagrammaticalPair(subsk[i], subsk[j])

select 1).Count();

// OPTIMIZATION: if didn't find any anagrammatical pairs in the substrings of length k,

// there are no anagrammatical pairs in the substrings of length k+1, so we can exit

// the loop early

if(anapairs == 0)

break;

else

count += anapairs;

}

return count;

}

is getting results

Answer

That's not the case - `abcd`

and `cdab`

are anagrams of length 4 but you can't find length 3 anagram substrings. Concretely, `abcdab`

will not work, since it contains both `abcd`

and `cdab`

, but no 3 anagrams (from `abc`

, `bcd`

, `cda`

, `dab`

).