Ms. Corlib Ms. Corlib - 3 months ago 18
C# Question

Is this fact true about anagrammatical substrings?

I'm trying to write an algorithm that finds the number of anagrammatical substrings of a string. For instance, the string

"abba"
has 4:

(1)
"a"
,
"a"


(2)
"b"
,
"b"


(3)
"ab"
,
"ba"


(4)
"abb"
,
"bba"


A fact I'm trying to use to optimize is


If a string has no anagrammatical pairs of substrings of length k,
then it has no anagrammatical pairs of substrings of length k+1


Can you confirm whether that's true or not?

Because my algorithm

static int NumAnagrammaticalPairs(string str)
{
int count = 0; // count of anagrammatical pairs found
int n = str.Length / 2; // OPTIMIZATION: only need to look through the substrings of half the size or less
for(int k = 1; k <= n; ++k)
{
// get all substrings of length k
var subsk = GetSubstrings(str,k).ToList();

// count the number of anagrammatical pairs
var indices = Enumerable.Range(0, subsk.Count);
int anapairs = (from i in indices
from j in indices
where i < j && IsAnagrammaticalPair(subsk[i], subsk[j])
select 1).Count();

// OPTIMIZATION: if didn't find any anagrammatical pairs in the substrings of length k,
// there are no anagrammatical pairs in the substrings of length k+1, so we can exit
// the loop early
if(anapairs == 0)
break;
else
count += anapairs;
}
return count;
}


is getting results sliggggtttthhhhly off (usually off by 1) the actual results in the test cases.

Answer

That's not the case - abcd and cdab are anagrams of length 4 but you can't find length 3 anagram substrings. Concretely, abcdab will not work, since it contains both abcd and cdab, but no 3 anagrams (from abc, bcd, cda, dab).

Comments