DmitrievichR - 7 months ago 48

Swift Question

Hey i have a question about optimization palindromes count algorithm

Task: Find count of palindromes in string.

in my func i use "in the forehead" method its like O(n^2)

can you guys help make it in O(n) or O(nlogn)

`func isPalindrome(string: String) -> Bool {`

let str = (string.lowercased())

let strWithoutSpace = str.components(separatedBy: .whitespaces).joined(separator: "")

let strArray = Array(strWithoutSpace.characters)

var i = 0

var j = strArray.count-1

while i <= j {

if strArray[i] != strArray[j] { return false }

i+=1

j-=1

}

return true

}

func palindromsInString(string: String) -> Int {

var arrayOfChars = Array(string.characters)

var count = 0

for i in 0..<arrayOfChars.count-1 {

for x in i+1..<arrayOfChars.count {

if isPalindrome(string: String(arrayOfChars[i...x])) {

count+=1

}

}

}

return count

}

and yes in my instance one letter can't be a palindrome

Answer

I'm not familiar with Manacher's algorithm, but I've always enjoyed figuring out efficient algorithms, so I thought I'd have a stab at this.

Your algorithm for determining if a string is a palindrome looks like the kinds I've come up with, so I decided to just use your `isPalindrome`

function, though I changed it to be a function in an extension of `String`

instead, and I removed the whitespace removal logic, as I felt that needed to be in the invoking call rather than in the palindrome determining function itself.

```
extension String {
func isPalindrome() -> Bool {
if length < 2 { return false }
let str = lowercased()
let strArray = Array(str.characters)
var i = 0
var j = strArray.count-1
while i <= j {
if strArray[i] != strArray[j] { return false }
i+=1
j-=1
}
return true
}
}
```

After looking at your `palindromsInString`

solution, it looks like a standard brute force, but simple and readable solution.

My first thought for a different algorithm was also pretty brute force, but was a totally different approach, so I'm calling it the Naive solution.

The idea of the Naive solution is to create arrays of substrings of the original string and check if each substring is a palindrome or not. The way I determine the substrings is to start with the biggest substring possible (the original string) and then get the 2 substrings of length `string.length-1`

, and then `string.length-2`

and so on until lastly I get all the substrings of length 2 (I'm ignoring substrings of length 1 because you said a string of length 1 can't be a palindrome).

ie: substrings of "test" greater than length 1 would be:

["test"] ["tes", "est"] ["te", "es", "st"]

So you just loop through each of those arrays and check if any are palindromes, and increasing the count if they are:

```
extension String {
var length: Int { return characters.count }
func substringsOfLength(length: Int) -> [String] {
if self.length == 0 || length > self.length { return [] }
let differenceInLengths = self.length - length
var firstIndex = startIndex
var lastIndex = index(startIndex, offsetBy: length)
var substrings: [String] = []
for _ in 0..<differenceInLengths {
substrings.append(substring(with: Range(uncheckedBounds: (firstIndex, lastIndex))))
firstIndex = index(after: firstIndex)
lastIndex = index(after: lastIndex)
}
substrings.append(substring(with: Range(uncheckedBounds: (firstIndex, lastIndex))))
return substrings
}
}
extension String {
func containsAPalindromeNaive(ignoringWhitespace: Bool = true) -> Int {
let numChars = length
if numChars < 2 { return 0 }
let stringToCheck = (ignoringWhitespace ? self.components(separatedBy: .whitespaces).joined(separator: "") : self).lowercased()
var outerLoop = numChars
var count: Int = 0
while outerLoop > 0 {
let substrings = stringToCheck.substringsOfLength(length: outerLoop)
for substring in substrings {
if substring.isPalindrome() {
count += 1
}
}
outerLoop -= 1
}
return count
}
}
```

I knew full well that this algorithm would be slow, but I wanted to implement it as a second baseline for my real solution.

I call this solution the Smart solution. It is a multipass solution that takes advantage of the number of, and positions of, the characters within a string.

In the first pass, I generate what I call a character mapping. The character mapping is a dictionary that maps a `Character`

to an array of indices. So you go over the string and add each character's index to an array stored under it's character value as the key.

The idea is that palindromes can only exist within a string that is bookended by the same letter. Therefore, you only need to check the substrings within a string at the indices of a particular letter. In the word "tattoo", you have 3 distinct letters: "t", "a", "o". The character mappings would look like this:

```
t: [0,2,3]
a: [1]
o: [4,5]
```

I now know that palindromes can only exist in this word between (0,2), (2,3), and (4,5). So I only need to check 3 substrings. That's the idea. And once you've checked out all possible substrings bookended by a particular letter, you can ignore any other substrings you come across that start with that letter, because you've already checked them.

In the second pass, I go through each character in the string and if I haven't already checked that letter, I go through the pairs of indices in the character mapping and check the substrings to see if they are a palindrome. I then do 2 optimizations here. First, if the distance between the starting character index and the ending character index is less 3, it must be a palindrome (a distance of 1 means they are sequential, and a distance of 2 means there's a single letter in between them, thus also guaranteed to be a palindrome). Second, because I already know that the first and last characters are the same, I don't need to check the whole substring, just from the second letter up until the second to last letter. So if the substring would be "test", and I'm always guaranteeing myself that the substring is bookended by the same letter, I don't actually need to check "test", and I can instead just check "es". It's a smaller optimization, but a nice one nonetheless.

```
extension String {
func generateCharacterMapping() -> [Character : [Int]] {
var characterMapping: [Character : [Int]] = [:]
for (index, char) in characters.enumerated() {
if let indicesOfChar = characterMapping[char] {
characterMapping[char] = indicesOfChar + [index]
} else {
characterMapping[char] = [index]
}
}
return characterMapping
}
func containsAPalindromeSmart(ignoringWhitespace: Bool = true) -> Int {
let numChars = length
if numChars < 2 { return 0 }
let stringToCheck = (ignoringWhitespace ? self.components(separatedBy: .whitespaces).joined(separator: "") : self).lowercased()
let characterMapping = stringToCheck.generateCharacterMapping()
var count: Int = 0
var checkedChars: Set<Character> = Set()
for char in stringToCheck.characters {
if checkedChars.contains(char) == false {
if let characterIndices = characterMapping[char], characterIndices.count > 1 {
for i in 0..<characterIndices.count-1 {
let j = i+1
let startCharIndex = characterIndices[i]
let endCharIndex = characterIndices[j]
if endCharIndex - startCharIndex < 3 {
count += 1
} else {
let substring = stringToCheck.substring(with: Range(uncheckedBounds: (stringToCheck.index(stringToCheck.startIndex, offsetBy: startCharIndex+1), stringToCheck.index(stringToCheck.startIndex, offsetBy: endCharIndex))))
if substring.isPalindrome() {
count += 1
}
}
}
checkedChars.insert(char)
}
}
}
return count
}
}
```

I felt pretty good about this solution. But I had no clue just how fast it really was.

Using XCTest to measure performance, I ran each algorithm through some performance tests. Using this string as the base source: "There are multiple palindromes in here", which is 33 characters long when whitespace is removed and it is lowercased ("therearemultiplepalindromesinhere"), I also created copies that were this string times 2 ("therearemultiplepalindromesinheretherearemultiplepalindromesinhere"), times 4, times 8, and times 10. Since we're trying to determine the O() of the algorithm, scaling the number of letters up and measuring the scale factor is the way to go.

In order to get more accurate data, I ran each test through 10 iterations (I would've liked more iterations, but the original solution and my Naive solution both don't finish in any timely manner on the tests above times 4). Here's the timings data I collected (screenshot of the spreadsheet was easier than entering it again here):

Green is Author; Red is Naive Solution; Orange is Smart Solution

As you can see, both your original solution and my Naive Solution operate in quadratic time (your solution has a quadratic correlation coefficient of r=0.9997, and my Naive Solution has a coefficient of r=0.9999; so, very clearly quadratic!). My Naive solution seems to be under your solution, but they are both increasing quadratically, and are therefore O(n^2), as we already knew.

The interesting part about my Smart Solution is it seems linear! My small, 5-point data set through a regression calculator, and it has a correlation coefficient of r=0.9917! So if it isn't linear, it's so close that I wouldn't care.

I don't know if my algorithm is any easier to understand than the Manacher's algorithm, but I hope I explained it well. And the results are pretty promising, so I hope you find a good use out of it.