Rutger Huijsmans Rutger Huijsmans - 1 month ago 9
Swift Question

Remove repeating substring from string

I cannot think of the a function to remove a repeating substring from my string. My string looks like this:

"<bold><bold>Rutger</bold> Roger</bold> rented a <bold>testitem zero dollars</bold> from <bold>Rutger</bold>."


And if
<bold>
is followed by another
<bold>
I want to remove the second
<bold>
. When removing that second
<bold>
I also want to remove the first
</bold>
that follows.

So the output that I'm looking for should be this:

"<bold>Rutger Roger</bold> rented a <bold>testitem zero dollars</bold> from <bold>Rutger</bold>."


Anyone know how to achieve this in Swift (2.2)?

Answer

I wrote a solution using regex with the assumption that tags won't appear in nested contents more than 1 times. In other words it just cleans the double tags not more than that. You can use the same code and a recursive call to clean as many nested repeating tag as you want:

class Cleaner {

    var tags:Array<String> = [];

    init(tags:Array<String>) {
        self.tags = tags;
    }

    func cleanString(html:String) -> String {

        var res = html

        do {

            for tag in tags {

                let start = "<\(tag)>"
                let end = "</\(tag)>"

                let pattern = "\(start)(.*?)\(end)"

                let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)

                let matches = regex.matches(in: res, options: [], range: NSRange(location: 0, length: res.utf16.count))

                var diff = 0;
                for match in matches {

                    let outer_range = NSMakeRange(match.rangeAt(0).location - diff, match.rangeAt(0).length)
                    let inner_range = NSMakeRange(match.rangeAt(1).location - diff, match.rangeAt(1).length)
                    let node = (res as NSString).substring(with: outer_range)
                    let content = (res as NSString).substring(with: inner_range)

                    // look for the starting tag in the content of the node
                    if content.range(of: start) != nil {
                        res = (res as NSString).replacingCharacters(in: outer_range, with: content);

                        //for shifting future ranges
                        diff += (node.utf16.count - content.utf16.count)
                    }
                }
            }
        }
        catch {
            print("regex was bad!")
        }

        return res
    }
}

let cleaner = Cleaner(tags: ["bold"]);
let html = "<bold><bold>Rutger</bold> Roger</bold> rented a <bold><bold>testitem</bold> zero dollars</bold> from <bold>Rutger</bold>."

let cleaned = cleaner.cleanString(html: html)
print(cleaned)
//<bold>Rutger Roger</bold> rented a <bold>testitem zero dollars</bold> from <bold>Rutger</bold>.