Andrea Andrea - 2 months ago 7
MySQL Question

MySQL replace .html links of given domain

In my database there are contenfields with lots of internal links. I have to change link structure from to, but replace statement should respect the domain:

This is whats expected to be replaced: -> -> -> ->

All other links should be untouched, here some examples, but could be any link on the web:

There can be different links in one contentfield:

<p>If you want to read more, click
<a href="">here</a>
or there <a href="">there</a>

This is doing the replace:

UPDATE tablename
SET contentfield = REPLACE(contentfield, '.html', '/')

My ideas (but don't know how to create a statement for them):

  • WHERE within the previous 100 chars is found ''

  • WHERE number of ".html" found = number of "" found

It does not have to be 100% matching all '' links, I am happy with 90%, but there should not be a wrong replacement in external links.


Please see the following SQL Fiddle, which I think should produce all the results you have asked for:

SQL Fiddle Demo


A pattern-replacing function is needed for this but unfortunately MySQL doesn't provide such a thing. So I wrote one (based on another that wasn't quite sufficient) and have posted it here. As mentioned in the referenced answer, this function has a limitation of not allowing substitution of capturing groups with backreferences. Hence it has been slightly adapted in the fiddle to take further parameters that allow it to perform a recursive replace in the found match for the replacement. (Note the use of allowed URL path characters in the regex as per this excellent answer).

Update SQL

The following SQL will update the table data using the function below:

SET url = reg_replace(
  22, -- Min match length = = 22
  0,  -- No max match length
  7,  -- Min sub-match length = /?.html = 7
  0   -- No max sub-match length

(This can be seen in action here.)

Function code

The UDF code used in the demo is also posted below. Note: The UDF delegates to a stored procedure since only stored procedures will allow recursion in MySQL.

CREATE PROCEDURE reg_replace_worker(subject VARCHAR(21845), pattern VARCHAR(21845),
  pattern2 VARCHAR(21845), replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT,
  maxMatchLen INT, minSubMatchLen INT, maxSubMatchLen INT, OUT result VARCHAR(21845))
  DECLARE subStr, usePattern, useRepl VARCHAR(21845);
  DECLARE startPos, prevStartPos, startInc, len, lenInc INT;
  SET @@SESSION.max_sp_recursion_depth = 2;
  IF subject REGEXP pattern THEN
    SET result = '';
    -- Sanitize input parameter values
    SET minMatchLen = IF(minMatchLen < 1, 1, minMatchLen);
    SET maxMatchLen = IF(maxMatchLen < 1 OR maxMatchLen > CHAR_LENGTH(subject),
                         CHAR_LENGTH(subject), maxMatchLen);
    -- Set the pattern to use to match an entire string rather than part of a string
    SET usePattern = IF (LEFT(pattern, 1) = '^', pattern, CONCAT('^', pattern));
    SET usePattern = IF (RIGHT(pattern, 1) = '$', usePattern, CONCAT(usePattern, '$'));
    -- Set start position to 1 if pattern starts with ^ or doesn't end with $.
    IF LEFT(pattern, 1) = '^' OR RIGHT(pattern, 1) <> '$' THEN
      SET startPos = 1, startInc = 1;
    -- Otherwise (i.e. pattern ends with $ but doesn't start with ^): Set start position
    -- to the min or max match length from the end (depending on "greedy" flag).
    ELSEIF greedy THEN
      SET startPos = CHAR_LENGTH(subject) - maxMatchLen + 1, startInc = 1;
      SET startPos = CHAR_LENGTH(subject) - minMatchLen + 1, startInc = -1;
    END IF;
    WHILE startPos >= 1 AND startPos <= CHAR_LENGTH(subject)
      AND startPos + minMatchLen - 1 <= CHAR_LENGTH(subject)
      AND !(LEFT(pattern, 1) = '^' AND startPos <> 1)
      AND !(RIGHT(pattern, 1) = '$'
            AND startPos + maxMatchLen - 1 < CHAR_LENGTH(subject)) DO
      -- Set start length to maximum if matching greedily or pattern ends with $.
      -- Otherwise set starting length to the minimum match length.
      IF greedy OR RIGHT(pattern, 1) = '$' THEN
        SET len = LEAST(CHAR_LENGTH(subject) - startPos + 1, maxMatchLen), lenInc = -1;
        SET len = minMatchLen, lenInc = 1;
      END IF;
      SET prevStartPos = startPos;
      lenLoop: WHILE len >= 1 AND len <= maxMatchLen
                 AND startPos + len - 1 <= CHAR_LENGTH(subject)
                 AND !(RIGHT(pattern, 1) = '$' 
                       AND startPos + len - 1 <> CHAR_LENGTH(subject)) DO
        SET subStr = SUBSTRING(subject, startPos, len);
        IF subStr REGEXP usePattern THEN
          IF pattern2 IS NULL THEN
            SET useRepl = replacement;
            CALL reg_replace_worker(subStr, pattern2, NULL, replacement, greedy, 
                          minSubMatchLen, maxSubMatchLen, NULL, NULL, useRepl);
          END IF;
          SET result = IF(startInc = 1,
                          CONCAT(result, useRepl), CONCAT(useRepl, result));
          SET startPos = startPos + startInc * len;
          LEAVE lenLoop;
        END IF;
        SET len = len + lenInc;
      END WHILE;
      IF (startPos = prevStartPos) THEN
        SET result = IF(startInc = 1, CONCAT(result, SUBSTRING(subject, startPos, 1)),
                        CONCAT(SUBSTRING(subject, startPos, 1), result));
        SET startPos = startPos + startInc;
      END IF;
    IF startInc = 1 AND startPos <= CHAR_LENGTH(subject) THEN
      SET result = CONCAT(result, RIGHT(subject, CHAR_LENGTH(subject) + 1 - startPos));
    ELSEIF startInc = -1 AND startPos >= 1 THEN
      SET result = CONCAT(LEFT(subject, startPos), result);
    END IF;
    SET result = subject;

CREATE FUNCTION reg_replace(subject VARCHAR(21845), pattern VARCHAR(21845),
  pattern2 VARCHAR(21845), replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT,
  maxMatchLen INT, minSubMatchLen INT, maxSubMatchLen INT)
  DECLARE result VARCHAR(21845);
  CALL reg_replace_worker(subject, pattern, pattern2, replacement, greedy, minMatchLen,
                          maxMatchLen, minSubMatchLen, maxSubMatchLen, result);
  RETURN result;