Arash Rabiee Arash Rabiee - 5 months ago 24
SQL Question

Delete garbage parts of strings in MySQL

One of my fields contains a strange section of text, but there is something in it that is important for me:

EX: my field is something like this:


\"'Way Out\"(How to be one) (1961) {Side Show (#1.12)}\"'t Schaep Met
De 5 Pooten\"


All the text is garbage, I just want 1961.

Also, my table has more than 200K rows.

The year is inside of the parentheses. The query which is shown below works at every position, but when I have something inside of parentheses before year there is a problem.
All of the text is garbage but the year is required. I already used:

UPDATE `myTable`
SET `myField` = (SUBSTRING_INDEX(`myField`, '(', -3));
UPDATE `myTable`
SET `myField` = (SUBSTRING_INDEX(`myField`, ')', 3));
UPDATE `myTable`
SET `myField` = (SUBSTRING_INDEX(`myField`, ')', 2));
UPDATE `myTable`
SET `myField` = (SUBSTRING_INDEX(`myField`, ')', 1));


Is there any way to make a query that saves just the value inside of any parentheses which contain 4 digits?
In the above ex the result should be: 1961

Answer

You may give it a try ( I guess it's a one time job hence leave the performance issue):

UPDATE myTable MT

CROSS JOIN 
(
    Select (th*1000+h*100+t*10+u+1) x from
    (select 0 th union select 1 union select 2 union select 3 union select 4 union
    select 5 union select 6 union select 7 union select 8 union select 9) A,
    (select 0 h union select 1 union select 2 union select 3 union select 4 union
    select 5 union select 6 union select 7 union select 8 union select 9) B,
    (select 0 t union select 1 union select 2 union select 3 union select 4 union
    select 5 union select 6 union select 7 union select 8 union select 9) C,
    (select 0 u union select 1 union select 2 union select 3 union select 4 union
    select 5 union select 6 union select 7 union select 8 union select 9) D
    WHERE (th*1000+h*100+t*10+u+1) >= 1800
    AND (th*1000+h*100+t*10+u+1) <= 3000
    order by x
) t
SET MT.myField = t.x
WHERE `myField` LIKE CONCAT('%',t.x,'%');

Note: Assuming your year string is within this range : from 1800 to 3000

Better put your minimum and maximum year in the where clause

WHERE (th*1000+h*100+t*10+u+1) >= MINIMUM_YEAR AND (th*1000+h*100+t*10+u+1) <= MAXIMUM_YEAR

Before running the above query:

SQL FIDDLE DEMO.

After running the above query:

SQL FIDDLE