luckydonald luckydonald - 4 years ago 101
SQL Question

Get each <tag> in String - stackexchange database

Mockup code for my problem:

SELECT Id FROM Tags WHERE TagName IN '<osx><keyboard><security><screen-lock>'


The problem in detail



I am trying to get tags used in 2011 from apple.stackexchange data. (this query)

As you can see, tags in tag changes are stored as plain text in the
Text
field.
example output with stackexchange tags

<tag1><tag2><tag3>
<osx><keyboard><security><screen-lock>


How can I create a unique list of the tags, to look them up in the
Tags
table, instead of this hardcoded version:

SELECT * FROM Tags
WHERE TagName = 'osx'
OR TagName = 'keyboard'
OR TagName = 'security'


Here is a interactive example.

Stackexchange uses T-SQL, my local copy is running under postgresql using Postgres app version 9.4.5.0.

Answer Source

I've simplified the data to the relevant column only and called it tags to present the example.

Sample data

create table posthistory(tags text);
insert into posthistory values
  ('<lion><backup><time-machine>'),
  ('<spotlight><alfred><photo-booth>'),
  ('<lion><pdf><preview>'),
  ('<pdf>'),
  ('<asd>');

Query to get unique list of tags

SELECT DISTINCT
  unnest(
    regexp_split_to_array(
      trim('><' from tags), '><'
    )
  )
FROM
  posthistory

First we're removing all occurences of leading and trailing > and < signs from each row, then using regexp_split_to_array() function to get values into arrays, and then unnest() to expand an array to a set of rows. Finally DISTINCT eliminates duplicate values.

Presenting SQLFiddle to preview how it works.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download