Vygandas Vygandas - 1 month ago 7
MySQL Question

MySQL group by URL path pattern

How to group rows with URLs by path pattern? E.g. we have there addresses:

1 http://example.com
2 http://example.com/products
3 http://example.com/products/some-product
4 http://example.com/categories
5 http://example.com/categories/cat1
6 http://example.com/categories/cat2
7 http://example.com/categories/cat3
8 http://example.com/tags
9 http://example.com/tags/tag1
10 http://example.com/tags/tag2
11 http://example.com/tags/tag3
12 http://example.com/about


So results would be:

1 http://example.com
2 http://example.com/products
3 http://example.com/products/some-product
4 http://example.com/categories
5 http://example.com/categories/cat1
8 http://example.com/tags
9 http://example.com/tags/tag1
12 http://example.com/about


We know domain http://example.com. We need all distinct path types. Basically we want to know what different pages website have. So it's kind of http://example.com/ * / * / * ...

Answer Source

Try this

Rextester Sample

select * from tbl1 t1
where exists
(select 1
from tbl1 t2
  group by substring_index(concat(url,'@'),'/',4)
having t1.id=min(t2.id)
);

In MYSQL, there is no hard rule for not selecting columns which are not in group by. So you can go by this as well.

select *
from tbl1
group by 
    substring_index(concat(url,'@'),'/',4) 
order by id
;

Role of substring_index(concat(url,'@'),'/',4)

It will first add an extra character at the end of the url, say @. Then it would truncate the url till the 4th /. Without adding @ at the end, http://example.com/categories/cat1 and http://example.com/categories will be put in same group which you dont want.