carol1287 carol1287 - 3 months ago 14
C# Question

MySql Split Query Group Percentage

I am trying to do the following but I cannot manage to get it right yet :(.

I have these tables:

table1 -> tb1_id, tb1_name

Sample Data:
--------------
1 group1
2 group2
3 group3
4 group4
5 group5

table2 -> tb2_id, tb2_sector, tb2_tb3_id

Sample Data:
--------------
1 alpha 1
2 beta 2
3 gamma 2
4 delta 2
5 epsilon 4

table3 -> tb3_id, tb3_mid, tb3_section

Sample Data:
--------------
1 234 alpha,beta,gama,delta


This is the output that I am looking for:

Name Count %
------ ----- -----
group1 1 25%
group2 3 75%
group3 0 0%
group4 0 0%
group5 0 0%


Basically I need a split a column value delimited by a comma (tb3_section in table3) and then find the right group for each value (table2 gives me the group id to link with table1) and then do a total count by group and get the percentage (assuming total is 100%).

This is the query I tried so far:

I searched for split value samples and found one that does the split by creating a numbers table first:

create table numbers (
`n` INT(11) SIGNED
, PRIMARY KEY(`n`)
)

INSERT INTO numbers(n) SELECT @row := @row + 1 FROM
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t,
(SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t2,
(SELECT 0 UNION ALL SELECT 1) t8,
(SELECT @row:=0) ti;


Afterwards, I did this:

select tb3_section, count(1) from (
select
tb3_mid,
substring_index(
substring_index(tb3_section, ',', n),
',',
-1
) as tb3_section from table3
join numbers
on char_length(tb3_section)
- char_length(replace(tb3_section, ',', ''))
>= n - 1
) tb3_section_dashboard
group by 1


This doesn't give me the group count. Just does the split of tb3_section but doesn't give me the correct count and equivalent percentage. Any ideas will be much appreciate it thanks a lot.

LATEST UPDATE

First of all, I would like to thanks @eggyal for pointing me to the right direction and @Shadow for despise knowing that I was not taking the best approach, he came up with a quick fix to my problem. I managed to change the approach and removed the comma delimited values from table3. Instead now I add multiple rows for each new value (and added a constraint to avoid duplicates).

Now table3 looks like:

Sample Data:
--------------
1 234 alpha
2 234 beta
3 234 gama
4 234 delta
5 235 alpha


Here is the query I have taken from @shadow sample:

SELECT t1.tb1_name, COUNT(t3.tb3_section) AS no_per_group,
COUNT(t3.tb3_section) / t4.no_of_groups AS percentage
FROM t1 left
JOIN t2 ON t1.tb1_id=t2.tb2_tb3_id
INNER JOIN t3 ON t2.tb2_sector=t3.tb3_section>0
JOIN (SELECT COUNT(*) AS no_of_groups
FROM t3 INNER JOIN t2 ON t2.tb2_sector=t3.tb3_section>0) t4
GROUP BY t1.tb1_name


Instead of using find_in_set now I use = to match the exact value.
Now I get something like the following but the percentage looks odd and I miss a group that doesn't have a match:

Name no_per_group percentage
----- ------------- ----------
group1 2 0.1053
group3 3 0.1579
group4 3 0.1579
group5 3 0.1579


Although still I need something like:

Name Count %
------ ----- -----
group1 1 25%
group2 3 75%
group3 0 0%
group4 0 0%
group5 0 0%


Notice that if there is no match in a group, I still need to show that group.
Because I have thousands of records which are different from each other, I need to add another condition: where tb3_mid=234 . Likes this, the results are using to tb3_mid.

Answer

The best solution would be to redesign your table structure and move the data in the delimited values list to a separate table.

The quick solution is to utilise MySQL's find_in_set() function.

To get the total count of entries in the messages table (table3):

select count(*) as no_of_groups
from t3 inner join t2 on find_in_set(t2.tb2_sector,t3.tb3_section)>0

To get the counts per group, add a join to table1 and group by group name. To calculate the percentage, add the above query as a subquery:

select t1.tb1_name, count(t3.tb3_section) as no_per_group, count(t3.tb3_section) / t4.no_of_groups as percentage
from t1 left join t2 on t1.tb1_id=t2.tb2_tb3_id
inner join t3 on find_in_set(t2.tb2_sector,t3.tb3_section)>0
join (select count(*) as no_of_groups
      from t3 inner join t2 on find_in_set(t2.tb2_sector,t3.tb3_section)>0) t4  --no join condition makes a Cartesian join
group by t1.tb1_name
Comments