S. Stevens S. Stevens - 2 years ago 61
MySQL Question

mysql - Grouping together rows where Count() less than percentage of total

I'm working on a database which keeps track of student records for an after-school education company, including enrollment into classes and information on the students.

What I am trying to do is write a query that can return the number of enrolled students we have from each school, but also to group together schools that contribute below a certain percentage of the total (I want to display the information in a chart, but we have a lot of schools where only 1 students comes from that school, and I don't want the chart to have 50 bars or pie slices etc.)

So instead of

+-------------+------------+
| School Name | # Students |
+-------------+------------+
| School A | 52 |
| School B | 27 |
| School C | 15 |
| School D | 2 |
| School E | 1 |
| School F | 1 |
+-------------+------------+


I would want

+---------------+------------+
| School Name | # Students |
+---------------+------------+
| School A | 52 |
| School B | 27 |
| School C | 15 |
| Other Schools | 4 |
+---------------+------------+


Here is the simplified form of the query I have now, it works but it's somewhat redundant in querying the same information with multiple Selects. Is there anyway to reduce the redundancy?

SELECT @enrollmentSum := COUNT(StudentEnrollmentID) FROM StudentEnrollment;
SELECT SchoolName, COUNT(StudentEnrollmentID) ECount FROM Student
JOIN StudentEnrollment ON StudentEnrollment.StudentID = Student.StudentID
JOIN School ON Student.SchoolID = School.SchoolID
GROUP BY SchoolName
HAVING Ecount >= .025 * @enrollmentSum
UNION ALL
SELECT "Other Schools" as SchoolName, SUM(Ecount) as ECount FROM
(
SELECT SchoolName, COUNT(StudentEnrollmentID) ECount FROM Student
JOIN StudentEnrollment ON StudentEnrollment.StudentID = Student.StudentID
JOIN School ON Student.SchoolID = School.SchoolID
GROUP BY SchoolName
HAVING Ecount < .025 * @enrollmentSum
) t2
ORDER BY Ecount DESC


If needed, the basic structure of the relevant tables:

Student

+-----------+-------------+----------+
| StudentID | StudentName | SchoolID |
+-----------+-------------+----------+


School

+----------+------------+
| SchoolID | SchoolName |
+----------+------------+


StudentEnrollment

+---------------------+-----------+---------+
| StudentEnrollmentID | StudentID | ClassID |
+---------------------+-----------+---------+


Thank you for any help

Answer Source

Tips:

  • count(x) returns number of rows where "x IS NOT NULL" therefore count(primary key) = count(*) which is simpler to read

  • "JOIN School ON Student.SchoolID = School.SchoolID" can be rewritten as "JOIN School USING (SchoolID)" which is more readable and also gives you only one column "SchoolID" in the result set if you use something like "select *"

Now, the query...

SELECT SchoolName, sum(cnt) ECount FROM 
(SELECT IF(count(*)>=.025*@enrollmentSum, SchoolName, 'Others') AS SchoolName,
 COUNT(*) cnt FROM Student
 JOIN StudentEnrollment USING (StudentID)
 JOIN School USING (SchoolID)
 GROUP BY SchoolName) subq
GROUP BY SchoolName
ORDER BY Ecount DESC

Using the IF() will replace school name with 'others' for all schools below the threshold. Note this is computed after the GROUP BY, so you can actually use count(*) in the selected expressions. Then another GROUP BY groups the 'others' together.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download