liltitus27 liltitus27 - 2 months ago 7
SQL Question

Select Grouped Rows Having a certain Property

I'm trying to select only certain rows containing a certain property. Here's a sample of the data I'm working with:


src_id cand_source
------ -----------
201609-004d7bgNDFXuIrQPXwsXrOptt2PdTdeXsjV5RJ6_mEQ mcp
201609-004d7bgNDFXuIrQPXwsXrOptt2PdTdeXsjV5RJ6_mEQ mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2
201609-00WmbmuIp3cwAcTNTbrgb9tTVR0AKNf-RvjXcHWPEEQ mc2
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mc2
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mcp
201609-01My_orS795Hmomry3-JiCiBVimarRzRGQ9Cnornp8Q mc2
201609-01noPFGBCqbH9jUB9MHNqPynjqW8cr24LJY917vSGTs mc2
201609-01noPFGBCqbH9jUB9MHNqPynjqW8cr24LJY917vSGTs mc2
201609-02ISoPEX0VVkQ0ogot49Q-e7K39Zyk2vdN1rB4Q-kl0 mc2
201609-02ISoPEX0VVkQ0ogot49Q-e7K39Zyk2vdN1rB4Q-kl0 mc2
201609-02LVZ8UqAaz7JCp3RAOTiIE7zH2mveiSQPBo6I6dHDc mc2
201609-02LVZ8UqAaz7JCp3RAOTiIE7zH2mveiSQPBo6I6dHDc mc2
201609-03dLH32kaKYVwIj4HiT1tZjCNgqgXiG-fvezX3S9QI4 mc2
201609-03dLH32kaKYVwIj4HiT1tZjCNgqgXiG-fvezX3S9QI4 mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-0421Jatpsk9T8GOD1M_GvDrnyV4dA41IL5tDeuTxGwU mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mcp
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2
201609-04HzM6NBIx_6QN91xzF9_p0RGfAQcRMeEhVFEPFZ8p4 mc2
201609-04JzR3AMxsfQvAeq1MAgjCtMhcaqt2Z_WNmuUlYLrLM mc2
201609-04JzR3AMxsfQvAeq1MAgjCtMhcaqt2Z_WNmuUlYLrLM mcp


What I want to do is select only the
src_id
s having at least one
cand_source
equaling
mcp
. Here's what I've tried:

SELECT *
FROM schema.table
WHERE src_id IN (
SELECT src_id
FROM schema.table
WHERE batch_id = ?
GROUP BY src_id
HAVING count(cand_source = 'mcp') > 1
)
ORDER BY src_id,
match_score DESC


That, however, keeps giving me back clusters of
src_id
s that have no
cand_source
s equaling
mcp
.




It was pointed out that I'm simply overcomplicating matters. Here's the solution:

SELECT *
FROM schema.table
WHERE src_id IN (
SELECT DISTINCT src_id
FROM schema.table
WHERE batch_id = ?
AND cand_source = 'mcp'
)
ORDER BY src_id,
match_score DESC

Answer

If you simply want the src_id that has mcp then a straight guery with a WHERE statement is enough no need for conditional aggregation or anything.

SELECT DISTINCT src_id
FROM
    Table
WHERE
    cand_source = 'mcp'
    AND batch_id = ?

If you want all of the records for each src_id that has at least 1 cand_source you can then join that back to the table to receive all of the records.

SELECT t.*
FROM
    Table t
    INNER JOIN (
       SELECT DISTINCT src_id
       FROM
          Table
       WHERE
          cand_source = 'mcp'
          AND batch_id = ?
    ) d
    ON t.src_id = d.src_id
    AND t.batch_id = ?

Or you can use a Common Table Expression with the awesome window functions to do it.

WITH cte AS (
    SELECT *, COUNT(CASE WHEN cand_source = 'mcp' THEN cand_source END) OVER (PARTITION BY src_id) as McpCount
    FROM
       Table
    WHERE
       batch_id = ?

)

SELECT *
FROM
    cte
WHERE
    McpCount > 0;
Comments