Shlomi Schwartz Shlomi Schwartz - 2 months ago 6
SQL Question

Select partitions based on matches in other table

Having the following table (

conversations
):

id | record_id | is_response | text |
---+------------+---------------+----------------------+
1 | 1 | false | in text 1 |
2 | 1 | true | response text 3 |
3 | 1 | false | in text 2 |
4 | 1 | true | response text 2 |
5 | 1 | true | response text 3 |
6 | 2 | false | in text 1 |
7 | 2 | true | response text 1 |
8 | 2 | false | in text 2 |
9 | 2 | true | response text 3 |
10 | 2 | true | response text 4 |


And another help table (
responses
):

id | text |
---+----------------------+
1 | response text 1 |
2 | response text 2 |
3 | response text 4 |


I'm looking for an SQL query to output the following:

record_id | context
----------+-----------------------+---------------------
1 | in text 1 response text 3 in text 2 response text 2
----------+-----------------------+---------------------
2 | in text 1 response text 1
----------+-----------------------+---------------------
2 | in text 2 response text 3 response text 4


So each time
is_response
is
true
and the
text
is in the responses table, aggregate the conversation context up to this point, ignoring conversation part that does not end with a response in the pool.

In the example above living response text 3 in
record_id
1.

I've tried the following complex SQL but it breaks sometimes aggregating the text wrong:

with context as(
with answers as (

SELECT record_id, is_response, id as ans_id
, max(id)
OVER (PARTITION BY record_id ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS previous_ans_id
FROM (select * from conversations where text in (select text from responses)) ans
),
lines as (
select answers.record_id, con.id, COALESCE(previous_ans_id || ',' || ans_id, '0') as block, con.text as text from answers, conversations con where con.engagement_id = answers.record_id and ((previous_ans_id is null and con.id <= ans_id) OR (con.id > previous_ans_id and con.id <= ans_id)) order by engagement_id, id asc
)

select record_id, block,replace(trim(both ' ' from string_agg(text, E' ')) ,' ',' ') ctx from lines group by record_id, block order by record_id,block
)

select * from context


I'm sure there is a better way.

Answer

Here's my take:

SELECT
    record_id,
    string_agg(text, ' ' ORDER BY id) AS context
FROM (
    SELECT
        *,
        coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
    FROM (
        SELECT *, is_response AND text IN (SELECT text FROM responses) as incl
        FROM conversations
         ) c
     ) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);

This will scan the table conversations once, but I am not sure if it will perform better than your solution. The basic idea is to use a window function to count how maybe preceding rows within the same record, end the conversation. Then we can group by with that number and the record_id and discard incomplete conversations.