SQL - 如何识别给定数据中的 1 小时时间段孤岛？

如何解决SQL - 如何识别给定数据中的 1 小时时间段孤岛？

目标是接受收到的第一个投诉，并在收到第一个投诉后的 1 小时内拒绝所有收到的投诉。例如我有下面的数据。

投诉ID	日期时间
1	12/24/2019 下午 1:07
2	12/24/2019 下午 1:20
3	12/24/2019 下午 1:40
4	2019/12/24 下午 2:00
5	12/24/2019 下午 2:10
6	12/24/2019 下午 2:12
7	12/24/2019 下午 2:50
8	2019/12/24 下午 2:55
9	12/24/2019 下午 3:00
10	12/24/2019 下午 3:08
11	12/24/2019 下午 4:00
12	12/24/2019 下午 4:50
13	12/24/2019 晚上 7:00
14	12/26/2019 晚上 7:01

所需输出：

投诉ID	日期时间	状态
1	12/24/2019 下午 1:07	接受
2	12/24/2019 下午 1:20	拒绝
3	12/24/2019 下午 1:40	拒绝
4	2019/12/24 下午 2:00	拒绝
5	12/24/2019 下午 2:10	接受
6	12/24/2019 下午 2:12	拒绝
7	12/24/2019 下午 2:50	拒绝
8	12/24/2019 下午 2:55	拒绝
9	12/24/2019 下午 3:00	拒绝
10	12/24/2019 下午 3:08	拒绝
11	12/24/2019 下午 4:00	接受
12	12/24/2019 下午 4:50	拒绝
13	12/24/2019 晚上 7:00	接受
14	12/26/2019 晚上 7:01	接受

我知道使用编程语言会很容易，但是我需要一个 SQL 解决方案。

编辑：

根据@Gordon 的建议，我实现了以下递归查询并且它有效！然而，在大数据上似乎效率低下。

with RECURSIVE t AS (
    select row_number as rn,ts,lag(ts,1) over (order by row_number) as baseline from main_table where row_number<3
  UNION ALL
    SELECT 
    rn+1 as rn,(select ts from main_table where row_number=rn+1) as ts,case when datediff('hour',baseline)>24 then ts else baseline end as baseline
     from (select * FROM t order by rn desc limit 1 )t where rn<=(select count(*)-1 from main_table)
),real_baseline as 
(
select rn,lead(baseline,1) over (order by rn) as real_baseline from t
)

select *,case when row_number() over (partition by real_baseline order by ts) =1 then 'Accept'
else 'Reject' end as status
from real_baseline

解决方法

通常您可以应用超前/滞后，但不能在这里应用。超前/滞后的问题是需要不可预测的范围。同样，递归 CTE 似乎不可行，因为它在递归部分需要 MIN 函数；然而这是不允许的。由于一个函数是令人满意的，也许最好的函数是返回一个表。见fiddle。

create or replace function public.accept_reject_complaints()
 returns table( o_complaintid integer,o_datetime    timestamp,o_status      text
              )
 language plpgsql
AS $$                 
declare
    l_current_end_ts timestamp = '-infinity'::timestamp;

    c_complaint_list cursor for
                     select complaintid,datetime     
                       from complaints
                      order by datetime;
begin
    for complaint_rec in c_complaint_list 
    loop
       if complaint_rec.datetime  > l_current_end_ts then 
          o_status = 'Accept'; 
          l_current_end_ts = complaint_rec.datetime  + interval '1 hour';
       else 
          o_status = 'Reject'; 
       end if; 
   
       o_datetime = complaint_rec.datetime;
       o_complaintid = complaint_rec.complaintid;
       return next; 
    end loop ;

end ; 
$$;

不幸的是，由于它涉及游标循环，因此在大数据量下性能会成为问题。

这是简单的方法。为 Hour 截断每个日期时间，然后在每个小时内将 First 或 Minimum datetime 设为接受，其他设为拒绝。

P.S 我已经使用 table_name 作为投诉更改它。在 Postgresql 8 中测试过。

SELECT ComplaintID,DateTime,CASE WHEN row_number() over(partition by hour order by 
DateTime)=1 THEN 'Accept' else 'Reject' end as Status from 
(select ComplaintID,date_trunc('hour',DateTime)as hour  from complaint)A ;

利用ComplaintID的连续性，查询为：

with recursive cte as (
  select 1 ComplaintID,min(DateTime) DateTime,min(DateTime) prev
    from main_table
  union all
  select t2.ComplaintID,t2.DateTime,case when t1.prev + interval '1 hour' < t2.DateTime
         then t2.DateTime else t1.prev end
    from cte t1 join main_table t2
    on t1.ComplaintID+1 = t2.ComplaintID
)
select ComplaintID,case when DateTime=prev
    then 'Accept' else 'Reject' end Status
  from cte
  order by ComplaintID

DB Fiddle

提取Accept的每一行，查询为：

with recursive cte as (
  (
    select ComplaintID,'Accept' Status
      from main_table order by DateTime limit 1
  )
  union all
  (
    select t2.ComplaintID,'Accept'
      from cte t1 join main_table t2
      on t1.DateTime + interval '1 hour' < t2.DateTime
      order by t2.DateTime limit 1
  )
)
select t1.ComplaintID,t1.DateTime,coalesce(t2.Status,'Reject') Status
  from main_table t1 left join cte t2
  on t1.ComplaintID=t2.ComplaintID
  order by t1.ComplaintID

DB Fiddle

SQL - 如何识别给定数据中的 1 小时时间段孤岛？

如何解决SQL - 如何识别给定数据中的 1 小时时间段孤岛？

解决方法

相关推荐