如何解决连接两个数据框,并限制一个数据框的行
我有两个数据框:
df1:
+--------------+---------------------+
|id_device |tracking_time |
+--------------+---------------------+
|20 |2020-02-19 02:37:45 |
|5 |2020-02-17 17:15:45 |
+--------------+---------------------+
df2
+--------------+----------------------+
|id_device |tracking_time |
+--------------+----------------------+
|20 | 2019-02-19 02:41:45 |
|20 |2020-01-17 17:15:45 |
+--------------+----------------------+
我想得到以下输出:
+--------------+---------------------+------------------+
|id_device |tracking_time | df2.tracking_time |
+--------------+---------------------+------------------+
|20 |2020-02-19 02:37:45 |2019-02-19 02:41:45|
|5 |2020-02-17 17:15:45 |null |
+--------------+---------------------+-------------------+
我尝试了以下代码:
df1.registerTempTable("data");
df2.createOrReplaceTempView("tdays");
Dataset<Row> d_f = sparkSession.sql("select a.*,b.* from data as a LEFT JOIN (select * from tdays ) as b on b.id_device == a.id_device and b.tracking_time < a.tracking_time ");
我得到以下输出:
+----------------------+---------------------+--------------------+------------------ -+
|id_device |tracking_time | b.id_device |b.tracking_time |
+----------------------+---------------------+--------------------+--------------------+
|20 |2020-02-19 02:37:45 |20 | 2019-02-19 02:41:45|
|20 |2020-02-19 02:37:45 |20 | 2020-01-17 17:15:45|
|5 |2020-02-17 17:15:45 |null |null |
+-----------------------+--------------------+--------------------+--------------------+
我想要的是通过左连接ordered by df2.tracking_time desc limit 1
我需要你的帮助
解决方法
在加入之前,您可以将df2
减少到每个id_device
的最小日期:
val df1 = ...
val df2 = ...
val df2min = df2.groupBy("id_device").agg(min("tracking_time")).as("df2.tracking_time")
val result = df1.join(df2min,Seq("id_device"),"left")
df2min
仅包含一行,每个ID的最低日期为df2
。因此,左联接将返回预期结果。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。