如何解决粘贴如何使用 pyspark 删除数据帧上的记录
我怀疑如何在 pyspark 中删除从另一个数据帧获取数据的数据帧中的记录 如下。 pyspark:
df1 = df1.withColumn ("dt_dia",current_date ())
df1 = df1.withColumn ("dt_dia_menos_14_dias",sf.date_add (current_date (),- 14))
df1 = newdf.where (newdf.dt_create> newdf.dt_dia_menos_14_dias | newdf.dt_change> newdf.dt_dia_menos_14_dias)
df2 = df1.withColumn ("dt_dia",current_date ())
df2 = df1.withColumn ("dt_dia_menos_14_dias",- 14))
df2 = newdf.where (newdf.dt_create <= newdf.dt_dia_menos_14_dias | newdf.dt_change <0 newdf.dt_dia_menos_14_dias)
## wanted to include a line here to remove the records I found in df1 where df2 would be without the same records
from df1 the delete would be searching for the fields bank and account and doc. The rule is as follows where the date of inclusion or change equal to 2020-12-21 will be the most updated and the previous one should be replaced
The next step would be to make the union two dataframes
dfResultUNion = df1.union (df2)
谁能帮帮我!
df1
===
id; dt_create; dt_change; bank; account; doc; name
1; 2020-12-01 ;; 001; 001; 001; Michael
2; 2020-12-02 ;; 001; 002; 002; Ismael
3; 2020-12-02 ;; 002; 002; 003; Ben
df2
===
id; dt_create; dt_change; bank; account; doc; name
1; 2020-12-01; 2020-12-21; 001; 001; 001; Michael Jachason
2; 2020-12-02 ;; 001; 002; 002; Ismael
9; 2020-12-21 ;; 002; 002; 003; Mary
Result of UNION of df1 and df2
id; dt_create; dt_change; bank; account; doc; name
1; 2020-12-01; 2020-12-21; 001; 001; 001; Michael Jachason
2; 2020-12-02 ;; 001; 002; 002; Ismael
3; 2020-12-02 ;; 002; 002; 003; Ben
9; 2020-12-21 ;; 002; 002; 003; Mary
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。