如何解决由
我有:
from pyspark.sql import functions as F
from pyspark.sql.window import Window
df = spark.createDataFrame([(17,"2017-03-10T15:27:18+00:00",'Store 1'),(13,"2017-04-15T12:27:18+00:00",(25,"2017-05-18T11:27:18+00:00",(18,"2017-05-19T11:27:18+00:00","2017-03-15T12:27:18+00:00",'Store 2'),"2017-08-18T11:27:18+00:00",'Store 2')],["dollars","timestampGMT",'Store'])
df = df.withColumn('timestampGMT',df.timestampGMT.cast('timestamp'))
dollars timestampGMT Store
17 2017-03-10 15:27:18 Store 1
13 2017-04-15 12:27:18 Store 1
25 2017-05-18 11:27:18 Store 1
18 2017-05-19 11:27:18 Store 1
13 2017-03-15 12:27:18 Store 2
25 2017-05-18 11:27:18 Store 2
25 2017-08-18 11:27:18 Store 2
我想对最近3个月(如果存在最近3个月,否则为0)求平均值,按商店分组。 结束于:
dollars timestampGMT Store Last_3_months_Average
17 2017-03-10 15:27:18 Store 1 0
13 2017-04-15 12:27:18 Store 1 0
25 2017-05-18 11:27:18 Store 1 18.25
18 2017-05-19 11:27:18 Store 1 18.25
13 2017-03-15 12:27:18 Store 2 0
25 2017-05-18 11:27:18 Store 2 0
25 2017-08-18 11:27:18 Store 2 0
25 2017-08-19 11:27:18 Store 2 0
不确定如何解决此问题。我应该先按月分组吗?
解决方法
尝试一下。
input type=file
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。