如何解决如何将SQL筛选器转换为Pyspark
我有以下SQL:
freecourse_info_step_8 as (
-- How many questions answered correct in that
select *,count(question_number) FILTER (WHERE answered = true) over(partition by hacker_rank_id,freecourse_version,question_block,freecourse_users_id) as answered_correct_in_block
from freecourse_info_step_7
),
我转换为Pyspark,
column_list = ["hacker_rank_id","freecourse_version","question_block","freecourse_users_id"]
window = Window.partitionBy([f.col(x) for x in column_list])
freecourse_info_step_8 = freecourse_info_step_7.withColumn('answered_correct_in_block',f.when(f.col('answered') == True,f.count('question_number').over(window)))
我怀疑代码与SQL的行为不同。 我对吗?我该如何正确地将此SQL转换为PySpark?
Pyspark spark.sql()方法不适用于FILTER
解决方法
return acceptFunction.call(req,opts.acceptedTypes);
count函数应该在when条件之外
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。