如何解决pyspark-UnsupportedOperationException:空集合
下面是我用来训练GBM模型以使用MLlib进行回归的代码。 在我的数据中,没有分类变量,所有字符串列均已预先标签编码为整数值。
与文档示例几乎相同,但无法运行并出现以下错误,请遵循link。
火花版本:2.5
from pyspark.ml import Pipeline
from pyspark.ml.regression import GBTRegressor
from pyspark.ml.feature import VectorIndexer
from pyspark.ml.feature import VectorAssembler,VectorIndexer
from pyspark.ml.regression import GBTRegressor
from pyspark.ml.classification import GBTClassifier
from pyspark.ml.tuning import CrossValidator,ParamGridBuilder
from pyspark.ml.evaluation import RegressionEvaluator,BinaryClassificationEvaluator
from pyspark.ml import Pipeline
data = data.na.fill(-666)
# Train/Test Split
(X_train,X_test) = data.randomSplit([0.7,0.3])
vectorAssembler = VectorAssembler(inputCols=features,outputCol="rawFeatures")
vectorIndexer = VectorIndexer(inputCol="rawFeatures",outputCol="features",maxCategories=3)
target_var = 'class'
gbt = GBTRegressor(labelCol=target_var)
paramGrid = ParamGridBuilder()\
.addGrid(gbt.maxDepth,[6])\
.addGrid(gbt.maxIter,[10])\
.build()
# We define an evaluation metric.
evaluator = RegressionEvaluator(metricName="mae",labelCol=gbt.getLabelCol(),predictionCol=gbt.getPredictionCol())
# CV class
cv = CrossValidator(estimator=gbt,evaluator=evaluator,estimatorParamMaps=paramGrid)
# pipeline
pipeline = Pipeline(stages=[vectorAssembler,vectorIndexer,cv])
# trains the model
pipelineModel = pipeline.fit(X_train)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。