如何解决H2O目标编码器Mojo模型抛出NULL指针异常
我正在h2o文档上使用给定的python example构建目标编码器模型,并尝试使用该模型的mojo通过java预测目标编码。但是,mojo预测仅在测试数据中存在,而在训练数据中不存在,具有以下错误的类别上失败
Exception in thread "main" java.lang.NullPointerException
at hex.genmodel.algos.targetencoder.TargetEncoderMojoModel.computeEncodings(TargetEncoderMojoModel.java:87)
at hex.genmodel.algos.targetencoder.TargetEncoderMojoModel.score0(TargetEncoderMojoModel.java:72)
at hex.genmodel.easy.EasyPredictModelWrapper.predict(EasyPredictModelWrapper.java:889)
at hex.genmodel.easy.EasyPredictModelWrapper.transformWithTargetEncoding(EasyPredictModelWrapper.java:618)
at main.main(main.java:26)
深入研究目标编码器mojo后,发现domains.txt
中仅存在测试数据中存在的类别,因此目标编码器不会将这些类别视为丢失的类别。但是encoding_map.ini
中缺少这些类别的目标编码,因此,当模型尝试使用NullPointerException
访问此类类别的编码时,模型将抛出encoding_map.ini
训练模型的代码:
h2o.init()
from h2o.estimators import H2OTargetEncoderEstimator
titanic = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv")
titanic['survived'] = titanic['survived'].asfactor()
response='survived'
train,test = titanic.split_frame(ratios = [.5],seed = 1234)
encoded_columns = ["home.dest","cabin","embarked"]
blended_avg= True
inflection_point = 3
smoothing = 10
noise = 0.15
data_leakage_handling = "k_fold"
fold_column = "kfold_column"
train[fold_column] = train.kfold_column(n_folds=5,seed=3456)
titanic_te = H2OTargetEncoderEstimator(fold_column=fold_column,data_leakage_handling=data_leakage_handling,blending=blended_avg,k=inflection_point,f=smoothing)
titanic_te.train(x=encoded_columns,y=response,training_frame=train)
titanic_te.download_mojo(get_genmodel_jar=True)
获取编码的代码:
import java.io.*;
import java.util.Arrays;
import hex.genmodel.easy.RowData;
import hex.genmodel.easy.EasyPredictModelWrapper;
import hex.genmodel.easy.prediction.*;
import hex.genmodel.MojoModel;
import hex.genmodel.algos.targetencoder.TargetEncoderMojoModel;
public class main {
public static void main(String[] args) throws Exception {
EasyPredictModelWrapper model = new EasyPredictModelWrapper(MojoModel.load("TargetEncoder_model_python_1599838802418_2.zip"));
String[] temp_home = { "?Havana Cuba","Aberdeen / Portland OR","Albany NY","Altdorf Switzerland","Amenia ND","Antwerp Belgium / Stanton OH","Asarum Sweden Brooklyn NY","Ascot Berkshire / Rochester NY","Auburn NY","Aughnacliff Co Longford Ireland New York NY","Australia Fingal ND","Austria Niagara Falls NY","Austria-Hungary","Austria-Hungary / Germantown Philadelphia PA"};
for(int j=0; j<temp_home.length; j++){
RowData row = new RowData();
row.put("cabin","D43");
row.put("embarked","C");
row.put("home.dest",temp_home[j]);
TargetEncoderPrediction p = model.transformWithTargetEncoding(row);
System.out.println(Arrays.toString(p.transformations));
}
}
}
编译命令: javac -cp h2o-genmodel.jar -J-Xmx2g main.java
运行命令: java -cp。:h2o-genmodel.jar main
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。