使用Spark的HIVE中的数据在SQL SERVER数据库中的UPDATE表

如何解决使用Spark的HIVE中的数据在SQL SERVER数据库中的UPDATE表

我在SQL Server中有我的主表,我想根据我的主表(在SQL Server DB中)和目标表(在HIVE)中3列匹配的条件来更新表中的几列。两个表都有多列,但我只对6列感兴趣,如下所示:

我要在主表中更新的3列是

"INSPECTED_BY","INSPECTION_COMMENTS" and "SIGNED_BY"

我要用作匹配条件的列是

"SERVICE_NUMBER","PART_ID" and "LOTID"

我尝试了以下代码,但它给了我NullPointerException错误

val df = spark.table("location_of_my_table_in_hive")
df.show(false)
df.foreachPartition(partition => 
{
    val Connection = DriverManager.getConnection(SQLjdbcURL,SQLusername,SQLPassword)
    val batch_size = 100
    var psmt: PreparedStatement = null 

    partition.grouped(batch_size).foreach(batch => 
    {
        batch.foreach{row => 
            {
                val inspctbyIndex = row.fieldIndex("INSPECTED_BY")
                val inspctby = row.getString(inspctbyIndex)
        
                val inspcomIndex = row.fieldIndex("INSPECT_COMMENTS")
                val inspcom = row.getString(inspcomIndex)
        
                val signIndex = row.fieldIndex("SIGNED_BY")
                val signby = row.getString(signIndex)
        
                val sqlquery = "MERGE INTO SERVICE_LOG_TABLE as LOG" +
                    "USING (VALUES(?,?,?))" +
                    "AS ROW(inspctby,inspcom,signby)" +
                    "ON LOG.SERVICE_NUMBER = ROW.SERVICE_NUMBER and LOG.PART_ID = ROW.PART_ID and LOG.LOTID = ROW.LOTID" +
                    "WHEN MATCHED THEN UPDATE SET INSPECTED_BY = 'SMITH',INSPECT_COMMENTS = 'STANDARD_MET',SIGNED_BY = 'WILL'" +
                    "WHEN NOT MATCHED THEN INSERT VALUES(ROW.INSPECTED_BY,ROW.INSPECT_COMMENTS,ROW.SIGNED_BY)"
                var psmt: PreparedStatement = Connection.prepareStatement(sqlquery)
        
                psmt.setString(1,inspctby)
                psmt.setString(2,inspcom)
                psmt.setString(3,signby)
                psmt.addBatch()
            }   
        }
        psmt.executeBatch()
        Connection.commit()
        psmt.close()
    })
    Connection.close()
})

这是错误

ERROR scheduler.TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 
times,most recent failure: Lost task 0.3 in stage 2.0 (TID 9,lwtxa0gzpappr.corp.bankofamerica.com,executor 4): java.lang.NullPointerException
    at $anonfun$1$$anonfun$apply$1.apply(/location/service_log.scala:101)
    at $anonfun$1$$anonfun$apply$1.apply(/location/service_log.scala:74)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at $anonfun$1.apply(/location/service_log.scala:74)
    at $anonfun$1.apply(/location/service_log.scala:68)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2121)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

我搜索了Internet,找不到错误即将来临的原因。 任何帮助将不胜感激

解决方法

如果您在Spark群集上运行此程序,我认为您可能必须广播一些对象。执行程序无法获取对象的值,因此空指针异常。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 <select id="xxx"> SELECT di.id, di.name, di.work_type, di.updated... <where> <if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 <property name="dynamic.classpath" value="tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-