如何解决Hive查询失败,出现“无法获取表test_table无效的方法名称:'get_table_req'”,pyspark 3.0.0和Hive 1.1.0
在相当新的环境中挖掘POC以获取火花并检查火花功能,但是在pyspark终端中运行sql查询时出现问题,而Hive正在运行,因为我们可以查询元数据。
您知道这里发生了什么以及如何解决吗?
$ pyspark --driver-class-path /etc/spark2/conf:/etc/hive/conf
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql import Row
>>> spark = SparkSession \
... .builder \
... .appName("sample_query_test") \
... .enableHiveSupport() \
... .getOrCreate()
>>> spark.sql("show tables in user_tables").show(5)
20/08/18 19:57:01 WARN conf.HiveConf: HiveConf of name hive.enforce.sorting does not exist
20/08/18 19:57:01 WARN conf.HiveConf: HiveConf of name hive.enforce.bucketing does not exist
20/08/18 19:57:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
+-----------+--------------------+-----------+
| database| tableName|isTemporary|
+-----------+--------------------+-----------+
|user_tables| a_2019| false|
|user_tables|abcdefgjeufjdsahh...| false|
|user_tables|testtesttesttestt...| false|
|user_tables|newnewnewnewnenwn...| false|
|user_tables|blahblahblablahbl...| false|
+-----------+--------------------+-----------+
only showing top 5 rows
>>> spark.sql("select count(*) from user_tables.test_table where date_partition='2020-08-17'").show(5)
Traceback (most recent call last):
File "<stdin>",line 1,in <module>
File "/opt/conda/lib/python3.7/site-packages/pyspark/sql/session.py",line 646,in sql
return DataFrame(self._jsparkSession.sql(sqlQuery),self._wrapped)
File "/opt/conda/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",line 1305,in __call__
File "/opt/conda/lib/python3.7/site-packages/pyspark/sql/utils.py",line 137,in deco
raise_from(converted)
File "<string>",line 3,in raise_from
**pyspark.sql.utils.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table test_spark_cedatatransfer. Invalid method name: 'get_table_req';**
集群上的信息:
$ hive --version
Hive 1.1.0-cdh5.13.0
Subversion file:///data/jenkins/workspace/generic-package-ubuntu64-16-04/CDH5.13.0-Packaging-Hive-2017-10-04_10-50-44/hive-1.1.0+cdh5.13.0+1269-1.cdh5.13.0.p0.34~xenial -r Unknown
Compiled by jenkins on Wed Oct 4 11:46:53 PDT 2017
$ pyspark --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.0
/_/
Using Scala version 2.12.10,OpenJDK 64-Bit Server VM,1.8.0_252
Branch HEAD
Compiled by user ubuntu on 2020-06-06T11:32:25Z
Revision 3fdfce3120f307147244e5eaf46d61419a723d50
Url https://gitbox.apache.org/repos/asf/spark.git
$ hadoop version
Hadoop 2.6.0-cdh5.13.0
Subversion http://github.com/cloudera/hadoop -r 42e8860b182e55321bd5f5605264da4adc8882be
Compiled by jenkins on 2017-10-04T18:50Z
Compiled with protoc 2.5.0
From source with checksum 5e84c185f8a22158e2b0e4b8f85311
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.13.0.jar
很明显,我添加了hive conf来确保正在使用相同的metastore,并且在执行简单操作的情况下,插入覆盖失败了!
解决方法
我遇到了相同的问题,试图将Spark 3.0.1与HDP2.6一起使用。
解决了问题,方法是从hive*.jar
文件夹中删除所有jars
个文件,然后从HDP发行版中的Spark2复制hive*jar
。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。