如何解决将分区添加到 GLue Catalog 中的表
我在 Glue 目录中定义了一个外部分区表,数据存储在 S3 中。
当我运行 MSCK REPAIR TABLE {table}
时,我可以按预期向表中添加分区并在 Athena 中查询它。
但是当我使用 boto3 API 添加分区时,尝试在 Athena 中查询时出现以下错误:
HIVE_UNKNOWN_ERROR: Can not create a Path from an empty string
我正在使用以下代码添加分区:
partitions = {
"Values": [
"2020","07","01"
],"StorageDescriptor": {
"Columns": [
{
"Name": "_hoodie_commit_time","Type": "string"
},{
"Name": "_hoodie_commit_seqno",{
"Name": "_hoodie_record_key",{
"Name": "_hoodie_partition_path",{
"Name": "_hoodie_file_name",{
...some more columns
}
],"InputFormat": "org.apache.hudi.hadoop.HoodieParquetInputFormat","OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat","Compressed": False,"NumberOfBuckets": -1,"SerdeInfo": {
"SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe","Parameters": {
"serialization.format": "1"
}
}
}
}
glue_conn.create_partition(
DatabaseName={database_name},TableName={table_name},PartitionInput=partitions
)
我还运行了 describe formatted test partition (year='2020',month='07',day='01');
,除了 CreateTime
字段外,它看起来与上述两种方法完全相同:
# col_name data_type comment
_hoodie_commit_time string
_hoodie_commit_seqno string
_hoodie_record_key string
_hoodie_partition_path string
_hoodie_file_name string
...some more columns
# Partition Information
# col_name data_type comment
year string
month string
day string
# Detailed Partition Information
Partition Value: [2020,07,01]
Database: bidb
Table: test
CreateTime: Tue Jan 12 12:08:44 UTC 2021
LastAccessTime: UNKNOWN
Location: s3://location
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hudi.hadoop.HoodieParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
表定义:
hive> show create table static_portfolio;
OK
CREATE EXTERNAL TABLE `static_portfolio`(
`_hoodie_commit_time` string,`_hoodie_commit_seqno` string,`_hoodie_record_key` string,`_hoodie_partition_path` string,`_hoodie_file_name` string,...some more columns)
PARTITIONED BY (
`year` string,`month` string,`day` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3:/location'
TBLPROPERTIES (
'bucketing_version'='2','transient_lastDdlTime'='1610446119')
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。