如何获取bigquery表的集群信息

如何解决如何获取bigquery表的集群信息

我正在尝试编写通用代码来自动执行多个表的加载作业。

在我要为其自动执行加载作业的表中，其中一个表是集群的，并且出现以下错误："Incompatible table partitioning specification. Expects partitioning specification interval(type:day) clustering(zipcode,address),but input partitioning specification is interval(type:day)"

这是我尝试加载表的代码：
我的问题是，给定tableId并假设该表已经存在，如何获取表的集群信息以将其设置为loadConfig.setClustering(clustering)

的一部分

    BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
    TableId tableId =
        TableId.of(
           FLAG_project.get(),FLAG_dataset_name.get(),FLAG_table_name.get());

    TimePartitioning partitioning = TimePartitioning.of(TimePartitioning.Type.DAY);
    
    CsvOptions csvOptions = CsvOptions.newBuilder().setAllowJaggedRows(true).build();

    LoadJobConfiguration loadConfig =
        LoadJobConfiguration.newBuilder(tableId,sourceUri,csvOptions)
            .setFormatOptions(FormatOptions.csv())
            .setTimePartitioning(partitioning)
            .setWriteDisposition(JobInfo.WriteDisposition.WRITE_APPEND)
            .setAutodetect(false)
            .setMaxBadRecords(1000)
            .setIgnoreUnknownValues(true)
            .build();

    Job loadJob = bigquery.create(JobInfo.of(loadConfig));
    loadJob = loadJob.waitFor();

解决方法

这些选项，分区和群集是在创建表时使用的，例如，如果要加载数据，并且想在表不存在时创建表。

如果仅将数据加载到已经存在的表中，那么我的建议是完全避免使用分区和集群选项的规范，因为它并不是真正需要的。

最后，如果您想要表的聚类信息，下面是有关如何执行此操作的示例：

// Get the tableId
TableId tableId = TableId.of(projectId,datasetName,tableName);

// Get the table
Table table = bigquery.getTable(tableId);

// Get the definition   
StandardTableDefinition tableDefinition = table.getDefinition();

// Get the clustering information
Clustering clustering = tableDefinition.getClustering();

现在您可以使用它在您的配置中创建

.setClustering(clustering)

您可以这样做：

    from google.cloud import bigquery

    client = bigquery.Client(project="PROJECT_ID")
    table = client.get_table("PROJECT_ID.DATASET.TABLE")
    print(table.clustering_fields)

您可以尝试以下方法：

bq show --format=prettyjson projectName:datasetName.tableName | jq '.clustering.fields[]'

示例：

bq show --format=prettyjson fh-bigquery:wikipedia_v3.pageviews_2017 | jq '.clustering.fields[]'

输出：

"wiki"
"title"

如何获取bigquery表的集群信息

如何解决如何获取bigquery表的集群信息

解决方法

相关推荐