如何解决来自Apache Beam的BigQuery授权视图
我正在尝试使用Apache Beam查询BigQuery中的视图。
该视图有权访问其引用的所有数据集。数据流/ GCE服务帐户可以访问该视图,但不能访问其基础数据集(这应该没有问题)。
当我尝试运行查询授权视图的作业时,出现如下错误:
java.lang.RuntimeException: java.io.IOException: Unable to get table: test_13249,aborting after 9 retries.
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:1004)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:491)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:477)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:471)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQueryHelper.executeQuery(BigQueryQueryHelper.java:109)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySourceDef.getTableReference(BigQueryQuerySourceDef.java:113)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:65)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:110)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:290)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:212)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:196)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:175)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:311)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,"errors" : [ {
"domain" : "global","message" : "Access Denied: Table my-gcp-project:bigquery_dataset_any.test_13249: User does not have bigquery.tables.get permission for table my-gcp-project:bigquery_dataset_any.test_13249.","reason" : "accessDenied"
} ],"status" : "PERMISSION_DENIED"
}
解决方法
Beam在处理BigQuery时会变得聪明。之所以会发生此错误,是因为Beam检查了查询所引用的表,而对于授权视图而言,并非总是如此。
要变通解决此问题,可以在withQueryLocation
或BigQueryIO.readTableRows()
中使用方法BigQueryIO.read(SerializableFunction)
。这样一来,Beam就可以使用提供的查询位置,而不会推断出一个。
因此:
BigQueryIO.readTableRows()
.fromQuery("SELECT * FROM my_authorized_view")
.withQueryLocation("US") // Whatever location is convenient for you
...
这应该可以解决您的问题。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。