如何解决如何在Databricks中显示所有数据库中的所有表
以下内容可用于分别显示当前模式或指定模式中的表:
show tables;
show tables in my_schema;
此文档记录在这里: https://docs.databricks.com/spark/latest/spark-sql/language-manual/show-tables.html
有没有办法显示所有数据库中的所有表?
Databricks / Spark中是否有元数据表(类似于Oracle中的all_或dba_表或MySql中的information_schema)?有没有办法对Databricks中的数据库对象进行更具体的查询?像这样:
select * from i_dont_know_what where lower(table_name) like '%gold%' and schema = 'myschema';
解决方法
不能在数据砖上使用Spark目录API吗?请尝试这个-
val tuples: Map[String,String] = spark.catalog.listDatabases().collect().flatMap(db =>
spark.catalog.listTables(db.name).collect().map(x => (db.name,x.name))
).toMap
,
您可以使用以下代码在数据库中列出所有表名
df = spark.sql("show tables in {}".format("<Your Database Name>"))
display(df)
,
我遇到了类似的问题。我也写了一篇关于它的简短文章:https://medium.com/helmes-people/how-to-view-all-databases-tables-and-columns-in-databricks-9683b12fee10
输出是一个 Spark SQL 视图,其中包含数据库名称、表名称和列名称。这适用于所有数据库、所有表和所有列。 您可以扩展它以获取更多信息。我需要的关于它的好处是它还列出了嵌套列 (StructType)。
Pyspark 代码:
from pyspark.sql.types import StructType
# get field name from schema (recursive for getting nested values)
def get_schema_field_name(field,parent=None):
if type(field.dataType) == StructType:
if parent == None:
prt = field.name
else:
prt = parent+"."+field.name # using dot notation
res = []
for i in field.dataType.fields:
res.append(get_schema_field_name(i,prt))
return res
else:
if parent==None:
res = field.name
else:
res = parent+"."+field.name
return res
# flatten list,from https://stackoverflow.com/a/12472564/4920394
def flatten(S):
if S == []:
return S
if isinstance(S[0],list):
return flatten(S[0]) + flatten(S[1:])
return S[:1] + flatten(S[1:])
# list of databases
db_list = [x[0] for x in spark.sql("SHOW DATABASES").rdd.collect()]
for i in db_list:
spark.sql("SHOW TABLES IN {}".format(i)).createOrReplaceTempView(str(i)+"TablesList")
# create a query for fetching all tables from all databases
union_string = "SELECT database,tableName FROM "
for idx,item in enumerate(db_list):
if idx == 0:
union_string += str(item)+"TablesList WHERE isTemporary = 'false'"
else:
union_string += " UNION ALL SELECT database,tableName FROM {}".format(str(item)+"TablesList WHERE isTemporary = 'false'")
spark.sql(union_string).createOrReplaceTempView("allTables")
# full list = schema,table,column
full_list = []
for i in spark.sql("SELECT * FROM allTables").collect():
table_name = i[0]+"."+i[1]
table_schema = spark.sql("SELECT * FROM {}".format(table_name))
column_list = []
for j in table_schema.schema:
column_list.append(get_schema_field_name(j))
column_list = flatten(column_list)
for k in column_list:
full_list.append([i[0],i[1],k])
spark.createDataFrame(full_list,schema = ['database','tableName','columnName']).createOrReplaceTempView("allColumns")```
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。