如何解决如何在pd.to_gbq中获得进度条?
在将数据推送到Big Query时,我将其推送如下:
from tqdm import tqdm
tqdm.pandas()
all_data.to_gbq(project_id = project_id,destination_table = table_name,credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),progress_bar = True,if_exists = 'replace')
但是,这样做并没有给我一个进度条,而是一个时间指示器,该指示器在某个时间点随机显示并停留在该值上(当前显示:1it [02:12,132.71s/it]
),同时正在推送数据和一次数据被完全推送,它会更改以显示花费的总时间(如1it [17:02,1022.74s/it]
),而在推送过程中我不知道如何完成以及完成到什么时间。
当我尝试将其按块推送时:
appended_rows = 0
for chunk in pd.read_csv('users.csv',chunksize= 10000):
chunk.to_gbq(project_id = project_id,table_schema = [{'name':col,'type':type(all_data[col][0])} for col in all_data.columns],credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),if_exists = 'append')
appended_rows += len(chunk)
print(f'Uploaded {appended_rows*100/len(all_data)}%')
尽管我对所使用的数据类型非常有信心,并且我正在明确提供模式(我也尝试不传递模式),但它仍然引发以下错误;当我尝试推送整个数据帧时,它也可以正常运行:
InvalidSchema Traceback (most recent call last)
<ipython-input-36-b336ea672493> in <module>
9 'credentials.json'),10 table_schema = [{'name':col,---> 11 if_exists = 'append')
12 appended_rows += len(chunk)
13 print(f'Uploaded {appended_rows*100/len(all_data)}%')
~\AppData\Roaming\Python\Python36\site-packages\pandas\core\frame.py in to_gbq(self,destination_table,project_id,chunksize,reauth,if_exists,auth_local_webserver,table_schema,location,progress_bar,credentials)
1652 location=location,1653 progress_bar=progress_bar,-> 1654 credentials=credentials,1655 )
1656
~\AppData\Roaming\Python\Python36\site-packages\pandas\io\gbq.py in to_gbq(dataframe,credentials,verbose,private_key)
226 credentials=credentials,227 verbose=verbose,--> 228 private_key=private_key,229 )
~\AppData\Roaming\Python\Python36\site-packages\pandas_gbq\gbq.py in to_gbq(dataframe,private_key)
1176 ):
1177 raise InvalidSchema(
-> 1178 "Please verify that the structure and "
1179 "data types in the DataFrame match the "
1180 "schema of the destination table."
InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
那么我如何获得进度条,因为最终目标只是知道进度推送,而不是将其按块推送还是全部推送?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。