如何解决如何将一些CSV文件合并到一个DataFrame中?
我有一些CSV文件,它们的股票报价结构完全相同(时间范围是一天):
date,open,high,low,close
2001-10-15 00:00:00 UTC,56.11,59.8,55.0,57.9
2001-10-22 00:00:00 UTC,57.9,63.63,56.88,62.18
我想将它们全部合并到一个DataFrame中,并且每个股票只有收盘价列。问题是不同的文件具有不同的历史深度(它们从不同年份的不同日期开始)。我想按日期将它们全部对齐到一个DataFrame中。 我正在尝试运行以下代码,但是在生成的df中我胡扯:
files = ['FB','MSFT','GM','IBM']
stock_d = {}
for file in files: #reading all files into one dictionary:
stock_d[file] = pd.read_csv(file + '.csv',parse_dates=['date'])
date_column = pd.Series() #the column with all dates from all CSV
for stock in stock_d:
date_column = date_column.append(stock_d[stock]['date'])
date_column = date_column.drop_duplicates().sort_values(ignore_index=True) #keeping only unique values,then sorting by date
df = pd.DataFrame(date_column,columns=['date']) #creating final DataFrame
for stock in stock_d:
stock_df = stock_d[stock] #this is one of CSV files,for example FB.csv
df[stock] = [stock_df.iloc[stock_df.index[stock_df['date'] == date]]['close'] for date in date_column] #for each date in date_column adding close price to resulting DF,or should be None if date not found
print(df.tail()) #something strange here - Series objects in every column
这个想法是首先从每个文件中提取所有日期,然后在各列和日期之间分配收盘价。但是显然我做错了。 你能帮我吗?
解决方法
如果我对您的理解正确,那么您正在寻找的是透视操作:
files = ['FB','MSFT','GM','IBM']
df = [] # this is a list,not a dictionary
for file in files:
# You only care about date and closing price
# so only keep those 2 columns to save memory
tmp = pd.read_csv(file + '.csv',parse_dates=['date'],usecols=['date','close']).assign(symbol=file)
df.append(tmp)
# A single `concat` is faster then sequential `append`s
df = pd.concat(df).pivot(index='date',columns='symbol')
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。