如何解决Python CSV:查找有条件的最新记录
我有一个带有以下示例数据的csv:
id bb_id cc_id datetime
-------------------------
1 11 44 2019-06-09
2 33 55 2020-06-09
3 22 66 2020-06-09
4 11 44 2019-06-09
5 11 44 2020-02-22
让我们说条件为if bb_id == 11 and cc_id == 44
得到最新记录,即:
11 44 2020-02-22
如何从csv中获取此信息?
我做了什么:
with open('sample.csv') as csv_file
for indx,data in enumerate(csv.DictReader(csv_file)):
# check if the conditional data is in the file?
if data['bb_id'] == 11 and data['cc_id'] == 44:
# sort the data by date? or should I store all the relevant data before hand in a data structure like list and then apply sort on it? could I avoid that? as I need to perform this interactively multiple times
解决方法
将所有选中的记录放入列表中,然后使用max()
函数(以日期为键)。
selected_rows = []
with open('sample.csv') as csv_file
for data in csv.DictReader(csv_file):
# check if the conditional data is in the file?
if data['bb_id'] == 11 and data['cc_id'] == 44:
selected_rows.append(data)
latest = max(selected_rows,key = lambda x: x['datetime'])
print(latest)
,
如果您真的想在常规python中执行此操作,则类似以下内容很简单:
with open('sample.csv') as csv_file:
list_of_dates = []
for indx,data in enumerate(csv.DictReader(csv_file)):
if data['bb_id'] == 11 and data['cc_id'] == 44:
list_of_dates.append(data['datetime'])
sorted = list_of_dates.sort()
print( sorted[-1] ) # you already know the values for bb and cc
也尝试:
def sort_func(e):
return e['datetime']
with open('sample.csv') as csv_file:
list_of_dates = []
for indx,data in enumerate(csv.DictReader(csv_file)):
if data['bb_id'] == 11 and data['cc_id'] == 44:
list_of_dates.append(data)
sorted = list_of_dates.sort(key=sort_func)
print( sorted[-1] )
,
我知道的最简单的方法:
import pandas as pd
import pandasql as ps
sample_df = pd.read_csv(<filepath>);
ps.sqldf("""select *
from (select *
from sample_df
where bb_id = 11
and cc_id = 44
order by datetime desc) limit 1""",locals())
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。