如何解决如何从文件名中提取元素并将其移动到不同的列?
我有一个文件名,我将其转换为列表。该列表包含以下元素:
list = ['15253_Variation.JPG','15253_Variation_Tainted.JPG','15253_Variation_O2_Saxophone.PNG','15253_Variation_O2_Saxophone.jpg','15253_Variation_O2_Saxophone_reference.png','15253_Variation_Side1.JPG','15253_Variation_Side2.JPG']
我的目标是从此列表中提取元素并填写一个数据框,该数据框应如下所示:
链接到包含上述图像的Google表格:https://docs.google.com/spreadsheets/d/1kuX3M4RFCNWtNoE7Hm1ejxWMwF-Cs4p8SsjA3JzdidA/edit?usp=sharing
我完成的操作如下代码:
Obj = pd.DataFrame(data = list,index = None,columns = ['file'])
new_list = []
for i in Obj['file']:
new_list.append(i.split('_'))
但是,这个人不会留下任何空白,因此无法满足我的需求。
非常感谢您。
解决方法
根据评论数。这很痛苦,因为文件名中的标记不是完全固定的格式。很多条件逻辑
- 已经定义了两个附加列表
mi
工具和oxygen
,无论它是什么。 - 第一遍是建立
dict
这是熊猫的标准格式 - 然后在具有基本数据帧之后通过条件逻辑进行工作
# don't name it list - it override python list()!
l = ['15253_Variation.JPG','15253_Variation_Tainted.JPG','15253_Variation_O2_Saxophone.PNG','15253_Variation_O2_Saxophone.jpg','15253_Variation_O2_Saxophone_reference.png','15253_Variation_Side1.JPG','15253_Variation_Side2.JPG']
issues = ["Tainted","Perfect"]
mi = ["Saxophone"]
oxygen = ["O2"]
# first pass using dict/list comprehensions
df = pd.DataFrame({"filename":{i:f.split(".")[0] for i,f in enumerate(l)},"Number":{i:f.split("_")[0] for i,"Name":{i:f.split("_")[1].split(".")[0] for i,"Location2":{},"Issues":{},"Oxygen":{},"Location":{},"Musical Instrument":{},"Ref":{},"Extension":{i:f.split(".")[1] for i,f in enumerate(l)}})
df = df.assign(**{
# list of tokens for checking fixed lists against
"Tokens":lambda dfa: dfa.apply(lambda s: s["filename"].split("_")[2:],axis=1),"Issues":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s,issues))[0][0]]
if np.isin(s,issues).any() else np.nan),"Musical Instrument":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s,mi))[0][0]]
if np.isin(s,mi).any() else np.nan),"Oxygen":lambda dfa: dfa["Tokens"].apply(lambda s: s[np.where(np.isin(s,oxygen))[0][0]]
if np.isin(s,oxygen).any() else np.nan),}).assign(**{
# let's do tokens again minus ones already placed
"Tokens":lambda dfa: dfa.apply(lambda s: [t for t in s["filename"].split("_")[2:]
if not(t==s["Issues"]
or t==s["Musical Instrument"]
or t==s["Oxygen"])],"Location2":lambda dfa: dfa.apply(lambda s: s["Tokens"][0] if len(s["Tokens"])>0
and "Side" in s["Tokens"][0] else np.nan,"Ref":lambda dfa: dfa.apply(lambda s: s["Tokens"][0] if len(s["Tokens"])>0
and "Side" not in s["Tokens"][0] else np.nan,axis=1)
}).drop(columns=["Tokens","filename"])
print(df.to_string(index=False))
输出
Number Name Location2 Issues Oxygen Location Musical Instrument Ref Extension
15253 Variation NaN NaN NaN NaN NaN NaN JPG
15253 Variation NaN Tainted NaN NaN NaN NaN JPG
15253 Variation NaN NaN O2 NaN Saxophone NaN PNG
15253 Variation NaN NaN O2 NaN Saxophone NaN jpg
15253 Variation NaN NaN O2 NaN Saxophone reference png
15253 Variation Side1 NaN NaN NaN NaN NaN JPG
15253 Variation Side2 NaN NaN NaN NaN NaN JPG
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。