如何解决熊猫复制列元素并基于相关列表应用于另一列
这是一个棘手的问题,很久以来我一直在head头。我有以下数据框。
PID label Drug Value
123 ABC Big 1
123 ABC Sul 2
132 ABC DPP 3
132 ABC sglt 4
143 ABC insu 1
143 ABC Sul 2
154 ABC Big 1
零件数据框仅供参考,我还有另一段代码,将提供相关零件的列表以及每个商店的主要零件。
#对于商店A->电视:['remote','antenna','peaker'];商店B->单元格:['显示','触摸板'] 我期望的数据帧是:
dct = {'Store': ('A','A','B','C','C'),'code_num':('INC101','INC102','INC103','INC104','INC105','INC106','INC201','INC202','INC203','INC301','INC302','INC303'),'days':('4','18','9','15','3','6','10','5','1','8','5'),'products': ('remote','antenna','remote,antenna','TV','display','display,touchpad','speaker','Cell','antenna')
}
df = pd.DataFrame(dct)
pts = {'Primary': ('TV','Cell'),'Related' :('remote','touchpad')
}
parts = pd.DataFrame(pts)
print(df)
Store code_num days products
0 A INC101 4 remote
1 A INC102 18 antenna
2 A INC103 9 remote,antenna
3 A INC104 15 TV
4 A INC105 3 display
5 A INC106 6 TV
6 B INC201 10 display,touchpad
7 B INC202 5 speaker
8 B INC203 3 Cell
9 C INC301 1 display
10 C INC302 8 speaker
11 C INC303 5 antenna
我有适合一次性执行整个df的代码。但是由于其他业务规则,这将是一片数据。含义2和3将被省略,因此,.iloc值对于某些记录可能有所不同。因此,如果您在 如果需要更多信息,请告诉我。 我知道它非常复杂,实际上是一个脑筋急转弯。
解决方法
复制了方案:
您的输入:
dct = {'Store': ('A','A','B','C','C'),'code_num':('INC101','INC102','INC103','INC104','INC105','INC106','INC201','INC202','INC203','INC301','INC302','INC303'),'days':('4','18','9','15','3','6','10','5','1','8','5'),'products': ('remote','antenna','remote,antenna','TV','display','display,touchpad','speaker','Cell','antenna')
}
df = pd.DataFrame(dct)
pts = {'Primary': ('TV','Cell'),'Related' :('remote','touchpad')
}
parts = pd.DataFrame(pts)
store = {'A':'TV','B':'Cell'}
解决方案:
将df部分转换为Dictionary:
parts_df_dict = dict(zip(parts['Related'],parts['Primary']))
拆分逗号分隔的子产品,并使其分隔行:
new_df = pd.DataFrame(df.products.str.split(',').tolist(),index=df.code_num).stack()
new_df = new_df.reset_index([0,'code_num'])
new_df.columns = ['code_num','Prod_seperated']
new_df = new_df.merge(df,on='code_num',how='left')
创建引用列的逻辑:
store_prod = {}
for k,v in store.items():
store_prod[k] = k+'_'+v
new_df['prod_store'] = new_df['Store'].map(store_prod)
new_df['p_store'] = new_df['Store'].map(store)
new_df['main_ind'] = ' '
new_df.loc[(new_df['prod_store']==new_df['Store']+'_'+new_df['Prod_seperated'])&(new_df['days'].astype('int')<10),'main_ind']=new_df['code_num']
refer_dic = new_df.groupby('Store')['main_ind'].max().to_dict()
new_df['prod_subproducts'] = new_df['Prod_seperated'].map(parts_df_dict)
new_df['refer'] = np.where((new_df['p_store']==new_df['prod_subproducts'])&(new_df['days'].astype('int')<=10),new_df['Store'].map(refer_dic),np.nan)
new_df['refer'].fillna(new_df['main_ind'],inplace=True)
new_df.drop(['Prod_seperated','prod_store','p_store','main_ind','prod_subproducts'],axis=1,inplace=True)
new_df.drop_duplicates(inplace=True)
new_df或必需的输出:
如果您有任何疑问,请告诉我。