如何解决如何在for循环中将字典追加到字典?
我正在尝试创建一个字典,其中每个键的值是两个字典。
我有两个患者(正常组织,疾病组织)条形码列表,它们对应于数据框中的值列。我的目标是匹配两个列表中的患者,然后针对两个列表中的每个患者,将其正常值和疾病组织值附加到字典中。字典键将是患者条形码,而字典值将是正常组织的另一个字典:从数据框中提取的值,而疾病组织:从数据框中提取的值。所以从
开始In [3]: df = pd.DataFrame({'Patient1_Normal':['nan',0.01,0.1,0.16,0.88,0.83,0.82,'nan'],'Patient1_Disease':[0.12,0.06,0.19,0.34,'nan',0.73,0.91],'Patient2_Disease':['nan',1.0,0.24,0.67,0.97,0.98],'Patient3_Normal': [0.21,0.25,0.63,0.92,0.3,0.56,0.78,0.9],'Patient3_Disease':[0.11,0.45,0.22,0.89,0.17,0.12],'Patient4_Normal':['nan',0.35,0.66,0.21,'Patient4_Disease':['nan',0.72,0.91,0.79],'Patient5_Disease': [0.34,0.27,0.32,0.55,0.51]})
In [4]: df
Out[4]: Patient1_Normal Patient1_Disease Patient2_Disease Patient3_Normal Patient3_Disease Patient4_Normal Patient4_Disease Patient5_Disease
0 nan 0.12 nan 0.21 0.11 nan nan 0.34
1 0.01 0.06 nan 0.25 0.45 0.35 nan 0.27
2 0.1 0.19 nan 0.63 nan nan 0.56 nan
3 0.16 0.34 1 0.92 0.45 0.22 0.72 0.16
4 0.88 nan 0.24 0.30 0.22 0.45 nan 0.32
5 0.83 nan 0.67 0.56 0.89 0.66 0.97 0.27
6 0.82 0.73 0.97 0.78 0.17 0.21 0.91 0.55
7 nan 0.91 0.98 0.90 0.12 0.91 0.79 0.51
这是我到目前为止所拥有的:
D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]
paired_patients = {}
psi_sets = {}
psi_sets['d'] = []
psi_sets['n'] = []
for patient in N_col:
patient_id = patient[0:8]
n_id = patient
d_id = [i for i in D_col if patient_id in i]
if len(d_id) > 0:
psi_sets['n'] = df[n_id].to_list()
for d in d_id:
psi_sets['d'] = df[d].to_list()
paired_patients[patient_id] = psi_sets
但是,我的paired_patients
字典值是覆盖而不是附加,因此paired_patients
的输出看起来像这样:
{'Patient1': {'d': ['nan','n': ['nan',0.91]},'Patient3': {'d': ['nan','Patient4': {'d': ['nan',0.91]}}
我该如何修正代码的最后一位,以便为每个患者正确附加paired_patient
字典值,以使paired_patient
字典看起来像这样:
{'Patient1': {'d': [0.12,'nan']},'Patient3': {'d': [0.11,'n': [0.21,0.9]},'Patient4': {'nan',0.91]}}
解决方法
D_col = [col for col in df if '_Disease' in col]
N_col = [col for col in df if '_Normal' in col]
paired_patients = {}
for patient in N_col:
psi_sets = {}
patient_id = patient[0:8]
n_id = patient
d_id = [i for i in D_col if patient_id in i]
if len(d_id) > 0:
psi_sets['n'] = df[n_id].to_list()
for d in d_id:
psi_sets['d'] = df[d].to_list()
paired_patients[patient_id] = psi_sets
,
您可以使用df.melt
,pd.concat
,series.str.split
,df.replace
,df.groupby
和df.xs
,最后使用df.to_dict
。
请检查以下内容:
>>> df2 = (pd.concat([
df.melt().variable.str.split('_',expand=True),df.melt().drop('variable',1)
],axis=1)
.replace({'Normal':'n','Disease':'d'})
.groupby([0,1]).agg(list))
>>> paired_patients = {k: v for k,v in
df2.groupby(level=0)
.apply(lambda df: df.xs(df.name).value.to_dict())
.to_dict().items()
if not ({'d','n'} ^ v.keys())}
>>> paired_patients
{'Patient1': {'d': [0.12,0.06,0.19,0.34,'nan',0.73,0.91],'n': ['nan',0.01,0.1,0.16,0.88,0.83,0.82,'nan']},'Patient3': {'d': [0.11,0.45,0.22,0.89,0.17,0.12],'n': [0.21,0.25,0.63,0.92,0.3,0.56,0.78,0.9]},'Patient4': {'nan',0.72,0.97,0.91,0.79],0.35,0.66,0.21,0.91]}}
EXPLANTION :
>>> df.melt()
variable value
0 Patient1_Normal NaN
1 Patient1_Normal 0.01
2 Patient1_Normal 0.10
.. ... ...
62 Patient5_Disease 0.55
63 Patient5_Disease 0.51
>>> df.melt().variable.str.split('_',expand=True)
0 1
0 Patient1 Normal
1 Patient1 Normal
2 Patient1 Normal
.. ... ...
62 Patient5 Disease
63 Patient5 Disease
[64 rows x 2 columns]
# then concat these two,replace 'Normal' and 'Disease' with 'n' and 'd' and drop
# the 'variable' column
>>> pd.concat([
df.melt().variable.str.split('_',axis=1).replace({'Normal':'n','Disease':'d'})
0 1 value
0 Patient1 n NaN
1 Patient1 n 0.01
2 Patient1 n 0.10
.. ... .. ...
62 Patient5 d 0.55
63 Patient5 d 0.51
[64 rows x 3 columns]
# then groupby column [0,1] and aggregate into list:
>>> df2 = _.groupby([0,1]).agg(list)
>>> df2
value
0 1
Patient1 d [0.12,nan,0.91]
n [nan,nan]
Patient2 d [nan,1.0,0.24,0.67,0.98]
Patient3 d [0.11,0.12]
n [0.21,0.9]
Patient4 d [nan,0.79]
n [nan,0.91]
Patient5 d [0.34,0.27,0.32,0.55,0.51]
# Now groupby level=0,and convert that into dict,and finally check whether
# both 'n' and 'd' are present as keys by using symmetric set difference
# properties of dict_keys objects
>>> paired_patients = {k: v for k,v in
df2.groupby(level=0)
.apply(lambda df: df.xs(df.name).value.to_dict())
.to_dict().items()
if ('n' in v) and ('d' in v)}