如何解决规范列表/元组数据的多列
我有一个包含多列元组数据的数据框。我正在尝试规范化每列中每一行的元组中的数据。这是一个带有列表的示例,但对于元组也应该是相同的概念-
df = pd.DataFrame(np.random.randn(5,10),columns=['a','b','c','d','e','f','g','h','i','j'])
df['arr1'] = df[['a','e']].values.tolist()
df['arr2'] = df[['f','j']].values.tolist()
如果我希望将每个列表行归一化为几列,我会这样做-
df['arr1'] = [preprocessing.scale(row) for row in df['arr1']]
df['arr2'] = [preprocessing.scale(row) for row in df['arr2']]
但是,由于我的原始数据集中有大约100个这样的列,所以我显然不想手动对每个列进行规范化。如何遍历所有列?
解决方法
您可以像这样浏览DataFrame中的列以处理每一列:
for col in df.columns:
df[col] = [preprocessing.scale(row) for row in df[col]]
当然,仅当您要处理DataFrame中所有列的 all 时,此方法才有效。如果只需要一个子集,则可以先创建一个列列表,也可以删除其他列。
# Here's an example where you manually specify the columns
cols_to_process = ["arr1","arr2"]
for col in cols_to_process:
df[col] = [preprocessing.scale(row) for row in df[col]]
# Here's an example where you drop the unwanted columns first
cols_to_drop = ["a","b","c"]
df = df.drop(columns=cols_to_drop)
for col in cols_to_process:
df[col] = [preprocessing.scale(row) for row in df[col]]
# Or,if you didn't want to actually drop the columns
# from the original DataFrame you could do it like this:
cols_to_drop = ["a","c"]
for col in df.drop(columns=cols_to_drop):
df[col] = [preprocessing.scale(row) for row in df[col]]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。