pandas dataframe-python检查字符串是否在另一列中忽略大写/小写

如何解决pandas dataframe-python检查字符串是否在另一列中忽略大写/小写

我的数据框与（pandas dataframe check if column contains string that exists in another column）中的要求相同

Name       Description
Am         Owner of Am
BQ         Employee at bq  
JW         Employee somewhere

我想检查名称是否也是描述的一部分，如果是，则保留该行。如果不是，请删除该行。在这种情况下，它将删除第三行（JW Employee在某处）

我正在使用

df[df.apply(lambda x: x['Name'] in x['Description'],axis = 1)]

在这种情况下，它也删除了BQ的行，因为在描述中“ bq”是小写的。无论如何要使用相同的语法，但要考虑到区分大小写？

解决方法

使用.lower()使其与大小写无关：

df[df.apply(lambda x: x['Name'].lower() in x['Description'].lower(),axis=1)]

请注意，这会将"am"视为"amy"上的匹配项。您可能希望使用单词边界来防止这种情况：

>>> def filter(x): 
...     return bool(re.search(rf"(?i)\b{x['Name']}\b",x["Description"]))
...
>>> df[df.apply(filter,axis=1)]
  Name     Description
0   Am     Owner of Am
1   BQ  Employee at bq

或者split可以更好地处理正则表达式特殊字符：

df[df.apply(lambda x: x["Name"].lower() in x["Description"].lower().split(),axis=1)]

您应该使用

df[df.apply(lambda x: x['Name'] in x['Description'].split(' '),axis = 1)]

您可以使用lower，split和isin：

msk=df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()).any(1)
df[msk]

输出：

  Name     Description
0   Am     Owner of Am
1   BQ  Employee at bq

详细信息
首先，我们使用str.lower将字符串转换为小写字母

print(df.Description.str.lower())
0           owner of am
1        employee at bq
2    employee somewhere
Name: Description,dtype: object

然后我们分割字符串并展开列表：

print(df.Description.str.lower().str.split(expand=True))
          0          1     2
0     owner         of    am
1  employee         at    bq
2  employee  somewhere  None

然后我们用df.name来检查isin的值

print(df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()))
   0      1      2
0  False  False   True
1  False  False   True
2  False  False  False

最后使any在第1轴上（行），以查看是否至少有一个单词匹配：

print(df.Description.str.lower().str.split(expand=True).isin(df.Name.str.lower()).any(1))
0     True
1     True
2    False
dtype: bool

pandas dataframe-python检查字符串是否在另一列中忽略大写/小写

如何解决pandas dataframe-python检查字符串是否在另一列中忽略大写/小写

解决方法

相关推荐