如何解决如何找到写不同的相同职称?
我想找到简历上有职位名称(例如)Market Research Coordinator
的人,但是他们的书写方式可能有所不同,例如:
Marketing Research Coordinator
Market Researching Coordinator
Markets Research Coordinator
Market Researches Coordinator
Marketing Research Coordinator
Markets Researchers Coordinator
Market Researcher Coordinators
Marketing Researcher Coordinators
...
如果我想与==
匹配,我将不会获得很好的结果,词干和词根化也很难识别出这些差异。
另一个选择是在两个字符串(which is discussed in this question)之间使用相似性度量,但是这将非常耗时并且可能不是一个好方法,同样在此方法中,确定阈值是另一个问题。
聪明的人有个主意吗?
解决方法
我不接受词干和词根化不起作用!您可以标记您的输入。然后获取词干,并确保在市场营销的情况下,如果正确选择了语言(检查词干包装中是否正确选择了语言),您将获得市场。您还应该确保将词干应用到if语句的两个元素上!
万一有听写问题或差异很小,可以使用Levenstein程序包并接受相似度大于T的输入。
示例:
import nltk.stem.porter
p_stemmer = PorterStemmer()
print("the stem of marketing:",p_stemmer.stem('Marketing'))
print("the stem of marketing research:",p_stemmer.stem('Marketing Research'))
,结果将为:
the stem of marketing: 'market' (correct)
the stem of marketing research: 'marketing research' (not want we want)
如您所见,如果未应用标记化,则词干提取器将无法正常工作。
我建议将所有这些(令牌化,词干和levenstein)结合起来。
,您可以使用Python软件包textdistance
计算字符串之间的归一化相似度,只有在相似度高于某个阈值时才保留它们。
import textdistance
main_job = 'Marketing Research Coordinator'
other_jobs = ['Market Researching Coordinator','Markets Research Coordinator','Market Researches Coordinator','Marketing Research Coordinator','Markets Researchers Coordinator','Market Researcher Coordinators','Marketing Researcher Coordinators','Marketing Researcher Executive','Senior Advertising Analyst']
for job in other_jobs:
distance = textdistance.jaccard.normalized_similarity(main_job,job)
print(f'Similarity "{main_job}" & "{job}": {distance:.3f}')
Similarity "Marketing Research Coordinator" & "Market Researching Coordinator": 1.000
Similarity "Marketing Research Coordinator" & "Markets Research Coordinator": 0.871
Similarity "Marketing Research Coordinator" & "Market Researches Coordinator": 0.844
Similarity "Marketing Research Coordinator" & "Marketing Research Coordinator": 1.000
Similarity "Marketing Research Coordinator" & "Markets Researchers Coordinator": 0.794
Similarity "Marketing Research Coordinator" & "Market Researcher Coordinators": 0.818
Similarity "Marketing Research Coordinator" & "Marketing Researcher Coordinators": 0.909
Similarity "Marketing Research Coordinator" & "Marketing Researcher Executive": 0.579
Similarity "Marketing Research Coordinator" & "Senior Advertising Analyst": 0.436
看看最后两个例子。
,使用以下正则表达式模式检查职位是否匹配
import re
pattern = r'Market(\w*?) Research(\w*?) Coordinator'
print('Enter job title')
job_title = input()
if re.search(pattern,job_title):
print('Job title matching with Market Research Coordinator')
else:
print('Job title not matching with Market Research Coordinator')
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。