如何解决如何在 Python 中编写通用/灵活的正则表达式?
我正在学习正则表达式。如您所知,人们可能有中间名,也可能没有。我想写一个灵活的正则表达式,以便将来编译和使用。但是,我无法这样做。任何建议和/或帮助将不胜感激。下面是我没有中间名的名字的正则表达式。
import re
p = re.compile(r"\W+\s+(?P<firstname>\w+)\s+(?P<lastname>\w+)")
name = "John Drell"
m = p.search(name)
我对没有中间名的名字没有任何问题。但是,我无法为可能有或没有中间名的名字写出正确的灵活。这是我的测试代码之一。
import re
p = re.compile(r"\W+\s+(?P<firstname>\w+)\s+(?:P<middlename>[A-Z]*)(?P<lastname>\w+)")
name = "John M. Drell"
m = p.search(name)
这个脚本只允许有中间名的名字,否则我会收到错误信息:'NonType' object has no attribute 'groups'。
如果你能纠正我,我非常感谢。
解决方法
使用split()
:
names = ["John M. Drell","John Drell"]
for name in names:
firstname,*middlenames,lastname = name.split()
print(f'First name: {firstname},Middle name(s): {" ".join(middlenames)},Last name: {lastname}')
通过正则表达式,学习使用可选组和 \S
来匹配任何非空白字符:
^(?P<firstname>\S+)(?:\s+(?P<middlename>\S+(?: +\S+)*))?\s+(?P<lastname>\S+)$
说明
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?P<firstname> group and capture to "firstname":
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n,\r,\t,\f,and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of "firstname"
--------------------------------------------------------------------------------
(?: group,but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n,and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?P<middlename> group and capture to "middlename":
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n,and " ") (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group,but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
+ ' ' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n,and " ") (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of "middlename"
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\s+ whitespace (\n,and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?P<lastname> group and capture to "lastname":
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n,and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of "lastname"
--------------------------------------------------------------------------------
$ before an optional \n,and the end of the
string