如何替换一个字符串的多个子字符串？

如何解决如何替换一个字符串的多个子字符串？

| 我想使用.replace函数替换多个字符串。我目前有

string.replace(\"condition1\",\"\")

但想有类似的东西

string.replace(\"condition1\",\"\").replace(\"condition2\",\"text\")

虽然那听起来不像是好的语法正确的方法是什么？有点像在grep / regex中，您可以执行\\1和\\2来将字段替换为某些搜索字符串

解决方法

这是一个简短的示例，应该使用正则表达式来解决问题：

import re

rep = {\"condition1\": \"\",\"condition2\": \"text\"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k),v) for k,v in rep.iteritems()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile(\"|\".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))],text)

例如：

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))],\"(condition1) and --condition2--\")
\'() and --text--\'

,您可以制作一个不错的小循环功能。

def replace_all(text,dic):
    for i,j in dic.iteritems():
        text = text.replace(i,j)
    return text

其中text是完整字符串，and8ѭ是字典-每个定义都是一个字符串，它将替换该词的匹配项。注意：在Python 3中，“ 9”已替换为“ 10” 注意：Python字典没有可靠的迭代顺序。该解决方案仅在以下情况下解决您的问题：更换顺序无关紧要可以更改以前的替换结果例如：

d = { \"cat\": \"dog\",\"dog\": \"pig\"}
mySentence = \"This is my cat and this is my dog.\"
replace_all(mySentence,d)
print(mySentence)

可能的输出＃1： “这是我的猪，这是我的猪。” 可能的输出＃2 “这是我的狗，这是我的猪。” 一种可能的解决方法是使用OrderedDict。

from collections import OrderedDict
def replace_all(text,j in dic.items():
        text = text.replace(i,j)
    return text
od = OrderedDict([(\"cat\",\"dog\"),(\"dog\",\"pig\")])
mySentence = \"This is my cat and this is my dog.\"
replace_all(mySentence,od)
print(mySentence)

输出：

\"This is my pig and this is my pig.\"

小心＃2：如果您的text字符串太大或字典中有许多对，则效率低下。 ,为什么不提供这样的解决方案？

s = \"The quick brown fox jumps over the lazy dog\"
for r in ((\"brown\",\"red\"),(\"lazy\",\"quick\")):
    s = s.replace(*r)

#output will be:  The quick red fox jumps over the quick dog

,这是使用reduce的第一个解决方案的变体，以防您需要正常运行。 :)

repls = {\'hello\' : \'goodbye\',\'world\' : \'earth\'}
s = \'hello,world\'
reduce(lambda a,kv: a.replace(*kv),repls.iteritems(),s)

martineau的更好版本：

repls = (\'hello\',\'goodbye\'),(\'world\',\'earth\')
s = \'hello,repls,s)

,这只是对F.J和MiniQuark绝佳答案的更为简洁的概括。要实现多个同时的字符串替换，您所需要做的就是以下功能：

def multiple_replace(string,rep_dict):
    pattern = re.compile(\"|\".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]),flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)],string)

用法：

>>>multiple_replace(\"Do you like cafe? No,I prefer tea.\",{\'cafe\':\'tea\',\'tea\':\'cafe\',\'like\':\'prefer\'})
\'Do you prefer tea? No,I prefer cafe.\'

如果您愿意，您可以从此简单的功能开始制作自己的专用替换功能。 ,我基于F.J.的出色答案：

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile(\"|\".join([re.escape(k) for k,v in key_values]),re.M)
    return lambda string: pattern.sub(replacement_function,string)

def multiple_replace(string,*key_values):
    return multiple_replacer(*key_values)(string)

一杆用法：

>>> replacements = (u\"café\",u\"tea\"),(u\"tea\",u\"café\"),(u\"like\",u\"love\")
>>> print multiple_replace(u\"Do you like café? No,*replacements)
Do you love tea? No,I prefer café.

注意，由于更换仅需一遍，因此“café”更改为“ tea”，但不会更改为“café”。如果您需要多次进行相同的替换，则可以轻松创建替换功能：

>>> my_escaper = multiple_replacer((\'\"\',\'\\\\\"\'),(\'\\t\',\'\\\\t\'))
>>> many_many_strings = (u\'This text will be escaped by \"my_escaper\"\',u\'Does this work?\\tYes it does\',u\'And can we span\\nmultiple lines?\\t\"Yes\\twe\\tcan!\"\')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \\\"my_escaper\\\"
Does this work?\\tYes it does
And can we span
multiple lines?\\t\\\"Yes\\twe\\tcan!\\\"

改进之处：把代码变成一个函数增加了多行支持修复了转义中的错误易于为特定的多个替换创建函数请享用！ :-) ,我想提出字符串模板的用法。只需将要替换的字符串放在字典中，就可以完成所有设置！来自docs.python.org的示例

>>> from string import Template
>>> s = Template(\'$who likes $what\')
>>> s.substitute(who=\'tim\',what=\'kung pao\')
\'tim likes kung pao\'
>>> d = dict(who=\'tim\')
>>> Template(\'Give $who $100\').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1,col 10
>>> Template(\'$who likes $what\').substitute(d)
Traceback (most recent call last):
[...]
KeyError: \'what\'
>>> Template(\'$who likes $what\').safe_substitute(d)
\'tim likes $what\'

,就我而言，我需要用名称简单替换唯一键，所以我想到了：

a = \'This is a test string.\'
b = {\'i\': \'I\',\'s\': \'S\'}
for x,y in b.items():
    a = a.replace(x,y)
>>> a
\'ThIS IS a teSt StrIng.\'

,从Python 3.8开始，并引入赋值表达式（PEP 572）（:=运算符），我们可以在列表推导中应用替换：

# text = \"The quick brown fox jumps over the lazy dog\"
# replacements = [(\"brown\",\"quick\")]
[text := text.replace(a,b) for a,b in replacements]
# text = \'The quick red fox jumps over the quick dog\'

,这是我的$ 0.02。它基于安德鲁·克拉克（Andrew Clark）的回答，但更加清晰，它还涵盖了替换字符串是另一个替换字符串的子字符串（更长的字符串获胜）的情况。

def multireplace(string,replacements):
    \"\"\"
    Given a string and a replacement map,it returns the replaced string.

    :param str string: string to execute replacements on
    :param dict replacements: replacement dictionary {value to find: value to replace}
    :rtype: str

    \"\"\"
    # Place longer ones first to keep shorter substrings from matching
    # where the longer ones should take place
    # For instance given the replacements {\'ab\': \'AB\',\'abc\': \'ABC\'} against 
    # the string \'hey abc\',it should produce \'hey ABC\' and not \'hey ABc\'
    substrs = sorted(replacements,reverse=True)

    # Create a big OR regex that matches any of the substrings to replace
    regexp = re.compile(\'|\'.join(map(re.escape,substrs)))

    # For each match,look up the new string in the replacements
    return regexp.sub(lambda match: replacements[match.group(0)],string)

正是在这个主旨中，如果您有任何建议，可以随时对其进行修改。 ,我需要一个解决方案，其中要替换的字符串可以是正则表达式，例如，通过将多个空白字符替换为一个空白字符来帮助规范长文本。基于其他人（包括MiniQuark和mmj）的答案，这是我想到的：

def multiple_replace(string,reps,re_flags = 0):
    \"\"\" Transforms string,replacing keys from re_str_dict with values.
    reps: dictionary,or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions,such as re.DOTALL
    \"\"\"
    if isinstance(reps,dict):
        reps = reps.items()
    pattern = re.compile(\"|\".join(\"(?P<_%d>%s)\" % (i,re_str[0])
                                  for i,re_str in enumerate(reps)),re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1],string)

它适用于其他答案中给出的示例，例如：

>>> multiple_replace(\"(condition1) and --condition2--\",...                  {\"condition1\": \"\",\"condition2\": \"text\"})
\'() and --text--\'

>>> multiple_replace(\'hello,world\',{\'hello\' : \'goodbye\',\'world\' : \'earth\'})
\'goodbye,earth\'

>>> multiple_replace(\"Do you like cafe? No,...                  {\'cafe\': \'tea\',\'tea\': \'cafe\',\'like\': \'prefer\'})
\'Do you prefer tea? No,I prefer cafe.\'

对我来说，最主要的是您还可以使用正则表达式，例如仅替换整个单词，或对空格进行规范化：

>>> s = \"I don\'t want to change this name:\\n  Philip II of Spain\"
>>> re_str_dict = {r\'\\bI\\b\': \'You\',r\'[\\n\\t ]+\': \' \'}
>>> multiple_replace(s,re_str_dict)
\"You don\'t want to change this name: Philip II of Spain\"

如果您要将字典键用作普通字符串，您可以使用例如在调用multi_replace之前转义那些。这个功能：

def escape_keys(d):
    \"\"\" transform dictionary d by applying re.escape to the keys \"\"\"
    return dict((re.escape(k),v in d.items())

>>> multiple_replace(s,escape_keys(re_str_dict))
\"I don\'t want to change this name:\\n  Philip II of Spain\"

以下函数可以帮助您在字典键中查找错误的正则表达式（因为来自multiple_replace的错误消息不是很清楚）：

def check_re_list(re_list):
    \"\"\" Checks if each regular expression in list is well-formed. \"\"\"
    for i,e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError,re.error):
            print(\"Invalid regular expression string \"
                  \"at position {}: \'{}\'\".format(i,e))

>>> check_re_list(re_str_dict.keys())

请注意，它不会链接替换项，而是同时执行它们。这样可以提高效率，而不会限制它可以做什么。为了模拟链接的效果，您可能只需要添加更多的字符串替换对并确保对的预期顺序即可：

>>> multiple_replace(\"button\",{\"but\": \"mut\",\"mutton\": \"lamb\"})
\'mutton\'
>>> multiple_replace(\"button\",[(\"button\",\"lamb\"),...                             (\"but\",\"mut\"),(\"mutton\",\"lamb\")])
\'lamb\'

,这是一个示例，该示例在处理许多小替换的长字符串时效率更高。

source = \"Here is foo,it does moo!\"

replacements = {
    \'is\': \'was\',# replace \'is\' with \'was\'
    \'does\': \'did\',\'!\': \'?\'
}

def replace(source,replacements):
    finder = re.compile(\"|\".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
    result = []
    pos = 0
    while True:
        match = finder.search(source,pos)
        if match:
            # cut off the part up until match
            result.append(source[pos : match.start()])
            # cut off the matched part and replace it in place
            result.append(replacements[source[match.start() : match.end()]])
            pos = match.end()
        else:
            # the rest after the last match
            result.append(source[pos:])
            break
    return \"\".join(result)

print replace(source,replacements)

关键是要避免长字符串的许多串联。我们将源字符串切成片段，在形成列表时替换一些片段，然后将整个内容重新组合成字符串。 ,您真的不应该这样，但是我觉得它太酷了：

>>> replacements = {\'cond1\':\'text1\',\'cond2\':\'text2\'}
>>> cmd = \'answer = s\'
>>> for k,v in replacements.iteritems():
>>>     cmd += \".replace(%s,%s)\" %(k,v)
>>> exec(cmd)

现在，answer是所有替换的结果再次，这很hacky，不是您应该定期使用的东西。但是，很高兴知道您可以根据需要执行以下操作。 ,我不了解速度，但这是我的工作日快速解决方案：

reduce(lambda a,b: a.replace(*b),[(\'o\',\'W\'),(\'t\',\'X\')] #iterable of pairs: (oldval,newval),\'tomato\' #The string from which to replace values
    )

...但是我喜欢上面的＃1正则表达式答案。注意-如果一个新值是另一个值的子字符串，则该操作不是可交换的。 ,您可以使用pandas库和replace函数，它支持完全匹配以及正则表达式替换。例如：

df = pd.DataFrame({\'text\': [\'Billy is going to visit Rome in November\',\'I was born in 10/10/2010\',\'I will be there at 20:00\']})

to_replace=[\'Billy\',\'Rome\',\'January|February|March|April|May|June|July|August|September|October|November|December\',\'\\d{2}:\\d{2}\',\'\\d{2}/\\d{2}/\\d{4}\']
replace_with=[\'name\',\'city\',\'month\',\'time\',\'date\']

print(df.text.replace(to_replace,replace_with,regex=True))

修改后的文本是：

0    name is going to visit city in month
1                      I was born in date
2                 I will be there at time

您可以在此处找到示例。请注意，文本上的替换是按照它们在列表中出现的顺序进行的 ,我也在这个问题上挣扎。正则表达式有很多替代方法，但比循环string.replace慢四倍（在我的实验条件下）。您绝对应该尝试使用Flashtext库（此处的博客文章，此处的Github）。就我而言，每个文档的速度从1.8 s到0.015 s（正则表达式花费7.7 s）快了两个数量级。在上面的链接中很容易找到使用示例，但这是一个有效的示例：

    from flashtext import KeywordProcessor
    self.processor = KeywordProcessor(case_sensitive=False)
    for k,v in self.my_dict.items():
        self.processor.add_keyword(k,v)
    new_string = self.processor.replace_keywords(string)

请注意，Flashtext会一次性进行替换（以避免-> b和b-> c将\'a \'转换为\'c \'）。 Flashtext还会查找整个单词（因此\'is \'将不匹配\'this \'）。如果您的目标是几个单词（用\“ Hello \”替换\'This is \'），则效果很好。 ,从安德鲁的宝贵答案开始，我开发了一个脚本，该脚本从文件加载字典并详细说明打开的文件夹中的所有文件以进行替换。该脚本从可在其中设置分隔符的外部文件中加载映射。我是一个初学者，但是当在多个文件中进行多次替换时，我发现此脚本非常有用。它以秒为单位加载了包含1000多个条目的字典。这不是优雅，但对我有用

import glob
import re

mapfile = input(\"Enter map file name with extension eg. codifica.txt: \")
sep = input(\"Enter map file column separator eg. |: \")
mask = input(\"Enter search mask with extension eg. 2010*txt for all files to be processed: \")
suff = input(\"Enter suffix with extension eg. _NEW.txt for newly generated files: \")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file,separator is prompted
    for line in temprep:
        (key,val) = line.strip(\'\\n\').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename,\"r\") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k),v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile(\"|\".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)],text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+\"_NEW.txt\",\"w\")
        target.write(text)
        target.close()

,这是我解决问题的方法。我在聊天机器人中使用它立即替换了不同的单词。

def mass_replace(text,dct):
    new_string = \"\"
    old_string = text
    while len(old_string) > 0:
        s = \"\"
        sk = \"\"
        for k in dct.keys():
            if old_string.startswith(k):
                s = dct[k]
                sk = k
        if s:
            new_string+=s
            old_string = old_string[len(sk):]
        else:
            new_string+=old_string[0]
            old_string = old_string[1:]
    return new_string

print mass_replace(\"The dog hunts the cat\",{\"dog\":\"cat\",\"cat\":\"dog\"})

这将变成The cat hunts the dog ,另一个例子：输入清单

error_list = [\'[br]\',\'[ex]\',\'Something\']
words = [\'how\',\'much[ex]\',\'is[br]\',\'the\',\'fish[br]\',\'noSomething\',\'really\']

所需的输出将是

words = [\'how\',\'much\',\'is\',\'fish\',\'no\',\'really\']

代码：

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,\"\") for e in error_list if e in w],w] for w in words]]

,我觉得这个问题需要一个单行递归lambda函数答案才能完整，仅因为如此。所以那里：

>>> mrep = lambda s,d: s if not d else mrep(s.replace(*d.popitem()),d)

用法：

>>> mrep(\'abcabc\',{\'a\': \'1\',\'c\': \'2\'})
\'1b21b2\'

笔记：这消耗了输入字典。 Python字典从3.6开始保留输入顺序；其他答案中的相应警告不再适用。为了向后兼容，可以采用基于元组的版本：

>>> mrep = lambda s,d: s if not d else mrep(s.replace(*d.pop()),d)
>>> mrep(\'abcabc\',[(\'a\',\'1\'),(\'c\',\'2\')])

注意：与python中的所有递归函数一样，太大的递归深度（即太大的替换字典）将导致错误。参见例如这里。 ,或者只是为了快速破解：

for line in to_read:
    read_buffer = line              
    stripped_buffer1 = read_buffer.replace(\"term1\",\" \")
    stripped_buffer2 = stripped_buffer1.replace(\"term2\",\" \")
    write_to_file = to_write.write(stripped_buffer2)

,这是使用字典的另一种方法：

listA=\"The cat jumped over the house\".split()
modify = {word:word for number,word in enumerate(listA)}
modify[\"cat\"],modify[\"jumped\"]=\"dog\",\"walked\"
print \" \".join(modify[x] for x in listA)

,我建议代码应为：

z = \"My name is Ahmed,and I like coding \"
print(z.replace(\" Ahmed\",\" Dauda\").replace(\" like\",\" Love\" ))

它将根据要求打印出所有更改。

如何替换一个字符串的多个子字符串？

如何解决如何替换一个字符串的多个子字符串？

解决方法

相关推荐