如何拆分 python 中有多个空格的字符串？

Question

我很难在标题中解释这一点，所以请允许我在这里这样做。

我正在为我正在开发的实用程序开发一个搜索界面，其中一个带有 google(ish) 过滤器。

当查询中只有一个过滤器时，它工作正常，但是当有两个或更多时，问题就出现了。

所以，假设我有一个类似 intitle:foo bar inbody: boo far

的查询

例如，虽然第一部分进入循环的第二部分并被正确解释为 {intitle:foo bar}，但下一个在循环的第一部分打印为 foo bar inbody，后跟它的值 boo far

应该发生的是每个过滤器都应该被识别并隔离成它自己的一对（例如 {intitle:foo bar} {inbody: bar foo}）

下面是导致这个问题的代码。

def ParseFilters(query):
    filterVals = []

    if ":" in query:
        query = query.split(":")

        for part in query:
            # This is the first part of the loop
            print(part)
            if part in filters:
                # This is the second part of the loop
                listIndex = query.index(part)
                filtering = query[listIndex + 1]

                for f in filters:
                    filtering = filtering.strip(f).lstrip()

                pair = {
                    part: filtering
                }
                
                print(pair)

                filterVals.append(pair)
    return filterVals

“过滤器”table 是

filters = [
    "intitle",
    "inbody"
]

Answer 1

如果我正确理解您的要求。我会这样写：

from collections import defaultdict

filters = [
    "intitle",
    "inbody"
]

query = 'intitle:foo bar inbody: boo far '

result = defaultdict(list)
current_filter = None
for elem in query.split():
    left, _, right = elem.partition(':')
    if left in filters:
        current_filter = left
        if right:
            result[current_filter].append(right)
    else:
        result[current_filter].append(left)

print(result)

输出：

defaultdict(<class 'list'>, {'intitle': ['foo', 'bar'], 'inbody': ['boo', 'far']})

在我看来，这稍微更具声明性，并且将来更容易变得更加健壮。您可以对其进行试验，使其满足您的要求。我建议您查看 str.partition，它对很多类似的东西都非常有用。 defaultdict 就像字典一样工作。

Answer 2

那是因为当您执行 query.split(":") 时，您的程序无法知道 inbody 是一个过滤器而不是 intitle 值的一部分。最好的方法是使用 正则表达式 查找所有过滤器和所有值并将它们存储在不同的列表中（即：query_filters 和 query_values），然后进行一个 dict:

import re


filter_table = ["intitle", "inbody"]
query = "intitle:foo bar inbody: boo far"

# Create a regular expression to match filters
filters_re = re.compile(r"\s*[a-zA-Z]+\:\s*")

# Find all filters
query_filters = filters_re.findall(query)
# Find all values by splitting query at the values matched by filters_re
query_values = filters_re.split(query)

# Cleaning up the strings
query_filters = map(lambda x: x.strip().replace(":", ""), query_filters)
query_values = map(lambda x: x.strip(), filter(None, query_values))

# Make pairs
filter_pairs = zip(query_filters, query_values)

# Remove filters that are not in filter_table
filter_pairs = filter(lambda x: x[0] in filter_table, filter_pairs)

filter_dict = dict(filter_pairs)

print(filter_dict)

或者，如果你喜欢one-liners：

import re

filter_table = ["intitle", "inbody"]
query = "intitle:foo bar inbody: boo far"

filter_dict = dict(filter(lambda x: x[0] in filter_table, zip(re.findall(r"[a-zA-Z]+(?=\:)", query), map(lambda x: x.strip(), filter(None, re.split(r"\s*[a-zA-Z]+\:\s*", query))))))

print(filter_dict)

如何拆分 python 中有多个空格的字符串？

How can I split a string in python that has multiple spaces between?

python

search

python-3.x