计算文本中的空格(将连续的空格视为一个)
Count spaces in text (treat consecutive spaces as one)
您如何计算文本中空格或换行符的数量,使连续的空格只计为一个?
例如,这非常接近我想要的:
string = "This is an example text.\n But would be good if it worked."
counter = 0
for i in string:
if i == ' ' or i == '\n':
counter += 1
print(counter)
但是,返回的结果不是 15
,而是 11
。
只存储最后找到的字符。每次循环时将其设置为 i。然后在你的内部 if 中,如果找到的最后一个字符也是空白字符,则不要增加计数器。
您可以遍历数字以将它们用作索引。
for i in range(1, len(string)):
if string[i] in ' \n' and string[i-1] not in ' \n':
counter += 1
if string[0] in ' \n':
counter += 1
print(counter)
注意第一个符号,因为这个构造是从第二个符号开始的,以防止IndexError
。
你可以这样做:
string = "This is an example text.\n But would be good if it worked."
counter = 0
# A boolean flag indicating whether the previous character was a space
previous = False
for i in string:
if i == ' ' or i == '\n':
# The current character is a space
previous = True # Setup for the next iteration
else:
# The current character is not a space, check if the previous one was
if previous:
counter += 1
previous = False
print(counter)
re
到 re
scue。
>>> import re
>>> string = "This is an example text.\n But would be good if it worked."
>>> spaces = sum(1 for match in re.finditer('\s+', string))
>>> spaces
11
这会消耗最少的内存,另一种构建临时列表的解决方案是
>>> len(re.findall('\s+', string))
11
如果您只想考虑 space 个字符和换行符(而不是制表符,例如),请使用正则表达式 '(\n| )+'
而不是 '\s+'
。
假设您被允许使用 Python 正则表达式;
import re
print len(re.findall(ur"[ \n]+", string))
快速简单!
更新:另外,使用 [\s]
而不是 [ \n]
来匹配任何空白字符。
默认的str.split()函数将连续运行的空格视为一个。所以简单地拆分字符串,得到结果列表的大小,然后减去一个。
len(string.split())-1
您可以使用 enumerate,检查下一个字符是否也不是空格,因此连续的空格只会算作 1:
string = "This is an example text.\n But would be good if it worked."
print(sum(ch.isspace() and not string[i:i+1].isspace() for i, ch in enumerate(string, 1)))
您还可以将 iter
与生成器函数一起使用,跟踪最后一个字符并进行比较:
def con(s):
it = iter(s)
prev = next(it)
for ele in it:
yield prev.isspace() and not ele.isspace()
prev = ele
yield ele.isspace()
print(sum(con(string)))
一个 itertools 版本:
string = "This is an example text.\n But would be good if it worked. "
from itertools import tee, izip_longest
a, b = tee(string)
next(b)
print(sum(a.isspace() and not b.isspace() for a,b in izip_longest(a,b, fillvalue="") ))
尝试:
def word_count(my_string):
word_count = 1
for i in range(1, len(my_string)):
if my_string[i] == " ":
if not my_string[i - 1] == " ":
word_count += 1
return word_count
您可以使用函数 groupby()
查找连续空格组:
from collections import Counter
from itertools import groupby
s = 'This is an example text.\n But would be good if it worked.'
c = Counter(k for k, _ in groupby(s, key=lambda x: ' ' if x == '\n' else x))
print(c[' '])
# 11
您如何计算文本中空格或换行符的数量,使连续的空格只计为一个? 例如,这非常接近我想要的:
string = "This is an example text.\n But would be good if it worked."
counter = 0
for i in string:
if i == ' ' or i == '\n':
counter += 1
print(counter)
但是,返回的结果不是 15
,而是 11
。
只存储最后找到的字符。每次循环时将其设置为 i。然后在你的内部 if 中,如果找到的最后一个字符也是空白字符,则不要增加计数器。
您可以遍历数字以将它们用作索引。
for i in range(1, len(string)):
if string[i] in ' \n' and string[i-1] not in ' \n':
counter += 1
if string[0] in ' \n':
counter += 1
print(counter)
注意第一个符号,因为这个构造是从第二个符号开始的,以防止IndexError
。
你可以这样做:
string = "This is an example text.\n But would be good if it worked."
counter = 0
# A boolean flag indicating whether the previous character was a space
previous = False
for i in string:
if i == ' ' or i == '\n':
# The current character is a space
previous = True # Setup for the next iteration
else:
# The current character is not a space, check if the previous one was
if previous:
counter += 1
previous = False
print(counter)
re
到 re
scue。
>>> import re
>>> string = "This is an example text.\n But would be good if it worked."
>>> spaces = sum(1 for match in re.finditer('\s+', string))
>>> spaces
11
这会消耗最少的内存,另一种构建临时列表的解决方案是
>>> len(re.findall('\s+', string))
11
如果您只想考虑 space 个字符和换行符(而不是制表符,例如),请使用正则表达式 '(\n| )+'
而不是 '\s+'
。
假设您被允许使用 Python 正则表达式;
import re
print len(re.findall(ur"[ \n]+", string))
快速简单!
更新:另外,使用 [\s]
而不是 [ \n]
来匹配任何空白字符。
默认的str.split()函数将连续运行的空格视为一个。所以简单地拆分字符串,得到结果列表的大小,然后减去一个。
len(string.split())-1
您可以使用 enumerate,检查下一个字符是否也不是空格,因此连续的空格只会算作 1:
string = "This is an example text.\n But would be good if it worked."
print(sum(ch.isspace() and not string[i:i+1].isspace() for i, ch in enumerate(string, 1)))
您还可以将 iter
与生成器函数一起使用,跟踪最后一个字符并进行比较:
def con(s):
it = iter(s)
prev = next(it)
for ele in it:
yield prev.isspace() and not ele.isspace()
prev = ele
yield ele.isspace()
print(sum(con(string)))
一个 itertools 版本:
string = "This is an example text.\n But would be good if it worked. "
from itertools import tee, izip_longest
a, b = tee(string)
next(b)
print(sum(a.isspace() and not b.isspace() for a,b in izip_longest(a,b, fillvalue="") ))
尝试:
def word_count(my_string):
word_count = 1
for i in range(1, len(my_string)):
if my_string[i] == " ":
if not my_string[i - 1] == " ":
word_count += 1
return word_count
您可以使用函数 groupby()
查找连续空格组:
from collections import Counter
from itertools import groupby
s = 'This is an example text.\n But would be good if it worked.'
c = Counter(k for k, _ in groupby(s, key=lambda x: ' ' if x == '\n' else x))
print(c[' '])
# 11