for 使用前一项循环遍历 stdin
for loop through stdin using previous item
我想比较一行与上一行而不在内存中存储任何内容(没有字典)。
示例数据:
a 2
file 1
file 2
file 4
for 1
has 1
is 2
lines 1
small 1
small 2
test 1
test 2
this 1
this 2
two 1
伪代码:
for line in sys.stdin:
word, count = line.split()
if word == previous_word:
print(word, count1+count2)
我知道我会在数组上使用 enumerate
或 dict.iteritems
但我不能在 sys.stdin
.
期望的输出:
a 2
file 7
for 1
has 1
is 2
lines 1
small 3
test 3
this 3
two 1
I would like to compare a line to the previous one without storing anything in memory (no dictionaries).
为了能够对所有具有相似词的前面行的计数求和,您需要维护一些状态。
通常这份工作适合awk
。你可以考虑这个命令:
awk '{a[] += } p && p != {print p, a[p]; delete a[p]} {p = }
END { print p, a[p] }' file
a 2
file 7
for 1
has 1
is 2
lines 1
small 3
test 3
this 3
two 1
使用delete
,此解决方案不会将整个文件存储在内存中。仅在处理具有相同第一个字的行时保持状态。
Awk 参考资料:
基本逻辑是跟踪前一个单词。如果当前单词匹配,则累加计数。如果不是,则打印前一个单词及其计数,然后重新开始。有一些特殊的代码来处理第一次和最后一次迭代。
stdin_data = [
"a 2",
"file 1",
"file 2",
"file 4",
"for 1",
"has 1",
"is 2",
"lines 1",
"small 1",
"small 2",
"test 1",
"test 2",
"this 1",
"this 2",
"two 1",
]
previous_word = ""
word_ct = 0
for line in stdin_data:
word, count = line.split()
if word == previous_word:
word_ct += int(count)
else:
if previous_word != "":
print(previous_word, word_ct)
previous_word = word
word_ct = int(count)
# Print the final word and count
print(previous_word, word_ct)
输出:
a 2
file 7
for 1
has 1
is 2
lines 1
small 3
test 3
this 3
two 1
您的代码就快完成了。虽然不想将整个内容存储在内存中是值得称赞的,但您将不得不存储上一行的累积组件:
prev_word, prev_count = '', 0
for line in sys.stdin:
word, count = line.split()
count = int(count)
if word == prev_word:
prev_count += count
elif prev_count:
print(prev_word, prev_count)
prev_word, prev_count = word, count
我想比较一行与上一行而不在内存中存储任何内容(没有字典)。
示例数据:
a 2
file 1
file 2
file 4
for 1
has 1
is 2
lines 1
small 1
small 2
test 1
test 2
this 1
this 2
two 1
伪代码:
for line in sys.stdin:
word, count = line.split()
if word == previous_word:
print(word, count1+count2)
我知道我会在数组上使用 enumerate
或 dict.iteritems
但我不能在 sys.stdin
.
期望的输出:
a 2
file 7
for 1
has 1
is 2
lines 1
small 3
test 3
this 3
two 1
I would like to compare a line to the previous one without storing anything in memory (no dictionaries).
为了能够对所有具有相似词的前面行的计数求和,您需要维护一些状态。
通常这份工作适合awk
。你可以考虑这个命令:
awk '{a[] += } p && p != {print p, a[p]; delete a[p]} {p = }
END { print p, a[p] }' file
a 2
file 7
for 1
has 1
is 2
lines 1
small 3
test 3
this 3
two 1
使用delete
,此解决方案不会将整个文件存储在内存中。仅在处理具有相同第一个字的行时保持状态。
Awk 参考资料:
基本逻辑是跟踪前一个单词。如果当前单词匹配,则累加计数。如果不是,则打印前一个单词及其计数,然后重新开始。有一些特殊的代码来处理第一次和最后一次迭代。
stdin_data = [
"a 2",
"file 1",
"file 2",
"file 4",
"for 1",
"has 1",
"is 2",
"lines 1",
"small 1",
"small 2",
"test 1",
"test 2",
"this 1",
"this 2",
"two 1",
]
previous_word = ""
word_ct = 0
for line in stdin_data:
word, count = line.split()
if word == previous_word:
word_ct += int(count)
else:
if previous_word != "":
print(previous_word, word_ct)
previous_word = word
word_ct = int(count)
# Print the final word and count
print(previous_word, word_ct)
输出:
a 2
file 7
for 1
has 1
is 2
lines 1
small 3
test 3
this 3
two 1
您的代码就快完成了。虽然不想将整个内容存储在内存中是值得称赞的,但您将不得不存储上一行的累积组件:
prev_word, prev_count = '', 0
for line in sys.stdin:
word, count = line.split()
count = int(count)
if word == prev_word:
prev_count += count
elif prev_count:
print(prev_word, prev_count)
prev_word, prev_count = word, count