根据另一列中的值计算一列中字符串出现的次数 Python
Count nr of occurrences of strings in one column based on value in other column Python
对于这个非常基本的问题提前表示抱歉,我知道到处都有关于这个问题的帖子,但我似乎也无法在其他网页上的所有帮助下解决这个问题。
首先,我是 python 的初学者,对于模糊的代码深表歉意。但我只是想计算某个字符串在第 2 列中出现的次数,而第 1 列中的值保持不变。如果这个值改变,循环应该重新开始。这听起来很简单,但我对 python 将我的文本文件作为字符串读取感到困惑(给我关于 strip 和 split 等的问题)。我似乎无法使这段代码正常工作。请有人帮助这个处于困境中的菜鸟!
输入:
6 ABMV
6 ABMV
6 FOOD
6 FOOD
6 IDLE
10 IDLE
10 ABMV
10 IDLE
代码:
#! /usr/bin/env python
from collections import Counter
outfile = open ("counts_outfile.txt", "w")
with open("test_counts.txt", "r") as infile:
lines = infile.readlines()
for i, item in enumerate(lines):
lines[i] = item.rstrip().split('\t')
last_chimp = lines[0][0]
behavior = lines[0][1]
nr_ABMV = 0
nr_FOOD = 0
nr_IDLE = 0
for lines in infile:
chimp = lines[0][0]
behavior = lines[0][1]
if chimp == last_chimp:
if behavior == "ABMV":
nr_ABMV += 1
elif behavior == "FOOD":
nr_FOOD += 1
elif behavior == "IDLE":
nr_IDLE += 1
else:
continue
else:
outline = "chimp_header %s\t%s\t%s\t%s" % (last_chimp, nr_ABMV, nr_FOOD, nr_IDLE)
outfile.write(outline)
last_chimp == lines[0][0]
nr_ABMV = 0
nr_FOOD = 0
nr_IDLE = 0
outfile.close()
提前谢谢你,你会帮助我,显然很多'chimps'(黑猩猩)很多!!
问候,
这是一个示例,与您的代码非常相似:
outfile = open ("counts_outfile.txt", "w")
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format('chimp', 'ABMV', 'FOOD', 'IDLE'))
with open("test_counts.txt", "r") as infile:
lines = [ line.strip() for line in infile if line.strip() ]
last_chimp = lines[0].split()[0]
behavior = { "ABMV":0, "FOOD":0, "IDLE":0 }
for line in lines :
line_split = line.strip().split()
chimp = line_split[0]
if chimp != last_chimp :
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(last_chimp, behavior["ABMV"], behavior["FOOD"], behavior["IDLE"]))
last_chimp = chimp
behavior = { "ABMV":0, "FOOD":0, "IDLE":0 }
behavior[line_split[1]] += 1
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(last_chimp, behavior["ABMV"], behavior["FOOD"], behavior["IDLE"]))
outfile.close()
这是另一个使用 Counter
和字典的例子:
from collections import Counter
with open("test_counts.txt", "r") as infile:
lines = [ tuple(line.strip().split()) for line in infile if line.strip() ]
chimps = { line[0] : { "ABMV":0, "FOOD":0, "IDLE":0 } for line in lines }
for k, v in Counter(lines).items() :
chimps[k[0]][k[1]] = v
with open("counts_outfile.txt", "w") as outfile :
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format('chimp', 'ABMV', 'FOOD', 'IDLE'))
for chimp in chimps :
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(chimp, chimps[chimp]["ABMV"], chimps[chimp]["FOOD"], chimps[chimp]["IDLE"]))
两个示例产生相同的结果:
chimp_header chimp ABMV FOOD IDLE
chimp_header 6 2 2 1
chimp_header 10 1 0 2
我希望这能给你一些想法。
对于这个非常基本的问题提前表示抱歉,我知道到处都有关于这个问题的帖子,但我似乎也无法在其他网页上的所有帮助下解决这个问题。
首先,我是 python 的初学者,对于模糊的代码深表歉意。但我只是想计算某个字符串在第 2 列中出现的次数,而第 1 列中的值保持不变。如果这个值改变,循环应该重新开始。这听起来很简单,但我对 python 将我的文本文件作为字符串读取感到困惑(给我关于 strip 和 split 等的问题)。我似乎无法使这段代码正常工作。请有人帮助这个处于困境中的菜鸟!
输入:
6 ABMV
6 ABMV
6 FOOD
6 FOOD
6 IDLE
10 IDLE
10 ABMV
10 IDLE
代码:
#! /usr/bin/env python
from collections import Counter
outfile = open ("counts_outfile.txt", "w")
with open("test_counts.txt", "r") as infile:
lines = infile.readlines()
for i, item in enumerate(lines):
lines[i] = item.rstrip().split('\t')
last_chimp = lines[0][0]
behavior = lines[0][1]
nr_ABMV = 0
nr_FOOD = 0
nr_IDLE = 0
for lines in infile:
chimp = lines[0][0]
behavior = lines[0][1]
if chimp == last_chimp:
if behavior == "ABMV":
nr_ABMV += 1
elif behavior == "FOOD":
nr_FOOD += 1
elif behavior == "IDLE":
nr_IDLE += 1
else:
continue
else:
outline = "chimp_header %s\t%s\t%s\t%s" % (last_chimp, nr_ABMV, nr_FOOD, nr_IDLE)
outfile.write(outline)
last_chimp == lines[0][0]
nr_ABMV = 0
nr_FOOD = 0
nr_IDLE = 0
outfile.close()
提前谢谢你,你会帮助我,显然很多'chimps'(黑猩猩)很多!!
问候,
这是一个示例,与您的代码非常相似:
outfile = open ("counts_outfile.txt", "w")
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format('chimp', 'ABMV', 'FOOD', 'IDLE'))
with open("test_counts.txt", "r") as infile:
lines = [ line.strip() for line in infile if line.strip() ]
last_chimp = lines[0].split()[0]
behavior = { "ABMV":0, "FOOD":0, "IDLE":0 }
for line in lines :
line_split = line.strip().split()
chimp = line_split[0]
if chimp != last_chimp :
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(last_chimp, behavior["ABMV"], behavior["FOOD"], behavior["IDLE"]))
last_chimp = chimp
behavior = { "ABMV":0, "FOOD":0, "IDLE":0 }
behavior[line_split[1]] += 1
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(last_chimp, behavior["ABMV"], behavior["FOOD"], behavior["IDLE"]))
outfile.close()
这是另一个使用 Counter
和字典的例子:
from collections import Counter
with open("test_counts.txt", "r") as infile:
lines = [ tuple(line.strip().split()) for line in infile if line.strip() ]
chimps = { line[0] : { "ABMV":0, "FOOD":0, "IDLE":0 } for line in lines }
for k, v in Counter(lines).items() :
chimps[k[0]][k[1]] = v
with open("counts_outfile.txt", "w") as outfile :
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format('chimp', 'ABMV', 'FOOD', 'IDLE'))
for chimp in chimps :
outfile.write("chimp_header {:>4} {:4} {:4} {:4}\r\n".format(chimp, chimps[chimp]["ABMV"], chimps[chimp]["FOOD"], chimps[chimp]["IDLE"]))
两个示例产生相同的结果:
chimp_header chimp ABMV FOOD IDLE
chimp_header 6 2 2 1
chimp_header 10 1 0 2
我希望这能给你一些想法。