计算文件中重复条目的数量

Question

当我读取文件时，它会给我这样的输出：

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749

我想统计文件中CW和CS的总数。输出应如下所示：

3 #For CW 
3 #For CS

我尝试使用以下代码：

with open ("file", 'r') as rf:
    v=rf.read().split('\n')

 i=[]
 for e in v[1::47]: #(only the names)
     r=(e[:12])
     s=(r[:2])
     q= sum(c != ' ' for c in s)
    print(q)

但它给了我这个输出

我什至尝试导入计数器，但它给我的输出是这样的：

C 1
W 1
C 1
W 1
C 1
S 1

请建议一些方法，以便我可以获得预期的输出。任何帮助将不胜感激。

Answer 1

want to count the total number of CW and CS in the file.

试试这个：

di = { }
with open("file", "r") as f:
    for l in f:
        l = l.strip()
        di[l] = di[l] + 1 if l in di else 1


for k, v in di.items():
    print("Line: %s and Count: %d" % (k, v))

输出：

Line: CW and Count: 3
Line: CS and Count: 3

Answer 2

确实使用Counter

from collections import Counter
with open("xyz.txt") as f:
    c = Counter(line.split()[0] for line in f)
    for k,n in c.items():
        print(k, n)

输入文件

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042 1
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076 1
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236 1
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213 1
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166 1
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749 1

产生

CW 3
CS 3

Answer 3

Python 3.8.1 我希望这会有所帮助。我尝试同时制作一个带有解释的功能示例代码，以了解发生了什么。

# Global variables
file = "lista.txt"
countDictionary = {}

# Logic Read File
def ReadFile(fileName):
    # Try is optional, is used to track error and to prevent them
    # Also except will be optional because is used on try
    try:
        # Open file in read mode
        with open(fileName, mode="r") as f:
            # Define line
            line = f.readline()
            # For every line in this file
            while line:
                # Get out all white spaces (ex: \n, \r)
                # We will call it item (I asume that CW and CS are some data)
                item = line.strip()[:2]

                # Counting logic
                # Dictionary have at least 2 values I call them data and info
                # Data is like key (name/nickname/id) of the information
                # Info is the value (the information) for this data
                # First will check if data is new and will set info = integer 1
                if item not in countDictionary.keys():
                    countDictionary[item] = 1
                # If is not new will update the count number
                else:
                    info = countDictionary[item]    #will get the curent count number
                    countDictionary[item] = info+1  # will increse the count by one

                # Go to next line by defineing the line again
                # With out that this logic will be on infinite loop just for first line
                line = f.readline()

        # This is optional to. Is callet automatical by python to prevent some errors
        # But I like to be shore
        f.close()

    # In case the file do not exist
    except FileNotFoundError:
        print(f"ERROR >> File \"{fileName}\" do not exist!")

# Execut Function
ReadFile(file)

# Testing dictionary count
for k,j in countDictionary.items():
    print(k, ">>", j)

控制台输出：

========================= RESTART: D:\Python\Whosebug\help.py =========================
CW >> 3
CS >> 3
>>>

文件lista.txt:

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042 1
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076 1
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236 1
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213 1
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166 1
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749 1

Answer 4

您可以试试下面的代码。

>>> text = '''CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042
... CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076
... CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236
... CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213
... CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166
... CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749'''
>>> items = [line.split()[0] for line in text.splitlines()]
>>> val = set([line.split()[0] for line in text.splitlines()])
>>> for item in val:
...     print(f'{items.count(item)} #For {item}')
...
3 #For CW
3 #For CS

计算文件中重复条目的数量

Counting the number of repeated entries in a file

python

file

counting

dataframe

python-3.x