从字符串列表中查找事件

Find occurences from a list of strings

我想创建一个没有外部库的函数,该函数从单词(字符串)列表中查找字母并仅在单词超过 3 个字符时计算它们的出现次数然后按顺序打印它们。

带字列表

word_list = ['THE', 'ZEN', 'OF', 'PYTHON', 'BY', 'TIM', 'PETERS', 'BEAUTIFUL', 'IS', 'BETTER', 'THAN', 'UGLY', 'EXPLICIT', 'IS', 'BETTER', 'THAN', 'IMPLICIT', 'SIMPLE', 'IS', 'BETTER', 'THAN', 'COMPLEX', 'COMPLEX', 'IS', 'BETTER', 'THAN', 'COMPLICATED', 'FLAT', 'IS', 'BETTER', 'THAN', 'NESTED', 'SPARSE', 'IS', 'BETTER', 'THAN', 'DENSE', 'READABILITY', 'COUNTS', 'SPECIAL', 'CASES', 'ARENT', 'SPECIAL', 'ENOUGH', 'TO', 'BREAK', 'THE', 'RULES', 'ALTHOUGH', 'PRACTICALITY', 'BEATS', 'PURITY', 'ERRORS', 'SHOULD', 'NEVER', 'PASS', 'SILENTLY', 'UNLESS', 'EXPLICITLY', 'SILENCED', 'IN', 'THE', 'FACE', 'OF', 'AMBIGUITY', 'REFUSE', 'THE', 'TEMPTATION', 'TO', 'GUESS', 'THERE', 'SHOULD', 'BE', 'ONE', 'AND', 'PREFERABLY', 'ONLY', 'ONE', 'OBVIOUS', 'WAY', 'TO', 'DO', 'IT', 'ALTHOUGH', 'THAT', 'WAY', 'MAY', 'NOT', 'BE', 'OBVIOUS', 'AT', 'FIRST', 'UNLESS', 'YOURE', 'DUTCH', 'NOW', 'IS', 'BETTER', 'THAN', 'NEVER', 'ALTHOUGH', 'NEVER', 'IS', 'OFTEN', 'BETTER', 'THAN', 'RIGHT', 'NOW', 'IF', 'THE', 'IMPLEMENTATION', 'IS', 'HARD', 'TO', 'EXPLAIN', 'ITS', 'A', 'BAD', 'IDEA', 'IF', 'THE', 'IMPLEMENTATION', 'IS', 'EASY', 'TO', 'EXPLAIN', 'IT', 'MAY', 'BE', 'A', 'GOOD', 'IDEA', 'NAMESPACES', 'ARE', 'ONE', 'HONKING', 'GREAT', 'IDEA', '', 'LETS', 'DO', 'MORE', 'OF', 'THOSE']

期望的输出:

Words with more than 3 letters

1 BETTER shows up 8 times
2 THAN shows up 7 times
.
.
.

您可以使用:

more_than_3 = [word for word in word_list if len(word) >=3]
more_than_3 .count("BETTER")

输出:

8

或:

from collections import Counter
more_than_3 = [word for word in word_list if len(word) >=3]

Counter(more_than_3)

输出:

Counter({'IS': 10, 'BETTER': 8, 'THAN': 8, 'THE': 6, 'TO': 5, 'OF': 3, 'ALTHOUGH': 3, 'NEVER': 3, 'BE': 3, 'ONE': 3, 'IDEA': 3, 'COMPLEX': 2, 'SPECIAL': 2, 'SHOULD': 2, 'UNLESS': 2, 'OBVIOUS': 2, 'WAY': 2, 'DO': 2, 'IT': 2, 'MAY': 2, 'NOW': 2, 'IF': 2, 'IMPLEMENTATION': 2, 'EXPLAIN': 2, 'A': 2, 'ZEN': 1, 'PYTHON': 1, 'BY': 1, 'TIM': 1, 'PETERS': 1, 'BEAUTIFUL': 1, 'UGLY': 1, 'EXPLICIT': 1, 'IMPLICIT': 1, 'SIMPLE': 1, 'COMPLICATED': 1, 'FLAT': 1, 'NESTED': 1, 'SPARSE': 1, 'DENSE': 1, 'READABILITY': 1, 'COUNTS': 1, 'CASES': 1, 'ARENT': 1, 'ENOUGH': 1, 'BREAK': 1, 'RULES': 1, 'PRACTICALITY': 1, 'BEATS': 1, 'PURITY': 1, 'ERRORS': 1, 'PASS': 1, 'SILENTLY': 1, 'EXPLICITLY': 1, 'SILENCED': 1, 'IN': 1, 'FACE': 1, 'AMBIGUITY': 1, 'REFUSE': 1, 'TEMPTATION': 1, 'GUESS': 1, 'THERE': 1, 'AND': 1, 'PREFERABLY': 1, 'ONLY': 1, 'THAT': 1, 'NOT': 1, 'AT': 1, 'FIRST': 1, 'YOURE': 1, 'DUTCH': 1, 'OFTEN': 1, 'RIGHT': 1, 'HARD': 1, 'ITS': 1, 'BAD': 1, 'EASY': 1, 'GOOD': 1, 'NAMESPACES': 1, 'ARE': 1, 'HONKING': 1, 'GREAT': 1, '': 1, 'LETS': 1, 'MORE': 1, 'THOSE': 1})

使用内置 python 函数的简单方法:

keys = set(word_list)

values = [word_list.count(key) for key in keys]

for k, v in zip(keys, values):
    print('item', k, 'has count', v)

输出:

item EASY has count 1
item IS has count 10
item DENSE has count 1
item EXPLICITLY has count 1
item FIRST has count 1
item THE has count 6
item DUTCH has count 1
item ONE has count 3
item BEAUTIFUL has count 1
item TO has count 5
item LETS has count 1
item BREAK has count 1
item READABILITY has count 1
item THAT has count 1
item GREAT has count 1
item IF has count 2
item NOW has count 2
item GOOD has count 1
item ALTHOUGH has count 3
item WAY has count 2
item MORE has count 1
item NESTED has count 1
item SPARSE has count 1
item AND has count 1
item ERRORS has count 1
item ZEN has count 1
item BY has count 1
item SILENCED has count 1
item ITS has count 1
item BETTER has count 8
item OBVIOUS has count 2
item ONLY has count 1
item THOSE has count 1
item ARENT has count 1
item REFUSE has count 1
item EXPLICIT has count 1
item BAD has count 1
item COMPLEX has count 2
item SILENTLY has count 1
item BE has count 3
item COMPLICATED has count 1
item PETERS has count 1
item SHOULD has count 2
item PREFERABLY has count 1
item UNLESS has count 2
item RULES has count 1
item NAMESPACES has count 1
item THERE has count 1
item OF has count 3
item EXPLAIN has count 2
item IMPLEMENTATION has count 2
item HARD has count 1
item IN has count 1
item COUNTS has count 1
item NOT has count 1
item A has count 2
item YOURE has count 1
item PURITY has count 1
item NEVER has count 3
item IMPLICIT has count 1
item DO has count 2
item ARE has count 1
item BEATS has count 1
item HONKING has count 1
item AMBIGUITY has count 1
item PRACTICALITY has count 1
item RIGHT has count 1
item ENOUGH has count 1
item MAY has count 2
item UGLY has count 1
item SIMPLE has count 1
item TIM has count 1
item IT has count 2
item CASES has count 1
item FLAT has count 1
item FACE has count 1
item THAN has count 8
item AT has count 1
item TEMPTATION has count 1
item PYTHON has count 1
item SPECIAL has count 2
item PASS has count 1
item IDEA has count 3
item OFTEN has count 1
item GUESS has count 1

似乎找到了一种方法来计算列表中唯一值的数量。尽管您可以使用 set() (which holds the unique values) and len() functions as it is explained in this answer 的组合,但您可以使用:

#print(len(set(word_list)))
#86

counts_unique_values = dict(zip(list(word_list),[list(word_list).count(i) for i in list(word_list)])) 
print(counts_unique_values)

输出:

{'THE': 6, 'ZEN': 1, 'OF': 3, 'PYTHON': 1, 'BY': 1, 'TIM': 1, 'PETERS': 1, 'BEAUTIFUL': 1, 'IS': 10, 'BETTER': 8, 'THAN': 8, 'UGLY': 1, 'EXPLICIT': 1, 'IMPLICIT': 1, 'SIMPLE': 1, 'COMPLEX': 2, 'COMPLICATED': 1, 'FLAT': 1, 'NESTED': 1, 'SPARSE': 1, 'DENSE': 1, 'READABILITY': 1, 'COUNTS': 1, 'SPECIAL': 2, 'CASES': 1, 'ARENT': 1, 'ENOUGH': 1, 'TO': 5, 'BREAK': 1, 'RULES': 1, 'ALTHOUGH': 3, 'PRACTICALITY': 1, 'BEATS': 1, 'PURITY': 1, 'ERRORS': 1, 'SHOULD': 2, 'NEVER': 3, 'PASS': 1, 'SILENTLY': 1, 'UNLESS': 2, 'EXPLICITLY': 1, 'SILENCED': 1, 'IN': 1, 'FACE': 1, 'AMBIGUITY': 1, 'REFUSE': 1, 'TEMPTATION': 1, 'GUESS': 1, 'THERE': 1, 'BE': 3, 'ONE': 3, 'AND': 1, 'PREFERABLY': 1, 'ONLY': 1, 'OBVIOUS': 2, 'WAY': 2, 'DO': 2, 'IT': 2, 'THAT': 1, 'MAY': 2, 'NOT': 1, 'AT': 1, 'FIRST': 1, 'YOURE': 1, 'DUTCH': 1, 'NOW': 2, 'OFTEN': 1, 'RIGHT': 1, 'IF': 2, 'IMPLEMENTATION': 2, 'HARD': 1, 'EXPLAIN': 2, 'ITS': 1, 'A': 2, 'BAD': 1, 'IDEA': 3, 'EASY': 1, 'GOOD': 1, 'NAMESPACES': 1, 'ARE': 1, 'HONKING': 1, 'GREAT': 1, '': 1, 'LETS': 1, 'MORE': 1, 'THOSE': 1}

Option2: 您可以通过直接调用 pd.DataFrame() 将列表转换为 pandas dataframe 并使用 value_counts() 计算单个不同的值专栏:

import pandas as pd
#df = pd.DataFrame(word_list)
df = pd.DataFrame({'Text': word_list})
df.value_counts()

输出:

Text    
IS          10
BETTER       8
THAN         8
THE          6
TO           5
            ..
ITS          1
YOURE        1
IN           1
IMPLICIT     1
             1
Length: 86, dtype: int64

涵盖:

...and counts their occurrence only if the word has more than 3 characters Then prints them in order

您可以使用如下内容:

for word in word_list:
    if len(word) > 3:
        #print()