我将如何遍历列中的每一行并保持出现的每个子字符串的 运行 计数? Python

How would I go about iterating through each row in a column and keeping a running tally of every substring that comes up? Python

基本上我想做的是逐行浏览“External_Name”列,并计算每个字符串中唯一子字符串的数量,有点像 .value_counts( ).

External_Name Specialty
ABMC Hyperbaric Medicine and Wound Care Hyperbaric/Wound Care
ABMC Kaukauna Laboratory Services Laboratory
AHCM Sinai Bariatric Surgery Clinic General Surgery
........... ...........
n n

例如,在 运行 到“External_Name”中的前三行之后,输出类似于

Output Count
ABMC 2
Hyperbaric 1
Medicine 1
and 1
Wound 1
Care 1

依此类推。任何帮助将不胜感激!

您可以使用 str.split() 在空格处拆分,然后 explode 将生成的单词列表分成单独的行,并使用 value_counts.

计算值
>>> df.External_Name.str.split().explode().value_counts()
ABMC          2
Hyperbaric    1
Medicine      1
and           1
Wound         1
Care          1
Kaukauna      1
Laboratory    1
Services      1
AHCM          1
Sinai         1
Bariatric     1
Surgery       1
Clinic        1
Name: External_Name, dtype: int64