我将如何遍历列中的每一行并保持出现的每个子字符串的 运行 计数? Python
How would I go about iterating through each row in a column and keeping a running tally of every substring that comes up? Python
基本上我想做的是逐行浏览“External_Name”列,并计算每个字符串中唯一子字符串的数量,有点像 .value_counts( ).
External_Name
Specialty
ABMC Hyperbaric Medicine and Wound Care
Hyperbaric/Wound Care
ABMC Kaukauna Laboratory Services
Laboratory
AHCM Sinai Bariatric Surgery Clinic
General Surgery
...........
...........
n
n
例如,在 运行 到“External_Name”中的前三行之后,输出类似于
Output
Count
ABMC
2
Hyperbaric
1
Medicine
1
and
1
Wound
1
Care
1
依此类推。任何帮助将不胜感激!
您可以使用 str.split()
在空格处拆分,然后 explode
将生成的单词列表分成单独的行,并使用 value_counts
.
计算值
>>> df.External_Name.str.split().explode().value_counts()
ABMC 2
Hyperbaric 1
Medicine 1
and 1
Wound 1
Care 1
Kaukauna 1
Laboratory 1
Services 1
AHCM 1
Sinai 1
Bariatric 1
Surgery 1
Clinic 1
Name: External_Name, dtype: int64
基本上我想做的是逐行浏览“External_Name”列,并计算每个字符串中唯一子字符串的数量,有点像 .value_counts( ).
External_Name | Specialty |
---|---|
ABMC Hyperbaric Medicine and Wound Care | Hyperbaric/Wound Care |
ABMC Kaukauna Laboratory Services | Laboratory |
AHCM Sinai Bariatric Surgery Clinic | General Surgery |
........... | ........... |
n | n |
例如,在 运行 到“External_Name”中的前三行之后,输出类似于
Output | Count |
---|---|
ABMC | 2 |
Hyperbaric | 1 |
Medicine | 1 |
and | 1 |
Wound | 1 |
Care | 1 |
依此类推。任何帮助将不胜感激!
您可以使用 str.split()
在空格处拆分,然后 explode
将生成的单词列表分成单独的行,并使用 value_counts
.
>>> df.External_Name.str.split().explode().value_counts()
ABMC 2
Hyperbaric 1
Medicine 1
and 1
Wound 1
Care 1
Kaukauna 1
Laboratory 1
Services 1
AHCM 1
Sinai 1
Bariatric 1
Surgery 1
Clinic 1
Name: External_Name, dtype: int64