如何计算我的 Dataframe 中的逗号分隔值?
How can I count comma-separated values in my Dataframe?
我想弄清楚如何根据特定文本值在列中列出的次数来计算 value_counts。
示例数据:
d = {'Title': ['Crash Landing on You', 'Memories of the Alhambra', 'The Heirs', 'While You Were Sleeping',
'Something in the Rain', 'Uncontrollably Fond'],
'Cast' : ['Hyun Bin,Son Ye Jin,Seo Ji Hye', 'Hyun Bin,Park Shin Hye,Park Hoon', 'Lee Min Ho,Park Shin Hye,Kim Woo Bin',
'Bae Suzy,Lee Jong Suk,Jung Hae In', 'Son Ye Jin,Jung Hae In,Jang So Yeon', 'Kim Woo Bin,Bae Suzy,Im Joo Hwan']}
Title Cast
0 Crash Landing on You Hyun Bin,Son Ye Jin,Seo Ji Hye
1 Memories of the Alhambra Hyun Bin,Park Shin Hye,Park Hoon
2 The Heirs Lee Min Ho,Park Shin Hye,Kim Woo Bin
3 While You Were Sleeping Bae Suzy,Lee Jong Suk,Jung Hae In
4 Something in the Rain Son Ye Jin,Jung Hae In,Jang So Yeon
5 Uncontrollably Fond Kim Woo Bin,Bae Suzy,Im Joo Hwan
当我拆分文本并计算值时:
df['Cast'] = df['Cast'].str.split(',')
df['Cast'].value_counts()
[Hyun Bin, Son Ye Jin, Seo Ji Hye] 1
[Hyun Bin, Park Shin Hye, Park Hoon] 1
[Lee Min Ho, Park Shin Hye, Kim Woo Bin] 1
[Bae Suzy, Lee Jong Suk, Jung Hae In] 1
[Son Ye Jin, Jung Hae In, Jang So Yeon] 1
[Kim Woo Bin, Bae Suzy, Im Joo Hwan] 1
Name: Cast, dtype: int64
如何获取特定文本在 'Cast' 列中显示的次数?即:
[Park Shin Hye] 2
[Hyun Bin] 2
[Bae Suzy] 1
etc
您应该使用 .explode
方法将每个列表“解压”到不同的行中。然后 .value_counts
将按原始代码中的预期工作:
import pandas as pd
d = {'Title': ['Crash Landing on You', 'Memories of the Alhambra', 'The Heirs', 'While You Were Sleeping',
'Something in the Rain', 'Uncontrollably Fond'],
'Cast' : ['Hyun Bin,Son Ye Jin,Seo Ji Hye', 'Hyun Bin,Park Shin Hye,Park Hoon', 'Lee Min Ho,Park Shin Hye,Kim Woo Bin',
'Bae Suzy,Lee Jong Suk,Jung Hae In', 'Son Ye Jin,Jung Hae In,Jang So Yeon', 'Kim Woo Bin,Bae Suzy,Im Joo Hwan']}
df = pd.DataFrame(d)
df['Cast'].str.split(',').explode('Cast').value_counts()
您可能正在寻找 str.count() 方法。
我也没有太多经验,但据我所知,实现此目的的一种方法是:在调用 str.split(',')
后,您可以使用 explode()
(参见 docs),然后对结果数据框执行 value_count()
。
我认为这与优化策略相去甚远,但它有效:)这是我在社区中的第一个答案,我非常愿意接受任何建议!
完整代码如下:
df['Cast'] = df['Cast'].str.split(',')
df = df.explode('Cast')
df['Cast'].value_counts()
我想弄清楚如何根据特定文本值在列中列出的次数来计算 value_counts。
示例数据:
d = {'Title': ['Crash Landing on You', 'Memories of the Alhambra', 'The Heirs', 'While You Were Sleeping',
'Something in the Rain', 'Uncontrollably Fond'],
'Cast' : ['Hyun Bin,Son Ye Jin,Seo Ji Hye', 'Hyun Bin,Park Shin Hye,Park Hoon', 'Lee Min Ho,Park Shin Hye,Kim Woo Bin',
'Bae Suzy,Lee Jong Suk,Jung Hae In', 'Son Ye Jin,Jung Hae In,Jang So Yeon', 'Kim Woo Bin,Bae Suzy,Im Joo Hwan']}
Title Cast
0 Crash Landing on You Hyun Bin,Son Ye Jin,Seo Ji Hye
1 Memories of the Alhambra Hyun Bin,Park Shin Hye,Park Hoon
2 The Heirs Lee Min Ho,Park Shin Hye,Kim Woo Bin
3 While You Were Sleeping Bae Suzy,Lee Jong Suk,Jung Hae In
4 Something in the Rain Son Ye Jin,Jung Hae In,Jang So Yeon
5 Uncontrollably Fond Kim Woo Bin,Bae Suzy,Im Joo Hwan
当我拆分文本并计算值时:
df['Cast'] = df['Cast'].str.split(',')
df['Cast'].value_counts()
[Hyun Bin, Son Ye Jin, Seo Ji Hye] 1
[Hyun Bin, Park Shin Hye, Park Hoon] 1
[Lee Min Ho, Park Shin Hye, Kim Woo Bin] 1
[Bae Suzy, Lee Jong Suk, Jung Hae In] 1
[Son Ye Jin, Jung Hae In, Jang So Yeon] 1
[Kim Woo Bin, Bae Suzy, Im Joo Hwan] 1
Name: Cast, dtype: int64
如何获取特定文本在 'Cast' 列中显示的次数?即:
[Park Shin Hye] 2
[Hyun Bin] 2
[Bae Suzy] 1
etc
您应该使用 .explode
方法将每个列表“解压”到不同的行中。然后 .value_counts
将按原始代码中的预期工作:
import pandas as pd
d = {'Title': ['Crash Landing on You', 'Memories of the Alhambra', 'The Heirs', 'While You Were Sleeping',
'Something in the Rain', 'Uncontrollably Fond'],
'Cast' : ['Hyun Bin,Son Ye Jin,Seo Ji Hye', 'Hyun Bin,Park Shin Hye,Park Hoon', 'Lee Min Ho,Park Shin Hye,Kim Woo Bin',
'Bae Suzy,Lee Jong Suk,Jung Hae In', 'Son Ye Jin,Jung Hae In,Jang So Yeon', 'Kim Woo Bin,Bae Suzy,Im Joo Hwan']}
df = pd.DataFrame(d)
df['Cast'].str.split(',').explode('Cast').value_counts()
您可能正在寻找 str.count() 方法。
我也没有太多经验,但据我所知,实现此目的的一种方法是:在调用 str.split(',')
后,您可以使用 explode()
(参见 docs),然后对结果数据框执行 value_count()
。
我认为这与优化策略相去甚远,但它有效:)这是我在社区中的第一个答案,我非常愿意接受任何建议!
完整代码如下:
df['Cast'] = df['Cast'].str.split(',')
df = df.explode('Cast')
df['Cast'].value_counts()