每日提及一个词
Daily Mentions of a Word
我有以下df,包含不同来源的每日文章:
print(df)
Date content
2018-11-01 Apple Inc. AAPL 1.54% reported its fourth cons...
2018-11-01 U.S. stocks climbed Thursday, Apple is a real ...
2018-11-02 GONE are the days when smartphone manufacturer...
2018-11-03 To historians of technology, the story of the ...
2018-11-03 Apple Inc. AAPL 1.54% reported its fourth cons...
2018-11-03 Apple is turning to traditional broadcasting t...
(...)
我想计算 每日提及 的总数 - 因此按日期聚合 - 单词 "Apple"。如何创建 "final_df"?
print(final_df)
2018-11-01 2
2018-11-02 0
2018-11-03 2
(...)
您可以 GroupBy
the different dates, use str.count
计算 Apple
的出现次数并与 sum
合计以获得每个组中的计数数量:
df.groupby('Date').apply(lambda x: x.content.str.count('Apple').sum())
.reset_index(name='counts')
Date counts
0 2018-11-01 2
1 2018-11-02 0
2 2018-11-03 2
对新 Series
使用 count
,按列 df['Date']
与 sum
聚合:
df1 = df['content'].str.count('Apple').groupby(df['Date']).sum().reset_index(name='count')
print (df1)
Date count
0 2018-11-01 2
1 2018-11-02 0
2 2018-11-03 2
您可以尝试使用 str.contains 和 groupby
函数的替代解决方案,而无需一直使用 sum
。
>>> df
Date content
0 2018-11-01 Apple Inc. AAPL 1.54% reported its fourth cons
1 2018-11-01 U.S. stocks climbed Thursday, Apple is a real
2 2018-11-02 GONE are the days when smartphone manufacturer
3 2018-11-03 To historians of technology, the story of the
4 2018-11-03 Apple Inc. AAPL 1.54% reported its fourth cons
5 2018-11-03 Apple is turning to traditional broadcasting t
解决方案:
df.content.str.contains("Apple").groupby(df['Date']).count().reset_index(name="count")
Date count
0 2018-11-01 2
1 2018-11-02 1
2 2018-11-03 3
# df["content"].str.contains('Apple',case=True,na=False).groupby(df['Date']).count()
我有以下df,包含不同来源的每日文章:
print(df)
Date content
2018-11-01 Apple Inc. AAPL 1.54% reported its fourth cons...
2018-11-01 U.S. stocks climbed Thursday, Apple is a real ...
2018-11-02 GONE are the days when smartphone manufacturer...
2018-11-03 To historians of technology, the story of the ...
2018-11-03 Apple Inc. AAPL 1.54% reported its fourth cons...
2018-11-03 Apple is turning to traditional broadcasting t...
(...)
我想计算 每日提及 的总数 - 因此按日期聚合 - 单词 "Apple"。如何创建 "final_df"?
print(final_df)
2018-11-01 2
2018-11-02 0
2018-11-03 2
(...)
您可以 GroupBy
the different dates, use str.count
计算 Apple
的出现次数并与 sum
合计以获得每个组中的计数数量:
df.groupby('Date').apply(lambda x: x.content.str.count('Apple').sum())
.reset_index(name='counts')
Date counts
0 2018-11-01 2
1 2018-11-02 0
2 2018-11-03 2
对新 Series
使用 count
,按列 df['Date']
与 sum
聚合:
df1 = df['content'].str.count('Apple').groupby(df['Date']).sum().reset_index(name='count')
print (df1)
Date count
0 2018-11-01 2
1 2018-11-02 0
2 2018-11-03 2
您可以尝试使用 str.contains 和 groupby
函数的替代解决方案,而无需一直使用 sum
。
>>> df
Date content
0 2018-11-01 Apple Inc. AAPL 1.54% reported its fourth cons
1 2018-11-01 U.S. stocks climbed Thursday, Apple is a real
2 2018-11-02 GONE are the days when smartphone manufacturer
3 2018-11-03 To historians of technology, the story of the
4 2018-11-03 Apple Inc. AAPL 1.54% reported its fourth cons
5 2018-11-03 Apple is turning to traditional broadcasting t
解决方案:
df.content.str.contains("Apple").groupby(df['Date']).count().reset_index(name="count")
Date count
0 2018-11-01 2
1 2018-11-02 1
2 2018-11-03 3
# df["content"].str.contains('Apple',case=True,na=False).groupby(df['Date']).count()