根据另一列(日期)计算列中的出现次数
Count occurrences in column based on another column (date)
我正在尝试按所在月份计算“类型”出现的次数。
给出了每日数据,所以按月分组我尝试使用 .resample()
但使用的问题是将所有字符串组合在一个长字符串中然后我无法计算出现的次数使用 str.count()
因为它 returns 是错误的值(它找到了太多匹配项,因为它不是在寻找精确的模式)。
我认为它必须不止一步完成...
我尝试了很多东西...我什至听说有一个支点 table?
示例数据:
Type
Date
Cat
2020-01-01
Cat
2020-01-01
Bird
2020-01-01
Dog
2020-01-01
Cat
2020-02-01
Cat
2020-03-01
Bird
2020-03-01
Cat
2020-05-02
...几年中的所有月份...
转换为以下格式:(header的标题也可以是数字形式)
January 2020
February 2020
Cat
4
1
Bird
1
0
Dog
1
0
据我所知,Pandas 没有标准函数或典型方法来获得您想要的结果。下面我包含了一个代码片段,可以得到您想要的结果。
如果您不介意使用额外的包,可以使用一些包进行 quicker/easier 二进制编码(例如 category_encoder
)。
import pandas as pd
# your data in dictionary format
d = {
"Type":["Cat","Cat","Bird","Dog","Cat","Cat","Bird","Cat"],
"Date":["2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-03-01","2020-03-01","2020-05-02"]
}
# creata dataframe with the dates as index
df = pd.DataFrame(data = d['Type'], index=pd.to_datetime(d['Date']))
animals = list(df[0].unique()) # a list contaning all unique animals
ndf = pd.DataFrame(index=animals) # empty new dataframe with all animals as index
for animal in animals:
ndf.loc[animal, df.index.month.unique()] = ( # at row = animal, insert all unique months
(df == animal).groupby(df.index.month) # groupby months, using .month (returns 1 for Jan)
.sum() # sum since we use bool comparison
.transpose() # tranpose due to desired output format
.values # array of values to insert
)
# convert column names back to date time and save as string in desired format
ndf.columns = pd.to_datetime(ndf.columns, format='%m').strftime('%B 2020')
结果
January 2020
February 2020
March 2020
May 2020
Cat
2
1
1
1
Bird
1
0
1
0
Dog
1
0
0
0
我正在尝试按所在月份计算“类型”出现的次数。
给出了每日数据,所以按月分组我尝试使用 .resample()
但使用的问题是将所有字符串组合在一个长字符串中然后我无法计算出现的次数使用 str.count()
因为它 returns 是错误的值(它找到了太多匹配项,因为它不是在寻找精确的模式)。
我认为它必须不止一步完成... 我尝试了很多东西...我什至听说有一个支点 table?
示例数据:
Type | Date |
---|---|
Cat | 2020-01-01 |
Cat | 2020-01-01 |
Bird | 2020-01-01 |
Dog | 2020-01-01 |
Cat | 2020-02-01 |
Cat | 2020-03-01 |
Bird | 2020-03-01 |
Cat | 2020-05-02 |
...几年中的所有月份...
转换为以下格式:(header的标题也可以是数字形式)
January 2020 | February 2020 | |
---|---|---|
Cat | 4 | 1 |
Bird | 1 | 0 |
Dog | 1 | 0 |
据我所知,Pandas 没有标准函数或典型方法来获得您想要的结果。下面我包含了一个代码片段,可以得到您想要的结果。
如果您不介意使用额外的包,可以使用一些包进行 quicker/easier 二进制编码(例如 category_encoder
)。
import pandas as pd
# your data in dictionary format
d = {
"Type":["Cat","Cat","Bird","Dog","Cat","Cat","Bird","Cat"],
"Date":["2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-03-01","2020-03-01","2020-05-02"]
}
# creata dataframe with the dates as index
df = pd.DataFrame(data = d['Type'], index=pd.to_datetime(d['Date']))
animals = list(df[0].unique()) # a list contaning all unique animals
ndf = pd.DataFrame(index=animals) # empty new dataframe with all animals as index
for animal in animals:
ndf.loc[animal, df.index.month.unique()] = ( # at row = animal, insert all unique months
(df == animal).groupby(df.index.month) # groupby months, using .month (returns 1 for Jan)
.sum() # sum since we use bool comparison
.transpose() # tranpose due to desired output format
.values # array of values to insert
)
# convert column names back to date time and save as string in desired format
ndf.columns = pd.to_datetime(ndf.columns, format='%m').strftime('%B 2020')
结果
January 2020 | February 2020 | March 2020 | May 2020 | |
---|---|---|---|---|
Cat | 2 | 1 | 1 | 1 |
Bird | 1 | 0 | 1 | 0 |
Dog | 1 | 0 | 0 | 0 |