Pandas - 计算 df 中的行以发现每天的存活率
Pandas - Counting rows in a df to discover the survival rate each day
。
大家好!
我有一个 dfA
(Table A),其中包含某些产品可用的天数 (days_survived
)。我需要计算每天可用的产品总数 (Table B)。我的意思是,我需要计算 dfA
中的行数以发现前 5 天 (df2
) 的每一天的存活率。
Table答:
+-------+--------------+
| id | days_survived|
+-------+--------------+
| 1 | 1 |
| 2 | 3 |
| 3 | 10 |
| 4 | 40 |
| 5 | 4 |
| 6 | 9 |
+-------+--------------+
Table B(前5天预期结果分析):
+-------+----------------+
| day | #count_survived|
+-------+----------------+
| 1 | 6 |
| 2 | 5 |
| 3 | 5 |
| 4 | 4 |
| 5 | 3 |
+-------+----------------+
这个结果意味着第一天总共有6个产品可用,然后第二天和第三天只有5个,然后第四天只有4个,最后第五天只有3个。
代码:
# create df
import pandas as pd
d = {'id': [1,2,3,4,5,6], 'days_survived': [1,3,10,40,4,9]}
dfA = pd.DataFrame(data=d)
有人能帮帮我吗? :)
将列表理解与展平和过滤结合使用,然后计数:
comp = [y for x in dfA['days_survived'] for y in range(1, x + 1) if y < 6]
s = pd.Series(comp).value_counts().rename_axis('day').reset_index(name='#count_survived')
print (s)
day #count_survived
0 1 6
1 3 5
2 2 5
3 4 4
4 5 3
Counter
的另一个解决方案:
from collections import Counter
comp = [y for x in dfA['days_survived'] for y in range(1, x + 1) if y < 6]
d = Counter(comp)
df = pd.DataFrame({'day':list(d.keys()), '#count_survived':list(d.values())})
这是使用集合,创建一个列表,列出某个项目出现的所有天数,然后计算列表中每一天的出现次数
import pandas as pd
import numpy as np
from collections import Counter
df = pd.DataFrame(data={'id': [1,2,3,4,5,6], 'days_survived': [1,3,10,40,4,9]})
# We will create a new column having values as a list of all the days for which item was present
df['Days'] = df.apply(lambda a : list(np.arange(1,a.days_survived+1)),axis=1)
# Applyin Counter to the flattened list of all elements in 'Days' column
cnt= Counter([item for items in list(df['Days']) for item in items])
#Converting cnt Counter object to Dataframe
df_new = pd.DataFrame(data= {'Days':list(cnt.keys()),'count':list(cnt.values())})
希望这对您有所帮助。
。
大家好!
我有一个 dfA
(Table A),其中包含某些产品可用的天数 (days_survived
)。我需要计算每天可用的产品总数 (Table B)。我的意思是,我需要计算 dfA
中的行数以发现前 5 天 (df2
) 的每一天的存活率。
Table答:
+-------+--------------+
| id | days_survived|
+-------+--------------+
| 1 | 1 |
| 2 | 3 |
| 3 | 10 |
| 4 | 40 |
| 5 | 4 |
| 6 | 9 |
+-------+--------------+
Table B(前5天预期结果分析):
+-------+----------------+
| day | #count_survived|
+-------+----------------+
| 1 | 6 |
| 2 | 5 |
| 3 | 5 |
| 4 | 4 |
| 5 | 3 |
+-------+----------------+
这个结果意味着第一天总共有6个产品可用,然后第二天和第三天只有5个,然后第四天只有4个,最后第五天只有3个。
代码:
# create df
import pandas as pd
d = {'id': [1,2,3,4,5,6], 'days_survived': [1,3,10,40,4,9]}
dfA = pd.DataFrame(data=d)
有人能帮帮我吗? :)
将列表理解与展平和过滤结合使用,然后计数:
comp = [y for x in dfA['days_survived'] for y in range(1, x + 1) if y < 6]
s = pd.Series(comp).value_counts().rename_axis('day').reset_index(name='#count_survived')
print (s)
day #count_survived
0 1 6
1 3 5
2 2 5
3 4 4
4 5 3
Counter
的另一个解决方案:
from collections import Counter
comp = [y for x in dfA['days_survived'] for y in range(1, x + 1) if y < 6]
d = Counter(comp)
df = pd.DataFrame({'day':list(d.keys()), '#count_survived':list(d.values())})
这是使用集合,创建一个列表,列出某个项目出现的所有天数,然后计算列表中每一天的出现次数
import pandas as pd
import numpy as np
from collections import Counter
df = pd.DataFrame(data={'id': [1,2,3,4,5,6], 'days_survived': [1,3,10,40,4,9]})
# We will create a new column having values as a list of all the days for which item was present
df['Days'] = df.apply(lambda a : list(np.arange(1,a.days_survived+1)),axis=1)
# Applyin Counter to the flattened list of all elements in 'Days' column
cnt= Counter([item for items in list(df['Days']) for item in items])
#Converting cnt Counter object to Dataframe
df_new = pd.DataFrame(data= {'Days':list(cnt.keys()),'count':list(cnt.values())})
希望这对您有所帮助。