转换数据框(pandas/Python)中的系列,其中列是系列的级别
Transform a Series in a dataframe (of pandas/Python) where the columns are the levels of the Series
我正在使用 pandas 并且我使用了 groupby:
group = df_crimes_query.groupby(["CrimeDateTime", "WeaponFactor"]).size()
group.head(20)
CrimeDateTime WeaponFactor
2016-01-01 FIREARM 11
HANDS 26
KNIFE 3
OTHER 11
UNDEFINED 102
2016-01-02 FIREARM 10
HANDS 21
KNIFE 8
OTHER 6
UNDEFINED 68
2016-01-03 FIREARM 12
HANDS 13
KNIFE 6
OTHER 5
UNDEFINED 73
2016-01-04 FIREARM 11
HANDS 10
KNIFE 1
OTHER 3
UNDEFINED 84
dtype: int64
它的类型是系列:
type(group)
pandas.core.series.Series
我想要一个这样的数据框:
CrimeDateTime FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
我想使用此数据框绘制五个时间序列,每种时间序列一个(FIREARM、HANDS 等)。我尝试过,在网上搜索过,但是没有成功。
代码在我的 GitHub 中(在名为“测试”的部分):https://github.com/rmmariano/CAP386_intro_data_science/blob/master/projeto/crimes_baltimore/crimes_baltimore.ipynb
我有其他测试代码,但为了清楚起见我已经删除了。
有人知道吗?
您将使用
获得想要的结果
df_crimes_query.groupby(["CrimeDateTime", "WeaponFactor"]).size().unstack().reset_index()
您可以使用 pivot table 而不是 groupby,即
df.pivot_table(index='CrimeDateTime',columns='WeaponFactor',values='count')
如果您有这样的数据框,则基于笔记本中的代码
CrimeDateTime WeaponFactor count
0 2016-01-01 FIREARM 11
1 2016-01-01 HANDS 26
2 2016-01-01 KNIFE 3
3 2016-01-01 OTHER 11
4 2016-01-01 UNDEFINED 102
5 2016-01-02 FIREARM 10
6 2016-01-02 HANDS 21
7 2016-01-02 KNIFE 8
8 2016-01-02 OTHER 6
9 2016-01-02 UNDEFINED 68
10 2016-01-03 FIREARM 12
11 2016-01-03 HANDS 13
12 2016-01-03 KNIFE 6
13 2016-01-03 OTHER 5
14 2016-01-03 UNDEFINED 73
15 2016-01-04 FIREARM 11
16 2016-01-04 HANDS 10
17 2016-01-04 KNIFE 1
18 2016-01-04 OTHER 3
19 2016-01-04 UNDEFINED 84
输出:
df.pivot_table(index='CrimeDateTime',columns='WeaponFactor',values='count')
WeaponFactor FIREARM HANDS KNIFE OTHER UNDEFINED
CrimeDateTime
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
In [595]:
选项 1
简单而缓慢
pd.crosstab(df.CrimeDateTime, df.WeaponFactor)
WeaponFactor FIREARM HANDS KNIFE OTHER UNDEFINED
CrimeDateTime
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
选项 2
更快更酷!
pd.get_dummies(df.CrimeDateTime).T.dot(pd.get_dummies(df.WeaponFactor))
FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
选项 3
下一级功夫熊猫!
i, r = pd.factorize(df.CrimeDateTime.values)
j, c = pd.factorize(df.WeaponFactor.values)
n, m = r.size, c.size
b = np.bincount(j + i * m, minlength=n * m).reshape(n, m)
pd.DataFrame(b, r, c)
FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
我正在使用 pandas 并且我使用了 groupby:
group = df_crimes_query.groupby(["CrimeDateTime", "WeaponFactor"]).size()
group.head(20)
CrimeDateTime WeaponFactor
2016-01-01 FIREARM 11
HANDS 26
KNIFE 3
OTHER 11
UNDEFINED 102
2016-01-02 FIREARM 10
HANDS 21
KNIFE 8
OTHER 6
UNDEFINED 68
2016-01-03 FIREARM 12
HANDS 13
KNIFE 6
OTHER 5
UNDEFINED 73
2016-01-04 FIREARM 11
HANDS 10
KNIFE 1
OTHER 3
UNDEFINED 84
dtype: int64
它的类型是系列:
type(group)
pandas.core.series.Series
我想要一个这样的数据框:
CrimeDateTime FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
我想使用此数据框绘制五个时间序列,每种时间序列一个(FIREARM、HANDS 等)。我尝试过,在网上搜索过,但是没有成功。
代码在我的 GitHub 中(在名为“测试”的部分):https://github.com/rmmariano/CAP386_intro_data_science/blob/master/projeto/crimes_baltimore/crimes_baltimore.ipynb
我有其他测试代码,但为了清楚起见我已经删除了。
有人知道吗?
您将使用
获得想要的结果df_crimes_query.groupby(["CrimeDateTime", "WeaponFactor"]).size().unstack().reset_index()
您可以使用 pivot table 而不是 groupby,即
df.pivot_table(index='CrimeDateTime',columns='WeaponFactor',values='count')
如果您有这样的数据框,则基于笔记本中的代码
CrimeDateTime WeaponFactor count 0 2016-01-01 FIREARM 11 1 2016-01-01 HANDS 26 2 2016-01-01 KNIFE 3 3 2016-01-01 OTHER 11 4 2016-01-01 UNDEFINED 102 5 2016-01-02 FIREARM 10 6 2016-01-02 HANDS 21 7 2016-01-02 KNIFE 8 8 2016-01-02 OTHER 6 9 2016-01-02 UNDEFINED 68 10 2016-01-03 FIREARM 12 11 2016-01-03 HANDS 13 12 2016-01-03 KNIFE 6 13 2016-01-03 OTHER 5 14 2016-01-03 UNDEFINED 73 15 2016-01-04 FIREARM 11 16 2016-01-04 HANDS 10 17 2016-01-04 KNIFE 1 18 2016-01-04 OTHER 3 19 2016-01-04 UNDEFINED 84
输出:
df.pivot_table(index='CrimeDateTime',columns='WeaponFactor',values='count')
WeaponFactor FIREARM HANDS KNIFE OTHER UNDEFINED CrimeDateTime 2016-01-01 11 26 3 11 102 2016-01-02 10 21 8 6 68 2016-01-03 12 13 6 5 73 2016-01-04 11 10 1 3 84 In [595]:
选项 1
简单而缓慢
pd.crosstab(df.CrimeDateTime, df.WeaponFactor)
WeaponFactor FIREARM HANDS KNIFE OTHER UNDEFINED
CrimeDateTime
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
选项 2
更快更酷!
pd.get_dummies(df.CrimeDateTime).T.dot(pd.get_dummies(df.WeaponFactor))
FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84
选项 3
下一级功夫熊猫!
i, r = pd.factorize(df.CrimeDateTime.values)
j, c = pd.factorize(df.WeaponFactor.values)
n, m = r.size, c.size
b = np.bincount(j + i * m, minlength=n * m).reshape(n, m)
pd.DataFrame(b, r, c)
FIREARM HANDS KNIFE OTHER UNDEFINED
2016-01-01 11 26 3 11 102
2016-01-02 10 21 8 6 68
2016-01-03 12 13 6 5 73
2016-01-04 11 10 1 3 84