来自 Python Pandas Dataframe 的重复嵌套列表
Duplicated Nested list from Python Pandas Dataframe
这是我的数据框
例如:
requesttime checkinperiod
0 2016-10-16T14:53:58.000Z 8
1 2016-10-16T22:53:22.000Z 8
2 2016-10-18T14:52:22.000Z 8
3 2016-10-18T06:53:08.000Z 8
4 2016-10-16T06:53:37.000Z 8
5 2016-10-15T22:53:14.000Z 8
6 2016-10-19T22:51:51.000Z 8
7 2016-10-22T10:16:57.000Z 12
8 2016-10-20T10:54:37.000Z 12
9 2016-10-20T06:51:42.000Z 12
10 2016-10-10T22:44:17.000Z 24
11 2016-10-13T22:47:26.000Z 8
12 2016-10-14T14:53:27.000Z 8
13 2016-10-14T22:53:58.000Z 8
14 2016-10-15T06:53:28.000Z 8
15 2016-10-14T06:53:58.000Z 8
16 2016-10-10T16:38:28.000Z 24
17 2016-10-17T06:53:50.000Z 8
18 2016-10-17T14:53:12.000Z 8
19 2016-10-19T14:51:53.000Z 8
20 2016-10-17T22:53:44.000Z 8
21 2016-10-15T14:53:50.000Z 8
22 2016-10-18T22:52:39.000Z 8
23 2016-10-12T22:27:51.000Z 24
24 2016-10-11T23:05:57.000Z 24
25 2016-10-19T06:52:53.000Z 8
26 2016-10-21T10:09:09.000Z 12
27 2016-10-21T22:17:15.000Z 12
28 2016-10-22T22:16:53.000Z 12
29 2016-10-20T23:02:13.000Z 12
期望的输出:
{
8 : [
[2016-10-16T14:53:58.000Z, 2016-10-16T22:53:22.000Z, 2016-10-18T14:52:22.000Z, 2016-10-16T06:53:37.000Z, 2016-10-15T22:53:14.000Z, 2016-10-19T22:51:51.000Z],
[2016-10-13T22:47:26.000Z, 2016-10-13T22:47:26.000Z, 2016-10-14T22:53:58.000Z, 2016-10-15T06:53:28.000Z, 2016-10-14T06:53:58.000Z],
[2016-10-17T06:53:50.000Z, 2016-10-17T14:53:12.000Z, 2016-10-19T14:51:53.000Z, 2016-10-17T22:53:44.000Z, 2016-10-15T14:53:50.000Z, 2016-10-18T22:52:39.000Z],
[2016-10-19T06:52:53.000Z]
],
12: [
[2016-10-22T10:16:57.000Z, 2016-10-20T10:54:37.000Z, 2016-10-20T06:51:42.000Z],
[2016-10-21T10:09:09.000Z, 2016-10-21T22:17:15.000Z, 2016-10-22T22:16:53.000Z, 2016-10-20T23:02:13.000Z]
],
24: [
[2016-10-10T22:44:17.000Z],
[2016-10-10T16:38:28.000Z],
[2016-10-12T22:27:51.000Z, 2016-10-11T23:05:57.000Z]
]
}
谢谢
峰会
使用正则表达式过滤数据并设置字典键尝试text 2 regex
import pandas as pd
# make sample data
col = 'checkinperiod'
df = pd.DataFrame([['a', 8], ['b', 8], ['c', 8],['c', 12], ['d', 8], ['e', 12], ['f', 12]],
columns=['requesttime', col])
print df
requesttime checkinperiod
0 a 8
1 b 8
2 c 8
3 c 12
4 d 8
5 e 12
6 f 12
# shift the dataframe one row down and compare with previous row
df['group'] = (df[col].shift(1) != df[col]).astype(int).cumsum()
print df
requesttime checkinperiod group
0 a 8 1
1 b 8 1
2 c 8 1
3 c 12 2
4 d 8 3
5 e 12 4
6 f 12 4
# group by those groups and combine the results
df_grouped = pd.DataFrame(df.groupby([col, 'group']).apply(
lambda df: list(df['requesttime'])))
df_grouped = df_grouped.reset_index().drop('group', axis=1)
print df_grouped
checkinperiod 0
0 8 [a, b, c]
1 8 [d]
2 12 [c]
3 12 [e, f]
result = df_grouped.groupby(col).apply(lambda df: list(df[0])).to_dict()
print result
{8: [['a', 'b', 'c'], ['d']], 12: [['c'], ['e', 'f']]}
灵感来自 [1]
这是我的数据框 例如:
requesttime checkinperiod
0 2016-10-16T14:53:58.000Z 8
1 2016-10-16T22:53:22.000Z 8
2 2016-10-18T14:52:22.000Z 8
3 2016-10-18T06:53:08.000Z 8
4 2016-10-16T06:53:37.000Z 8
5 2016-10-15T22:53:14.000Z 8
6 2016-10-19T22:51:51.000Z 8
7 2016-10-22T10:16:57.000Z 12
8 2016-10-20T10:54:37.000Z 12
9 2016-10-20T06:51:42.000Z 12
10 2016-10-10T22:44:17.000Z 24
11 2016-10-13T22:47:26.000Z 8
12 2016-10-14T14:53:27.000Z 8
13 2016-10-14T22:53:58.000Z 8
14 2016-10-15T06:53:28.000Z 8
15 2016-10-14T06:53:58.000Z 8
16 2016-10-10T16:38:28.000Z 24
17 2016-10-17T06:53:50.000Z 8
18 2016-10-17T14:53:12.000Z 8
19 2016-10-19T14:51:53.000Z 8
20 2016-10-17T22:53:44.000Z 8
21 2016-10-15T14:53:50.000Z 8
22 2016-10-18T22:52:39.000Z 8
23 2016-10-12T22:27:51.000Z 24
24 2016-10-11T23:05:57.000Z 24
25 2016-10-19T06:52:53.000Z 8
26 2016-10-21T10:09:09.000Z 12
27 2016-10-21T22:17:15.000Z 12
28 2016-10-22T22:16:53.000Z 12
29 2016-10-20T23:02:13.000Z 12
期望的输出:
{
8 : [
[2016-10-16T14:53:58.000Z, 2016-10-16T22:53:22.000Z, 2016-10-18T14:52:22.000Z, 2016-10-16T06:53:37.000Z, 2016-10-15T22:53:14.000Z, 2016-10-19T22:51:51.000Z],
[2016-10-13T22:47:26.000Z, 2016-10-13T22:47:26.000Z, 2016-10-14T22:53:58.000Z, 2016-10-15T06:53:28.000Z, 2016-10-14T06:53:58.000Z],
[2016-10-17T06:53:50.000Z, 2016-10-17T14:53:12.000Z, 2016-10-19T14:51:53.000Z, 2016-10-17T22:53:44.000Z, 2016-10-15T14:53:50.000Z, 2016-10-18T22:52:39.000Z],
[2016-10-19T06:52:53.000Z]
],
12: [
[2016-10-22T10:16:57.000Z, 2016-10-20T10:54:37.000Z, 2016-10-20T06:51:42.000Z],
[2016-10-21T10:09:09.000Z, 2016-10-21T22:17:15.000Z, 2016-10-22T22:16:53.000Z, 2016-10-20T23:02:13.000Z]
],
24: [
[2016-10-10T22:44:17.000Z],
[2016-10-10T16:38:28.000Z],
[2016-10-12T22:27:51.000Z, 2016-10-11T23:05:57.000Z]
]
}
谢谢 峰会
使用正则表达式过滤数据并设置字典键尝试text 2 regex
import pandas as pd
# make sample data
col = 'checkinperiod'
df = pd.DataFrame([['a', 8], ['b', 8], ['c', 8],['c', 12], ['d', 8], ['e', 12], ['f', 12]],
columns=['requesttime', col])
print df
requesttime checkinperiod
0 a 8
1 b 8
2 c 8
3 c 12
4 d 8
5 e 12
6 f 12
# shift the dataframe one row down and compare with previous row
df['group'] = (df[col].shift(1) != df[col]).astype(int).cumsum()
print df
requesttime checkinperiod group
0 a 8 1
1 b 8 1
2 c 8 1
3 c 12 2
4 d 8 3
5 e 12 4
6 f 12 4
# group by those groups and combine the results
df_grouped = pd.DataFrame(df.groupby([col, 'group']).apply(
lambda df: list(df['requesttime'])))
df_grouped = df_grouped.reset_index().drop('group', axis=1)
print df_grouped
checkinperiod 0
0 8 [a, b, c]
1 8 [d]
2 12 [c]
3 12 [e, f]
result = df_grouped.groupby(col).apply(lambda df: list(df[0])).to_dict()
print result
{8: [['a', 'b', 'c'], ['d']], 12: [['c'], ['e', 'f']]}
灵感来自 [1]