如何通过 ID select 数据框中的行信息
How to select row information in dataframe over ID
我是 python 的新人。我有一个像这样的大数据框:
ID x y
0 1 x1 y1
1 0 x2 y2
2 0 x3 y3
3 2 x4 y4
4 1 x5 y5
5 2 x6 y6
我想在 ID 1 和 2 之间取几个 (x;y),在这样的数据框中:
coordinates
0 (x1,y1), (x2,y2), (x3,y3), (x4,y4)
1 (x5,y5), (x6,y6)
我已经尝试了 double 迭代,但计算时间太长。我怎样才能得到这个东西?
一个想法是按每个 1
起始值创建组并聚合元组的自定义 lambda 函数:
df['new'] = (df['ID'] == 1).cumsum()
print (df)
ID x y new
0 1 x1 y1 1
1 0 x2 y2 1
2 0 x3 y3 1
3 2 x4 y4 1
4 1 x5 y5 2
5 2 x6 y6 2
df1 = (df.groupby('new')['x','y']
.apply(lambda x: list(map(tuple, x.values.tolist())))
.reset_index(name='coordinates'))
print (df1)
new coordinates
0 1 [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
1 2 [(x5, y5), (x6, y6)]
没有新列的类似解决方案:
df1 = (df.groupby((df['ID'].rename('new') == 1).cumsum())['x','y']
.apply(lambda x: list(map(tuple, x.values.tolist())))
.reset_index(name='coordinates'))
print (df1)
new coordinates
0 1 [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
1 2 [(x5, y5), (x6, y6)]
编辑:
print (df)
ID x y
0 1 x1 y1
1 0 x2 y2
2 0 x3 y3
3 2 x4 y4
4 0 x7 y7
4 0 x8 y8
4 1 x5 y5
5 2 x6 y6
g = df['ID'].eq(1).cumsum()
s = df['ID'].shift().eq(2).cumsum()
df = df[s.groupby(g).transform('min').eq(s)]
print (df)
ID x y
0 1 x1 y1
1 0 x2 y2
2 0 x3 y3
3 2 x4 y4
4 1 x5 y5
5 2 x6 y6
df1 = (df.groupby((df['ID'].rename('new') == 1).cumsum())['x','y']
.apply(lambda x: list(map(tuple, x.values.tolist())))
.reset_index(name='coordinates'))
print (df1)
new coordinates
0 1 [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
1 2 [(x5, y5), (x6, y6)]
您可以使用 apply
tuple across axis 1, and groupby
your "groups" using cumsum
with eq(1)
and use list
aggregation:
(df[['x', 'y']].apply(tuple, axis=1)
.groupby(df['ID'].eq(1).cumsum()).agg(list))
[出]
ID
1 [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
2 [(x5, y5), (x6, y6)]
dtype: object
或者如果预期输出是逗号分隔的坐标字符串,您可以 apply
join
函数:
(df[['x', 'y']].apply(tuple, axis=1).astype(str)
.groupby(df['ID'].eq(1).cumsum()).apply(', '.join))
[出]
ID
1 ('x1', 'y1'), ('x2', 'y2'), ('x3', 'y3'), ('x4', 'y4')
2 ('x5', 'y5'), ('x6', 'y6')
dtype: object
我是 python 的新人。我有一个像这样的大数据框:
ID x y
0 1 x1 y1
1 0 x2 y2
2 0 x3 y3
3 2 x4 y4
4 1 x5 y5
5 2 x6 y6
我想在 ID 1 和 2 之间取几个 (x;y),在这样的数据框中:
coordinates
0 (x1,y1), (x2,y2), (x3,y3), (x4,y4)
1 (x5,y5), (x6,y6)
我已经尝试了 double 迭代,但计算时间太长。我怎样才能得到这个东西?
一个想法是按每个 1
起始值创建组并聚合元组的自定义 lambda 函数:
df['new'] = (df['ID'] == 1).cumsum()
print (df)
ID x y new
0 1 x1 y1 1
1 0 x2 y2 1
2 0 x3 y3 1
3 2 x4 y4 1
4 1 x5 y5 2
5 2 x6 y6 2
df1 = (df.groupby('new')['x','y']
.apply(lambda x: list(map(tuple, x.values.tolist())))
.reset_index(name='coordinates'))
print (df1)
new coordinates
0 1 [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
1 2 [(x5, y5), (x6, y6)]
没有新列的类似解决方案:
df1 = (df.groupby((df['ID'].rename('new') == 1).cumsum())['x','y']
.apply(lambda x: list(map(tuple, x.values.tolist())))
.reset_index(name='coordinates'))
print (df1)
new coordinates
0 1 [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
1 2 [(x5, y5), (x6, y6)]
编辑:
print (df)
ID x y
0 1 x1 y1
1 0 x2 y2
2 0 x3 y3
3 2 x4 y4
4 0 x7 y7
4 0 x8 y8
4 1 x5 y5
5 2 x6 y6
g = df['ID'].eq(1).cumsum()
s = df['ID'].shift().eq(2).cumsum()
df = df[s.groupby(g).transform('min').eq(s)]
print (df)
ID x y
0 1 x1 y1
1 0 x2 y2
2 0 x3 y3
3 2 x4 y4
4 1 x5 y5
5 2 x6 y6
df1 = (df.groupby((df['ID'].rename('new') == 1).cumsum())['x','y']
.apply(lambda x: list(map(tuple, x.values.tolist())))
.reset_index(name='coordinates'))
print (df1)
new coordinates
0 1 [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
1 2 [(x5, y5), (x6, y6)]
您可以使用 apply
tuple across axis 1, and groupby
your "groups" using cumsum
with eq(1)
and use list
aggregation:
(df[['x', 'y']].apply(tuple, axis=1)
.groupby(df['ID'].eq(1).cumsum()).agg(list))
[出]
ID
1 [(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
2 [(x5, y5), (x6, y6)]
dtype: object
或者如果预期输出是逗号分隔的坐标字符串,您可以 apply
join
函数:
(df[['x', 'y']].apply(tuple, axis=1).astype(str)
.groupby(df['ID'].eq(1).cumsum()).apply(', '.join))
[出]
ID
1 ('x1', 'y1'), ('x2', 'y2'), ('x3', 'y3'), ('x4', 'y4')
2 ('x5', 'y5'), ('x6', 'y6')
dtype: object