将 运行 计数分配给以 3 pandas 为一组的新列
Assign running count to a new column in groups of 3 pandas
我正在尝试根据另一个 2 中的值在 pandas df
中分配一个新的 column
。
在下面的 df 中,对于 Location
(Home, Away etc)
中的每个单独值,我想为第一个 3
对应的 unique
分配一个递增的 integer
Day
.
中的值
import pandas as pd
import numpy as np
d = ({
'Time' : ['7:00:00','8:00:00','9:00:00','11:00:00','12:00:00','1:00:00','2:00:00','3:00:00'],
'Day' : ['Mon','Tues','Wed','Thurs','Fri','Thurs','Fri','Sat'],
'Location' : ['Home','Home','Home','Away','Away','Home','Home','Home'],
})
df = pd.DataFrame(data=d)
#Assign values from Home
mask = df['Location'] == 'Home'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
#Assign values from Away
mask = df['Location'] == 'Away'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
输出:
Time Day Location Assign
0 7:00:00 Mon Home 1.0
1 8:00:00 Tues Home 1.0
2 9:00:00 Wed Home 1.0
3 11:00:00 Thurs Away 1.0
4 12:00:00 Fri Away 1.0
5 1:00:00 Thurs Home 2.0
6 2:00:00 Fri Home 2.0
7 3:00:00 Sat Home 2.0
预期输出:
Time Day Location Assign
0 7:00:00 Mon Home 1.0
1 8:00:00 Tues Home 1.0
2 9:00:00 Wed Home 1.0
3 11:00:00 Thurs Away 2.0
4 12:00:00 Fri Away 2.0
5 1:00:00 Thurs Home 3.0
6 2:00:00 Fri Home 3.0
7 3:00:00 Sat Home 3.0
我认为 GroupBy.apply
and then convert values to numeric values by factorize
:
需要自定义函数
def f(x):
x1 = x.drop_duplicates('Day')
d = dict(zip(x1['Day'], np.arange(len(x1)) // 3 + 1))
x['new'] = x['Day'].map(d)
return x
df = df.groupby('Location', sort=False, group_keys=False).apply(f)
df['new'] = pd.factorize(df['new'].astype(str) + df['Location'])[0] + 1
print (df)
Time Day Location new
0 7:00:00 Mon Home 1
1 8:00:00 Tues Home 1
2 9:00:00 Wed Home 1
3 11:00:00 Thurs Away 2
4 12:00:00 Fri Away 2
5 1:00:00 Thurs Home 3
6 2:00:00 Fri Home 3
7 3:00:00 Sat Home 3
另一个类似的解决方案 unique
而不是 drop_duplicates
:
def f(x):
u = x['Day'].unique()
d = dict(zip(u, np.arange(len(u)) // 3 + 1))
x['new'] = x['Day'].map(d)
return x
df = df.groupby('Location', sort=False).apply(f)
s = df['new'].astype(str) + df['Location']
df['new'] = pd.factorize(s)[0] + 1
print (df)
Day Location new
0 Mon Home 1
1 Tues Home 1
2 Wed Away 2
3 Wed Home 1
4 Thurs Away 2
5 Thurs Home 3
6 Fri Home 3
7 Mon Home 1
8 Sat Home 3
9 Fri Away 2
10 Sun Home 4
我正在尝试根据另一个 2 中的值在 pandas df
中分配一个新的 column
。
在下面的 df 中,对于 Location
(Home, Away etc)
中的每个单独值,我想为第一个 3
对应的 unique
分配一个递增的 integer
Day
.
import pandas as pd
import numpy as np
d = ({
'Time' : ['7:00:00','8:00:00','9:00:00','11:00:00','12:00:00','1:00:00','2:00:00','3:00:00'],
'Day' : ['Mon','Tues','Wed','Thurs','Fri','Thurs','Fri','Sat'],
'Location' : ['Home','Home','Home','Away','Away','Home','Home','Home'],
})
df = pd.DataFrame(data=d)
#Assign values from Home
mask = df['Location'] == 'Home'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
#Assign values from Away
mask = df['Location'] == 'Away'
df1 = df[mask].drop_duplicates('Day')
d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))
df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)
输出:
Time Day Location Assign
0 7:00:00 Mon Home 1.0
1 8:00:00 Tues Home 1.0
2 9:00:00 Wed Home 1.0
3 11:00:00 Thurs Away 1.0
4 12:00:00 Fri Away 1.0
5 1:00:00 Thurs Home 2.0
6 2:00:00 Fri Home 2.0
7 3:00:00 Sat Home 2.0
预期输出:
Time Day Location Assign
0 7:00:00 Mon Home 1.0
1 8:00:00 Tues Home 1.0
2 9:00:00 Wed Home 1.0
3 11:00:00 Thurs Away 2.0
4 12:00:00 Fri Away 2.0
5 1:00:00 Thurs Home 3.0
6 2:00:00 Fri Home 3.0
7 3:00:00 Sat Home 3.0
我认为 GroupBy.apply
and then convert values to numeric values by factorize
:
def f(x):
x1 = x.drop_duplicates('Day')
d = dict(zip(x1['Day'], np.arange(len(x1)) // 3 + 1))
x['new'] = x['Day'].map(d)
return x
df = df.groupby('Location', sort=False, group_keys=False).apply(f)
df['new'] = pd.factorize(df['new'].astype(str) + df['Location'])[0] + 1
print (df)
Time Day Location new
0 7:00:00 Mon Home 1
1 8:00:00 Tues Home 1
2 9:00:00 Wed Home 1
3 11:00:00 Thurs Away 2
4 12:00:00 Fri Away 2
5 1:00:00 Thurs Home 3
6 2:00:00 Fri Home 3
7 3:00:00 Sat Home 3
另一个类似的解决方案 unique
而不是 drop_duplicates
:
def f(x):
u = x['Day'].unique()
d = dict(zip(u, np.arange(len(u)) // 3 + 1))
x['new'] = x['Day'].map(d)
return x
df = df.groupby('Location', sort=False).apply(f)
s = df['new'].astype(str) + df['Location']
df['new'] = pd.factorize(s)[0] + 1
print (df)
Day Location new
0 Mon Home 1
1 Tues Home 1
2 Wed Away 2
3 Wed Home 1
4 Thurs Away 2
5 Thurs Home 3
6 Fri Home 3
7 Mon Home 1
8 Sat Home 3
9 Fri Away 2
10 Sun Home 4