Python shift() 来自与日期相同的 Excel 中的同一列
Python shift() from same column like in Excel with dates
我想在 python 中创建 'target_start' 列:
id
start
end
diff
target_start
12220
1999-11-22
2008-08-31
3515
1999-11-22
12220
2018-04-16
2019-09-15
1
2018-04-16
12220
2019-09-16
2019-11-30
1
2018-04-16
12220
2019-12-01
2020-03-31
1
2018-04-16
12220
2020-04-01
2020-06-30
-711
2018-04-16
11132
2018-07-20
2019-09-15
1
2018-07-20
11132
2019-09-16
2021-01-01
-44197
2018-07-20
这在Excel中很容易解决:
但我不知道,我如何在 pyton 中执行此操作:第一个目标行是“1”,然后:
df.loc[df.index==0,'target_start']= df['start']
我试过这段代码,但没有用:
import pandas as pd
df=pd.read_excel('./Shift.xlsx')
#if id != id.shift(1) then target_start = start
df.loc[df['id'] != df['id'].shift(1), 'target_start'] = df['start']
#elif: diff != 1 then target_start = start
df.loc[df['diff'].shift(1) != 1, 'target_start'] = df['start']
#else: target_start = target_start.shift(1)
df.loc[(df.index != 0) & (df['id'] == df['id'].shift(1)) & (df['diff'].shift(1) == 1), 'target_start']=df['target_start'].shift(1)
print(df)
结果是:
id
start
end
diff
target_start
12220
1999-11-22
2008-08-31
3515
1999-11-22
12220
2018-04-16
2019-09-15
1
2018-04-16
12220
2019-09-16
2019-11-30
1
2018-04-16
12220
2019-12-01
2020-03-31
1
NaT
12220
2020-04-01
2020-06-30
-711
NaT
11132
2018-07-20
2019-09-15
1
2018-07-20
11132
2019-09-16
2021-01-01
-44197
2018-07-20
有人知道怎么解决吗?提前致谢!
以下是我将如何实施您的 excel 公式(您突出显示的公式):
df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df.target_start = pd.to_datetime(df.target_start)
df["id_shift"] = df.id.shift()
target_start = [df.iloc[0, 1]]
for i in range(1, df.shape[0]):
print(i)
if df.iloc[i, 0] != df.iloc[i - 1, 0]:
target_start.append(df.iloc[i, 1])
else:
if df.iloc[i, 3] == 1:
target_start.append(df.iloc[i, 1])
else:
target_start.append(target_start[i - 1])
df["target_start"] = target_start
del df["id_shift"]
它生成以下结果:
id start end diff target_start
0 12220 1999-11-22 2008-08-31 3515 1999-11-22
1 12220 2018-04-16 2019-09-15 1 2018-04-16
2 12220 2019-09-16 2019-11-30 1 2019-09-16
3 12220 2019-12-01 2020-03-31 1 2019-12-01
4 12220 2020-04-01 2020-06-30 -711 2019-12-01
5 11132 2018-07-20 2019-09-15 1 2018-07-20
6 11132 2019-09-16 2021-01-01 -44197 2018-07-20
谢谢@quest!
太棒了:)
我先解决了一个问题:
else:
if df.iloc[i-1, 3] != 1:
target_start.append(df.iloc[i, 1])
所以完美的代码是:
df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df.target_start = pd.to_datetime(df.target_start)
df["id_shift"] = df.id.shift()
target_start = [df.iloc[0, 1]]
for i in range(1, df.shape[0]):
#print(i)
if df.iloc[i, 0] != df.iloc[i - 1, 0]:
target_start.append(df.iloc[i, 1])
else:
if df.iloc[i-1, 3] != 1:
target_start.append(df.iloc[i, 1])
else:
target_start.append(target_start[i - 1])
df["target_start"] = target_start
del df["id_shift"]
df.head(7)
再次感谢!你帮了大忙。
我想在 python 中创建 'target_start' 列:
id | start | end | diff | target_start |
---|---|---|---|---|
12220 | 1999-11-22 | 2008-08-31 | 3515 | 1999-11-22 |
12220 | 2018-04-16 | 2019-09-15 | 1 | 2018-04-16 |
12220 | 2019-09-16 | 2019-11-30 | 1 | 2018-04-16 |
12220 | 2019-12-01 | 2020-03-31 | 1 | 2018-04-16 |
12220 | 2020-04-01 | 2020-06-30 | -711 | 2018-04-16 |
11132 | 2018-07-20 | 2019-09-15 | 1 | 2018-07-20 |
11132 | 2019-09-16 | 2021-01-01 | -44197 | 2018-07-20 |
这在Excel中很容易解决:
但我不知道,我如何在 pyton 中执行此操作:第一个目标行是“1”,然后:
df.loc[df.index==0,'target_start']= df['start']
我试过这段代码,但没有用:
import pandas as pd
df=pd.read_excel('./Shift.xlsx')
#if id != id.shift(1) then target_start = start
df.loc[df['id'] != df['id'].shift(1), 'target_start'] = df['start']
#elif: diff != 1 then target_start = start
df.loc[df['diff'].shift(1) != 1, 'target_start'] = df['start']
#else: target_start = target_start.shift(1)
df.loc[(df.index != 0) & (df['id'] == df['id'].shift(1)) & (df['diff'].shift(1) == 1), 'target_start']=df['target_start'].shift(1)
print(df)
结果是:
id | start | end | diff | target_start |
---|---|---|---|---|
12220 | 1999-11-22 | 2008-08-31 | 3515 | 1999-11-22 |
12220 | 2018-04-16 | 2019-09-15 | 1 | 2018-04-16 |
12220 | 2019-09-16 | 2019-11-30 | 1 | 2018-04-16 |
12220 | 2019-12-01 | 2020-03-31 | 1 | NaT |
12220 | 2020-04-01 | 2020-06-30 | -711 | NaT |
11132 | 2018-07-20 | 2019-09-15 | 1 | 2018-07-20 |
11132 | 2019-09-16 | 2021-01-01 | -44197 | 2018-07-20 |
有人知道怎么解决吗?提前致谢!
以下是我将如何实施您的 excel 公式(您突出显示的公式):
df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df.target_start = pd.to_datetime(df.target_start)
df["id_shift"] = df.id.shift()
target_start = [df.iloc[0, 1]]
for i in range(1, df.shape[0]):
print(i)
if df.iloc[i, 0] != df.iloc[i - 1, 0]:
target_start.append(df.iloc[i, 1])
else:
if df.iloc[i, 3] == 1:
target_start.append(df.iloc[i, 1])
else:
target_start.append(target_start[i - 1])
df["target_start"] = target_start
del df["id_shift"]
它生成以下结果:
id start end diff target_start
0 12220 1999-11-22 2008-08-31 3515 1999-11-22
1 12220 2018-04-16 2019-09-15 1 2018-04-16
2 12220 2019-09-16 2019-11-30 1 2019-09-16
3 12220 2019-12-01 2020-03-31 1 2019-12-01
4 12220 2020-04-01 2020-06-30 -711 2019-12-01
5 11132 2018-07-20 2019-09-15 1 2018-07-20
6 11132 2019-09-16 2021-01-01 -44197 2018-07-20
谢谢@quest! 太棒了:)
我先解决了一个问题:
else:
if df.iloc[i-1, 3] != 1:
target_start.append(df.iloc[i, 1])
所以完美的代码是:
df.start = pd.to_datetime(df.start)
df.end = pd.to_datetime(df.end)
df.target_start = pd.to_datetime(df.target_start)
df["id_shift"] = df.id.shift()
target_start = [df.iloc[0, 1]]
for i in range(1, df.shape[0]):
#print(i)
if df.iloc[i, 0] != df.iloc[i - 1, 0]:
target_start.append(df.iloc[i, 1])
else:
if df.iloc[i-1, 3] != 1:
target_start.append(df.iloc[i, 1])
else:
target_start.append(target_start[i - 1])
df["target_start"] = target_start
del df["id_shift"]
df.head(7)
再次感谢!你帮了大忙。