创建滞后值在 Pandas 中的 t-1 列
Creating t-1 column with lag values in Pandas
我有一个数据框
{'Year': {0: 2017, 1: 2018, 2: 2019, 3: 2017, 4: 2018, 5: 2019, 6: 2017, 7: 2018, 8: 2019}, 'Total_articles_per_year': {0: 2, 1: 1, 2: 0, 3: 4, 4: 7, 5: 3, 6: 2, 7: 3, 8: 1}, 'Total_articles_2017': {0: 2, 1: 2, 2: 2, 3: 4, 4: 4, 5: 4, 6: 2, 7: 2, 8: 2}, 'Total_articles_2018': {0: 1, 1: 1, 2: 1, 3: 7, 4: 7, 5: 7, 6: 3, 7: 3, 8: 3}, 'Total_articles_2019 ': {0: 0, 1: 0, 2: 0, 3: 3, 4: 3, 5: 3, 6: 1, 7: 1, 8: 1}}
我想创建一个新列t-1
。
此列的条件为:
- 如果df[
Year'] == 2017
,df[t-1] = NaN
- if df[
Year'] == 2018
, for value if df[t-1]
从 Total_articles_per_year2017
取值
- 如果 df[
Year'] == 2019
,对于值 if df[t-1]
从 Total_articles_per_year2018
取值
预期输出:
{'Year': {0: 2017, 1: 2018, 2: 2019, 3: 2017, 4: 2018, 5: 2019, 6: 2017, 7: 2018, 8: 2019}, 'Total_articles_per_year': {0: 2, 1: 1, 2: 0, 3: 4, 4: 7, 5: 3, 6: 2, 7: 3, 8: 1}, 'Total_articles_2017': {0: 2, 1: 2, 2: 2, 3: 4, 4: 4, 5: 4, 6: 2, 7: 2, 8: 2}, 'Total_articles_2018': {0: 1, 1: 1, 2: 1, 3: 7, 4: 7, 5: 7, 6: 3, 7: 3, 8: 3}, 'Total_articles_2019 ': {0: 0, 1: 0, 2: 0, 3: 3, 4: 3, 5: 3, 6: 1, 7: 1, 8: 1}, 't-1': {0: nan, 1: 2.0, 2: 1.0, 3: nan, 4: 4.0, 5: 7.0, 6: nan, 7: 2.0, 8: 3.0}}
您可以通过 DataFrame.melt
重塑值,因此可以比较 Year
减去前一年的 1
与添加 Total_articles_
:
melt = df.melt(['Year'], ignore_index=False)
mask = 'Total_articles_' + melt['Year'].sub(1).astype(str) == melt['variable']
df['t-1'] = melt.loc[mask, 'value']
print (df)
Year Total_articles_per_year Total_articles_2017 Total_articles_2018 \
0 2017 2 2 1
1 2018 1 2 1
2 2019 0 2 1
3 2017 4 4 7
4 2018 7 4 7
5 2019 3 4 7
6 2017 2 2 3
7 2018 3 2 3
8 2019 1 2 3
Total_articles_2019 t-1
0 0 NaN
1 0 2.0
2 0 1.0
3 3 NaN
4 3 4.0
5 3 7.0
6 1 NaN
7 1 2.0
8 1 3.0
我有一个数据框
{'Year': {0: 2017, 1: 2018, 2: 2019, 3: 2017, 4: 2018, 5: 2019, 6: 2017, 7: 2018, 8: 2019}, 'Total_articles_per_year': {0: 2, 1: 1, 2: 0, 3: 4, 4: 7, 5: 3, 6: 2, 7: 3, 8: 1}, 'Total_articles_2017': {0: 2, 1: 2, 2: 2, 3: 4, 4: 4, 5: 4, 6: 2, 7: 2, 8: 2}, 'Total_articles_2018': {0: 1, 1: 1, 2: 1, 3: 7, 4: 7, 5: 7, 6: 3, 7: 3, 8: 3}, 'Total_articles_2019 ': {0: 0, 1: 0, 2: 0, 3: 3, 4: 3, 5: 3, 6: 1, 7: 1, 8: 1}}
我想创建一个新列t-1
。
此列的条件为:
- 如果df[
Year'] == 2017
,df[t-1] = NaN
- if df[
Year'] == 2018
, for value ifdf[t-1]
从Total_articles_per_year2017
取值
- 如果 df[
Year'] == 2019
,对于值if df[t-1]
从Total_articles_per_year2018
取值
预期输出:
{'Year': {0: 2017, 1: 2018, 2: 2019, 3: 2017, 4: 2018, 5: 2019, 6: 2017, 7: 2018, 8: 2019}, 'Total_articles_per_year': {0: 2, 1: 1, 2: 0, 3: 4, 4: 7, 5: 3, 6: 2, 7: 3, 8: 1}, 'Total_articles_2017': {0: 2, 1: 2, 2: 2, 3: 4, 4: 4, 5: 4, 6: 2, 7: 2, 8: 2}, 'Total_articles_2018': {0: 1, 1: 1, 2: 1, 3: 7, 4: 7, 5: 7, 6: 3, 7: 3, 8: 3}, 'Total_articles_2019 ': {0: 0, 1: 0, 2: 0, 3: 3, 4: 3, 5: 3, 6: 1, 7: 1, 8: 1}, 't-1': {0: nan, 1: 2.0, 2: 1.0, 3: nan, 4: 4.0, 5: 7.0, 6: nan, 7: 2.0, 8: 3.0}}
您可以通过 DataFrame.melt
重塑值,因此可以比较 Year
减去前一年的 1
与添加 Total_articles_
:
melt = df.melt(['Year'], ignore_index=False)
mask = 'Total_articles_' + melt['Year'].sub(1).astype(str) == melt['variable']
df['t-1'] = melt.loc[mask, 'value']
print (df)
Year Total_articles_per_year Total_articles_2017 Total_articles_2018 \
0 2017 2 2 1
1 2018 1 2 1
2 2019 0 2 1
3 2017 4 4 7
4 2018 7 4 7
5 2019 3 4 7
6 2017 2 2 3
7 2018 3 2 3
8 2019 1 2 3
Total_articles_2019 t-1
0 0 NaN
1 0 2.0
2 0 1.0
3 3 NaN
4 3 4.0
5 3 7.0
6 1 NaN
7 1 2.0
8 1 3.0