如何为两列之间的所有日期添加行?
How can I add rows for all dates between two columns?
import pandas as pd
mydata = [{'ID' : '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016'},
{'ID' : '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016'}]
mydata2 = [{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '10/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '11/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '12/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '13/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '14/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '15/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '10/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '11/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '12/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '13/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '14/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '15/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '16/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '17/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '18/10/2016'},]
df = pd.DataFrame(mydata)
df2 = pd.DataFrame(mydata2)
我找不到关于如何将 'df' 更改为 'df2' 的答案。可能是我表达的不对。
我想获取两列 'Entry Date'、'Exit Date' 中日期之间的所有日期,并为每个列创建一行,在新列中为每一行输入相应的日期,'Date'.
如有任何帮助,我们将不胜感激。
您可以使用 melt
for reshaping, set_index
并删除列 variable
:
#convert columns to datetime
df['Entry Date'] = pd.to_datetime(df['Entry Date'])
df['Exit Date'] = pd.to_datetime(df['Exit Date'])
df2 = pd.melt(df, id_vars='ID', value_name='Date')
df2.Date = pd.to_datetime(df2.Date)
df2.set_index('Date', inplace=True)
df2.drop('variable', axis=1, inplace=True)
print (df2)
ID
Date
2016-10-10 10
2016-10-10 20
2016-10-15 10
2016-10-18 20
然后groupby
with resample
and ffill
缺失值:
df3 = df2.groupby('ID').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
print (df3)
Date ID
0 2016-10-10 10
1 2016-10-11 10
2 2016-10-12 10
3 2016-10-13 10
4 2016-10-14 10
5 2016-10-15 10
6 2016-10-10 20
7 2016-10-11 20
8 2016-10-12 20
9 2016-10-13 20
10 2016-10-14 20
11 2016-10-15 20
12 2016-10-16 20
13 2016-10-17 20
14 2016-10-18 20
最后一个 merge
原 DataFrame
:
print (pd.merge(df, df3))
Entry Date Exit Date ID Date
0 2016-10-10 2016-10-15 10 2016-10-10
1 2016-10-10 2016-10-15 10 2016-10-11
2 2016-10-10 2016-10-15 10 2016-10-12
3 2016-10-10 2016-10-15 10 2016-10-13
4 2016-10-10 2016-10-15 10 2016-10-14
5 2016-10-10 2016-10-15 10 2016-10-15
6 2016-10-10 2016-10-18 20 2016-10-10
7 2016-10-10 2016-10-18 20 2016-10-11
8 2016-10-10 2016-10-18 20 2016-10-12
9 2016-10-10 2016-10-18 20 2016-10-13
10 2016-10-10 2016-10-18 20 2016-10-14
11 2016-10-10 2016-10-18 20 2016-10-15
12 2016-10-10 2016-10-18 20 2016-10-16
13 2016-10-10 2016-10-18 20 2016-10-17
14 2016-10-10 2016-10-18 20 2016-10-18
使用较新版本的 Pandas (> 1.1),您可以使用 explode
函数生成日期:
df['Entry Date'] = pd.to_datetime(df['Entry Date'])
df['Exit Date'] = pd.to_datetime(df['Exit Date'])
(df.assign(Date = [pd.date_range(start, end)
for start, end
in zip(df['Entry Date'], df['Exit Date'])]
)
.explode('Date', ignore_index = True)
)
ID Entry Date Exit Date Date
0 10 2016-10-10 2016-10-15 2016-10-10
1 10 2016-10-10 2016-10-15 2016-10-11
2 10 2016-10-10 2016-10-15 2016-10-12
3 10 2016-10-10 2016-10-15 2016-10-13
4 10 2016-10-10 2016-10-15 2016-10-14
5 10 2016-10-10 2016-10-15 2016-10-15
6 20 2016-10-10 2016-10-18 2016-10-10
7 20 2016-10-10 2016-10-18 2016-10-11
8 20 2016-10-10 2016-10-18 2016-10-12
9 20 2016-10-10 2016-10-18 2016-10-13
10 20 2016-10-10 2016-10-18 2016-10-14
11 20 2016-10-10 2016-10-18 2016-10-15
12 20 2016-10-10 2016-10-18 2016-10-16
13 20 2016-10-10 2016-10-18 2016-10-17
14 20 2016-10-10 2016-10-18 2016-10-18
import pandas as pd
mydata = [{'ID' : '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016'},
{'ID' : '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016'}]
mydata2 = [{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '10/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '11/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '12/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '13/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '14/10/2016'},
{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '15/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '10/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '11/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '12/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '13/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '14/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '15/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '16/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '17/10/2016'},
{'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '18/10/2016'},]
df = pd.DataFrame(mydata)
df2 = pd.DataFrame(mydata2)
我找不到关于如何将 'df' 更改为 'df2' 的答案。可能是我表达的不对。
我想获取两列 'Entry Date'、'Exit Date' 中日期之间的所有日期,并为每个列创建一行,在新列中为每一行输入相应的日期,'Date'.
如有任何帮助,我们将不胜感激。
您可以使用 melt
for reshaping, set_index
并删除列 variable
:
#convert columns to datetime
df['Entry Date'] = pd.to_datetime(df['Entry Date'])
df['Exit Date'] = pd.to_datetime(df['Exit Date'])
df2 = pd.melt(df, id_vars='ID', value_name='Date')
df2.Date = pd.to_datetime(df2.Date)
df2.set_index('Date', inplace=True)
df2.drop('variable', axis=1, inplace=True)
print (df2)
ID
Date
2016-10-10 10
2016-10-10 20
2016-10-15 10
2016-10-18 20
然后groupby
with resample
and ffill
缺失值:
df3 = df2.groupby('ID').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
print (df3)
Date ID
0 2016-10-10 10
1 2016-10-11 10
2 2016-10-12 10
3 2016-10-13 10
4 2016-10-14 10
5 2016-10-15 10
6 2016-10-10 20
7 2016-10-11 20
8 2016-10-12 20
9 2016-10-13 20
10 2016-10-14 20
11 2016-10-15 20
12 2016-10-16 20
13 2016-10-17 20
14 2016-10-18 20
最后一个 merge
原 DataFrame
:
print (pd.merge(df, df3))
Entry Date Exit Date ID Date
0 2016-10-10 2016-10-15 10 2016-10-10
1 2016-10-10 2016-10-15 10 2016-10-11
2 2016-10-10 2016-10-15 10 2016-10-12
3 2016-10-10 2016-10-15 10 2016-10-13
4 2016-10-10 2016-10-15 10 2016-10-14
5 2016-10-10 2016-10-15 10 2016-10-15
6 2016-10-10 2016-10-18 20 2016-10-10
7 2016-10-10 2016-10-18 20 2016-10-11
8 2016-10-10 2016-10-18 20 2016-10-12
9 2016-10-10 2016-10-18 20 2016-10-13
10 2016-10-10 2016-10-18 20 2016-10-14
11 2016-10-10 2016-10-18 20 2016-10-15
12 2016-10-10 2016-10-18 20 2016-10-16
13 2016-10-10 2016-10-18 20 2016-10-17
14 2016-10-10 2016-10-18 20 2016-10-18
使用较新版本的 Pandas (> 1.1),您可以使用 explode
函数生成日期:
df['Entry Date'] = pd.to_datetime(df['Entry Date'])
df['Exit Date'] = pd.to_datetime(df['Exit Date'])
(df.assign(Date = [pd.date_range(start, end)
for start, end
in zip(df['Entry Date'], df['Exit Date'])]
)
.explode('Date', ignore_index = True)
)
ID Entry Date Exit Date Date
0 10 2016-10-10 2016-10-15 2016-10-10
1 10 2016-10-10 2016-10-15 2016-10-11
2 10 2016-10-10 2016-10-15 2016-10-12
3 10 2016-10-10 2016-10-15 2016-10-13
4 10 2016-10-10 2016-10-15 2016-10-14
5 10 2016-10-10 2016-10-15 2016-10-15
6 20 2016-10-10 2016-10-18 2016-10-10
7 20 2016-10-10 2016-10-18 2016-10-11
8 20 2016-10-10 2016-10-18 2016-10-12
9 20 2016-10-10 2016-10-18 2016-10-13
10 20 2016-10-10 2016-10-18 2016-10-14
11 20 2016-10-10 2016-10-18 2016-10-15
12 20 2016-10-10 2016-10-18 2016-10-16
13 20 2016-10-10 2016-10-18 2016-10-17
14 20 2016-10-10 2016-10-18 2016-10-18