Pandas:填充 Pandas 数据框中缺失的日期

Pandas: Fill missing dates in Pandas dataframe

如何填写“日期”列,以便在检测到日期时将该日期添加到下面的行,直到看到新日期开始添加该日期?

可重现的例子:

输入:


                Date                                           Headline
0   Mar-20-21 04:03AM  Apple CEO Cook, executives on tentative list o...
1             03:43AM  Apple CEO Cook, execs on tentative list of wit...
2   Mar-19-21 10:19PM  Dow Jones Futures: Why This Market Rally Is So...
3             06:13PM  Zuckerberg: Apples Privacy Move Could Spur Mor...
4             05:45PM  Apple (AAPL) Dips More Than Broader Markets: W...
5             04:17PM  Facebook Stock Jumps As Zuckerberg Changes Tun...
6             04:03PM  Best Dow Jones Stocks To Buy And Watch In Marc...
7             01:02PM  The Nasdaq's on the Rise Friday, and These 2 S...

期望的输出:


                 Date                                           Headline
0   Mar-20-21 04:03AM  Apple CEO Cook, executives on tentative list o...
1   Mar-20-21 03:43AM  Apple CEO Cook, execs on tentative list of wit...
2   Mar-19-21 10:19PM  Dow Jones Futures: Why This Market Rally Is So...
3   Mar-19-21 06:13PM  Zuckerberg: Apples Privacy Move Could Spur Mor...
4   Mar-19-21 05:45PM  Apple (AAPL) Dips More Than Broader Markets: W...
5   Mar-19-21 04:17PM  Facebook Stock Jumps As Zuckerberg Changes Tun...
6   Mar-19-21 04:03PM  Best Dow Jones Stocks To Buy And Watch In Marc...
7   Mar-19-21 01:02PM  The Nasdaq's on the Rise Friday, and These 2 S...

尝试:

df['Time'] = [x[-7:] for x in df['Date']]
df['Date'] = [x[:-7] for x in df['Date']]
# Some code that fills the date
# Then convert to datetime

在使用ffill()之前,您需要拆分两列以获得正确的时间,并且只填写日期部分。您需要将空格替换为 np.nan 才能使用 ffill()。然后将列放回原处并将该操作包装在 pd.to_datetime 中以获得正确的 dtype.

最后您可以删除时间列。

# Imports
import numpy as np
import pandas as pd

# Split the column
df[['Date','Time']] = df['Date'].str.split(' ',expand=True)

# Replace space with nan and use ffill()
df['Date'] = df['Date'].replace(r'^\s*$', np.nan, regex=True).ffill()

# Put the columns back and convert to datetime
df['Date'] =  pd.to_datetime(df['Date'] + ' ' + df['Time'])

# Drop the time column
del(df['Time'])

会让你回来:

df
                 Date                                           Headline
0 2021-03-20 04:03:00  Apple CEO Cook, executives on tentative list o...
1 2021-03-20 03:43:00  Apple CEO Cook, execs on tentative list of wit...
2 2021-03-19 22:19:00  Dow Jones Futures: Why This Market Rally Is So...
3 2021-03-19 18:13:00  Zuckerberg: Apples Privacy Move Could Spur Mor...
4 2021-03-19 17:45:00  Apple (AAPL) Dips More Than Broader Markets: W...
5 2021-03-19 16:17:00  Facebook Stock Jumps As Zuckerberg Changes Tun...
6 2021-03-19 16:03:00  Best Dow Jones Stocks To Buy And Watch In Marc...
7 2021-03-19 13:02:00  The Nasdaq's on the Rise Friday, and These 2 S...

编辑 如果您希望 'Date' 完全按照您想要的结果显示,即这种格式 'Mar-20-21',请不要将其包装在 pd.to_datetime() 中并将其保留为 object:

df['Date'] =  df['Date'] + ' ' + df['Time']

df
                Date                                           Headline
0  Mar-20-21 04:03AM  Apple CEO Cook, executives on tentative list o...
1  Mar-20-21 03:43AM  Apple CEO Cook, execs on tentative list of wit...
2  Mar-19-21 10:19PM  Dow Jones Futures: Why This Market Rally Is So...
3  Mar-19-21 06:13PM  Zuckerberg: Apples Privacy Move Could Spur Mor...
4  Mar-19-21 05:45PM  Apple (AAPL) Dips More Than Broader Markets: W...
5  Mar-19-21 04:17PM  Facebook Stock Jumps As Zuckerberg Changes Tun...
6  Mar-19-21 04:03PM  Best Dow Jones Stocks To Buy And Watch In Marc...
7  Mar-19-21 01:02PM  The Nasdaq's on the Rise Friday, and These 2 S...