格式化为整数和浮点数的时间

formatting time written as integers and floats

我正在研究一些飞行数据。它应该是一种解释性分析,其中应该使用一些统计方法,如分箱。我一直在尝试格式化出发和到达时间。 到目前为止,这是我的代码:

  #Calling Libraries
  import os               # File management
  import pandas as pd     # Data frame manipulation
  import numpy as np      # Data frame operations
  import datetime as dt   # Date operations
  import seaborn as sns   # Data Viz  

  #Reading the file:
  flight_df=pd.read_csv(r'C:\Users\pc\Desktop\Work\flights.csv')

  #Checking the DataFrame:
  flight_df.head()

  flight_df.info()
  <class 'pandas.core.frame.DataFrame'>
   RangeIndex: 2500 entries, 0 to 2499
   Data columns (total 38 columns):
   #   Column               Non-Null Count  Dtype  
   ---  ------               --------------  -----  
   0   O_AIRPORT_IATA_CODE  2500 non-null   object 
   1   O_AIRPORT            2288 non-null   object 
   2   O_CITY               2288 non-null   object 
   3   O_STATE              2288 non-null   object 
   4   O_COUNTRY            2288 non-null   object 
   5   O_LATITUDE           2287 non-null   float64
   6   O_LONGITUDE          2287 non-null   float64
   7   D_AIRPORT_IATA_CODE  2500 non-null   object 
   8   D_AIRPORT            2288 non-null   object 
   9   D_CITY               2288 non-null   object 
   10  D_STATE              2288 non-null   object 
   11  D_COUNTRY            2288 non-null   object 
   12  D_LATITUDE           2288 non-null   float64
   13  D_LONGITUDE          2288 non-null   float64
   14  SCHEDULED_DEPARTURE  2500 non-null   int64  
   15  DEPARTURE_TIME       2467 non-null   float64
   16  DEPARTURE_DELAY      2467 non-null   float64
   17  TAXI_OUT             2467 non-null   float64
   18  WHEELS_OFF           2467 non-null   float64
   19  SCHEDULED_TIME       2500 non-null   int64  
   20  ELAPSED_TIME         2464 non-null   float64
   21  AIR_TIME             2464 non-null   float64
   22  DISTANCE             2500 non-null   int64  
   23  WHEELS_ON            2467 non-null   float64
   24  TAXI_IN              2467 non-null   float64
   25  SCHEDULED_ARRIVAL    2500 non-null   int64  
   26  ARRIVAL_TIME         2467 non-null   float64
   27  ARRIVAL_DELAY        2464 non-null   float64
   28  DIVERTED             2500 non-null   int64  
   29  CANCELLED            2500 non-null   int64  
   30  CANCELLATION_REASON  33 non-null     object 
   31  AIR_SYSTEM_DELAY     386 non-null    float64
   32  SECURITY_DELAY       386 non-null    float64
   33  AIRLINE_DELAY        386 non-null    float64
   34  LATE_AIRCRAFT_DELAY  386 non-null    float64
   35  WEATHER_DELAY        386 non-null    float64
   36  DATE                 2500 non-null   object 
   37  AIRLINE_NAME         2500 non-null   object 
   dtypes: float64(19), int64(6), object(13)
   memory usage: 742.3+ KB

# dropping redundant columns
newdf= flight_df.drop(['O_COUNTRY','O_LATITUDE','O_LONGITUDE','D_COUNTRY','D_LATITUDE','D_LONGITUDE','SCHEDULED_DEPARTURE','DIVERTED','CANCELLED','CANCELLATION_REASON','TAXI_OUT','TAXI_IN','WHEELS_OFF', 'WHEELS_ON','SCHEDULED_ARRIVAL'],axis=1, inplace = True) 

我需要更改出发和到达时间格式,而不是像这样显示:

12    1746.0
14    1849.0
19    1514.0
20    1555.0
22    2017.0
Name: DEPARTURE_TIME, dtype: float64

它们看起来像这样:

   12    17:46
   14    18:49
   19    15:14
   20    15:55
   22    20:17

我需要这个才能进行进一步的装箱和分析

谢谢!

您可以通过使用pd.to_datetime to parse to datetime data type, then format to string:

获得想要的格式
import pandas as pd

df = pd.DataFrame({'DEPARTURE_TIME': [1746.0, 1849.0, 1514.0, 1555.0, 2017.0]})

df['DEPARTURE_TIME'] = pd.to_datetime(df['DEPARTURE_TIME'], format="%H%M").dt.strftime("%H:%M")

df['DEPARTURE_TIME']
0    17:46
1    18:49
2    15:14
3    15:55
4    20:17
Name: DEPARTURE_TIME, dtype: object