格式化为整数和浮点数的时间
formatting time written as integers and floats
我正在研究一些飞行数据。它应该是一种解释性分析,其中应该使用一些统计方法,如分箱。我一直在尝试格式化出发和到达时间。
到目前为止,这是我的代码:
#Calling Libraries
import os # File management
import pandas as pd # Data frame manipulation
import numpy as np # Data frame operations
import datetime as dt # Date operations
import seaborn as sns # Data Viz
#Reading the file:
flight_df=pd.read_csv(r'C:\Users\pc\Desktop\Work\flights.csv')
#Checking the DataFrame:
flight_df.head()
flight_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 38 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 O_AIRPORT_IATA_CODE 2500 non-null object
1 O_AIRPORT 2288 non-null object
2 O_CITY 2288 non-null object
3 O_STATE 2288 non-null object
4 O_COUNTRY 2288 non-null object
5 O_LATITUDE 2287 non-null float64
6 O_LONGITUDE 2287 non-null float64
7 D_AIRPORT_IATA_CODE 2500 non-null object
8 D_AIRPORT 2288 non-null object
9 D_CITY 2288 non-null object
10 D_STATE 2288 non-null object
11 D_COUNTRY 2288 non-null object
12 D_LATITUDE 2288 non-null float64
13 D_LONGITUDE 2288 non-null float64
14 SCHEDULED_DEPARTURE 2500 non-null int64
15 DEPARTURE_TIME 2467 non-null float64
16 DEPARTURE_DELAY 2467 non-null float64
17 TAXI_OUT 2467 non-null float64
18 WHEELS_OFF 2467 non-null float64
19 SCHEDULED_TIME 2500 non-null int64
20 ELAPSED_TIME 2464 non-null float64
21 AIR_TIME 2464 non-null float64
22 DISTANCE 2500 non-null int64
23 WHEELS_ON 2467 non-null float64
24 TAXI_IN 2467 non-null float64
25 SCHEDULED_ARRIVAL 2500 non-null int64
26 ARRIVAL_TIME 2467 non-null float64
27 ARRIVAL_DELAY 2464 non-null float64
28 DIVERTED 2500 non-null int64
29 CANCELLED 2500 non-null int64
30 CANCELLATION_REASON 33 non-null object
31 AIR_SYSTEM_DELAY 386 non-null float64
32 SECURITY_DELAY 386 non-null float64
33 AIRLINE_DELAY 386 non-null float64
34 LATE_AIRCRAFT_DELAY 386 non-null float64
35 WEATHER_DELAY 386 non-null float64
36 DATE 2500 non-null object
37 AIRLINE_NAME 2500 non-null object
dtypes: float64(19), int64(6), object(13)
memory usage: 742.3+ KB
# dropping redundant columns
newdf= flight_df.drop(['O_COUNTRY','O_LATITUDE','O_LONGITUDE','D_COUNTRY','D_LATITUDE','D_LONGITUDE','SCHEDULED_DEPARTURE','DIVERTED','CANCELLED','CANCELLATION_REASON','TAXI_OUT','TAXI_IN','WHEELS_OFF', 'WHEELS_ON','SCHEDULED_ARRIVAL'],axis=1, inplace = True)
我需要更改出发和到达时间格式,而不是像这样显示:
12 1746.0
14 1849.0
19 1514.0
20 1555.0
22 2017.0
Name: DEPARTURE_TIME, dtype: float64
它们看起来像这样:
12 17:46
14 18:49
19 15:14
20 15:55
22 20:17
我需要这个才能进行进一步的装箱和分析
谢谢!
您可以通过使用pd.to_datetime to parse to datetime data type, then format to string:
获得想要的格式
import pandas as pd
df = pd.DataFrame({'DEPARTURE_TIME': [1746.0, 1849.0, 1514.0, 1555.0, 2017.0]})
df['DEPARTURE_TIME'] = pd.to_datetime(df['DEPARTURE_TIME'], format="%H%M").dt.strftime("%H:%M")
df['DEPARTURE_TIME']
0 17:46
1 18:49
2 15:14
3 15:55
4 20:17
Name: DEPARTURE_TIME, dtype: object
我正在研究一些飞行数据。它应该是一种解释性分析,其中应该使用一些统计方法,如分箱。我一直在尝试格式化出发和到达时间。 到目前为止,这是我的代码:
#Calling Libraries
import os # File management
import pandas as pd # Data frame manipulation
import numpy as np # Data frame operations
import datetime as dt # Date operations
import seaborn as sns # Data Viz
#Reading the file:
flight_df=pd.read_csv(r'C:\Users\pc\Desktop\Work\flights.csv')
#Checking the DataFrame:
flight_df.head()
flight_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 38 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 O_AIRPORT_IATA_CODE 2500 non-null object
1 O_AIRPORT 2288 non-null object
2 O_CITY 2288 non-null object
3 O_STATE 2288 non-null object
4 O_COUNTRY 2288 non-null object
5 O_LATITUDE 2287 non-null float64
6 O_LONGITUDE 2287 non-null float64
7 D_AIRPORT_IATA_CODE 2500 non-null object
8 D_AIRPORT 2288 non-null object
9 D_CITY 2288 non-null object
10 D_STATE 2288 non-null object
11 D_COUNTRY 2288 non-null object
12 D_LATITUDE 2288 non-null float64
13 D_LONGITUDE 2288 non-null float64
14 SCHEDULED_DEPARTURE 2500 non-null int64
15 DEPARTURE_TIME 2467 non-null float64
16 DEPARTURE_DELAY 2467 non-null float64
17 TAXI_OUT 2467 non-null float64
18 WHEELS_OFF 2467 non-null float64
19 SCHEDULED_TIME 2500 non-null int64
20 ELAPSED_TIME 2464 non-null float64
21 AIR_TIME 2464 non-null float64
22 DISTANCE 2500 non-null int64
23 WHEELS_ON 2467 non-null float64
24 TAXI_IN 2467 non-null float64
25 SCHEDULED_ARRIVAL 2500 non-null int64
26 ARRIVAL_TIME 2467 non-null float64
27 ARRIVAL_DELAY 2464 non-null float64
28 DIVERTED 2500 non-null int64
29 CANCELLED 2500 non-null int64
30 CANCELLATION_REASON 33 non-null object
31 AIR_SYSTEM_DELAY 386 non-null float64
32 SECURITY_DELAY 386 non-null float64
33 AIRLINE_DELAY 386 non-null float64
34 LATE_AIRCRAFT_DELAY 386 non-null float64
35 WEATHER_DELAY 386 non-null float64
36 DATE 2500 non-null object
37 AIRLINE_NAME 2500 non-null object
dtypes: float64(19), int64(6), object(13)
memory usage: 742.3+ KB
# dropping redundant columns
newdf= flight_df.drop(['O_COUNTRY','O_LATITUDE','O_LONGITUDE','D_COUNTRY','D_LATITUDE','D_LONGITUDE','SCHEDULED_DEPARTURE','DIVERTED','CANCELLED','CANCELLATION_REASON','TAXI_OUT','TAXI_IN','WHEELS_OFF', 'WHEELS_ON','SCHEDULED_ARRIVAL'],axis=1, inplace = True)
我需要更改出发和到达时间格式,而不是像这样显示:
12 1746.0
14 1849.0
19 1514.0
20 1555.0
22 2017.0
Name: DEPARTURE_TIME, dtype: float64
它们看起来像这样:
12 17:46
14 18:49
19 15:14
20 15:55
22 20:17
我需要这个才能进行进一步的装箱和分析
谢谢!
您可以通过使用pd.to_datetime to parse to datetime data type, then format to string:
获得想要的格式import pandas as pd
df = pd.DataFrame({'DEPARTURE_TIME': [1746.0, 1849.0, 1514.0, 1555.0, 2017.0]})
df['DEPARTURE_TIME'] = pd.to_datetime(df['DEPARTURE_TIME'], format="%H%M").dt.strftime("%H:%M")
df['DEPARTURE_TIME']
0 17:46
1 18:49
2 15:14
3 15:55
4 20:17
Name: DEPARTURE_TIME, dtype: object