int - 将日期时间转换为 Unix 时间纪元时出现字符串类型错误
int - String Type error while converting datetime to Unix Time epoch
我正在尝试将 Datetime 转换为 Unix 时间纪元,但出现以下错误。
输入:
userid,datetime,latitude,longitude
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313
节目:
import pandas as pd
import numpy as np
import io
df = pd.read_csv('input.csv',
#header=None, #no header in csv
header=['userid','datetime','latitude','longitude'], #set custom column names
parse_dates=['datetime']) #parse columns d, e to datetime
df['datetime'] = df['datetime'].astype(np.int64) // 10**9
#df['e'] = df['e'].astype(np.int64) // 10**9
df.to_csv('output.csv', header=True, index=False)
上面的程序在 python 2.7 中运行良好,但我没有升级到 python 3.x Anaconda 我无法获得结果
错误:
File "pandas\parser.pyx", line 519, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5907)
TypeError: Can't convert 'int' object to str implicitly
编辑: 输入文件 here
pd.read_csv
中的 header
参数需要一个整数或整数列表,而不是字符串列表。
from io import StringIO
file="""
userid,datetime,latitude,longitude
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313"""
让我们试试这个 read_csv 语句:
df = pd.read_csv(StringIO(file),parse_dates=['datetime'])
df['datetime'] = df['datetime'].astype(np.int64) // 10**9
print(df.head())
输出:
userid datetime latitude longitude
0 156 1391209200 41.883672 12.487778
1 187 1391209201 41.928543 12.469037
2 297 1391209201 41.891069 12.492705
3 89 1391209201 41.793177 12.432122
4 79 1391209201 41.900275 12.462746
如果 csv 没有 header 是必需的参数 names
和 parse_dates
与 [1]
- 尝试将第二列解析为 datetime
:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
temp=u"""156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,1014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
parse_dates=[1],
names=['userid','datetime','latitude','longitude'])
#print (df)
#check dtypes if datetime it is OK
print (df['datetime'].dtypes)
datetime64[ns]
df['datetime'] = df['datetime'].astype(np.int64) // 10**9
print (df)
userid datetime latitude longitude
0 156 1391209200 41.883672 12.487778
1 187 1391209201 41.928543 12.469037
2 297 1391209201 41.891069 12.492705
3 89 1391209201 41.793177 12.432122
4 79 1391209201 41.900275 12.462746
5 191 1391209202 41.852305 12.577407
6 343 1391209202 41.892172 12.469700
7 341 1391209202 41.910213 12.477000
8 260 1391209203 41.865821 12.465522
另一个可能的问题是数据错误,在我的示例第二行中:
import pandas as pd
from pandas.compat import StringIO
temp=u"""156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,1014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
parse_dates=[1],
names=['userid','datetime','latitude','longitude'])
#print (df)
#check dtypes - parse failed, get object dtype
print (df['datetime'].dtypes)
object
使用 to_datetime
and parameter errors='coerce'
- it replace bad data to NaT
and then replace NaT to some value e.g. 0
(1970-01-01 00:00:00.000000
) with fillna
解析到日期时间:
df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce').fillna(0)
print (df)
userid datetime latitude longitude
0 156 2014-01-31 23:00:00.739166 41.883672 12.487778
1 187 1970-01-01 00:00:00.000000 41.928543 12.469037
2 297 2014-01-31 23:00:01.220066 41.891069 12.492705
3 89 2014-01-31 23:00:01.470854 41.793177 12.432122
4 79 2014-01-31 23:00:01.631136 41.900275 12.462746
5 191 2014-01-31 23:00:02.048546 41.852305 12.577407
6 343 2014-01-31 23:00:02.647839 41.892172 12.469700
7 341 2014-01-31 23:00:02.709888 41.910213 12.477000
8 260 2014-01-31 23:00:03.458195 41.865821 12.465522
df['datetime'] = df['datetime'].astype(np.int64) // 10**9
print (df)
userid datetime latitude longitude
0 156 1391209200 41.883672 12.487778
1 187 0 41.928543 12.469037
2 297 1391209201 41.891069 12.492705
3 89 1391209201 41.793177 12.432122
4 79 1391209201 41.900275 12.462746
5 191 1391209202 41.852305 12.577407
6 343 1391209202 41.892172 12.469700
7 341 1391209202 41.910213 12.477000
8 260 1391209203 41.865821 12.465522
编辑:
如果还有header需要替换列名需要header=0
添加到read_csv
.
我正在尝试将 Datetime 转换为 Unix 时间纪元,但出现以下错误。
输入:
userid,datetime,latitude,longitude
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313
节目:
import pandas as pd
import numpy as np
import io
df = pd.read_csv('input.csv',
#header=None, #no header in csv
header=['userid','datetime','latitude','longitude'], #set custom column names
parse_dates=['datetime']) #parse columns d, e to datetime
df['datetime'] = df['datetime'].astype(np.int64) // 10**9
#df['e'] = df['e'].astype(np.int64) // 10**9
df.to_csv('output.csv', header=True, index=False)
上面的程序在 python 2.7 中运行良好,但我没有升级到 python 3.x Anaconda 我无法获得结果
错误:
File "pandas\parser.pyx", line 519, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:5907)
TypeError: Can't convert 'int' object to str implicitly
编辑: 输入文件 here
pd.read_csv
中的 header
参数需要一个整数或整数列表,而不是字符串列表。
from io import StringIO
file="""
userid,datetime,latitude,longitude
156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,2014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313"""
让我们试试这个 read_csv 语句:
df = pd.read_csv(StringIO(file),parse_dates=['datetime'])
df['datetime'] = df['datetime'].astype(np.int64) // 10**9
print(df.head())
输出:
userid datetime latitude longitude
0 156 1391209200 41.883672 12.487778
1 187 1391209201 41.928543 12.469037
2 297 1391209201 41.891069 12.492705
3 89 1391209201 41.793177 12.432122
4 79 1391209201 41.900275 12.462746
如果 csv 没有 header 是必需的参数 names
和 parse_dates
与 [1]
- 尝试将第二列解析为 datetime
:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
temp=u"""156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,1014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
parse_dates=[1],
names=['userid','datetime','latitude','longitude'])
#print (df)
#check dtypes if datetime it is OK
print (df['datetime'].dtypes)
datetime64[ns]
df['datetime'] = df['datetime'].astype(np.int64) // 10**9
print (df)
userid datetime latitude longitude
0 156 1391209200 41.883672 12.487778
1 187 1391209201 41.928543 12.469037
2 297 1391209201 41.891069 12.492705
3 89 1391209201 41.793177 12.432122
4 79 1391209201 41.900275 12.462746
5 191 1391209202 41.852305 12.577407
6 343 1391209202 41.892172 12.469700
7 341 1391209202 41.910213 12.477000
8 260 1391209203 41.865821 12.465522
另一个可能的问题是数据错误,在我的示例第二行中:
import pandas as pd
from pandas.compat import StringIO
temp=u"""156,2014-02-01 00:00:00.739166+01,41.8836718276551,12.4877775603346
187,1014-02-01 00:00:01.148457+01,41.9285433333333,12.4690366666667
297,2014-02-01 00:00:01.220066+01,41.8910686119733,12.4927045625339
89,2014-02-01 00:00:01.470854+01,41.7931766914244,12.4321219603157
79,2014-02-01 00:00:01.631136+01,41.90027472,12.46274618
191,2014-02-01 00:00:02.048546+01,41.8523047579646,12.5774065771898
343,2014-02-01 00:00:02.647839+01,41.8921718255185,12.4696996165151
341,2014-02-01 00:00:02.709888+01,41.9102125627332,12.4770004336041
260,2014-02-01 00:00:03.458195+01,41.8658208551143,12.4655221109313"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
parse_dates=[1],
names=['userid','datetime','latitude','longitude'])
#print (df)
#check dtypes - parse failed, get object dtype
print (df['datetime'].dtypes)
object
使用 to_datetime
and parameter errors='coerce'
- it replace bad data to NaT
and then replace NaT to some value e.g. 0
(1970-01-01 00:00:00.000000
) with fillna
解析到日期时间:
df['datetime'] = pd.to_datetime(df['datetime'], errors='coerce').fillna(0)
print (df)
userid datetime latitude longitude
0 156 2014-01-31 23:00:00.739166 41.883672 12.487778
1 187 1970-01-01 00:00:00.000000 41.928543 12.469037
2 297 2014-01-31 23:00:01.220066 41.891069 12.492705
3 89 2014-01-31 23:00:01.470854 41.793177 12.432122
4 79 2014-01-31 23:00:01.631136 41.900275 12.462746
5 191 2014-01-31 23:00:02.048546 41.852305 12.577407
6 343 2014-01-31 23:00:02.647839 41.892172 12.469700
7 341 2014-01-31 23:00:02.709888 41.910213 12.477000
8 260 2014-01-31 23:00:03.458195 41.865821 12.465522
df['datetime'] = df['datetime'].astype(np.int64) // 10**9
print (df)
userid datetime latitude longitude
0 156 1391209200 41.883672 12.487778
1 187 0 41.928543 12.469037
2 297 1391209201 41.891069 12.492705
3 89 1391209201 41.793177 12.432122
4 79 1391209201 41.900275 12.462746
5 191 1391209202 41.852305 12.577407
6 343 1391209202 41.892172 12.469700
7 341 1391209202 41.910213 12.477000
8 260 1391209203 41.865821 12.465522
编辑:
如果还有header需要替换列名需要header=0
添加到read_csv
.