Using pandas to perform regression, error: cannot concatenate 'str' and 'float' objects
Using pandas to perform regression, error: cannot concatenate 'str' and 'float' objects
我一直在根据这个答案 () 编写代码,以查明哪些日子的早晨风速增加。
这是我的数据样本
hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,#
hd, 40842,2000,03,20,10,50,2000,03,20,10,50,2000,03,20,00,50, ,N, 25.7,N, 25.7,N, 25.6,N, 21.5,N, 21.5,N, 21.4,N, 19.2,N, 19.2,N, 19.0,N, 67,N, 68,N, 66,N, 13,N, 9,N,100,N, 4,N, 15,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,51,2000,03,20,10,51,2000,03,20,00,51, 0.0,N, 25.6,N, 25.8,N, 25.6,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.4,N, 19.2,N, 68,N, 68,N, 66,N, 11,N, 9,N,107,N, 11,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,52,2000,03,20,10,52,2000,03,20,00,52, 0.0,N, 25.8,N, 25.8,N, 25.6,N, 21.7,N, 21.7,N, 21.5,N, 19.5,N, 19.5,N, 19.2,N, 68,N, 69,N, 66,N, 11,N, 9,N, 83,N, 13,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,53,2000,03,20,10,53,2000,03,20,00,53, 0.0,N, 25.8,N, 25.9,N, 25.8,N, 21.6,N, 21.8,N, 21.6,N, 19.3,N, 19.6,N, 19.3,N, 67,N, 68,N, 66,N, 9,N, 8,N, 87,N, 14,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,54,2000,03,20,10,54,2000,03,20,00,54, 0.0,N, 25.8,N, 25.8,N, 25.8,N, 21.6,N, 21.6,N, 21.6,N, 19.3,N, 19.3,N, 19.2,N, 67,N, 67,N, 67,N, 8,N, 4,N, 98,N, 23,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,55,2000,03,20,10,55,2000,03,20,00,55, 0.0,N, 25.7,N, 25.8,N, 25.7,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.3,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 4,N, 68,N, 15,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,56,2000,03,20,10,56,2000,03,20,00,56, 0.0,N, 25.9,N, 25.9,N, 25.7,N, 21.7,N, 21.7,N, 21.5,N, 19.4,N, 19.4,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 5,N, 69,N, 16,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,57,2000,03,20,10,57,2000,03,20,00,57, 0.0,N, 26.0,N, 26.0,N, 25.9,N, 21.8,N, 21.8,N, 21.7,N, 19.5,N, 19.5,N, 19.4,N, 67,N, 68,N, 66,N, 9,N, 5,N, 72,N, 10,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,58,2000,03,20,10,58,2000,03,20,00,58, 0.0,N, 26.0,N, 26.1,N, 26.0,N, 21.7,N, 21.8,N, 21.7,N, 19.4,N, 19.5,N, 19.3,N, 66,N, 67,N, 66,N, 8,N, 5,N, 69,N, 13,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
这是我尝试的代码:
import glob
import pandas as pd
import numpy as np
from datetime import datetime
for file in glob.glob('X:/brisbaneweatherdata/*.txt'):
df = pd.read_csv(file)
col = 'Wind (1 minute) speed in km/h'
mask = pd.notnull(df[col])
df = df.loc[mask]
for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))
然而,这是生产
runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
import glob
Traceback (most recent call last):
File "<ipython-input-19-ace8af14da2c>", line 1, in <module>
runfile('X:/python/linearregression.py', wdir='X:/python')
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "X:/python/linearregression.py", line 10, in <module>
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit
y = NX.asarray(y) + 0.0
TypeError: cannot concatenate 'str' and 'float' objects
如果我尝试将年份值转换为整数浮点数,例如int('Year Month Day Hours Minutes in YYYY')
或 int('MM')
它会产生错误 ValueError: invalid literal for int() with base 10: 'Year Month Day Hours Minutes in YYYY'
不过,在 Unutbu 的帮助下,TypeError 问题已得到解决。这会产生以下错误。
runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
import glob
C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
Traceback (most recent call last):
File "<ipython-input-24-ace8af14da2c>", line 1, in <module>
runfile('X:/python/linearregression.py', wdir='X:/python')
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "X:/python/linearregression.py", line 17, in <module>
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 570, in average
avg = a.mean(axis)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\core\_methods.py", line 72, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: 'str' and 'int'
我将.between('9', '12')
调整为.between(9, 12)
,np.average
计算只使用morning_data['Wind (1 minute) direction in degrees true']
,并在最后的[=18=中添加了string
格式]声明:
from datetime import datetime
for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))
最终运行良好(至少没有错误),产生:
20, Mar 2000 , 0.47, 83.67
这是我在复制您的示例后得到的DataFrame
:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 62 columns):
hd 9 non-null object
Station Number 9 non-null int64
Year Month Day Hours Minutes in YYYY 9 non-null int64
MM 9 non-null int64
DD 9 non-null int64
HH24 9 non-null int64
MI format in Local time 9 non-null int64
Year Month Day Hours Minutes in YYYY.1 9 non-null int64
MM.1 9 non-null int64
DD.1 9 non-null int64
HH24.1 9 non-null int64
MI format in Local standard time 9 non-null int64
Year Month Day Hours Minutes in YYYY.2 9 non-null int64
MM.2 9 non-null int64
DD.2 9 non-null int64
HH24.2 9 non-null int64
MI format in Universal coordinated time 9 non-null int64
Precipitation since last (AWS) observation in mm 9 non-null object
Quality of precipitation since last (AWS) observation value 9 non-null object
Air Temperature in degrees Celsius 9 non-null float64
Quality of air temperature 9 non-null object
Air temperature (1-minute maximum) in degrees Celsius 9 non-null float64
Quality of air temperature (1-minute maximum) 9 non-null object
Air temperature (1-minute minimum) in degrees Celsius 9 non-null float64
Quality of air temperature (1-minute minimum) 9 non-null object
Wet bulb temperature in degrees Celsius 9 non-null float64
Quality of Wet bulb temperature 9 non-null object
Wet bulb temperature (1 minute maximum) in degrees Celsius 9 non-null float64
Quality of wet bulb temperature (1 minute maximum) 9 non-null object
Wet bulb temperature (1 minute minimum) in degrees Celsius 9 non-null float64
Quality of wet bulb temperature (1 minute minimum) 9 non-null object
Dew point temperature in degrees Celsius 9 non-null float64
Quality of dew point temperature 9 non-null object
Dew point temperature (1-minute maximum) in degrees Celsius 9 non-null float64
Quality of Dew point Temperature (1-minute maximum) 9 non-null object
Dew point temperature (1 minute minimum) in degrees Celsius 9 non-null float64
Quality of Dew point Temperature (1 minute minimum) 9 non-null object
Relative humidity in percentage % 9 non-null int64
Quality of relative humidity 9 non-null object
Relative humidity (1 minute maximum) in percentage % 9 non-null int64
Quality of relative humidity (1 minute maximum) 9 non-null object
Relative humidity (1 minute minimum) in percentage % 9 non-null int64
Quality of Relative humidity (1 minute minimum) 9 non-null object
Wind (1 minute) speed in km/h 9 non-null int64
Wind (1 minute) speed quality 9 non-null object
Minimum wind speed (over 1 minute) in km/h 9 non-null int64
Minimum wind speed (over 1 minute) quality 9 non-null object
Wind (1 minute) direction in degrees true 9 non-null int64
Wind (1 minute) direction quality 9 non-null object
Standard deviation of wind (1 minute) 9 non-null int64
Standard deviation of wind (1 minute) direction quality 9 non-null object
Maximum wind gust (over 1 minute) in km/h 9 non-null int64
Maximum wind gust (over 1 minute) quality 9 non-null object
Visibility (automatic - one minute data) in km 9 non-null object
Quality of visibility (automatic - one minute data) 9 non-null object
Mean sea level pressure in hPa 9 non-null float64
Quality of mean sea level pressure 9 non-null object
Station level pressure in hPa 9 non-null float64
Quality of station level pressure 9 non-null object
QNH pressure in hPa 9 non-null float64
Quality of QNH pressure 9 non-null object
# 9 non-null object
dtypes: float64(12), int64(24), object(26)
memory usage: 4.4+ KB
错误信息
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit
y = NX.asarray(y) + 0.0
TypeError: cannot concatenate 'str' and 'float' objects
如果 y
是包含字符串的系列,则可以重现:
In [14]: np.asarray(pd.Series(['',1.0])) + 0.0
TypeError: cannot concatenate 'str' and 'float' objects
现在,如果您 peek at line 550 inside polynomial.py
,您会看到 y
是传递给 np.polyfit
的第二个参数。因此,这强烈表明 morning_data['Wind (1 minute) speed in km/h']
是一个包含字符串的系列。
您发布的示例数据没有显示字符串,但在 CSV 的某个地方您可能会在该列中找到一个字符串。
现在我们如何找到那个字符串?一种方法是将 Series 转换为数值(将无效字符串强制转换为 NaN):
col = 'Wind (1 minute) speed in km/h'
tmp = pd.to_numeric(morning_data[col], errors='coerce')
然后寻找 NaN:
mask = pd.isnull(tmp)
print(morning_data.loc[mask, col])
这将显示 'Wind (1 minute) speed in km/h'
列中无法转换为数字的所有值。
然后您可以考虑如何处理这些有问题的行。如果有
只是其中的一部分,您可以手动编辑它们。或者查看 CSV 如何
已生成并在源头修复错误。或者,如果你想丢弃这些
行,你可以使用
for file in glob.glob('X:/brisbaneweatherdata/*.txt'):
df = pd.read_csv(file)
for col in ['Wind (1 minute) speed in km/h',
'Wind (1 minute) direction in degrees true']:
df[col] = pd.to_numeric(df[col], errors='coerce')
mask = pd.notnull(df[col])
df = df.loc[mask]
for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
if len(morning_data) == 0: continue
gradient, intercept = np.polyfit(morning_data['HH24'], morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))
然后其余代码应该有机会工作。
我一直在根据这个答案 (
这是我的数据样本
hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,#
hd, 40842,2000,03,20,10,50,2000,03,20,10,50,2000,03,20,00,50, ,N, 25.7,N, 25.7,N, 25.6,N, 21.5,N, 21.5,N, 21.4,N, 19.2,N, 19.2,N, 19.0,N, 67,N, 68,N, 66,N, 13,N, 9,N,100,N, 4,N, 15,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,51,2000,03,20,10,51,2000,03,20,00,51, 0.0,N, 25.6,N, 25.8,N, 25.6,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.4,N, 19.2,N, 68,N, 68,N, 66,N, 11,N, 9,N,107,N, 11,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,52,2000,03,20,10,52,2000,03,20,00,52, 0.0,N, 25.8,N, 25.8,N, 25.6,N, 21.7,N, 21.7,N, 21.5,N, 19.5,N, 19.5,N, 19.2,N, 68,N, 69,N, 66,N, 11,N, 9,N, 83,N, 13,N, 13,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,53,2000,03,20,10,53,2000,03,20,00,53, 0.0,N, 25.8,N, 25.9,N, 25.8,N, 21.6,N, 21.8,N, 21.6,N, 19.3,N, 19.6,N, 19.3,N, 67,N, 68,N, 66,N, 9,N, 8,N, 87,N, 14,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,54,2000,03,20,10,54,2000,03,20,00,54, 0.0,N, 25.8,N, 25.8,N, 25.8,N, 21.6,N, 21.6,N, 21.6,N, 19.3,N, 19.3,N, 19.2,N, 67,N, 67,N, 67,N, 8,N, 4,N, 98,N, 23,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,55,2000,03,20,10,55,2000,03,20,00,55, 0.0,N, 25.7,N, 25.8,N, 25.7,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.3,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 4,N, 68,N, 15,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,56,2000,03,20,10,56,2000,03,20,00,56, 0.0,N, 25.9,N, 25.9,N, 25.7,N, 21.7,N, 21.7,N, 21.5,N, 19.4,N, 19.4,N, 19.2,N, 67,N, 68,N, 66,N, 8,N, 5,N, 69,N, 16,N, 9,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,57,2000,03,20,10,57,2000,03,20,00,57, 0.0,N, 26.0,N, 26.0,N, 25.9,N, 21.8,N, 21.8,N, 21.7,N, 19.5,N, 19.5,N, 19.4,N, 67,N, 68,N, 66,N, 9,N, 5,N, 72,N, 10,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,58,2000,03,20,10,58,2000,03,20,00,58, 0.0,N, 26.0,N, 26.1,N, 26.0,N, 21.7,N, 21.8,N, 21.7,N, 19.4,N, 19.5,N, 19.3,N, 66,N, 67,N, 66,N, 8,N, 5,N, 69,N, 13,N, 11,N, ,N,1018.6,N,1017.5,N,1018.6,N,#
这是我尝试的代码:
import glob
import pandas as pd
import numpy as np
from datetime import datetime
for file in glob.glob('X:/brisbaneweatherdata/*.txt'):
df = pd.read_csv(file)
col = 'Wind (1 minute) speed in km/h'
mask = pd.notnull(df[col])
df = df.loc[mask]
for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))
然而,这是生产
runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
import glob
Traceback (most recent call last):
File "<ipython-input-19-ace8af14da2c>", line 1, in <module>
runfile('X:/python/linearregression.py', wdir='X:/python')
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "X:/python/linearregression.py", line 10, in <module>
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit
y = NX.asarray(y) + 0.0
TypeError: cannot concatenate 'str' and 'float' objects
如果我尝试将年份值转换为整数浮点数,例如int('Year Month Day Hours Minutes in YYYY')
或 int('MM')
它会产生错误 ValueError: invalid literal for int() with base 10: 'Year Month Day Hours Minutes in YYYY'
不过,在 Unutbu 的帮助下,TypeError 问题已得到解决。这会产生以下错误。
runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
import glob
C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
Traceback (most recent call last):
File "<ipython-input-24-ace8af14da2c>", line 1, in <module>
runfile('X:/python/linearregression.py', wdir='X:/python')
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "X:/python/linearregression.py", line 17, in <module>
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 570, in average
avg = a.mean(axis)
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\core\_methods.py", line 72, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: 'str' and 'int'
我将.between('9', '12')
调整为.between(9, 12)
,np.average
计算只使用morning_data['Wind (1 minute) direction in degrees true']
,并在最后的[=18=中添加了string
格式]声明:
from datetime import datetime
for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
gradient, intercept = np.polyfit(morning_data.HH24, morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))
最终运行良好(至少没有错误),产生:
20, Mar 2000 , 0.47, 83.67
这是我在复制您的示例后得到的DataFrame
:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 62 columns):
hd 9 non-null object
Station Number 9 non-null int64
Year Month Day Hours Minutes in YYYY 9 non-null int64
MM 9 non-null int64
DD 9 non-null int64
HH24 9 non-null int64
MI format in Local time 9 non-null int64
Year Month Day Hours Minutes in YYYY.1 9 non-null int64
MM.1 9 non-null int64
DD.1 9 non-null int64
HH24.1 9 non-null int64
MI format in Local standard time 9 non-null int64
Year Month Day Hours Minutes in YYYY.2 9 non-null int64
MM.2 9 non-null int64
DD.2 9 non-null int64
HH24.2 9 non-null int64
MI format in Universal coordinated time 9 non-null int64
Precipitation since last (AWS) observation in mm 9 non-null object
Quality of precipitation since last (AWS) observation value 9 non-null object
Air Temperature in degrees Celsius 9 non-null float64
Quality of air temperature 9 non-null object
Air temperature (1-minute maximum) in degrees Celsius 9 non-null float64
Quality of air temperature (1-minute maximum) 9 non-null object
Air temperature (1-minute minimum) in degrees Celsius 9 non-null float64
Quality of air temperature (1-minute minimum) 9 non-null object
Wet bulb temperature in degrees Celsius 9 non-null float64
Quality of Wet bulb temperature 9 non-null object
Wet bulb temperature (1 minute maximum) in degrees Celsius 9 non-null float64
Quality of wet bulb temperature (1 minute maximum) 9 non-null object
Wet bulb temperature (1 minute minimum) in degrees Celsius 9 non-null float64
Quality of wet bulb temperature (1 minute minimum) 9 non-null object
Dew point temperature in degrees Celsius 9 non-null float64
Quality of dew point temperature 9 non-null object
Dew point temperature (1-minute maximum) in degrees Celsius 9 non-null float64
Quality of Dew point Temperature (1-minute maximum) 9 non-null object
Dew point temperature (1 minute minimum) in degrees Celsius 9 non-null float64
Quality of Dew point Temperature (1 minute minimum) 9 non-null object
Relative humidity in percentage % 9 non-null int64
Quality of relative humidity 9 non-null object
Relative humidity (1 minute maximum) in percentage % 9 non-null int64
Quality of relative humidity (1 minute maximum) 9 non-null object
Relative humidity (1 minute minimum) in percentage % 9 non-null int64
Quality of Relative humidity (1 minute minimum) 9 non-null object
Wind (1 minute) speed in km/h 9 non-null int64
Wind (1 minute) speed quality 9 non-null object
Minimum wind speed (over 1 minute) in km/h 9 non-null int64
Minimum wind speed (over 1 minute) quality 9 non-null object
Wind (1 minute) direction in degrees true 9 non-null int64
Wind (1 minute) direction quality 9 non-null object
Standard deviation of wind (1 minute) 9 non-null int64
Standard deviation of wind (1 minute) direction quality 9 non-null object
Maximum wind gust (over 1 minute) in km/h 9 non-null int64
Maximum wind gust (over 1 minute) quality 9 non-null object
Visibility (automatic - one minute data) in km 9 non-null object
Quality of visibility (automatic - one minute data) 9 non-null object
Mean sea level pressure in hPa 9 non-null float64
Quality of mean sea level pressure 9 non-null object
Station level pressure in hPa 9 non-null float64
Quality of station level pressure 9 non-null object
QNH pressure in hPa 9 non-null float64
Quality of QNH pressure 9 non-null object
# 9 non-null object
dtypes: float64(12), int64(24), object(26)
memory usage: 4.4+ KB
错误信息
File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\polynomial.py", line 550, in polyfit
y = NX.asarray(y) + 0.0
TypeError: cannot concatenate 'str' and 'float' objects
如果 y
是包含字符串的系列,则可以重现:
In [14]: np.asarray(pd.Series(['',1.0])) + 0.0
TypeError: cannot concatenate 'str' and 'float' objects
现在,如果您 peek at line 550 inside polynomial.py
,您会看到 y
是传递给 np.polyfit
的第二个参数。因此,这强烈表明 morning_data['Wind (1 minute) speed in km/h']
是一个包含字符串的系列。
您发布的示例数据没有显示字符串,但在 CSV 的某个地方您可能会在该列中找到一个字符串。
现在我们如何找到那个字符串?一种方法是将 Series 转换为数值(将无效字符串强制转换为 NaN):
col = 'Wind (1 minute) speed in km/h'
tmp = pd.to_numeric(morning_data[col], errors='coerce')
然后寻找 NaN:
mask = pd.isnull(tmp)
print(morning_data.loc[mask, col])
这将显示 'Wind (1 minute) speed in km/h'
列中无法转换为数字的所有值。
然后您可以考虑如何处理这些有问题的行。如果有 只是其中的一部分,您可以手动编辑它们。或者查看 CSV 如何 已生成并在源头修复错误。或者,如果你想丢弃这些 行,你可以使用
for file in glob.glob('X:/brisbaneweatherdata/*.txt'):
df = pd.read_csv(file)
for col in ['Wind (1 minute) speed in km/h',
'Wind (1 minute) direction in degrees true']:
df[col] = pd.to_numeric(df[col], errors='coerce')
mask = pd.notnull(df[col])
df = df.loc[mask]
for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY', 'MM', 'DD']):
morning_data = group[group.HH24.between(9, 12)]
if len(morning_data) == 0: continue
gradient, intercept = np.polyfit(morning_data['HH24'], morning_data['Wind (1 minute) speed in km/h'], 1)
wind_direction = np.average(morning_data['Wind (1 minute) direction in degrees true'])
if gradient > 0:
print("{0:%d, %b %Y} , {1:.2f}, {2:.2f}".format(datetime(*date), gradient, wind_direction))
然后其余代码应该有机会工作。