Adding a new column to dask dataframe throws ValueError: Length of values does not match length of index
Adding a new column to dask dataframe throws ValueError: Length of values does not match length of index
我明白这个回溯
ValueError: Length of values does not match length of index
源于 dataframe
dataframe
在 ddf.assign(new_col=ts_col
期间或 ddf['ts_col'] = ts_col
.
中的相同操作
问题是,我看不出长度有何不同 - 用代码解释:
from dask import dataframe as dd
# Read data
ddf = dd.read_csv(csv_path)
ddf.persist()
# Convert to unixtimestamp to pandas timestamp
ts_col = pd.to_datetime(ddf.ts_unixtime_sec_prec, unit='s', errors='coerce')
ts_col.fillna()
# Check data
> ts_col[0:2]
< DatetimeIndex(['2019-05-23 09:09:56', '2019-05-23 09:09:56'], dtype='datetime64[ns]', freq=None)
# Checking length
> len(ddf.index)
< 11227296
> len(ts_col)
< 11227296
# Try to assign it to dataframe.
> ddf['ts_col'] = ts_col
< ValueError: Length of values does not match length of index <<< Error
使用 lambda
/ map
函数:
df['ts'] = df['ts'].map(lambda x: pd.to_datetime(x, errors='coerce'))
我明白这个回溯
ValueError: Length of values does not match length of index
源于 dataframe
dataframe
在 ddf.assign(new_col=ts_col
期间或 ddf['ts_col'] = ts_col
.
问题是,我看不出长度有何不同 - 用代码解释:
from dask import dataframe as dd
# Read data
ddf = dd.read_csv(csv_path)
ddf.persist()
# Convert to unixtimestamp to pandas timestamp
ts_col = pd.to_datetime(ddf.ts_unixtime_sec_prec, unit='s', errors='coerce')
ts_col.fillna()
# Check data
> ts_col[0:2]
< DatetimeIndex(['2019-05-23 09:09:56', '2019-05-23 09:09:56'], dtype='datetime64[ns]', freq=None)
# Checking length
> len(ddf.index)
< 11227296
> len(ts_col)
< 11227296
# Try to assign it to dataframe.
> ddf['ts_col'] = ts_col
< ValueError: Length of values does not match length of index <<< Error
使用 lambda
/ map
函数:
df['ts'] = df['ts'].map(lambda x: pd.to_datetime(x, errors='coerce'))