在 pandas 中将数字数据框转换为整数时出错 -- "only integer scalar arrays can be converted to a scalar index"
Errors in converting numeric data frame to integer in pandas -- "only integer scalar arrays can be converted to a scalar index"
我有一个大型数据集,我正在尝试将仅包含数字数据的 'object' 列转换为 python/pandas 中的 'integer' 数据类型。对于我尝试的每个代码,我都收到以下错误:
CODE SNIPPET (see below for options I have tried)
PATH/frame.py in __setiten__(self, key, value)
3482 self._setitem_frame(key, value)
3483 elif isinstance(key, (Series, np.ndarray, list, Index)):
-->3484 self._setiten_array(key, value)
3485 else:
PATH/frame.py in _setitem_array(self, key, value)
3507 raise ValueError("Columns must be same length as key")
3508 for k1, k2 in zip(key, value.columns):
-->3509 self[k1] = value[k2]
3510 else:
3511 indexer = self.loc._convert_to_indexer(key, axis=1)
PATH/frame.py in __setitem__(self, key, value)
3485 else:
3486 #set column
-->3487 self._set_item(key, value)
3488
3489 def _setitem_slice(self, key, value):
PATH/frame.py in _set_item(self, key, value)
3562
3563 self._ensure_valid_index(value)
-->3564 value = self._sanitize_column(key, value)
3565 NDFrame._set_item(self, key, value)
PATH/frame.py in _sanitize_column(self, key, value, broadcast)
3778 if broadcast and key in self.columns and value.ndim == 1:
3780 if not self.columns.is_unique or isinstance(self.columns, MultiIndex):
-->3781 existing_piece = self[key]
3782 if isinstance(existing_piece, DataFrame):
3783 value = np.tile(value, (len(existing_piece.columns), 1))
PATH/frame.py in __getitem__(self, key)
2971 if self.columns.nlevels > 1:
2972 return self.getitem_multilevel(key)
-->2973 return self.__get_item_cache(key_
2974
2975 # Do we have a slicer (on rows)?
PATH/generic.py in _get_item_cache(self, item)
3268 res = cache.get(item)
3269 if res is None:
-->3270 values = self.data.get(item)
3271 res = self.box_item_values(item, values)
3272 cache[item] = res
PATH/managers.py in get(self, item)
958 raise ValueError("cannot label index with a null key")
959
-->960 return self.iget(loc)
961 else:
962
PATH/managers.py in iget(self, i)
975 Otherwise return as a ndarray
976 """
-->977 block = self.blocks[self.blknos[i]]
978 values = block.iget(self._blklocks[i])
978 if values.ndi != 1:
TypeError: only integer scalar arrays can be concerted to a scalar index
我试过的,全部回退了(上面的)错误:
df[["column1", "column 2", "column 3", "column 4"]] = df[["column 1", "column 2", "column 3", "column 4"]].apply(pd.to_numeric, errors='raise')
和
df[["column1", "column 2", "column 3", "column 4"]] = df[["column 1", "column 2", "column 3", "column 4"]].apply(pd.to_numeric, errors='raise')
WHERE, df = python 中的数据框名称;第 1 列等 = python
中的列名称
我也试过:
df["column1"] = df["column1"].astype(str).astype(int)
和
df["column1"] = pd.numeric(df["column1"], errors = 'coerce')
这也返回了同样的错误。
第一次 post 之后的额外尝试:
我也试过了--
def convert_numbers(val):
"""
Convert number string to integer
"""
new_val = val
return int(new_val)
df["column1"].apply(convert_numbers)
再次返回相同的错误。
我仔细检查了数据类型。 df.dtypes
显示我尝试更改为“对象”的列的数据类型,无论我做什么。我仔细检查了代码,有问题的列没有 missing/null 值。我还检查了格式,列完全是数字。一列格式化为三个数字(即 207、710、115),另一列格式化为两个数字(01、02、03),最后一列格式化为五个数字(00001、00002、00003)....
如有任何帮助,我们将不胜感激。如果我找到答案,我会 post 在这里。
试试这个:
for col in ["column1", "column 2", "column 3", "column 4"]:
# df[col].reshape((1,-1))
df[col] = [int(n) for n in df[col]]
我找到了答案。问题可能是我正在使用 Oracle 数据库连接,我不确定。如果有人在 Python 中有更简单的方法来做到这一点,我仍然很想听到更多评论,但我是这样做的:
#coerce stores all non-convertible values as NA and ignore keeps original values, so column may have mixed data types.
df['column names'] = df[['column names']].apply(pd.to_numeric, errors = 'coerce').fillna(df)
请注意,对非数字项目使用强制可能会删除其数据并将其切换为 NA。 :) 这虽然有效!
我有一个大型数据集,我正在尝试将仅包含数字数据的 'object' 列转换为 python/pandas 中的 'integer' 数据类型。对于我尝试的每个代码,我都收到以下错误:
CODE SNIPPET (see below for options I have tried)
PATH/frame.py in __setiten__(self, key, value)
3482 self._setitem_frame(key, value)
3483 elif isinstance(key, (Series, np.ndarray, list, Index)):
-->3484 self._setiten_array(key, value)
3485 else:
PATH/frame.py in _setitem_array(self, key, value)
3507 raise ValueError("Columns must be same length as key")
3508 for k1, k2 in zip(key, value.columns):
-->3509 self[k1] = value[k2]
3510 else:
3511 indexer = self.loc._convert_to_indexer(key, axis=1)
PATH/frame.py in __setitem__(self, key, value)
3485 else:
3486 #set column
-->3487 self._set_item(key, value)
3488
3489 def _setitem_slice(self, key, value):
PATH/frame.py in _set_item(self, key, value)
3562
3563 self._ensure_valid_index(value)
-->3564 value = self._sanitize_column(key, value)
3565 NDFrame._set_item(self, key, value)
PATH/frame.py in _sanitize_column(self, key, value, broadcast)
3778 if broadcast and key in self.columns and value.ndim == 1:
3780 if not self.columns.is_unique or isinstance(self.columns, MultiIndex):
-->3781 existing_piece = self[key]
3782 if isinstance(existing_piece, DataFrame):
3783 value = np.tile(value, (len(existing_piece.columns), 1))
PATH/frame.py in __getitem__(self, key)
2971 if self.columns.nlevels > 1:
2972 return self.getitem_multilevel(key)
-->2973 return self.__get_item_cache(key_
2974
2975 # Do we have a slicer (on rows)?
PATH/generic.py in _get_item_cache(self, item)
3268 res = cache.get(item)
3269 if res is None:
-->3270 values = self.data.get(item)
3271 res = self.box_item_values(item, values)
3272 cache[item] = res
PATH/managers.py in get(self, item)
958 raise ValueError("cannot label index with a null key")
959
-->960 return self.iget(loc)
961 else:
962
PATH/managers.py in iget(self, i)
975 Otherwise return as a ndarray
976 """
-->977 block = self.blocks[self.blknos[i]]
978 values = block.iget(self._blklocks[i])
978 if values.ndi != 1:
TypeError: only integer scalar arrays can be concerted to a scalar index
我试过的,全部回退了(上面的)错误:
df[["column1", "column 2", "column 3", "column 4"]] = df[["column 1", "column 2", "column 3", "column 4"]].apply(pd.to_numeric, errors='raise')
和
df[["column1", "column 2", "column 3", "column 4"]] = df[["column 1", "column 2", "column 3", "column 4"]].apply(pd.to_numeric, errors='raise')
WHERE, df = python 中的数据框名称;第 1 列等 = python
中的列名称我也试过:
df["column1"] = df["column1"].astype(str).astype(int)
和
df["column1"] = pd.numeric(df["column1"], errors = 'coerce')
这也返回了同样的错误。 第一次 post 之后的额外尝试: 我也试过了--
def convert_numbers(val):
"""
Convert number string to integer
"""
new_val = val
return int(new_val)
df["column1"].apply(convert_numbers)
再次返回相同的错误。
我仔细检查了数据类型。 df.dtypes
显示我尝试更改为“对象”的列的数据类型,无论我做什么。我仔细检查了代码,有问题的列没有 missing/null 值。我还检查了格式,列完全是数字。一列格式化为三个数字(即 207、710、115),另一列格式化为两个数字(01、02、03),最后一列格式化为五个数字(00001、00002、00003)....
如有任何帮助,我们将不胜感激。如果我找到答案,我会 post 在这里。
试试这个:
for col in ["column1", "column 2", "column 3", "column 4"]:
# df[col].reshape((1,-1))
df[col] = [int(n) for n in df[col]]
我找到了答案。问题可能是我正在使用 Oracle 数据库连接,我不确定。如果有人在 Python 中有更简单的方法来做到这一点,我仍然很想听到更多评论,但我是这样做的:
#coerce stores all non-convertible values as NA and ignore keeps original values, so column may have mixed data types.
df['column names'] = df[['column names']].apply(pd.to_numeric, errors = 'coerce').fillna(df)
请注意,对非数字项目使用强制可能会删除其数据并将其切换为 NA。 :) 这虽然有效!