KeyError: "None of [['', '']] are in the [columns]" (Pandas Dataframe)
KeyError: "None of [['', '']] are in the [columns]" (Pandas Dataframe)
我正在尝试编写一个接受数据框的函数,其中有些列属于同一类型,有些列则不是。列的一个例子是:
['id', 't_dur0', 't_dur1', 't_dur2', 't_dance0', 't_dance1', 't_dance2', 't_energy0',
't_energy1', 't_energy2']
我正在尝试生成两个新的数据框,一个是没有重复的列,另一个是只有重复的列,代码如下:
# Function that takes in a dataframe and returns new dataframes with all the sub-dataframes
def sub_dataframes(dataframe):
copy = dataframe.copy() # To avoid SettingWithCopyWarning
# Iterate through all the columns of the df
for (col_name, col_data) in copy.iteritems():
temp = str(col_name)
rest = copy.iloc[:, 1:]
new_df = [[]]
# If it's not a duplicate, we just add it to the new df
if len(temp) < 6:
new_df[temp] = copy[col_data]
# If the length of the column name is greater than or equal to 6, we know it's a duplicate
if len(temp) >= 6:
stripped = temp.rstrip(temp[2:])
# Second for-loop to check the next column
for (col_name2, col_data2) in rest.iteritems():
temp2 = str(col_name2)
rest2 = rest.iloc[:, 1:]
only_dups = [[]]
if len(temp2) >= 6:
stripped2 = temp2.rstrip(temp2[2:])
# Compare the two column names (without the integer 0,1, or 2)
if stripped[:-1] == stripped2[:-1]:
# Create new df of the two columns
only_dups[stripped] = col_data
only_dups[stripped2] = col_data2
# Third for-loop to check the remaining columns
for (col_name3, col_data3) in rest2.iteritems():
temp3 = str(col_name3)
if len(temp3) >= 6:
stripped3 = temp3.rstrip(temp3[2:])
# Compare the two column names (without the integer 0,1, or 2)
if stripped2[:-1] == stripped3[:-1]:
only_dups[stripped3] = col_data3
print("Original:\n{}\nWithout duplicates:\n{}\nDuplicates:\n{}".format(copy, new_df, only_dups))
sub_dataframes(df)
当我 运行 这段代码时,我得到这个错误:
KeyError: "None of [Int64Index([ 22352, 106534, 23608, 8655, 49670, 101988, 9136,
141284,\n 28564, 14262,\n ...\n 76690, 150965,
143106, 142370, 68004, 33980, 110832, 14491,\n 123511, 6207],\n
dtype='int64', length=2833)] are in the [columns]"
我试着在 Whosebug 上查看其他问题,看看是否可以解决问题,但到目前为止我所了解的是,我无法像现在这样添加列,new_df[temp] = copy[col_data]
或 only_dups[stripped] = col_data
,但我似乎无法弄清楚如何正确创建新列。如何根据我现在拥有的变量添加新列?是否可能,或者我是否必须重写代码以使其没有那么多 for 循环?
编辑
我想要的输出示例是:
Original:
id t_dur0 t_dur1 t_dur2 ...
0 22352 292720 293760.0 292733.0
1 106534 213760 181000.0 245973.0
2 23608 157124 130446.0 152450.0
3 8655 127896 176351.0 166968.0
4 49670 210320 226253.0 211880.0
... ... ... ... ...
Without duplicates:
id
0 22352
1 106534
2 23608
3 8655
4 49670
... ..
Duplicates:
t_dur0 t_dur1 t_dur2
0 292720 293760.0 292733.0
1 213760 181000.0 245973.0
2 157124 130446.0 152450.0
3 127896 176351.0 166968.0
4 210320 226253.0 211880.0
... ... ... ...
IIUC:
def sub_dataframes(dataframe):
# extract common prefix -> remove trailing digits
cols = dataframe.columns.str.replace(r'\d*$', '', regex=True) \
.to_series().value_counts()
# split columns
unq_cols = cols[cols == 1].index
dup_cols = dataframe.columns[~dataframe.columns.isin(unq_cols)]
return (dataframe[unq_cols], dataframe[dup_cols])
df1, df2 = sub_dataframes(df)
输出:
>>> df1
id
0 22352
1 106534
2 23608
3 8655
4 49670
>>> df2
t_dur0 t_dur1 t_dur2
0 292720 293760.0 292733.0
1 213760 181000.0 245973.0
2 157124 130446.0 152450.0
3 127896 176351.0 166968.0
4 210320 226253.0 211880.0
您可以删除数字并确定列是否变为 duplicated
:
mask = df.columns.str.replace(r'\d+', '', regex=True).duplicated(keep=False)
# duplicated columns
df1 = df.loc[:, mask]
# unique columns
df2 = df.loc[:, ~mask]
我正在尝试编写一个接受数据框的函数,其中有些列属于同一类型,有些列则不是。列的一个例子是:
['id', 't_dur0', 't_dur1', 't_dur2', 't_dance0', 't_dance1', 't_dance2', 't_energy0',
't_energy1', 't_energy2']
我正在尝试生成两个新的数据框,一个是没有重复的列,另一个是只有重复的列,代码如下:
# Function that takes in a dataframe and returns new dataframes with all the sub-dataframes
def sub_dataframes(dataframe):
copy = dataframe.copy() # To avoid SettingWithCopyWarning
# Iterate through all the columns of the df
for (col_name, col_data) in copy.iteritems():
temp = str(col_name)
rest = copy.iloc[:, 1:]
new_df = [[]]
# If it's not a duplicate, we just add it to the new df
if len(temp) < 6:
new_df[temp] = copy[col_data]
# If the length of the column name is greater than or equal to 6, we know it's a duplicate
if len(temp) >= 6:
stripped = temp.rstrip(temp[2:])
# Second for-loop to check the next column
for (col_name2, col_data2) in rest.iteritems():
temp2 = str(col_name2)
rest2 = rest.iloc[:, 1:]
only_dups = [[]]
if len(temp2) >= 6:
stripped2 = temp2.rstrip(temp2[2:])
# Compare the two column names (without the integer 0,1, or 2)
if stripped[:-1] == stripped2[:-1]:
# Create new df of the two columns
only_dups[stripped] = col_data
only_dups[stripped2] = col_data2
# Third for-loop to check the remaining columns
for (col_name3, col_data3) in rest2.iteritems():
temp3 = str(col_name3)
if len(temp3) >= 6:
stripped3 = temp3.rstrip(temp3[2:])
# Compare the two column names (without the integer 0,1, or 2)
if stripped2[:-1] == stripped3[:-1]:
only_dups[stripped3] = col_data3
print("Original:\n{}\nWithout duplicates:\n{}\nDuplicates:\n{}".format(copy, new_df, only_dups))
sub_dataframes(df)
当我 运行 这段代码时,我得到这个错误:
KeyError: "None of [Int64Index([ 22352, 106534, 23608, 8655, 49670, 101988, 9136,
141284,\n 28564, 14262,\n ...\n 76690, 150965,
143106, 142370, 68004, 33980, 110832, 14491,\n 123511, 6207],\n
dtype='int64', length=2833)] are in the [columns]"
我试着在 Whosebug 上查看其他问题,看看是否可以解决问题,但到目前为止我所了解的是,我无法像现在这样添加列,new_df[temp] = copy[col_data]
或 only_dups[stripped] = col_data
,但我似乎无法弄清楚如何正确创建新列。如何根据我现在拥有的变量添加新列?是否可能,或者我是否必须重写代码以使其没有那么多 for 循环?
编辑
我想要的输出示例是:
Original:
id t_dur0 t_dur1 t_dur2 ...
0 22352 292720 293760.0 292733.0
1 106534 213760 181000.0 245973.0
2 23608 157124 130446.0 152450.0
3 8655 127896 176351.0 166968.0
4 49670 210320 226253.0 211880.0
... ... ... ... ...
Without duplicates:
id
0 22352
1 106534
2 23608
3 8655
4 49670
... ..
Duplicates:
t_dur0 t_dur1 t_dur2
0 292720 293760.0 292733.0
1 213760 181000.0 245973.0
2 157124 130446.0 152450.0
3 127896 176351.0 166968.0
4 210320 226253.0 211880.0
... ... ... ...
IIUC:
def sub_dataframes(dataframe):
# extract common prefix -> remove trailing digits
cols = dataframe.columns.str.replace(r'\d*$', '', regex=True) \
.to_series().value_counts()
# split columns
unq_cols = cols[cols == 1].index
dup_cols = dataframe.columns[~dataframe.columns.isin(unq_cols)]
return (dataframe[unq_cols], dataframe[dup_cols])
df1, df2 = sub_dataframes(df)
输出:
>>> df1
id
0 22352
1 106534
2 23608
3 8655
4 49670
>>> df2
t_dur0 t_dur1 t_dur2
0 292720 293760.0 292733.0
1 213760 181000.0 245973.0
2 157124 130446.0 152450.0
3 127896 176351.0 166968.0
4 210320 226253.0 211880.0
您可以删除数字并确定列是否变为 duplicated
:
mask = df.columns.str.replace(r'\d+', '', regex=True).duplicated(keep=False)
# duplicated columns
df1 = df.loc[:, mask]
# unique columns
df2 = df.loc[:, ~mask]