如何比较数据框列中的字符串值和单元格中的值以创建基于多值字典的新数据框?
How to compare string value in dataframe column and value in cell to create new dataframe based on multi value dictionary?
我有一个看起来像这样的 df:
id.1.value.1 id.2.value.2 id.1.question id.2.value.2
TRUE FALSE TRUE TRUE
我想创建逻辑来扫描 df 的列名并仅从列名中具有 value
的列名中提取最后一个数字,然后比较包含 value
具有以下逻辑:
如果 value
列中的值等于 TRUE
则比较多值字典中的最后一个数字
使用多键字典中的第二个值来创建数据框列名
示例:
my_dict = {1: ('a', 'category'),2: ('b', 'category'),\
3: ('c', 'category'),4:('d','category'),\
5:('e','subcategory'),6:('f','subcategory'),\
7:('g','subcategory'),8:('h','subcategory'),\
9:('i','subcategory'),10:('j','subcategory'),\
11:('k','subcategor'),12:('l','subcategory'),\
13:('m','subcategory'),14:('n','subcategory'),\
15:('o','subcategory'),16:('p','subcategory'),\
17:('q','subcategory'),18:('r','subcategory'),\
19:('s','subcategory'),20:('t','subcategory'),\
21:('u','subcategory'),22:('v','subcategory'),\
23:('w','subcategory'),24:('x','subcategory')
}
如果我当前的 df 看起来像这样:
id.1.value.1 id.2.value.2 id.1.question id.6.value.6
TRUE FALSE TRUE TRUE
新的 df 应该是这样的:
category subcategory
a f
哪里 df,
id.1.value.1 id.2.value.2 id.1.question id.6.value.6
0 True False True True
使用:
i = df.loc[:,df.columns[df.iloc[0]]].filter(like='value').columns.str.split('.').str[-1].astype(int).tolist()
my_dict = {1: ('a', 'category'),2: ('b', 'category'),\
3: ('c', 'category'),4:('d','category'),\
5:('e','subcategory'),6:('f','subcategory'),\
7:('g','subcategory'),8:('h','subcategory'),\
9:('i','subcategory'),10:('j','subcategory'),\
11:('k','subcategor'),12:('l','subcategory'),\
13:('m','subcategory'),14:('n','subcategory'),\
15:('o','subcategory'),16:('p','subcategory'),\
17:('q','subcategory'),18:('r','subcategory'),\
19:('s','subcategory'),20:('t','subcategory'),\
21:('u','subcategory'),22:('v','subcategory'),\
23:('w','subcategory'),24:('x','subcategory')}
df1 = pd.DataFrame.from_dict(my_dict, orient='index')
df_out = df1.loc[i].set_index(1).T
print(df_out)
输出:
1 category subcategory
0 a f
names = df.columns
new_df = pd.DataFrame()
for name in names:
if ('value' in name) & df[name][0]:
last_number = int(name[-1])
key, value = my_dict[last_number]
try:
new_df[value][0] = list(new_df[value][0]) + [key]
except:
new_df[value] = [key]
new_df = pd.DataFrame()
# get column names
for col in (list(df)):
if "value" in col:
try:
# operate only in columns where a valid number is found
value = df[col].rpartition('.')[:-1]
# When df== True
if df.loc[col,1]==True:
new_df[my_dict[value][1]]= my_dict[value][0]
except Exception as e:
print(e)
IIUC:
ans = [my_dict[int(x[-1])] for x in df1.where(df1.loc[:,['value' in x for x in df1.columns]]).dropna(axis=1)]
pd.DataFrame.from_dict({v: k for k, v in dict(ans).items()}, orient='index').T
输出:
category subcategory
0 a f
我有一个看起来像这样的 df:
id.1.value.1 id.2.value.2 id.1.question id.2.value.2
TRUE FALSE TRUE TRUE
我想创建逻辑来扫描 df 的列名并仅从列名中具有 value
的列名中提取最后一个数字,然后比较包含 value
具有以下逻辑:
如果
value
列中的值等于TRUE
则比较多值字典中的最后一个数字使用多键字典中的第二个值来创建数据框列名
示例:
my_dict = {1: ('a', 'category'),2: ('b', 'category'),\
3: ('c', 'category'),4:('d','category'),\
5:('e','subcategory'),6:('f','subcategory'),\
7:('g','subcategory'),8:('h','subcategory'),\
9:('i','subcategory'),10:('j','subcategory'),\
11:('k','subcategor'),12:('l','subcategory'),\
13:('m','subcategory'),14:('n','subcategory'),\
15:('o','subcategory'),16:('p','subcategory'),\
17:('q','subcategory'),18:('r','subcategory'),\
19:('s','subcategory'),20:('t','subcategory'),\
21:('u','subcategory'),22:('v','subcategory'),\
23:('w','subcategory'),24:('x','subcategory')
}
如果我当前的 df 看起来像这样:
id.1.value.1 id.2.value.2 id.1.question id.6.value.6
TRUE FALSE TRUE TRUE
新的 df 应该是这样的:
category subcategory
a f
哪里 df,
id.1.value.1 id.2.value.2 id.1.question id.6.value.6
0 True False True True
使用:
i = df.loc[:,df.columns[df.iloc[0]]].filter(like='value').columns.str.split('.').str[-1].astype(int).tolist()
my_dict = {1: ('a', 'category'),2: ('b', 'category'),\
3: ('c', 'category'),4:('d','category'),\
5:('e','subcategory'),6:('f','subcategory'),\
7:('g','subcategory'),8:('h','subcategory'),\
9:('i','subcategory'),10:('j','subcategory'),\
11:('k','subcategor'),12:('l','subcategory'),\
13:('m','subcategory'),14:('n','subcategory'),\
15:('o','subcategory'),16:('p','subcategory'),\
17:('q','subcategory'),18:('r','subcategory'),\
19:('s','subcategory'),20:('t','subcategory'),\
21:('u','subcategory'),22:('v','subcategory'),\
23:('w','subcategory'),24:('x','subcategory')}
df1 = pd.DataFrame.from_dict(my_dict, orient='index')
df_out = df1.loc[i].set_index(1).T
print(df_out)
输出:
1 category subcategory
0 a f
names = df.columns
new_df = pd.DataFrame()
for name in names:
if ('value' in name) & df[name][0]:
last_number = int(name[-1])
key, value = my_dict[last_number]
try:
new_df[value][0] = list(new_df[value][0]) + [key]
except:
new_df[value] = [key]
new_df = pd.DataFrame()
# get column names
for col in (list(df)):
if "value" in col:
try:
# operate only in columns where a valid number is found
value = df[col].rpartition('.')[:-1]
# When df== True
if df.loc[col,1]==True:
new_df[my_dict[value][1]]= my_dict[value][0]
except Exception as e:
print(e)
IIUC:
ans = [my_dict[int(x[-1])] for x in df1.where(df1.loc[:,['value' in x for x in df1.columns]]).dropna(axis=1)]
pd.DataFrame.from_dict({v: k for k, v in dict(ans).items()}, orient='index').T
输出:
category subcategory
0 a f