apyori apriori 破碎的项集输出
apyori apriori broken itemset output
我正在尝试 运行 与 apyori 模块的关联规则。
我的 "items" 是各种手术(行 = 患者病例),正如您在下面的数据框示例中所见。
Apyori 无法捕获正确的标签,而且它似乎正在按字母切碎标签。我过去从未见过这种行为。除非我遗漏了什么,否则我的数据集格式正确,适合 apyori 使用。任何时候发生的手术不超过 2 次。
这是我得到的示例:
RelationRecord(items=frozenset({'v', '_'}), support=0.10309278350515463, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'v', '_'}), confidence=0.10309278350515463, lift=1.0), OrderedStatistic(items_base=frozenset({'_'}), items_add=frozenset({'v'}), confidence=0.10638297872340426, lift=1.0319148936170213), OrderedStatistic(items_base=frozenset({'v'}), items_add=frozenset({'_'}), confidence=1.0, lift=1.0319148936170213)]) Support: 0.10309278350515463 Confidence: frozenset({'v', '_'}) Lift:
0.10309278350515463
frozenset坏了...
这是我的输入 dataframe.head():
sm-to-sm_bowel_anastom small_bowel_incision_nec sm_bowel_exteriorization \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
incisional_hernia_repair colonoscopy anal_anastomosis \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
c.a.t._scan_of_abdomen open_sigmoidectomy_nec small_bowel_suture_nec \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
lap_pt_ex_lrg_intest_nec ... abdperneal_res_rectm_nos \
0 0 ... 0
1 0 ... 0
2 0 ... 0
3 0 ... 0
4 0 ... 0
5 0 ... 0
6 0 ... 0
7 0 ... 0
8 0 ... 0
9 0 ... 0
ureteral_catheterization cv_cath_plcmt_w_guidance \
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
clos_large_bowel_biopsy lap_right_hemicolectomy continent_ileostomy \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 1
insert_endotracheal_tube mult_seg_sm_bowel_excis \
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
small-to-large_bowel_nec opn_lft_hemicolectmy_nec
0 1 1
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 1 0
7 0 0
8 0 0
9 0 0
[10 rows x 97 columns]
我运行规则是这样的:
from apyori import apriori as ap
rulez = ap(ohe_df, min_support = 0.1, min_length = 2,use_colnames=True)
我只同时进行了 2 次手术,所以我不希望组合 >2 项。
frozenset 发生了什么事??
谢谢
您需要将输入数据放在一个列表列表中,其中每个列表都是一对组合在一起的东西。我整理了一些数据:
# Replace 1's with the column name
df = df.replace(1, pd.Series(df.columns, df.columns))
# get a list of non-zero values per row into an array of lists
ops = df.apply(lambda x: [v for v in x.values if v!=0], axis=1).values
ops 变量现在看起来不错:
array([list(['small_bowel_incision_nec', 'colonoscopy']),
list(['sm_bowel_exteriorization', 'colonoscopy']),
list(['sm-to-sm_bowel_anastom', 'small_bowel_suture_nec']),
list(['small_bowel_incision_nec', 'colonoscopy']),
list(['anal_anastomosis', 'open_sigmoidectomy_nec']),
list(['colonoscopy', 'c.a.t._scan_of_abdomen']),
list(['sm-to-sm_bowel_anastom', 'open_sigmoidectomy_nec']),
list(['c.a.t._scan_of_abdomen', 'small_bowel_suture_nec']),
list(['incisional_hernia_repair', 'small_bowel_suture_nec']),
list(['small_bowel_incision_nec', 'colonoscopy'])], dtype=object)
# Run apriori, getting them as a list
rulez = list(ap(ops, min_support = 0.1, min_length = 2,use_colnames=True))
示例输出
[RelationRecord(items=frozenset({'anal_anastomosis'}), support=0.1, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'anal_anastomosis'}), confidence=0.1, lift=1.0)]),
RelationRecord(items=frozenset({'c.a.t._scan_of_abdomen'}), support=0.2, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'c.a.t._scan_of_abdomen'}), confidence=0.2, lift=1.0)]),
RelationRecord(items=frozenset({'colonoscopy'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'colonoscopy'}), confidence=0.5, lift=1.0)]),...]
我正在尝试 运行 与 apyori 模块的关联规则。 我的 "items" 是各种手术(行 = 患者病例),正如您在下面的数据框示例中所见。 Apyori 无法捕获正确的标签,而且它似乎正在按字母切碎标签。我过去从未见过这种行为。除非我遗漏了什么,否则我的数据集格式正确,适合 apyori 使用。任何时候发生的手术不超过 2 次。
这是我得到的示例:
RelationRecord(items=frozenset({'v', '_'}), support=0.10309278350515463, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'v', '_'}), confidence=0.10309278350515463, lift=1.0), OrderedStatistic(items_base=frozenset({'_'}), items_add=frozenset({'v'}), confidence=0.10638297872340426, lift=1.0319148936170213), OrderedStatistic(items_base=frozenset({'v'}), items_add=frozenset({'_'}), confidence=1.0, lift=1.0319148936170213)]) Support: 0.10309278350515463 Confidence: frozenset({'v', '_'}) Lift:
0.10309278350515463
frozenset坏了... 这是我的输入 dataframe.head():
sm-to-sm_bowel_anastom small_bowel_incision_nec sm_bowel_exteriorization \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
incisional_hernia_repair colonoscopy anal_anastomosis \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
c.a.t._scan_of_abdomen open_sigmoidectomy_nec small_bowel_suture_nec \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
lap_pt_ex_lrg_intest_nec ... abdperneal_res_rectm_nos \
0 0 ... 0
1 0 ... 0
2 0 ... 0
3 0 ... 0
4 0 ... 0
5 0 ... 0
6 0 ... 0
7 0 ... 0
8 0 ... 0
9 0 ... 0
ureteral_catheterization cv_cath_plcmt_w_guidance \
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
clos_large_bowel_biopsy lap_right_hemicolectomy continent_ileostomy \
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 1
insert_endotracheal_tube mult_seg_sm_bowel_excis \
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
small-to-large_bowel_nec opn_lft_hemicolectmy_nec
0 1 1
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 1 0
7 0 0
8 0 0
9 0 0
[10 rows x 97 columns]
我运行规则是这样的:
from apyori import apriori as ap
rulez = ap(ohe_df, min_support = 0.1, min_length = 2,use_colnames=True)
我只同时进行了 2 次手术,所以我不希望组合 >2 项。
frozenset 发生了什么事??
谢谢
您需要将输入数据放在一个列表列表中,其中每个列表都是一对组合在一起的东西。我整理了一些数据:
# Replace 1's with the column name
df = df.replace(1, pd.Series(df.columns, df.columns))
# get a list of non-zero values per row into an array of lists
ops = df.apply(lambda x: [v for v in x.values if v!=0], axis=1).values
ops 变量现在看起来不错:
array([list(['small_bowel_incision_nec', 'colonoscopy']),
list(['sm_bowel_exteriorization', 'colonoscopy']),
list(['sm-to-sm_bowel_anastom', 'small_bowel_suture_nec']),
list(['small_bowel_incision_nec', 'colonoscopy']),
list(['anal_anastomosis', 'open_sigmoidectomy_nec']),
list(['colonoscopy', 'c.a.t._scan_of_abdomen']),
list(['sm-to-sm_bowel_anastom', 'open_sigmoidectomy_nec']),
list(['c.a.t._scan_of_abdomen', 'small_bowel_suture_nec']),
list(['incisional_hernia_repair', 'small_bowel_suture_nec']),
list(['small_bowel_incision_nec', 'colonoscopy'])], dtype=object)
# Run apriori, getting them as a list
rulez = list(ap(ops, min_support = 0.1, min_length = 2,use_colnames=True))
示例输出
[RelationRecord(items=frozenset({'anal_anastomosis'}), support=0.1, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'anal_anastomosis'}), confidence=0.1, lift=1.0)]),
RelationRecord(items=frozenset({'c.a.t._scan_of_abdomen'}), support=0.2, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'c.a.t._scan_of_abdomen'}), confidence=0.2, lift=1.0)]),
RelationRecord(items=frozenset({'colonoscopy'}), support=0.5, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'colonoscopy'}), confidence=0.5, lift=1.0)]),...]