使用 reduce 计算节点的 gini 指数
using reduce to calculate gini index for a node
我正在尝试应用公式:
我不清楚为什么这不起作用:
def gini_node(node):
count = sum(node)
gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)
print(count, gini)
print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
return gini
正在评估 gini([[175, 330], [220, 120]])
打印:
505 175.57298304087834
0.8799137339476522 0.5729830408783452
340 220.87543252595157
0.5813148788927336 0.8754325259515571
请注意,在给定示例输入的情况下,第二个打印语句打印了我想要求和的数字。 return 值(第一个 print 语句的第二个值)应该是 0 到 1 之间的数字。
我的 reduce 有什么问题?
我想写的完整功能是:
import functools
def gini_node(node):
count = sum(node)
gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)
print(count, gini)
print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
return gini
def gini (groups):
counts = [ sum(node) for node in groups ]
count = sum(counts)
proportions = [ n/count for n in counts ]
return sum([ gini_node(node) * proportion for node, proportion in zip(groups, proportions)])
# test
print(gini([[175, 330], [220, 120]]))
reduce 的工作方式是它从它的容器中获取 2 个参数(只有 2 个)
https://docs.python.org/3/library/functools.html#functools.reduce
并执行给它的操作,然后继续迭代相同的操作使用 2 个参数对列表进行操作。
gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)
对于第一个节点 (175, 330)
,此 lambda 将在 p
中采用 175
,在 c
中采用 330
,在 return 中采用 [=19] =] 相反,我们想要
gini = functools.reduce(lambda p,c: (1 - (p/count)**2) + (1 - (c/count)**2), node)
我添加了一些打印语句,让我们看看它们的输出。
import functools
def gini_node(node):
count = sum(node)
gini = functools.reduce(lambda p,c: (1 - (p/count)**2) + (1 - (c/count)**2), node)
print(count, gini)
print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
return gini
def gini (groups):
counts = [ sum(node) for node in groups ]
count = sum(counts)
proportions = [ n/count for n in counts ]
print(count, counts, proportions) #This
gini_indexes = [ gini_node(node) * proportion for node, proportion in zip(groups, proportions)]
print(gini_indexes) #And this
return sum(gini_indexes)
# test
print(gini([[175, 330], [220, 120]]))
rahul@RNA-HP:~$ python3 so.py
845 [505, 340] [0.5976331360946746, 0.40236686390532544]
505 1.4528967748259973 #Second number here is addition of 2 numbers below
0.8799137339476522 0.5729830408783452
340 1.4567474048442905 #Same for this
0.5813148788927336 0.8754325259515571
#The first number of this list is first 1.45289677.... * 0.597633...
#Basically the addition and then multiplication by it's proportion.
[0.868299255961099, 0.5861468847894187]
#What you are returning to final print statement is the addition of gini co-effs of each node i.e the sum of the list above
1.4544461407505178
如果参数超过 2 个,更简单的解决方法(*)
gini = sum([(1 - (p/count)**2) for p in node])
与上面定义的 reduce()
函数相同。
我正在尝试应用公式:
我不清楚为什么这不起作用:
def gini_node(node):
count = sum(node)
gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)
print(count, gini)
print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
return gini
正在评估 gini([[175, 330], [220, 120]])
打印:
505 175.57298304087834
0.8799137339476522 0.5729830408783452
340 220.87543252595157
0.5813148788927336 0.8754325259515571
请注意,在给定示例输入的情况下,第二个打印语句打印了我想要求和的数字。 return 值(第一个 print 语句的第二个值)应该是 0 到 1 之间的数字。
我的 reduce 有什么问题?
我想写的完整功能是:
import functools
def gini_node(node):
count = sum(node)
gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)
print(count, gini)
print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
return gini
def gini (groups):
counts = [ sum(node) for node in groups ]
count = sum(counts)
proportions = [ n/count for n in counts ]
return sum([ gini_node(node) * proportion for node, proportion in zip(groups, proportions)])
# test
print(gini([[175, 330], [220, 120]]))
reduce 的工作方式是它从它的容器中获取 2 个参数(只有 2 个)
https://docs.python.org/3/library/functools.html#functools.reduce
并执行给它的操作,然后继续迭代相同的操作使用 2 个参数对列表进行操作。
gini = functools.reduce(lambda p,c: p + (1 - (c/count)**2), node)
对于第一个节点 (175, 330)
,此 lambda 将在 p
中采用 175
,在 c
中采用 330
,在 return 中采用 [=19] =] 相反,我们想要
gini = functools.reduce(lambda p,c: (1 - (p/count)**2) + (1 - (c/count)**2), node)
我添加了一些打印语句,让我们看看它们的输出。
import functools
def gini_node(node):
count = sum(node)
gini = functools.reduce(lambda p,c: (1 - (p/count)**2) + (1 - (c/count)**2), node)
print(count, gini)
print(1 - (node[0]/count)**2, 1 - (node[1]/count)**2)
return gini
def gini (groups):
counts = [ sum(node) for node in groups ]
count = sum(counts)
proportions = [ n/count for n in counts ]
print(count, counts, proportions) #This
gini_indexes = [ gini_node(node) * proportion for node, proportion in zip(groups, proportions)]
print(gini_indexes) #And this
return sum(gini_indexes)
# test
print(gini([[175, 330], [220, 120]]))
rahul@RNA-HP:~$ python3 so.py
845 [505, 340] [0.5976331360946746, 0.40236686390532544]
505 1.4528967748259973 #Second number here is addition of 2 numbers below
0.8799137339476522 0.5729830408783452
340 1.4567474048442905 #Same for this
0.5813148788927336 0.8754325259515571
#The first number of this list is first 1.45289677.... * 0.597633...
#Basically the addition and then multiplication by it's proportion.
[0.868299255961099, 0.5861468847894187]
#What you are returning to final print statement is the addition of gini co-effs of each node i.e the sum of the list above
1.4544461407505178
如果参数超过 2 个,更简单的解决方法(*)
gini = sum([(1 - (p/count)**2) for p in node])
与上面定义的 reduce()
函数相同。