pandas 的缓慢迭代

Question

我正在使用以下代码生成所有包含 6 个或更少元素的和弦，每个元素有 12 个可能的音符。所以生成的和弦数量应该是：(12 * 12 * 12 * 12 * 12 * 12) + (12 * 12 * 12 * 12 * 12) + (12 * 12 * 12 * 12) + (12 * 12 * 12) + (12 * 12) + (12) = 3.257.436 。对吗？

我相信在我的笔记本上需要 30 个小时才能完成，如果处理速度随时间变化...我在 google 云上制作了一个免费的虚拟机（8 vCpus，8gb de ram）并执行脚本，但已经快 4 个小时了。

所以我在想是否有办法加快这个过程。我无法使用具有 16 个 vCpus 的虚拟机。而且我不知道我可以做些什么来改进我的脚本。

def calculando_todos_acordes_e_diferencas():
    import pandas as pd
    import itertools                          
    anagrama=[]
    for i in range(1,13):
        anagrama.append(i)

    tst=[[[0],[0]]]
    df=pd.DataFrame(tst, columns=["notas","diferencas"])
    count_name=-1

    for qntd_notas in range(7):
        for i in itertools.product((anagrama), repeat=qntd_notas) :
            diferencas=[]
            count=-1
            for primeiro in i :
                count=count+1
        
        
                if i.index(primeiro) != len(i)-1 :
                    for segundo in i[count+1:]:
                        diferenca= segundo - primeiro
                        if diferenca < 0 :
                            diferenca=diferenca* -1
                        diferencas.append(diferenca)

          #  if len(df.index) == 100000 :
           #     count_name=count_name+1
            #    df=df.append({"notas":list(i),"diferencas":diferencas},ignore_index=True)
             #   df.to_csv("acordes e diferencas pt %s.csv" %(count_name), index=False)
              #  df=pd.DataFrame(tst, columns=["notas","diferencas"])

            df=df.append({"notas":list(i),"diferencas":diferencas},ignore_index=True)
    
    df.to_csv("acordes e diferencas TOTAL2.csv", index=False)
            #else:
            
     
calculando_todos_acordes_e_diferencas()

Answer 1

如果我没理解错的话，你想要的是所有音符的组合，用于 1-6 人的小组规模。这不会产生 320 万种可能性，而只会产生 2509 种。

您要找的是动力装置。这实际上可以通过 itertools 非常快速地实现，并且您在 documentation 中有一个配方，我在这里根据您的需要进行了调整：

from itertools import chain, combinations

def powerset(iterable, maximum=6):
    s = list(iterable)
    if not maximum:
        maximum=len(s)
    return chain.from_iterable(combinations(s, r) for r in range(1, maximum+1))

然后使用：

chords = list(powerset(range(12), maximum=6))

并且 voilà... 运行时间为 200µs，而不是 30 小时 ;)

如果你真的想要排列，请将上面代码中的 combinations 替换为 permutations。运行时间约为 100µs。

pandas 的缓慢迭代

Slow iteration with pandas

hardware

iterator

python-3.x

pandas