如何用马尔可夫链构建文本?

How to build text with Markov Chain?

目前我已经从文本文件中成功解析了这个结构:

Chain={'damn':[{'you':0.2}, {'it':0.4}, {'fool!':0.4}]}

现在我正在努力根据状态(链中的当前键)构建输出文本。为什么?因为我的单词概率是浮点数格式,我不知道如何将它们转换为百分比。我的第一个想法是这样的:

def buildText(self, Chain):
    self.state = outText = choice(list(Chain.keys()))
    for i in range(self.iterations):
        step = uniform(0,1)
        self.getState(Chain)
        outText = outText + ' ' + self.state + ' '
    return outText

def getState(self, Chain):
    step = uniform(0,1.1)
    print('Current step is: ', step, ' And state is: ', self.state)
    for d in Chain[self.state]:
        for key in d:
            if d[key] < step:
                print('New state--', key)
                self.state = key
                return
            else:
                continue

但是这个函数会生成重复文本,因为正如我提到的,我不知道如何根据我的概率格式正确编写随机函数代码。任何建议将不胜感激! Github Link

处的完整代码

Python >= 3.6

random 现在包括 random.choices ,它将采用权重

import random, collections

# convert Chain to a sensible format instead of a list of single element dictionaries
accumulator = {}
for dic in chain['damn']:
    accumulator.update(dic)
values = list(d.keys())
weights = list(d.values())

# Get one value
print(random.choices(values, weights=weights)[0])

# Test distribution
print(collections.Counter(random.choices(values, weights=weights)[0] for i in range(100)))

# prints Counter({'fool!': 41, 'it': 39, 'you': 20})    

Python <3.6

See the recipe in the Python docs for creating cumulative distributions(这就是人们在评论中描述的)

如果这些值形成其键出现的相对概率,这里的技巧。您必须将 a random.uniform(0, tot) 作为步长值,其中 tot 是概率之和(此处为 1)。然后你将它与第一概率进行比较。如果它更小,你选择这个,否则你从步长值中减去概率并用下一个概率迭代。如果你想超级安全,你可以把最后一种可能性设为 catch all 以免受舍入错误的影响(在 SO 上搜索 broken floating point arithmetics...)

代码可以是:

def buildText(self, Chain):
    self.state = outText = choice(list(Chain.keys()))
    for i in range(self.iterations):
        self.getState(Chain)
        outText = outText + ' ' + self.state + ' '
    return outText
def getState(self, Chain):
    states = Chain[self.state]
    if len(states) == 1:                # beware of corner case: one single option
        k = list(states[0].keys())[0]
        self.state = k
        return
    tot = sum(list(d.values())[0] for d in states)
    step = uniform(0, tot)
    # print(step, '/', tot)                  # uncomment to control what happens
    for s in states[:-1]:                    # test up to last item in list
        k, v = list(s.items())[0]
        if step <= v:                        # less we choose this one
            self.state = k
            return
        step -= v                            # else decrease the step
    self.state = list(states[-1].keys())[0]  # last option is a "catch all"