遗传算法/人工智能;基本上,我走在正确的轨道上吗?

Genetic Algorithm / AI; basically, am I on the right track?

我知道 Python 不是编写任何这种性质的软件的最佳主意。我的推理是在决策制定中将这种类型的算法用于 Raspberry Pi 3(仍然不确定将如何进行),以及我将使用的库和 API(Adafruit 电机 HAT,Google 服务、OpenCV、各种传感器等)都非常适合在 Python 中导入,更不用说我在这种环境中更适合 rPi 了。我已经诅咒它是面向对象的,例如 Java 或 C++ 对我来说更有意义,但我宁愿处理它的低效问题,并专注于 rPi 的更大集成图景。

我不会在这里解释代码,因为它在整个脚本的注释部分中有很好的记录。我的问题如上所述;基本上可以将其视为遗传算法吗?如果不是,它必须是一个基本的 AI 或遗传密码吗?我在解决此类问题方面是否走在正确的轨道上?我知道通常有加权变量和函数可以提升 "survival of the fittest",但我认为可以根据需要弹出。

我已经阅读了很多关于这个主题的论坛和文章。我不想复制别人的我几乎看不懂的代码并开始将其用作我的更大项目的基础;我想确切地知道它是如何工作的,这样我就不会对为什么一路上有些事情没有奏效感到困惑。所以,我只是试图理解它是如何工作的基本概念,并写下我是如何解释它的。请记住,为此我想留在 Python。我知道 rPi 有多种 C++、Java 等环境,但如前所述,我使用的大多数硬件组件只有 Python API 用于实现。如果我错了,请在算法层面进行解释,而不仅仅是代码块(同样,我真的很想了解这个过程)。另外,请不要挑剔代码约定,除非它与我的问题相关,每个人都有自己的风格,这只是目前的草图。在这里,感谢阅读!

# Created by X3r0, 7/3/2016
# Basic genetic algorithm utilizing a two dimensional array system.
# the 'DNA' is the larger array, and the 'gene' is a smaller array as an element
# of the DNA. There exists no weighted algorithms, or statistical tracking to
# make the program more efficient yet; it is straightforwardly random and solves
# its problem randomly. At this stage, only the base element is iterated over.

# Basic Idea:
# 1) User inputs constraints onto array
# 2) Gene population is created at random given user constraints
# 3) DNA is created with randomized genes ( will never randomize after )
#   a) Target DNA is created with loop control variable as data (basically just for some target structure)
# 4) CheckDNA() starts with base gene from DNA, and will recurse until gene matches the target gene
#   a) Randomly select two genes from DNA
#   b) Create a candidate gene by splicing both parent genes together
#   c) Check candidate gene against the target gene
#   d) If there exists a match in gene elements, a child gene is created and inserted into DNA
#   e) If the child gene in DNA is not equal to target gene, recurse until it is

import random

DNAsize = 32
geneSize = 5
geneDiversity = 9
geneSplit = 4
numRecursions = 0

DNA = []
targetDNA = []

def init():
    global DNAsize, geneSize, geneDiversity, geneSplit, DNA

    print("This is a very basic form of genetic software. Input variable constraints below. "
          "Good starting points are: DNA strand size (array size): 32, gene size (sub array size: 5, gene diversity (randomized 0 - x): 5"
          "gene split (where to split gene array for splicing): 2")

    DNAsize = int(input('Enter DNA strand size: '))
    geneSize = int(input('Enter gene size: '))
    geneDiversity = int(input('Enter gene diversity: '))
    geneSplit = int(input('Enter gene split: '))

    # initializes the gene population, and kicks off
    # checkDNA recursion
    initPop()
    checkDNA(DNA[0])

def initPop():

    # builds an array of smaller arrays
    # given DNAsize
    for x in range(DNAsize):
        buildDNA()
        # builds the goal array with a recurring
        # numerical pattern, in this case just the loop
        # control variable
        buildTargetDNA(x)

def buildDNA():
    newGene = []

    # builds a smaller array (gene) using a given geneSize
    # and randomized with vaules 0 - [given geneDiversity]
    for x in range(geneSize):
        newGene.append(random.randint(0,geneDiversity))
    # append the built array to the larger array
    DNA.append(newGene)

def buildTargetDNA(x):

    # builds the target array, iterating with x as a loop
    # control from the call in init()
    newGene = []
    for y in range(geneSize):
            newGene.append(x)
    targetDNA.append(newGene)

def checkDNA(childGene):

    global numRecursions
    numRecursions = numRecursions+1

    gene = DNA[0]
    targetGene = targetDNA[0]
    parentGeneA = DNA[random.randint(0,DNAsize-1)]          # randomly selects an array (gene) from larger array (DNA)
    parentGeneB = DNA[random.randint(0,DNAsize-1)]
    pos = random.randint(geneSplit-1,geneSplit+1)           # randomly selects a position to split gene for splicing
    candidateGene = parentGeneA[:pos] + parentGeneB[pos:]   # spliced gene given split from parentA and parentB

    print("DNA Splice Position: " + str(pos))
    print("Element A: " + str(parentGeneA))
    print("Element B: " + str(parentGeneB))
    print("Candidate Element: " + str(candidateGene))
    print("Target DNA: " + str(targetDNA))
    print("Old DNA:    " + str(DNA))

    # iterates over the candidate gene and compares each element to the target gene
    # if the candidate gene element hits a target gene element, the resulting child
    # gene is created
    for x in range(geneSize):
        #if candidateGene[x] != targetGene[x]:
            #print("false ")
        if candidateGene[x] == targetGene[x]:
            #print("true ")
            childGene.pop(x)
            childGene.insert(x, candidateGene[x])

    # if the child gene isn't quite equal to the target, and recursion hasn't reached
    # a max (apparently 900), the child gene is inserted into the DNA. Recursion occurs
    # until the child gene equals the target gene, or max recursuion depth is exceeded
    if childGene != targetGene and numRecursions < 900:
        DNA.pop(0)
        DNA.insert(0, childGene)
        print("New DNA:   " + str(DNA))
        print(numRecursions)
        checkDNA(childGene)

init()
print("Final DNA:  " + str(DNA))
print("Number of generations (recursions): " + str(numRecursions))

I'm working with evolutionary computation right now so I hope my answer will be helpful for you, personally, I work with java, mostly because is one of my main languages, and for the portability, because I tested in linux, windows and mac. In my case I work with permutation encoding, but if you are still learning how GA works, I strongly recommend binary encoding. This is what I called my InitialPopulation. I try to describe my program's workflow:

1-. Set my main variables

This are PopulationSize, IndividualSize, MutationRate, CrossoverRate. Also you need to create an objective function and decide the crossover method you use. For this example lets say that my PopulationSize is equals to 50, the IndividualSize is 4, MutationRate is 0.04%, CrossoverRate is 90% and the crossover method will be roulette wheel. My objective function only what to check if my Individuals are capable to represent the number 15 in binary, so the best individual must be 1111.

2-. Initialize my Population

For this I create 50 individuals (50 is given by my PopulationSize) with random genes.

3-. Loop starts

For each Individuals in Population you need to:

  • Evaluate fitness according to the objective function. If an Individual is represented by the next characters: 00100 this mean that his fitness is 1. As you can see this is a simple fitness function. You can create your own while you are learning, like fitness = 1/numberOfOnes. Also you need to assign the sum of all the fitness to a variable called populationFitness, this will be useful in the next step.
  • Select the best individuals. For this task there's a lot of methods you can use, but we will use the roulette wheel method as we say before. In this method, You assign a value to every individual inside your population. This value is given by the next formula: (fitness/populationFitness) * 100. So, if your population fitness is 10, and a certain individual fitness is 3, this mean that this individual has a 30% chance to be selected to make a crossover with another individual. Also, if another individual have a 4 in his fitness, his value will be 40%.
  • Apply crossover. Once you have the "best" individuals of your population, you need to create a new population. This new population is formed by others individuals of the previous population. For each individual you create a random number from 0 to 1. If this numbers is in the range of 0.9 (since our crossoverRate = 90%), this individual can reproduce, so you select another individual. Each new individual has this 2 parents who inherit his genes.例如: Lets say that parentA = 1001 and parentB = 0111. We need to create a new individual with this genes. There's a lot of methods to do this, uniform crossover, single point crossover, two point crossover, etc. We will use the single point crossover. In this method we choose a random point between the first gene and the last gene. Then, we create a new individual according to the first genes of parentA and the last genes of parentB. In a visual form:

parentA = 1001 parentB = 0111 crossoverPoint = 2 newIndividual = 1011

As you can see, the new individual share his parents genes.

  • Once you have a new population with new individuals, you apply the mutation. In this case, for each individual in the new population generate a random number between 0 and 1. If this number is in the range of 0.04 (since our mutationRate = 0.04), you apply the mutation in a random gene. In binary encoding the mutation is just change the 1's for 0's or viceversa. In a visual form:

individual = 1011 randomPoint = 3 mutatedIndividual = 1010

  • Get the best individual

  • If this individual has reached the solution stop. Else, repeat the loop

  • End

As you can see, my english is not very good, but I hope you understand the basic idea of a genetic algorithm. If you are truly interested in learning this, you can check the following links:

http://www.obitko.com/tutorials/genetic-algorithms/ This link explains in a clearer way the basics of a genetic algorithm

http://natureofcode.com/book/chapter-9-the-evolution-of-code/ This book also explain what a GA is, but also provide some code in Processing, basically java. But I think you can understand.

Also I would recommend the following books:

An Introduction to Genetic Algorithms - Melanie Mitchell

Evolutionary algorithms in theory and practice - Thomas Bäck

Introduction to genetic algorithms - S. N. Sivanandam

If you have no money, you can easily find all this books in PDF. Also, you can always search for articles in scholar.google.com Almost all are free to download.

为了给 Alberto 的出色回答补充一点,您需要在解决方案的发展过程中注意两个问题。

第一个是过拟合。这基本上意味着您的解决方案足够复杂 "learn" 所有样本,但它不适用于训练集之外。为避免这种情况,您需要确保训练集中的 "amount" 信息量远远大于解决方案中可以容纳的信息量。

第二个问题是Plateaus。在某些情况下,您会得出某些平庸的解决方案,但这些解决方案仍然足以 "outcompete" 任何新兴解决方案,因此您的进度停滞不前(一种看待这种情况的方法是,如果您看到您的健康状况 "stuck"一定的,小于最佳数量)。解决这个问题的一种方法是灭绝:你可以跟踪你的最佳解决方案的改进率,如果在过去的 N 代中改进为 0,你就可以对你的人口进行 Nuke。 (也就是说,删除你的种群和最优个体列表并重新开始)。随机性将使解决方案最终超越高原。

另一件要记住的事情是,默认的随机 class 在随机性方面非常糟糕。通过简单地使用诸如 Mesernne Twister 随机生成器或硬件熵生成器之类的东西,我的解决方案得到了显着改善。

希望对您有所帮助。祝你好运。