有什么更好的方法来设计句子生成器的内置统计功能?

What better way to design the built-in stats function of a sentence generator?

Context : 随机句子生成器

  1. the function generateSentence() generates random sentences returned as strings (works fine)

  2. the function calculateStats() outputs the number of unique strings the above function can theoretically generate (works fine also in this mockup, so be sure to read the disclaimer, I don't want to waste your time)

  3. the function generateStructure() and the words lists in Dictionnary.lists are constantly growing as time passes

主要生成器功能的快速模型:

function generateSentence() {
  var words = [];
  var structure = generateStructure();

  structure.forEach(function(element) {
    words.push(Dictionnary.getElement(element));
  });

  var fullText = words.join(" ");
  fullText = fullText.substring(0, 1).toUpperCase() + fullText.substring(1);
  fullText += ".";
  return fullText;
}

var Dictionnary = {
  getElement: function(listCode) {
    return randomPick(Dictionnary.lists[listCode]);
  },
  lists: {
    _location: ["here,", "at my cousin's,", "in Antarctica,"],
    _subject: ["some guy", "the teacher", "Godzilla"],
    _vTransitive: ["is eating", "is scolding", "is seeing"],
    _vIntransitive: ["is working", "is sitting", "is yawning"],
    _adverb: ["slowly", "very carefully", "with a passion"],
    _object: ["this chair", "an egg", "the statue of Liberty"],
  }
}

// returns an array of strings symbolizing types of sentence elements
// example : ["_location", "_subject", "_vIntransitive"]
function generateStructure() {
  var str = [];

  if (dice(6) > 5) {// the structure can begin with a location or not
    str.push("_location");
  }

  str.push("_subject");// the subject is mandatory

  // verb can be of either types
  var verbType = randomPick(["_vTransitive", "_vIntransitive"]);
  str.push(verbType);

  if (dice(6) > 5) {// adverb is optional
    str.push("_adverb");
  }

  // the structure needs an object if the verb is transitive
  if (verbType == "_vTransitive") {
    str.push("_object");
  }

  return str;
}

// off-topic warning! don't mind the implementation here,
// just know it's a random pick in the array
function randomPick(sourceArray) {
  return sourceArray[dice(sourceArray.length) - 1];
}

// Same as above, not the point, just know it's a die roll (random integer from 1 to max)
function dice(max) {
  if (max < 1) { return 0; }
  return Math.round((Math.random() * max) + .5);
}

在某些时候,我想知道它可以输出多少个不同的唯一字符串,我写了类似的东西(同样,非常简单):

function calculateStats() {// the "broken leg" function I'm trying to improve/replace
  var total = 0;
  // lines below : +1 to account for 'no location' or 'no adverb'
  var nbOfLocations = Dictionnary.lists._location.length + 1;
  var nbOfAdverbs = Dictionnary.lists._adverb.length + 1;

  var nbOfTransitiveSentences = 
    nbOfLocations *
    Dictionnary.lists._vTransitive.length *
    nbOfAdverbs *
    Dictionnary.lists._object.length;
  var nbOfIntransitiveSentences =
    nbOfLocations *
    Dictionnary.lists._vIntransitive.length *
    nbOfAdverbs;

  total = nbOfTransitiveSentences + nbOfIntransitiveSentences;
  return total;
}

(旁注:不要担心名称空间污染、输入参数的类型检查或诸如此类的事情,为了示例清晰起见,我们假定它们处于气泡中。)

重要免责声明:这与修复我发布的代码无关。这是一个模型,它按原样工作。真正的问题是 “随着未来可能结构的复杂性增加,以及列表的大小和多样性,计算这些类型的随机结构的统计数据的更好策略是什么,而不是我笨拙的 calculateStats() 函数,难以维护,可能处理天文数字*,并且容易出错?"

* 在真实工具中,此时有351 120个独特的结构,对于句子...总数已经超过(10的80次方)一段时间了。

由于你的句子结构变化很大(在这个小例子中它确实发生了变化我无法想象它在实际代码中有多大变化),我会做类似的事情:

首先,我需要以某种方式保存给定 Dictionary 存在的所有可能的句子结构...也许我会创建一个 Language 对象,其中包含字典作为 属性,我可以添加可能的句子结构(这部分可能会被优化并找到一种更程序化的方式来生成所有可能的句子结构,比如规则引擎)。 句子结构是什么意思?好吧,按照你的例子,我将调用句子结构到下一个:

[ 'location', 'transitive-verb', 'adverb', 'object' ] < - Transitive sentence
[ 'location', 'instransitive-verb', 'adverb' ] <- Intransitive sentence

您可能会找到生成此结构的方法...或对其进行硬编码。

但是... 为什么我认为这可以改进您计算统计信息的方式?因为您通过使用 map/reduce 操作最小化了每个句子数量的硬编码并使其更具可扩展性。

所以... 怎么做?

想象一下,我们可以在全局范围内或通过对象或在字典本身中访问我们的结构:

// Somewhere in the code
const structures = [
  [ 'location', 'transitive-verb', 'adverb', 'object' ],
  [ 'location', 'instransitive-verb', 'adverb' ] 
];
...
// In this example I just passed it as an argument
function calculateStats(structures) {
  const numberOfCombinations = structures.reduce((total, structure) => {
      // We should calculate the number of combinations a structure has
      const numberOfSentences = structure.reduce((acc, wordType) => {
          // For each word type, we access the list and get the lenght (I am not doing safety checks for any wordType)
          return acc + Dictionary.lists[wordType].length
      }, 0);//Initial accumulator

      return total + numberOfSentences;
  }, 0); // Initial accumulator
  return numberOfCombinations;
}

因此,我们将利用遍历不同结构的能力,而不是对每个可能的组合进行硬编码,因此您基本上只需要添加结构,您的 calculateStats 函数不应增长。

如果您需要进行更复杂的计算,则需要更改 reducer 中使用的函数。

我对语法或句法分析知之甚少,所以如果你有更多的知识,可能会找到使它更简单的方法或"smarter calculations"。

我自由地用 ES6-ish 风格编写它,如果 reduce 对你来说是一种奇怪的动物,you can read more here or use the lodash / ramda / 随便 ^^