PHP 中的可扩展 Trie 实现
Scalable Trie implementation in PHP
紧随其后 this tutorial I met the Trie data structure. Since recently I've been programming in PHP I tried to solve the lecture's problem。我能够获得正确的答案,但仅限于较小的输入(输入 #10 是一个 2.82 MB 的文件)。显然,我的算法扩展性不佳。它还超过了 PHP.
的默认 128 MB 内存限制
我的算法
Trie中存储了一个根节点。每个节点都有一个 "children" 成员。我使用标准 PHP 数组来存储子项。子键代表一个字符(目前我是为每个字符创建一个新节点,a-z小写,映射到0-25),子值是对另一个节点的引用。
每个节点都有的"weight"成员是因为problem。
我想优化我的代码,(或者甚至使用不同的方法从 stratch 重写它)以便它可以通过大输入的测试。
如果可能的话,我对使此数据结构在 PHP 中使用大输入的解决方案感兴趣。
我的代码
TrieNode class 存储树层次结构。
class TrieNode {
// weight is needed for the given problem
public $weight;
/* TrieNode children,
* e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
* where 0 stands for 'a', 1 for 'c'
* and TrieNode objects are references to other TrieNodes.
*/
private $children;
function __construct($weight, $children) {
$this->weight = $weight;
$this->children = $children;
}
/** map lower case english letters to 0-25 */
static function getAsciiValue($char) {
return intval(ord($char)) - intval(ord('a'));
}
function addChild($char, $node) {
if (!isset($this->children)) {
$this->children = [];
}
$this->children[self::getAsciiValue($char)] = $node;
}
function isChild($char) {
return isset($this->children[self::getAsciiValue($char)]);
}
function getChild($char) {
return $this->children[self::getAsciiValue($char)];
}
function isLeaf() {
return empty($this->children);
}
}
Trie class存放根TrieNode。它可以插入和查询节点。
class Trie {
/* root TrieNode stores the first characters */
private $root;
function __construct() {
$this->root = new TrieNode(-1, []);
}
function insert($string, $weight) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if(!$currentNode->isChild($char)) {
$n = new TrieNode($weight, null);
$currentNode->addChild($char, $n);
}
$currentNode->weight = max($weight, $currentNode->weight);
$currentNode = $currentNode->getChild($char);
}
}
function getNode($string) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if ($currentNode->isLeaf() || !$currentNode->isChild($char)) {
return null;
}
$currentNode = $currentNode->getChild($char);
}
return $currentNode;
}
function getWeight($string) {
$node = $this->getNode($string);
return is_null($node) ? -1 : $node->weight;
}
}
测试代码。解析输入并调用 Trie 对象。
//MAIN / TEST
/*
In case the problem page is down:
e.g.
INPUT
2 1
hackerearth 10
hackerrank 9
hacker
OUTPUT
10
where 2 is the number of inserts, 1 is the number of queries
"string number" is the string to insert and its "weight"
"hacker" is the string to query
10 is maximum the weight of the queried string (hacker -> 10)
*/
$trie = new Trie();
$handle = fopen('test.txt', 'r');
//$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
list($s, $weight) = fscanf($handle, "%s %d");
$trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
$query = trim(strval(fgets($handle)));
echo $trie->getWeight($query) . PHP_EOL;
}
fclose($handle);
失败
经过一些调整和修改后,我已经能够让这个东西在所有测试用例中工作,除了一个测试用例超时,
这是完整的代码,它将 运行 成功用于除测试用例 10 之外的所有测试用例。
class TrieNode {
// weight is needed for the given problem
public $weight;
/* TrieNode children,
* e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
* where 0 stands for 'a', 1 for 'c'
* and TrieNode objects are references to other TrieNodes.
*/
private $children;
function __construct($weight, $children) {
$this->weight = $weight;
$this->children = $children;
}
/** map lower case english letters to 0-25 */
static function getAsciiValue($char) {
return intval(ord($char)) - intval(ord('a'));
}
function addChild($char, $node) {
if (!isset($this->children)) {
$this->children = [];
}
$this->children[self::getAsciiValue($char)] = $node;
}
function isChild($char) {
return isset($this->children[self::getAsciiValue($char)]);
}
function getChild($char) {
return $this->children[self::getAsciiValue($char)];
}
function isLeaf() {
return empty($this->children);
}
}
class Trie {
/* root TrieNode stores the first characters */
private $root;
function __construct() {
$this->root = new TrieNode(-1, []);
}
function insert($string, $weight) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if(!$currentNode->isChild($char)) {
$n = new TrieNode($weight, null);
$currentNode->addChild($char, $n);
}
$currentNode->weight = max($weight, $currentNode->weight);
$currentNode = $currentNode->getChild($char);
}
}
function getNode($string) {
$currentNode = $this->root;
if (empty($currentNode) || !isset($currentNode)) {
return null;
}
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if (empty($currentNode) || $currentNode->isLeaf() || !$currentNode->isChild($char)) {
return null;
}
$currentNode = $currentNode->getChild($char);
if (empty($currentNode)) {
return null;
}
}
return $currentNode;
}
function getWeight($string) {
$node = $this->getNode($string);
return is_null($node) ? -1 : $node->weight;
}
}
$trie = new Trie();
//$handle = fopen('test.txt', 'r');
$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
list($s, $weight) = fscanf($handle, "%s %d");
$trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
$query = trim(strval(fgets($handle)));
echo $trie->getWeight($query) . PHP_EOL;
}
fclose($handle);
我将尝试添加更多检查,以便减少该程序的计算周期。
下面是经过优化的代码-
删除了所有不必要的条件检查,例如
- 无需检查节点是否为叶节点,因为如果节点没有指定字符的子节点,那么它是否为叶节点并不重要。
- 无需每次添加子节点都检查{children}是否初始化。删除了在构造函数本身中将 {children} 初始化为空数组的检查。
已将函数移除为 {getAsciiValue},而不是使用简单的关联数组作为。此外,将 {char} 更改为 ascii 值已从 TrieNode 移至 Trie class,因此我们不需要多次转换它
经过这些优化后,我提出了最小的解决方案,但这也无法通过测试#10。在阅读 PHP 中的数组后,我了解到 PHP 并不像其他编译语言那样实现数组,相反 PHP 中的任何数组都只是一个有序的哈希映射,因为这个数组确实不支持恒定时间操作。
也使用 SplFixedArray 但没有帮助,因为它是一个对象并且具有实例化成本。如果尝试使用大型数组来存储整个 Trie,它可能会有所帮助。您可以尝试实现一个解决方案,使用 SplFixedArray 存储整个 Trie,并检查是否可以通过测试 #10。
<?php
/*
* Read input from stdin and provide input before running code
fscanf(STDIN, "%s\n", $name);
echo "Hi, ".$name;
*/
class TrieNode {
// weight is needed for the given problem
public $weight;
/* TrieNode children,
* e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
* where 0 stands for 'a', 2 for 'c'
* and TrieNode objects are references to other TrieNodes.
*/
private $children;
function __construct($weight) {
$this->weight = $weight;
$this->children = [];
}
function addChild($char, $node) {
$this->children[$char] = $node;
}
function isChild($char) {
return isset($this->children[$char]);
}
function getChild($char) {
return $this->children[$char];
}
}
class Trie {
/* root TrieNode stores the first characters */
private $root;
function __construct() {
$this->root = new TrieNode(-1);
}
static $asciiValues = array(
"a" => 0,
"b" => 1,
"c" => 2,
"d" => 3,
"e" => 4,
"f" => 5,
"g" => 6,
"h" => 7,
"i" => 8,
"j" => 9,
"k" => 10,
"l" => 11,
"m" => 12,
"n" => 13,
"o" => 14,
"p" => 15,
"q" => 16,
"r" => 17,
"s" => 18,
"t" => 19,
"u" => 20,
"v" => 21,
"w" => 22,
"x" => 23,
"y" => 24,
"z" => 25
);
function insert($string, $weight) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = self::$asciiValues[$string[$i]];
$currentNode->weight = max($weight, $currentNode->weight);
if($currentNode->isChild($char)) {
$childNode = $currentNode->getChild($char);
} else {
$childNode = new TrieNode($weight);
$currentNode->addChild($char, $childNode);
}
$currentNode = $childNode;
}
}
function getNodeWeight($string) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = self::$asciiValues[$string[$i]];
if (!$currentNode->isChild($char)) {
return -1;
}
$currentNode = $currentNode->getChild($char);
}
return $currentNode->weight;
}
}
$trie = new Trie();
//$handle = fopen('test.txt', 'r');
$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
list($s, $weight) = fscanf($handle, "%s %d");
$trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
//$query = trim(strval(fgets($handle)));
$query = trim(strval(fgets($handle)));
echo $trie->getNodeWeight($query) . PHP_EOL;
}
fclose($handle);
?>
紧随其后 this tutorial I met the Trie data structure. Since recently I've been programming in PHP I tried to solve the lecture's problem。我能够获得正确的答案,但仅限于较小的输入(输入 #10 是一个 2.82 MB 的文件)。显然,我的算法扩展性不佳。它还超过了 PHP.
的默认 128 MB 内存限制我的算法
Trie中存储了一个根节点。每个节点都有一个 "children" 成员。我使用标准 PHP 数组来存储子项。子键代表一个字符(目前我是为每个字符创建一个新节点,a-z小写,映射到0-25),子值是对另一个节点的引用。
每个节点都有的"weight"成员是因为problem。 我想优化我的代码,(或者甚至使用不同的方法从 stratch 重写它)以便它可以通过大输入的测试。
如果可能的话,我对使此数据结构在 PHP 中使用大输入的解决方案感兴趣。
我的代码
TrieNode class 存储树层次结构。
class TrieNode {
// weight is needed for the given problem
public $weight;
/* TrieNode children,
* e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
* where 0 stands for 'a', 1 for 'c'
* and TrieNode objects are references to other TrieNodes.
*/
private $children;
function __construct($weight, $children) {
$this->weight = $weight;
$this->children = $children;
}
/** map lower case english letters to 0-25 */
static function getAsciiValue($char) {
return intval(ord($char)) - intval(ord('a'));
}
function addChild($char, $node) {
if (!isset($this->children)) {
$this->children = [];
}
$this->children[self::getAsciiValue($char)] = $node;
}
function isChild($char) {
return isset($this->children[self::getAsciiValue($char)]);
}
function getChild($char) {
return $this->children[self::getAsciiValue($char)];
}
function isLeaf() {
return empty($this->children);
}
}
Trie class存放根TrieNode。它可以插入和查询节点。
class Trie {
/* root TrieNode stores the first characters */
private $root;
function __construct() {
$this->root = new TrieNode(-1, []);
}
function insert($string, $weight) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if(!$currentNode->isChild($char)) {
$n = new TrieNode($weight, null);
$currentNode->addChild($char, $n);
}
$currentNode->weight = max($weight, $currentNode->weight);
$currentNode = $currentNode->getChild($char);
}
}
function getNode($string) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if ($currentNode->isLeaf() || !$currentNode->isChild($char)) {
return null;
}
$currentNode = $currentNode->getChild($char);
}
return $currentNode;
}
function getWeight($string) {
$node = $this->getNode($string);
return is_null($node) ? -1 : $node->weight;
}
}
测试代码。解析输入并调用 Trie 对象。
//MAIN / TEST
/*
In case the problem page is down:
e.g.
INPUT
2 1
hackerearth 10
hackerrank 9
hacker
OUTPUT
10
where 2 is the number of inserts, 1 is the number of queries
"string number" is the string to insert and its "weight"
"hacker" is the string to query
10 is maximum the weight of the queried string (hacker -> 10)
*/
$trie = new Trie();
$handle = fopen('test.txt', 'r');
//$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
list($s, $weight) = fscanf($handle, "%s %d");
$trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
$query = trim(strval(fgets($handle)));
echo $trie->getWeight($query) . PHP_EOL;
}
fclose($handle);
失败
经过一些调整和修改后,我已经能够让这个东西在所有测试用例中工作,除了一个测试用例超时,
这是完整的代码,它将 运行 成功用于除测试用例 10 之外的所有测试用例。
class TrieNode {
// weight is needed for the given problem
public $weight;
/* TrieNode children,
* e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
* where 0 stands for 'a', 1 for 'c'
* and TrieNode objects are references to other TrieNodes.
*/
private $children;
function __construct($weight, $children) {
$this->weight = $weight;
$this->children = $children;
}
/** map lower case english letters to 0-25 */
static function getAsciiValue($char) {
return intval(ord($char)) - intval(ord('a'));
}
function addChild($char, $node) {
if (!isset($this->children)) {
$this->children = [];
}
$this->children[self::getAsciiValue($char)] = $node;
}
function isChild($char) {
return isset($this->children[self::getAsciiValue($char)]);
}
function getChild($char) {
return $this->children[self::getAsciiValue($char)];
}
function isLeaf() {
return empty($this->children);
}
}
class Trie {
/* root TrieNode stores the first characters */
private $root;
function __construct() {
$this->root = new TrieNode(-1, []);
}
function insert($string, $weight) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if(!$currentNode->isChild($char)) {
$n = new TrieNode($weight, null);
$currentNode->addChild($char, $n);
}
$currentNode->weight = max($weight, $currentNode->weight);
$currentNode = $currentNode->getChild($char);
}
}
function getNode($string) {
$currentNode = $this->root;
if (empty($currentNode) || !isset($currentNode)) {
return null;
}
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = $string[$i];
if (empty($currentNode) || $currentNode->isLeaf() || !$currentNode->isChild($char)) {
return null;
}
$currentNode = $currentNode->getChild($char);
if (empty($currentNode)) {
return null;
}
}
return $currentNode;
}
function getWeight($string) {
$node = $this->getNode($string);
return is_null($node) ? -1 : $node->weight;
}
}
$trie = new Trie();
//$handle = fopen('test.txt', 'r');
$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
list($s, $weight) = fscanf($handle, "%s %d");
$trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
$query = trim(strval(fgets($handle)));
echo $trie->getWeight($query) . PHP_EOL;
}
fclose($handle);
我将尝试添加更多检查,以便减少该程序的计算周期。
下面是经过优化的代码-
删除了所有不必要的条件检查,例如
- 无需检查节点是否为叶节点,因为如果节点没有指定字符的子节点,那么它是否为叶节点并不重要。
- 无需每次添加子节点都检查{children}是否初始化。删除了在构造函数本身中将 {children} 初始化为空数组的检查。
已将函数移除为 {getAsciiValue},而不是使用简单的关联数组作为。此外,将 {char} 更改为 ascii 值已从 TrieNode 移至 Trie class,因此我们不需要多次转换它
经过这些优化后,我提出了最小的解决方案,但这也无法通过测试#10。在阅读 PHP 中的数组后,我了解到 PHP 并不像其他编译语言那样实现数组,相反 PHP 中的任何数组都只是一个有序的哈希映射,因为这个数组确实不支持恒定时间操作。
也使用 SplFixedArray 但没有帮助,因为它是一个对象并且具有实例化成本。如果尝试使用大型数组来存储整个 Trie,它可能会有所帮助。您可以尝试实现一个解决方案,使用 SplFixedArray 存储整个 Trie,并检查是否可以通过测试 #10。
<?php
/*
* Read input from stdin and provide input before running code
fscanf(STDIN, "%s\n", $name);
echo "Hi, ".$name;
*/
class TrieNode {
// weight is needed for the given problem
public $weight;
/* TrieNode children,
* e.g. [0 => (TrieNode object1), 2 => (TrieNode object2)]
* where 0 stands for 'a', 2 for 'c'
* and TrieNode objects are references to other TrieNodes.
*/
private $children;
function __construct($weight) {
$this->weight = $weight;
$this->children = [];
}
function addChild($char, $node) {
$this->children[$char] = $node;
}
function isChild($char) {
return isset($this->children[$char]);
}
function getChild($char) {
return $this->children[$char];
}
}
class Trie {
/* root TrieNode stores the first characters */
private $root;
function __construct() {
$this->root = new TrieNode(-1);
}
static $asciiValues = array(
"a" => 0,
"b" => 1,
"c" => 2,
"d" => 3,
"e" => 4,
"f" => 5,
"g" => 6,
"h" => 7,
"i" => 8,
"j" => 9,
"k" => 10,
"l" => 11,
"m" => 12,
"n" => 13,
"o" => 14,
"p" => 15,
"q" => 16,
"r" => 17,
"s" => 18,
"t" => 19,
"u" => 20,
"v" => 21,
"w" => 22,
"x" => 23,
"y" => 24,
"z" => 25
);
function insert($string, $weight) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = self::$asciiValues[$string[$i]];
$currentNode->weight = max($weight, $currentNode->weight);
if($currentNode->isChild($char)) {
$childNode = $currentNode->getChild($char);
} else {
$childNode = new TrieNode($weight);
$currentNode->addChild($char, $childNode);
}
$currentNode = $childNode;
}
}
function getNodeWeight($string) {
$currentNode = $this->root;
$l = strlen($string);
for ($i = 0; $i < $l; $i++) {
$char = self::$asciiValues[$string[$i]];
if (!$currentNode->isChild($char)) {
return -1;
}
$currentNode = $currentNode->getChild($char);
}
return $currentNode->weight;
}
}
$trie = new Trie();
//$handle = fopen('test.txt', 'r');
$handle = STDIN; // <- this is for the online judge
list($n, $q) = fscanf($handle, "%d %d");
for ($i = 0; $i < $n; $i++) { // insert data
list($s, $weight) = fscanf($handle, "%s %d");
$trie->insert($s, $weight);
}
for ($i = 0; $i < $q; $i++) { // query data
//$query = trim(strval(fgets($handle)));
$query = trim(strval(fgets($handle)));
echo $trie->getNodeWeight($query) . PHP_EOL;
}
fclose($handle);
?>