查找列表中每个单词的所有字谜的最有效方法
most efficient way to find all the anagrams of each word in a list
我一直在尝试创建一个程序,该程序可以为文本文件中的每个单词(包含大约 ~370k 个由“\n”分隔的单词)找到所有字谜(在列表中)。
我已经在python中编写了代码。我花了大约一个小时才到达 运行。只是想知道是否有更有效的方法。
我的代码
from tqdm.auto import tqdm
ls = open("words.txt","r").readlines()
ls = [i[:-1] for i in ls]
ls = [[i,''.join(sorted(i))] for i in ls]
ln = set([len(i[1]) for i in tqdm(ls)])
df = {}
for l in tqdm(ln):
df[l] = [i for i in ls if len(i[0]) == l]
full = {}
for m in tqdm(ls):
if full.get(m[0]) == None:
temp = []
for i in df[len(m[0])]:
if i[1] == m[1] and i[0] != m[0]:
temp.append(i[0])
for i in temp:
full[i] = temp
如果有更有效的方法用其他语言(Rust、C、C++、Java ...)编写此代码,如果您也可以 post :)
使用按字符字母顺序排序的单词作为搜索关键字是前进的方向。也许您已经在您的代码中使用这一行来执行此操作(我几乎从不使用 python):
[[i,''.join(sorted(i))] for i in ls]
无论如何,这是我的 C++ 解决您的问题。
现场演示:https://onlinegdb.com/_gauHBd_3
#include <algorithm> // for sorting
#include <string>
#include <unordered_map> // for storing words/anagrams
#include <iostream>
#include <fstream>
#include <set>
// create a class that will hold all words
class dictionary_t
{
public:
// load a text file with one word per line
void load(const std::string& filename)
{
std::ifstream file{ filename };
std::string word;
while (file >> word)
{
add_anagram(word);
}
}
auto& find_anagrams(const std::string& word)
{
const auto key = get_key(word);
// intentionally allow an empty entry to be made if word has no anagrams yet
// for readability easier error handling (not for space/time efficiency)
auto& anagrams = m_anagrams[key];
return anagrams;
}
// show all anagrams for a word
void show_anagrams(const std::string& word)
{
std::cout << "anagrams for word '" << word << "' are : ";
auto anagrams = find_anagrams(word);
for (const auto& anagram : anagrams)
{
if (anagram != word)
{
std::cout << anagram << " ";
}
}
std::cout << "\n";
}
private:
// this function is key to the whole idea
// two words are anagrams if they sort their letters
// to the same order. e.g. beast and betas both sort (alphabetically) to abest
std::string get_key(const std::string& word)
{
std::string key{ word };
// all anagrams sort to the same order of characters.
std::sort(key.begin(), key.end());
return key;
}
void add_anagram(const std::string& word)
{
// find the vector of anagrams for this word
auto& anagrams = find_anagrams(word);
// then add word to it (I use a set so all words will be unique even
// if input file contains duplicates)
anagrams.insert(word);
}
std::unordered_map<std::string, std::set<std::string>> m_anagrams;
};
int main()
{
dictionary_t dictionary;
dictionary.load("words.txt");
dictionary.show_anagrams("beast");
dictionary.show_anagrams("tacos");
return 0;
}
我一直在尝试创建一个程序,该程序可以为文本文件中的每个单词(包含大约 ~370k 个由“\n”分隔的单词)找到所有字谜(在列表中)。
我已经在python中编写了代码。我花了大约一个小时才到达 运行。只是想知道是否有更有效的方法。
我的代码
from tqdm.auto import tqdm
ls = open("words.txt","r").readlines()
ls = [i[:-1] for i in ls]
ls = [[i,''.join(sorted(i))] for i in ls]
ln = set([len(i[1]) for i in tqdm(ls)])
df = {}
for l in tqdm(ln):
df[l] = [i for i in ls if len(i[0]) == l]
full = {}
for m in tqdm(ls):
if full.get(m[0]) == None:
temp = []
for i in df[len(m[0])]:
if i[1] == m[1] and i[0] != m[0]:
temp.append(i[0])
for i in temp:
full[i] = temp
如果有更有效的方法用其他语言(Rust、C、C++、Java ...)编写此代码,如果您也可以 post :)
使用按字符字母顺序排序的单词作为搜索关键字是前进的方向。也许您已经在您的代码中使用这一行来执行此操作(我几乎从不使用 python):
[[i,''.join(sorted(i))] for i in ls]
无论如何,这是我的 C++ 解决您的问题。 现场演示:https://onlinegdb.com/_gauHBd_3
#include <algorithm> // for sorting
#include <string>
#include <unordered_map> // for storing words/anagrams
#include <iostream>
#include <fstream>
#include <set>
// create a class that will hold all words
class dictionary_t
{
public:
// load a text file with one word per line
void load(const std::string& filename)
{
std::ifstream file{ filename };
std::string word;
while (file >> word)
{
add_anagram(word);
}
}
auto& find_anagrams(const std::string& word)
{
const auto key = get_key(word);
// intentionally allow an empty entry to be made if word has no anagrams yet
// for readability easier error handling (not for space/time efficiency)
auto& anagrams = m_anagrams[key];
return anagrams;
}
// show all anagrams for a word
void show_anagrams(const std::string& word)
{
std::cout << "anagrams for word '" << word << "' are : ";
auto anagrams = find_anagrams(word);
for (const auto& anagram : anagrams)
{
if (anagram != word)
{
std::cout << anagram << " ";
}
}
std::cout << "\n";
}
private:
// this function is key to the whole idea
// two words are anagrams if they sort their letters
// to the same order. e.g. beast and betas both sort (alphabetically) to abest
std::string get_key(const std::string& word)
{
std::string key{ word };
// all anagrams sort to the same order of characters.
std::sort(key.begin(), key.end());
return key;
}
void add_anagram(const std::string& word)
{
// find the vector of anagrams for this word
auto& anagrams = find_anagrams(word);
// then add word to it (I use a set so all words will be unique even
// if input file contains duplicates)
anagrams.insert(word);
}
std::unordered_map<std::string, std::set<std::string>> m_anagrams;
};
int main()
{
dictionary_t dictionary;
dictionary.load("words.txt");
dictionary.show_anagrams("beast");
dictionary.show_anagrams("tacos");
return 0;
}