从 2-3-4 树中搜索带有电话号码的词
Search for words with telephone numbers from 2-3-4 tree
我有一个单词字典,放在 2-3-4 树 中。单词有不同的长度。使用 telephone 键盘,我需要找到 所有可能的词 可以响应特定的 phone 号码。给定键盘:
例如,数字 26678837 可能是单词“COMPUTER”,但也可能是另一个单词。考虑到我所有的单词都在 2-3-4 树中,为了从给定的 phone 数字中找到所有可能的单词,最好的算法或搜索方法是什么?
怎么样
在 youtube 上我看了一个非常有趣的 google 采访,其中编码员使用了这个算法。
这里的关键是获取所有不同的字符组合,这些字符组合可以从与该数字从左到右的序列中提供的数字相关的字符派生。显然,提供的数字越大,组合就越多。例如,您在 post 中提供的 26678837 的数量可以生成 11,664 个独特的字符组合(字符串单词)。
2 6 6 7 8 8 3 7
abc mno mno pqrs tuv tuv def pqrs
这 11,664 个生成的字符串单词中的每一个都需要通过各种字典传递,字典本身 可能 包含比方说 170,000(或更多)个单词并检查 (不区分大小写)平等。这不会是一个过快的过程,因为这相当于近 20 亿次平等检查。在我 10 岁的桌面 Windows 盒子上,这大约需要 1 分 20 秒。
在任何情况下,下面提供的方法都采用提供的字符串编号,生成不同的字符组合(字符词)并将每个组合与提供的词典(单词)文本文件中的单词进行比较。阅读方法附带的文档:
/**
* Converts the supplied string telephone number (or any string of integer
* numbers) to possible words acquired from within the supplied dictionary
* text file.<br><br>
*
* Specific digits relate to specific alpha characters as they do in a telephone
* keypad, for example:<pre>
*
* 1 2 3 4 5 6 7 8 9 0
* ABC DEF GHI JKL MNO PQRS TUV WXYZ
* </pre>
*
* Digits 1 and 0 do not relate to any alpha characters. By looking at the
* above scale you can see how a number like "26678837" can generate 11,664
* unique character combinations (string words). Each one of these combinations
* needs to be check against the supplied dictionary (or word) text file to see
* if matches a specific dictionary word. If it does then we store that word
* and carry on to the next character combination. If the dictionary (word)
* text file contains 170,000 words and there are 11,664 different character
* combinations then a simple calculation (170,000 x 11,664) determines that
* there will need to be 1,982,880,000 (almost 2 billion) inspections done for
* equality. Because of this, many (sometimes thousands) of Executor Service
* threads are used to speed up the processing task. At the end of the day
* however, task speed completely relies on the speed of the Hard Disk Drive
* (HDD) and too many threads can actually degrade performance. It's a tough
* balance. Modify things as you see fit.
*
*
*
* @param dictionaryFilePath (String) The path and file name of the dictionary
* file to gather possible words from. This dictionary (or word) file must
* contain a single word on each line of the file. There can not be more than
* one word on any given file line. Blank lines within the file are ignored.
* Letter case is always ignored during processing however any words found and
* returned will be in Upper Letter Case.<br><br>
*
* If you don't currently have a dictionary or word text file available then
* you can easily acquire one from the Internet.<br>
*
* @param telephoneNumber (String) The telephone number (or any number) to
* convert to a dictionary word. Any non-digit characters within the supplied
* string are automatically removed so that only digits remain. The longer
* the number string, the longer it takes to process this number to a word.
* There is no guarantee that the number you supply will produce a word from
* within the supplied dictionary (word) file.<br>
*
* @param showProcess (boolean) If boolean 'true' is supplied then the processing
* is displayed within the console window. This however slows things down
* considerably but does give you an idea what is going on. Because larger
* string numbers can take some time to complete processing, a modal 'Please
* Wait' dialog with a progress bar is displayed in the middle of your screen.
* When processing has completed this dialog window is automatically closed.<br>
*
* @return (String[] Array) A String[] Array of the word or words from the
* supplied dictionary (word) file which the supplied number correlates to.
*/
public static String[] telephoneNumberToDictionaryWords(final String dictionaryFilePath,
final String telephoneNumber, final boolean showProcess) {
long timeStart = System.currentTimeMillis();
if (dictionaryFilePath == null || dictionaryFilePath.isEmpty()) {
System.err.println("The argument for dictionaryFilePath can not be null or empty!");
return null;
}
else if (telephoneNumber == null || telephoneNumber.trim().isEmpty()) {
System.err.println("The argument for telephoneNumber can not be null or empty!");
return null;
}
else if (!new File(dictionaryFilePath).exists()) {
System.err.println("File Not Found! ("
+ new File(dictionaryFilePath).getAbsolutePath() + ")");
return null;
}
String digits = String.valueOf(telephoneNumber).replaceAll("[^\d]", "");
if (digits.isEmpty()) {
return null;
}
List<String> foundList = new ArrayList<>();
// ================== The 'Please Wait' Dialog Window ==================
// Depending on the number supplied, this process can take a while to
// complete therefore put a popup 'Please Wait...' window into play
// here.
final javax.swing.JFrame iframe = new javax.swing.JFrame();
iframe.setAlwaysOnTop(true);
iframe.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
iframe.setLocationRelativeTo(null);
final JDialog workingDialog = new JDialog(iframe);
workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.WAIT_CURSOR));
final JPanel p1 = new JPanel(new BorderLayout());
javax.swing.border.Border lineBorder = javax.swing.BorderFactory.createLineBorder(Color.red, 4);
p1.setBorder(lineBorder);
p1.setBackground(Color.BLACK);
p1.setPreferredSize(new java.awt.Dimension(255, 150));
final JLabel label = new JLabel();
label.setHorizontalAlignment(JLabel.CENTER);
label.setVerticalAlignment(JLabel.CENTER);
label.setText("<html><font size='5',color=#14e5eb><b><center>Please Wait..."
+ "</center></b></font><br><center>This can take a little "
+ "while.</center></html>");
label.setForeground(Color.WHITE);
p1.add(label, BorderLayout.CENTER);
final JLabel label2 = new JLabel();
label2.setHorizontalAlignment(JLabel.CENTER);
label2.setVerticalAlignment(JLabel.CENTER);
label2.setText("Hello World");
label2.setForeground(Color.WHITE);
p1.add(label2, BorderLayout.BEFORE_FIRST_LINE);
final javax.swing.JProgressBar progressBar = new javax.swing.JProgressBar();
progressBar.setPreferredSize( new java.awt.Dimension (190, 25));
progressBar.setMaximumSize(new java.awt.Dimension(190,25));
progressBar.setStringPainted(true);
p1.add(progressBar, BorderLayout.SOUTH);
workingDialog.setUndecorated(true);
workingDialog.getContentPane().add(p1);
workingDialog.pack();
workingDialog.setLocationRelativeTo(iframe);
workingDialog.setDefaultCloseOperation(JDialog.DISPOSE_ON_CLOSE);
workingDialog.setModal(true);
// =====================================================================
SwingWorker<String, Void> worker = new SwingWorker<String, Void>() {
@Override
protected String doInBackground() throws InterruptedException {
// Get the number of words contained within the supplied dictionary file.
long numberOfDictionaryWords = 0L;
// This will auto-close the stream.
try (java.util.stream.Stream<String> linesStream
= java.nio.file.Files.lines(java.nio.file.Paths.get(dictionaryFilePath))) {
numberOfDictionaryWords = linesStream.count();
}
catch (IOException ex) {
System.err.println(ex);
}
if (showProcess) {
System.out.println("The number of words in dictionary: --> "
+ numberOfDictionaryWords);
}
// Map the alpha characters related to telephone digits
HashMap<Character, String> map = new HashMap<>();
map.put('0', "");
map.put('1', "");
map.put('2', "abc");
map.put('3', "def");
map.put('4', "ghi");
map.put('5', "jkl");
map.put('6', "mno");
map.put('7', "pqrs");
map.put('8', "tuv");
map.put('9', "wxyz");
// Get all the character combinations available for the
// telelphone number (or just number) digits provided.
List<String> phoneCharactersList = new ArrayList<>();
phoneCharactersList.add("");
for (int i = 0; i < digits.length(); i++) {
if (digits.charAt(i) == '0' || digits.charAt(i) == '1') {
continue;
}
ArrayList<String> tempList = new ArrayList<>();
String option = map.get(digits.charAt(i));
for (int j = 0; j < phoneCharactersList.size(); j++) {
for (int p = 0; p < option.length(); p++) {
tempList.add(new StringBuilder(phoneCharactersList.get(j))
.append(option.charAt(p)).toString());
}
}
phoneCharactersList.clear();
phoneCharactersList.addAll(tempList);
}
if (showProcess) {
System.out.println("The number of generated words to process: --> "
+ phoneCharactersList.size());
}
progressBar.setMinimum(0);
progressBar.setMaximum(phoneCharactersList.size());
/* Iterate through all the character combinations now contained
within the 'phoneCharactersList' and compare them to the words
contained within the dictionary file. Store found words into
the foundList List Interface object.
Declare concurrent executor service threads (one for each possible
word to check for in dictionary file). */
java.util.concurrent.ExecutorService executor =
java.util.concurrent.Executors.newFixedThreadPool(phoneCharactersList.size());
for (int i = 0; i < phoneCharactersList.size(); i++) {
String wordToFind = phoneCharactersList.get(i);
label2.setText("<html><center>Processing word: <font color=yellow>"
+ wordToFind + "</font></center></html>");
if (showProcess) {
System.out.println((i + 1) + ") Processing the possible word: " + wordToFind);
}
Runnable threadWorker = new Runnable() {
@Override
public void run() {
try (java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(dictionaryFilePath))) {
String line;
while ((line = br.readLine()) != null) {
line = line.trim();
if (line.isEmpty() || (line.length() != wordToFind.length())) {
continue;
}
if (line.equalsIgnoreCase(wordToFind)) {
foundList.add(wordToFind.toUpperCase());
break;
}
}
}
catch (Exception ex) {
System.err.println(ex);
}
}
};
executor.execute(threadWorker);
progressBar.setValue(i+1);
}
// Shut down Executor Service threads (when they are done)
executor.shutdown();
// Hold rest of code processing until are executors are terminated.
while (!executor.isTerminated()) {
}
return null;
}
@Override
protected void done() {
workingDialog.dispose();
iframe.dispose();
workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.DEFAULT_CURSOR));
}
};
worker.execute();
workingDialog.setVisible(true);
try {
worker.get();
}
catch (Exception e1) {
e1.printStackTrace();
}
java.util.Collections.sort(foundList);
long timeEnd = System.currentTimeMillis();
if (showProcess) {
long seconds = (timeEnd - timeStart) / 1000;
String timeSpan = new StringBuffer("").append(String.valueOf(seconds))
.append(" seconds").toString();
if (seconds > 60) {
timeSpan = new StringBuffer("").append(String.valueOf(seconds/60))
.append(" minutes and ").append(String.valueOf(seconds%60))
.append(" seconds").toString();
}
System.out.println(new StringBuffer("It took ").append(timeSpan)
.append(" to process the number: ").append(telephoneNumber)
.toString());
}
return foundList.toArray(new String[foundList.size()]);
}
根据您的需要修改代码,以便使用您的 2-3-4 树、B 树或任何您喜欢的东西。要使用此方法,您可以这样做:
String phoneNumber = "26678837";
String[] words = telephoneNumberToDictionaryWords("dictionary.txt", phoneNumber, false);
if (words == null || words.length == 0) {
System.out.println("There are no dictionary words available for "
+ "the Phone Number: " + phoneNumber);
}
else {
for (String str : words) {
System.out.println(str);
}
}
我有一个单词字典,放在 2-3-4 树 中。单词有不同的长度。使用 telephone 键盘,我需要找到 所有可能的词 可以响应特定的 phone 号码。给定键盘:
例如,数字 26678837 可能是单词“COMPUTER”,但也可能是另一个单词。考虑到我所有的单词都在 2-3-4 树中,为了从给定的 phone 数字中找到所有可能的单词,最好的算法或搜索方法是什么?
在 youtube 上我看了一个非常有趣的 google 采访,其中编码员使用了这个算法。
这里的关键是获取所有不同的字符组合,这些字符组合可以从与该数字从左到右的序列中提供的数字相关的字符派生。显然,提供的数字越大,组合就越多。例如,您在 post 中提供的 26678837 的数量可以生成 11,664 个独特的字符组合(字符串单词)。
2 6 6 7 8 8 3 7
abc mno mno pqrs tuv tuv def pqrs
这 11,664 个生成的字符串单词中的每一个都需要通过各种字典传递,字典本身 可能 包含比方说 170,000(或更多)个单词并检查 (不区分大小写)平等。这不会是一个过快的过程,因为这相当于近 20 亿次平等检查。在我 10 岁的桌面 Windows 盒子上,这大约需要 1 分 20 秒。
在任何情况下,下面提供的方法都采用提供的字符串编号,生成不同的字符组合(字符词)并将每个组合与提供的词典(单词)文本文件中的单词进行比较。阅读方法附带的文档:
/**
* Converts the supplied string telephone number (or any string of integer
* numbers) to possible words acquired from within the supplied dictionary
* text file.<br><br>
*
* Specific digits relate to specific alpha characters as they do in a telephone
* keypad, for example:<pre>
*
* 1 2 3 4 5 6 7 8 9 0
* ABC DEF GHI JKL MNO PQRS TUV WXYZ
* </pre>
*
* Digits 1 and 0 do not relate to any alpha characters. By looking at the
* above scale you can see how a number like "26678837" can generate 11,664
* unique character combinations (string words). Each one of these combinations
* needs to be check against the supplied dictionary (or word) text file to see
* if matches a specific dictionary word. If it does then we store that word
* and carry on to the next character combination. If the dictionary (word)
* text file contains 170,000 words and there are 11,664 different character
* combinations then a simple calculation (170,000 x 11,664) determines that
* there will need to be 1,982,880,000 (almost 2 billion) inspections done for
* equality. Because of this, many (sometimes thousands) of Executor Service
* threads are used to speed up the processing task. At the end of the day
* however, task speed completely relies on the speed of the Hard Disk Drive
* (HDD) and too many threads can actually degrade performance. It's a tough
* balance. Modify things as you see fit.
*
*
*
* @param dictionaryFilePath (String) The path and file name of the dictionary
* file to gather possible words from. This dictionary (or word) file must
* contain a single word on each line of the file. There can not be more than
* one word on any given file line. Blank lines within the file are ignored.
* Letter case is always ignored during processing however any words found and
* returned will be in Upper Letter Case.<br><br>
*
* If you don't currently have a dictionary or word text file available then
* you can easily acquire one from the Internet.<br>
*
* @param telephoneNumber (String) The telephone number (or any number) to
* convert to a dictionary word. Any non-digit characters within the supplied
* string are automatically removed so that only digits remain. The longer
* the number string, the longer it takes to process this number to a word.
* There is no guarantee that the number you supply will produce a word from
* within the supplied dictionary (word) file.<br>
*
* @param showProcess (boolean) If boolean 'true' is supplied then the processing
* is displayed within the console window. This however slows things down
* considerably but does give you an idea what is going on. Because larger
* string numbers can take some time to complete processing, a modal 'Please
* Wait' dialog with a progress bar is displayed in the middle of your screen.
* When processing has completed this dialog window is automatically closed.<br>
*
* @return (String[] Array) A String[] Array of the word or words from the
* supplied dictionary (word) file which the supplied number correlates to.
*/
public static String[] telephoneNumberToDictionaryWords(final String dictionaryFilePath,
final String telephoneNumber, final boolean showProcess) {
long timeStart = System.currentTimeMillis();
if (dictionaryFilePath == null || dictionaryFilePath.isEmpty()) {
System.err.println("The argument for dictionaryFilePath can not be null or empty!");
return null;
}
else if (telephoneNumber == null || telephoneNumber.trim().isEmpty()) {
System.err.println("The argument for telephoneNumber can not be null or empty!");
return null;
}
else if (!new File(dictionaryFilePath).exists()) {
System.err.println("File Not Found! ("
+ new File(dictionaryFilePath).getAbsolutePath() + ")");
return null;
}
String digits = String.valueOf(telephoneNumber).replaceAll("[^\d]", "");
if (digits.isEmpty()) {
return null;
}
List<String> foundList = new ArrayList<>();
// ================== The 'Please Wait' Dialog Window ==================
// Depending on the number supplied, this process can take a while to
// complete therefore put a popup 'Please Wait...' window into play
// here.
final javax.swing.JFrame iframe = new javax.swing.JFrame();
iframe.setAlwaysOnTop(true);
iframe.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
iframe.setLocationRelativeTo(null);
final JDialog workingDialog = new JDialog(iframe);
workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.WAIT_CURSOR));
final JPanel p1 = new JPanel(new BorderLayout());
javax.swing.border.Border lineBorder = javax.swing.BorderFactory.createLineBorder(Color.red, 4);
p1.setBorder(lineBorder);
p1.setBackground(Color.BLACK);
p1.setPreferredSize(new java.awt.Dimension(255, 150));
final JLabel label = new JLabel();
label.setHorizontalAlignment(JLabel.CENTER);
label.setVerticalAlignment(JLabel.CENTER);
label.setText("<html><font size='5',color=#14e5eb><b><center>Please Wait..."
+ "</center></b></font><br><center>This can take a little "
+ "while.</center></html>");
label.setForeground(Color.WHITE);
p1.add(label, BorderLayout.CENTER);
final JLabel label2 = new JLabel();
label2.setHorizontalAlignment(JLabel.CENTER);
label2.setVerticalAlignment(JLabel.CENTER);
label2.setText("Hello World");
label2.setForeground(Color.WHITE);
p1.add(label2, BorderLayout.BEFORE_FIRST_LINE);
final javax.swing.JProgressBar progressBar = new javax.swing.JProgressBar();
progressBar.setPreferredSize( new java.awt.Dimension (190, 25));
progressBar.setMaximumSize(new java.awt.Dimension(190,25));
progressBar.setStringPainted(true);
p1.add(progressBar, BorderLayout.SOUTH);
workingDialog.setUndecorated(true);
workingDialog.getContentPane().add(p1);
workingDialog.pack();
workingDialog.setLocationRelativeTo(iframe);
workingDialog.setDefaultCloseOperation(JDialog.DISPOSE_ON_CLOSE);
workingDialog.setModal(true);
// =====================================================================
SwingWorker<String, Void> worker = new SwingWorker<String, Void>() {
@Override
protected String doInBackground() throws InterruptedException {
// Get the number of words contained within the supplied dictionary file.
long numberOfDictionaryWords = 0L;
// This will auto-close the stream.
try (java.util.stream.Stream<String> linesStream
= java.nio.file.Files.lines(java.nio.file.Paths.get(dictionaryFilePath))) {
numberOfDictionaryWords = linesStream.count();
}
catch (IOException ex) {
System.err.println(ex);
}
if (showProcess) {
System.out.println("The number of words in dictionary: --> "
+ numberOfDictionaryWords);
}
// Map the alpha characters related to telephone digits
HashMap<Character, String> map = new HashMap<>();
map.put('0', "");
map.put('1', "");
map.put('2', "abc");
map.put('3', "def");
map.put('4', "ghi");
map.put('5', "jkl");
map.put('6', "mno");
map.put('7', "pqrs");
map.put('8', "tuv");
map.put('9', "wxyz");
// Get all the character combinations available for the
// telelphone number (or just number) digits provided.
List<String> phoneCharactersList = new ArrayList<>();
phoneCharactersList.add("");
for (int i = 0; i < digits.length(); i++) {
if (digits.charAt(i) == '0' || digits.charAt(i) == '1') {
continue;
}
ArrayList<String> tempList = new ArrayList<>();
String option = map.get(digits.charAt(i));
for (int j = 0; j < phoneCharactersList.size(); j++) {
for (int p = 0; p < option.length(); p++) {
tempList.add(new StringBuilder(phoneCharactersList.get(j))
.append(option.charAt(p)).toString());
}
}
phoneCharactersList.clear();
phoneCharactersList.addAll(tempList);
}
if (showProcess) {
System.out.println("The number of generated words to process: --> "
+ phoneCharactersList.size());
}
progressBar.setMinimum(0);
progressBar.setMaximum(phoneCharactersList.size());
/* Iterate through all the character combinations now contained
within the 'phoneCharactersList' and compare them to the words
contained within the dictionary file. Store found words into
the foundList List Interface object.
Declare concurrent executor service threads (one for each possible
word to check for in dictionary file). */
java.util.concurrent.ExecutorService executor =
java.util.concurrent.Executors.newFixedThreadPool(phoneCharactersList.size());
for (int i = 0; i < phoneCharactersList.size(); i++) {
String wordToFind = phoneCharactersList.get(i);
label2.setText("<html><center>Processing word: <font color=yellow>"
+ wordToFind + "</font></center></html>");
if (showProcess) {
System.out.println((i + 1) + ") Processing the possible word: " + wordToFind);
}
Runnable threadWorker = new Runnable() {
@Override
public void run() {
try (java.io.BufferedReader br = new java.io.BufferedReader(new java.io.FileReader(dictionaryFilePath))) {
String line;
while ((line = br.readLine()) != null) {
line = line.trim();
if (line.isEmpty() || (line.length() != wordToFind.length())) {
continue;
}
if (line.equalsIgnoreCase(wordToFind)) {
foundList.add(wordToFind.toUpperCase());
break;
}
}
}
catch (Exception ex) {
System.err.println(ex);
}
}
};
executor.execute(threadWorker);
progressBar.setValue(i+1);
}
// Shut down Executor Service threads (when they are done)
executor.shutdown();
// Hold rest of code processing until are executors are terminated.
while (!executor.isTerminated()) {
}
return null;
}
@Override
protected void done() {
workingDialog.dispose();
iframe.dispose();
workingDialog.setCursor(java.awt.Cursor.getPredefinedCursor(java.awt.Cursor.DEFAULT_CURSOR));
}
};
worker.execute();
workingDialog.setVisible(true);
try {
worker.get();
}
catch (Exception e1) {
e1.printStackTrace();
}
java.util.Collections.sort(foundList);
long timeEnd = System.currentTimeMillis();
if (showProcess) {
long seconds = (timeEnd - timeStart) / 1000;
String timeSpan = new StringBuffer("").append(String.valueOf(seconds))
.append(" seconds").toString();
if (seconds > 60) {
timeSpan = new StringBuffer("").append(String.valueOf(seconds/60))
.append(" minutes and ").append(String.valueOf(seconds%60))
.append(" seconds").toString();
}
System.out.println(new StringBuffer("It took ").append(timeSpan)
.append(" to process the number: ").append(telephoneNumber)
.toString());
}
return foundList.toArray(new String[foundList.size()]);
}
根据您的需要修改代码,以便使用您的 2-3-4 树、B 树或任何您喜欢的东西。要使用此方法,您可以这样做:
String phoneNumber = "26678837";
String[] words = telephoneNumberToDictionaryWords("dictionary.txt", phoneNumber, false);
if (words == null || words.length == 0) {
System.out.println("There are no dictionary words available for "
+ "the Phone Number: " + phoneNumber);
}
else {
for (String str : words) {
System.out.println(str);
}
}