Android 自定义键盘建议

Android custom keyboard suggestions

我正在为 android 构建一个自定义键盘,该键盘至少支持自动完成建议。为实现这一点,我将用户键入的每个单词(不是密码字段)存储在 Room 数据库 table 中,该数据库具有简单的模型、单词及其频率。现在为了显示建议,我使用了一个由来自该数据库 table 的单词填充的 Trie。我的查询基本上是根据词频按 table 排序,结果限制在 5K(我不觉得 Trie 太多了,这 5K 词可以认为是他使用的用户最喜欢的词经常需要建议)。现在我的实际问题是 ORDER BY 子句,这是一个快速增长的数据集,排序让我们说 0.1M 词得到 5K 词似乎有点矫枉过正。我如何修改此方法以提高整个建议逻辑的效率。

如果还没有实现,频率索引 @ColumnInfo(index = true)

另一种可能是添加一个 table 以保持最高 5k。由另一个 table(支持 table)支持,它有 1 行,列为;最高频率(不是真正需要的),当前 5k 中的最低频率,以及当前持有的数字的第 3 列。因此,您可以在添加现有单词后获取是否应将 new/updated 单词添加到 5k table(可能是最低主键的第 4 列,以促进高效删除)。

所以

  1. 如果当前持有的数字小于 5k 插入或更新 5k table 并增加当前持有的数字支持 table.
  2. 否则如果数字低于最低则跳过它。
  3. 如果已经存在则更新。
  4. 否则删除最低的,插入替换然后相应地更新支持中的最低的table。

请注意,5K table 可能只需要将 rowid 作为 pointer/reference/map 存储到核心 table。

  • rowid 是几乎所有 table 都会在 Room 中拥有的列(虚拟 table 是例外 table 具有 WITHOUT ROWID 属性,但 Room 不方便(据我所知)WITHOUT ROWID table).

  • rowid 可以比其他索引快两倍。我建议使用 @PrimaryKey Long id=null; (java) 或 @PrimaryKey var id: Long?=null (Kotlin) 而不要使用 @PrimaryKey(autogenerate = true).

    • autogenerate = true 等同于 SQLite 的 AUTOINCREMENT,SQLite 文档对此有描述 “AUTOINCREMENT 关键字强加了额外的 CPU、内存、磁盘 space , 和磁盘 I/O 开销,如果不是严格需要,应该避免。通常不需要。"
    • https://www.sqlite.org/rowidtable.html, and also https://sqlite.org/autoinc.html
    • curiously/funnily 提到的支持 table 与编码 AUTOINCREMENT 所做的相差不远。
    • a table 每个 table 有一行有 AUTOINCREMENT,用于 (sqlite_sequence) 存储 table 名称和最高分配的 rowid
      • 没有 AUTOINCREMENT 但有 <column_name> INTEGER PRIMARY KEY 并且主键列的值没有值或为空,则 SQLite 生成一个比 max(rowid) 大 1 的值。
      • 使用 AUTOINCREMENT/autogenerate=true 则生成的值是 max(rowid) 和存储的值中的较大者,对于 table,在 sqlite_sequence table (因此产生了间接费用)。
      • 当然,与排序 10 万行相比,这些开销很可能微不足道。

演示

以下是一个演示,尽管只是使用基本的 Word table 作为源代码。

首先是 2 tables(@Entity 注释 classes)

@Entity (
        indices = {@Index(value = {"word"},unique = true)}
)
class Word {
    @PrimaryKey Long wordId=null;
    @NonNull
    String word;
    @ColumnInfo(index = true)
    long frequency;

    Word(){}

    @Ignore
    Word(String word, long frequency) {
        this.word = word;
        this.frequency = frequency;
    }
}

WordSubset 也就是出现频率最高的 5000 次的 table,它只是有一个 reference/map/link 到 underlying/actual 字。 :-

@Entity(
        foreignKeys = {
                @ForeignKey(
                        entity = Word.class,
                        parentColumns = {"wordId"},
                        childColumns = {"wordIdMap"},
                        onDelete = ForeignKey.CASCADE,
                        onUpdate = ForeignKey.CASCADE
                )
        }
)
class WordSubset {
    public static final long SUBSET_MAX_SIZE = 5000;
    @PrimaryKey
    long wordIdMap;

    WordSubset(){};

    @Ignore
    WordSubset(long wordIdMap) {
        this.wordIdMap = wordIdMap;
    }
}
  • 注意常量 SUBSET_MAX_SIZE,仅对一次进行硬编码,因此只需进行一次简单的更改即可调整 (在添加行后降低它可能会导致问题)

WordSubsetSupport 这将是单行 table 包含最高和最低频率(实际上不需要最高),WordSubset 中的行数 table 和一个 reference/map 到频率最低的单词。

@Entity(
        foreignKeys = {
                @ForeignKey(
                        entity = Word.class,
                        parentColumns = {"wordId"},
                        childColumns = {"lowestWordIdMap"}
                )
        }
)
class WordSubsetSupport {
    @PrimaryKey
    Long wordSubsetSupportId=null;
    long highestFrequency;
    long lowestFrequency;
    long countOfRowsInSubsetTable;
    @ColumnInfo(index = true)
    long lowestWordIdMap;

    WordSubsetSupport(){}
    @Ignore
    WordSubsetSupport(long highestFrequency, long lowestFrequency, long countOfRowsInSubsetTable, long lowestWordIdMap) {
        this.highestFrequency = highestFrequency;
        this.lowestFrequency = lowestFrequency;
        this.countOfRowsInSubsetTable = countOfRowsInSubsetTable;
        this.lowestWordIdMap = lowestWordIdMap;
        this.wordSubsetSupportId = 1L;
    }
}

为了访问抽象 class(而不是接口,因为在 Java 中,允许 methods/functions 带有主体,Kotlin 接口允许这些)组合道 :-

@Dao
abstract class CombinedDao {
    @Insert(onConflict = OnConflictStrategy.IGNORE)
    abstract long insert(Word word);
    @Insert(onConflict = OnConflictStrategy.IGNORE)
    abstract long insert(WordSubset wordSubset);
    @Insert(onConflict = OnConflictStrategy.IGNORE)
    abstract long insert(WordSubsetSupport wordSubsetSupport);

    @Query("SELECT * FROM wordsubsetsupport LIMIT 1")
    abstract WordSubsetSupport getWordSubsetSupport();
    @Query("SELECT count() FROM wordsubsetsupport")
    abstract long getWordSubsetSupportCount();
    @Query("SELECT countOfRowsInSubsetTable FROM wordsubsetsupport")
    abstract long getCountOfRowsInSubsetTable();
    @Query("UPDATE wordsubsetsupport SET countOfRowsInSubsetTable=:updatedCount")
    abstract void updateCountOfRowsInSubsetTable(long updatedCount);
    @Query("UPDATE wordsubsetsupport " +
            "SET countOfRowsInSubsetTable = (SELECT count(*) FROM wordsubset), " +
            "lowestWordIdMap = (SELECT word.wordId FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId ORDER BY frequency ASC LIMIT 1)," +
            "lowestFrequency = (SELECT coalesce(min(frequency),0) FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId)," +
            "highestFrequency = (SELECT coalesce(max(frequency),0) FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId)")
    abstract void autoUpdateWordSupportTable();
    @Query("DELETE FROM wordsubset WHERE wordIdMap= (SELECT wordsubset.wordIdMap FROM wordsubset JOIN word ON wordsubset.wordIdMap = word.wordId ORDER BY frequency ASC LIMIT 1)")
    abstract void deleteLowestFrequency();

    @Transaction
    @Query("")
    int addWord(Word word) {
        /* try to add the add word, setting the wordId value according to the result.
            The result will be the wordId generated (1 or greater) or if the word already exists -1
        */
        word.wordId = insert(word);
        /* If the word was added and not rejected as a duplicate, then it may need to be added to the WordSubset table */
        if (word.wordId > 0) {
            /* Are there any rows in the support table? if not then add the very first entry/row */
            if (getWordSubsetSupportCount() < 1) {
                /* Need to add the word to the subset */
                insert(new WordSubset(word.wordId));
                /* Can now add the first (and only) row to the support table */
                insert(new WordSubsetSupport(word.frequency,word.frequency,1,word.wordId));
                autoUpdateWordSupportTable();
                return 1;
            }
            /* If there are less than the maximum number of rows in the subset table then
                1) insert the new subset row, and
                2) update the support table accordingly
            */
            if (getCountOfRowsInSubsetTable() < WordSubset.SUBSET_MAX_SIZE) {
                insert(new WordSubset(word.wordId));
                autoUpdateWordSupportTable();
                return 2;
            }
            /*
                Last case is that the subset table is at the maximum number of rows and
                the frequency of the added word is greater than the lowest frequency in the
                subset, so
                1) the row with the lowest frequency is removed from the subset table and
                2) the added word is added to the subset
                3) the support table is updated accordingly
            */
            if (getCountOfRowsInSubsetTable() >= WordSubset.SUBSET_MAX_SIZE) {
                WordSubsetSupport currentWordSubsetSupport = getWordSubsetSupport();
                if (word.frequency > currentWordSubsetSupport.lowestFrequency) {
                    deleteLowestFrequency();
                    insert(new WordSubset(word.wordId));
                    autoUpdateWordSupportTable();
                    return 3;
                }
            }
            return 4; /* indicates word added but does not qualify for addition to the subset */
        }
        return -1;
    }
}
  • addWord method/function 是唯一使用的方法,因为它会自动维护 WordSubset 和 WordSubsetSupport tables。

TheDatabase 是一个非常标准的 @Database 注释 class,除此之外,为了演示的方便和简洁,它允许使用主线程:-

@Database( entities = {Word.class,WordSubset.class,WordSubsetSupport.class}, version = TheDatabase.DATABASE_VERSION, exportSchema = false)
abstract class TheDatabase extends RoomDatabase {
    abstract CombinedDao getCombinedDao();

    private static volatile TheDatabase instance = null;
    public static TheDatabase getInstance(Context context) {
        if (instance == null) {
            instance = Room.databaseBuilder(context,TheDatabase.class,DATABASE_NAME)
                    .addCallback(cb)
                    .allowMainThreadQueries()
                    .build();
        }
        return instance;
    }

    private static Callback cb = new Callback() {
        @Override
        public void onCreate(@NonNull SupportSQLiteDatabase db) {
            super.onCreate(db);
        }

        @Override
        public void onOpen(@NonNull SupportSQLiteDatabase db) {
            super.onOpen(db);
        }
    };



    public static final String DATABASE_NAME = "the_database.db";
    public static final int DATABASE_VERSION = 1;
}

最后 activity 代码随机生成并添加 10,000 个单词(或者大约因为有些单词可能是重复的单词),每个单词的频率也是随机生成的(在 1 到 10000 之间):-

public class MainActivity extends AppCompatActivity {
    
    TheDatabase db;
    CombinedDao dao;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        
        db = TheDatabase.getInstance(this);
        dao = db.getCombinedDao();
        
        for (int i=0; i < 10000; i++) {
            Word currentWord = generateRandomWord();
            Log.d("ADDINGWORD","Adding word " + currentWord.word + " frequency is " + currentWord.frequency);
            dao.addWord(generateRandomWord());
        }
    }
    public static final String ALPHABET = "abcdefghijklmnopqrstuvwxyz";
    private Word generateRandomWord() {
        Random r = new Random();
        int wordLength = (abs(r.nextInt()) % 24) + 1;
        int frequency = abs(r.nextInt()) % 10000;
        StringBuilder sb = new StringBuilder();
        for (int i=0; i < wordLength; i++) {
            int letter = abs(r.nextInt()) % (ALPHABET.length());
            sb.append(ALPHABET.substring(letter,letter+1));
        }
        return new Word(sb.toString(),frequency);
    }
}

显然每个 运行 的结果会有所不同,而且该演示仅真正设计为 运行 一次(尽管它可能 运行 次)。

在 运行ning 之后,使用 AppInspection,然后

支持 table(在本例中)是:-

  • 因此,由于 countOfRowsInSubsetTable 是 5000,因此子集 table 已填充到它的 capacity/limit。
  • 遇到的最高频率为 9999(正如预期的那样)
  • 子集中的最低频率是 4690,这是针对 wordId 为 7412 的词。

子集 table 本身意义不大,因为它只包含实际单词的映射。所以'使用查询查看它包含的内容会提供更多信息。

例如

查询可以看出,wordId为7412的词是出现频率最低的词4690(根据支持table预期)

转到最后一页显示:-