设计应用引擎数据存储和文本搜索建模的最佳方式

Question

我们在 google 应用引擎上有一个 Java 应用程序运行。有一种叫做联系方式。以下是示例架构

Contact
{
  long id
  String firstName
  String lastName
  ...
}

以上是现有的模型，为了支持一些需求，我们将这个对象存储在数据存储和文本搜索中

现在我们要将联系人与其页面浏览量数据集成。

每个联系人可以有几千条浏览量记录，有些联系人甚至有几百万条

以下是示例页面访问对象[注意：目前我们还没有这个对象，这只是提供有关页面访问的信息]

PageVisit
{

  long id
  String url
  String refUrl
  int  country
  String city
  ....
}

我们有一个需求，需要查询联系人核心属性和他的页面访问数据

例如：

select * from Contact where firstName = 'abc' and url = 'cccccc.com';
select * from Contact where firstName = 'abc' or url = 'cccccc.com';

要编写此类查询，我们既需要联系人核心属性，也需要他们访问的页面在联系人对象本身中可用，但联系人可以有大量的页面浏览量。所以这将跨越实体最大大小限制

那么如何在数据存储和文本搜索中设计这种情况下的联系人模型。

谢谢

Answer 1

Cloud Datastore 不支持联接，因此您需要通过客户端代码以某种方式处理此问题。

2 种可能的处理方法是：

非规范化您需要搜索到 PageVisit 的联系人：

PageVisit
{

  long id
  String firstName // Denormalized from Contact
  String url
  String refUrl
  int  country
  String city
  ....
}

这需要您创建一个复合索引：

- kind: PageVisit
  ancestor: no
  properties:
  - name: firstName
  - name: url

或运行多个查询

select id from Contact where firstName = 'abc'

select * from PageVisit where contactId={id} and url = 'cccccc.com';
select * from PageVisit where contactId={id} or url = 'cccccc.com';

这需要您创建一个复合索引：

- kind: PageVisit
  ancestor: no
  properties:
  - name: contactId
  - name: url

最后： 根据您网站的大小，可能值得查看 Cloud Bigtable 以获取 PageView 数据。对于高写入 OLAP 式工作负载，这是一个更好的解决方案。

设计应用引擎数据存储和文本搜索建模的最佳方式

Best way to design app engine datastore and text search modelling

google-app-engine

gql

google-cloud-datastore

google-cloud-platform