如何将每个人(有多个地址)到原点的最短距离并根据该值排序
How to take the shortest distance per person (with multiple addresses) to an origin point and sort on that value
我的弹性索引中有 People 文档,每个人都有多个地址,每个地址都有一个 lat/long 点关联。
我想根据与特定起始位置的接近程度对所有人进行地理分类,但是每个人有多个位置会使这件事变得复杂。决定的是[Objective:] 取每个人到原点的最短距离作为排序编号。
我在 'pseudo-JSON' 中粗略列出的人员索引示例显示了几个人员文档,每个文档都有多个地址:
person {
name: John Smith
addresses [
{ lat: 43.5234, lon: 32.5432, 1 Main St. }
{ lat: 44.983, lon: 37.3432, 2 Queen St. W. }
{ ... more addresses ... }
]
}
person {
name: Jane Doe
addresses [
... she has a bunch of addresses too ...
]
}
... many more people docs each having multiple addresses like above ...
目前我正在使用一个带有内联 groovy 脚本的弹性脚本字段,就像这样 - 它使用 groovy 脚本来计算每个地址从原点开始的米数,将所有这些米距离推入每个人一个数组,并从每个人的数组中选择最小数量,使其成为排序值。
string groovyShortestDistanceMetersSortScript = string.Format("[doc['geo1'].distance({0}, {1}), doc['geo2'].distance({0}, {1})].min()",
origin.Latitude,
origin.Longitude);
var shortestMetersSort = new SortDescriptor<Person>()
.Script(sd => sd
.Type("number")
.Script(script => script
.Inline(groovyShortestDistanceMetersSortScript)
)
.Order(SortOrder.Ascending)
);
虽然这可行,但我想知道在查询时使用脚本字段是否会更昂贵或太复杂,并且 是否有更好的方法通过索引数据来实现所需的排序顺序结果不同的是 and/or 通过使用聚合,甚至可能完全取消脚本字段 。
感谢任何想法和指导。我确定其他人已 运行 满足相同的要求(或类似要求)并找到了不同或更好的解决方案。
我在此代码示例中使用 Nest API,但很乐意接受 elasticsearch JSON 格式的答案,因为我可以将它们移植到 NEST API 代码中。
当根据与指定来源的距离进行排序时,其中正在排序的字段包含值的集合(在本例中为 geo_point
类型),我们可以指定应如何使用sort_mode
。在这种情况下,我们可以指定一个 sort_mode
of "min"
来使用离原点最近的位置作为排序值。这是一个例子
public class Person
{
public string Name { get; set; }
public IList<Address> Addresses { get; set; }
}
public class Address
{
public string Name { get; set; }
public GeoLocation Location { get; set; }
}
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var indexName = "people";
var connectionSettings = new ConnectionSettings(pool)
.InferMappingFor<Person>(m => m.IndexName(indexName));
var client = new ElasticClient(connectionSettings);
if (client.IndexExists(indexName).Exists)
client.DeleteIndex(indexName);
client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<Person>(mm => mm
.AutoMap()
.Properties(p => p
.Nested<Address>(n => n
.Name(nn => nn.Addresses.First().Location)
.AutoMap()
)
)
)
)
);
var people = new[] {
new Person {
Name = "John Smith",
Addresses = new List<Address>
{
new Address
{
Name = "Buckingham Palace",
Location = new GeoLocation(51.501476, -0.140634)
},
new Address
{
Name = "Empire State Building",
Location = new GeoLocation(40.748817, -73.985428)
}
}
},
new Person {
Name = "Jane Doe",
Addresses = new List<Address>
{
new Address
{
Name = "Eiffel Tower",
Location = new GeoLocation(48.858257, 2.294511)
},
new Address
{
Name = "Uluru",
Location = new GeoLocation(-25.383333, 131.083333)
}
}
}
};
client.IndexMany(people);
// call refresh for testing (avoid in production)
client.Refresh("people");
var towerOfLondon = new GeoLocation(51.507313, -0.074308);
client.Search<Person>(s => s
.MatchAll()
.Sort(so => so
.GeoDistance(g => g
.Field(f => f.Addresses.First().Location)
.PinTo(towerOfLondon)
.Ascending()
.Unit(DistanceUnit.Meters)
// Take the minimum address location distance from
// our target location, The Tower of London
.Mode(SortMode.Min)
)
)
);
}
这将创建以下搜索
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"addresses.location": [
{
"lat": 51.507313,
"lon": -0.074308
}
],
"order": "asc",
"mode": "min",
"unit": "m"
}
}
]
}
哪个returns
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yT",
"_score" : null,
"_source" : {
"name" : "John Smith",
"addresses" : [ {
"name" : "Buckingham Palace",
"location" : {
"lat" : 51.501476,
"lon" : -0.140634
}
}, {
"name" : "Empire State Building",
"location" : {
"lat" : 40.748817,
"lon" : -73.985428
}
} ]
},
"sort" : [ 4632.035195223564 ]
}, {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yU",
"_score" : null,
"_source" : {
"name" : "Jane Doe",
"addresses" : [ {
"name" : "Eiffel Tower",
"location" : {
"lat" : 48.858257,
"lon" : 2.294511
}
}, {
"name" : "Uluru",
"location" : {
"lat" : -25.383333,
"lon" : 131.083333
}
} ]
},
"sort" : [ 339100.6843074794 ]
} ]
}
}
sort
数组中每次命中返回的值是指定排序单位(在我们的示例中为米)与指定点(伦敦塔)的最小距离以及每个命中的地址人.
根据 guidelines in Sorting By Distance documentation, often it can make more sense to score by distance, which can be achieved by using function_score
query with a decay function.
我的弹性索引中有 People 文档,每个人都有多个地址,每个地址都有一个 lat/long 点关联。
我想根据与特定起始位置的接近程度对所有人进行地理分类,但是每个人有多个位置会使这件事变得复杂。决定的是[Objective:] 取每个人到原点的最短距离作为排序编号。
我在 'pseudo-JSON' 中粗略列出的人员索引示例显示了几个人员文档,每个文档都有多个地址:
person {
name: John Smith
addresses [
{ lat: 43.5234, lon: 32.5432, 1 Main St. }
{ lat: 44.983, lon: 37.3432, 2 Queen St. W. }
{ ... more addresses ... }
]
}
person {
name: Jane Doe
addresses [
... she has a bunch of addresses too ...
]
}
... many more people docs each having multiple addresses like above ...
目前我正在使用一个带有内联 groovy 脚本的弹性脚本字段,就像这样 - 它使用 groovy 脚本来计算每个地址从原点开始的米数,将所有这些米距离推入每个人一个数组,并从每个人的数组中选择最小数量,使其成为排序值。
string groovyShortestDistanceMetersSortScript = string.Format("[doc['geo1'].distance({0}, {1}), doc['geo2'].distance({0}, {1})].min()",
origin.Latitude,
origin.Longitude);
var shortestMetersSort = new SortDescriptor<Person>()
.Script(sd => sd
.Type("number")
.Script(script => script
.Inline(groovyShortestDistanceMetersSortScript)
)
.Order(SortOrder.Ascending)
);
虽然这可行,但我想知道在查询时使用脚本字段是否会更昂贵或太复杂,并且 是否有更好的方法通过索引数据来实现所需的排序顺序结果不同的是 and/or 通过使用聚合,甚至可能完全取消脚本字段 。
感谢任何想法和指导。我确定其他人已 运行 满足相同的要求(或类似要求)并找到了不同或更好的解决方案。
我在此代码示例中使用 Nest API,但很乐意接受 elasticsearch JSON 格式的答案,因为我可以将它们移植到 NEST API 代码中。
当根据与指定来源的距离进行排序时,其中正在排序的字段包含值的集合(在本例中为 geo_point
类型),我们可以指定应如何使用sort_mode
。在这种情况下,我们可以指定一个 sort_mode
of "min"
来使用离原点最近的位置作为排序值。这是一个例子
public class Person
{
public string Name { get; set; }
public IList<Address> Addresses { get; set; }
}
public class Address
{
public string Name { get; set; }
public GeoLocation Location { get; set; }
}
void Main()
{
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var indexName = "people";
var connectionSettings = new ConnectionSettings(pool)
.InferMappingFor<Person>(m => m.IndexName(indexName));
var client = new ElasticClient(connectionSettings);
if (client.IndexExists(indexName).Exists)
client.DeleteIndex(indexName);
client.CreateIndex(indexName, c => c
.Settings(s => s
.NumberOfShards(1)
.NumberOfReplicas(0)
)
.Mappings(m => m
.Map<Person>(mm => mm
.AutoMap()
.Properties(p => p
.Nested<Address>(n => n
.Name(nn => nn.Addresses.First().Location)
.AutoMap()
)
)
)
)
);
var people = new[] {
new Person {
Name = "John Smith",
Addresses = new List<Address>
{
new Address
{
Name = "Buckingham Palace",
Location = new GeoLocation(51.501476, -0.140634)
},
new Address
{
Name = "Empire State Building",
Location = new GeoLocation(40.748817, -73.985428)
}
}
},
new Person {
Name = "Jane Doe",
Addresses = new List<Address>
{
new Address
{
Name = "Eiffel Tower",
Location = new GeoLocation(48.858257, 2.294511)
},
new Address
{
Name = "Uluru",
Location = new GeoLocation(-25.383333, 131.083333)
}
}
}
};
client.IndexMany(people);
// call refresh for testing (avoid in production)
client.Refresh("people");
var towerOfLondon = new GeoLocation(51.507313, -0.074308);
client.Search<Person>(s => s
.MatchAll()
.Sort(so => so
.GeoDistance(g => g
.Field(f => f.Addresses.First().Location)
.PinTo(towerOfLondon)
.Ascending()
.Unit(DistanceUnit.Meters)
// Take the minimum address location distance from
// our target location, The Tower of London
.Mode(SortMode.Min)
)
)
);
}
这将创建以下搜索
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"addresses.location": [
{
"lat": 51.507313,
"lon": -0.074308
}
],
"order": "asc",
"mode": "min",
"unit": "m"
}
}
]
}
哪个returns
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yT",
"_score" : null,
"_source" : {
"name" : "John Smith",
"addresses" : [ {
"name" : "Buckingham Palace",
"location" : {
"lat" : 51.501476,
"lon" : -0.140634
}
}, {
"name" : "Empire State Building",
"location" : {
"lat" : 40.748817,
"lon" : -73.985428
}
} ]
},
"sort" : [ 4632.035195223564 ]
}, {
"_index" : "people",
"_type" : "person",
"_id" : "AVcxBKuPlWTRBymPa4yU",
"_score" : null,
"_source" : {
"name" : "Jane Doe",
"addresses" : [ {
"name" : "Eiffel Tower",
"location" : {
"lat" : 48.858257,
"lon" : 2.294511
}
}, {
"name" : "Uluru",
"location" : {
"lat" : -25.383333,
"lon" : 131.083333
}
} ]
},
"sort" : [ 339100.6843074794 ]
} ]
}
}
sort
数组中每次命中返回的值是指定排序单位(在我们的示例中为米)与指定点(伦敦塔)的最小距离以及每个命中的地址人.
根据 guidelines in Sorting By Distance documentation, often it can make more sense to score by distance, which can be achieved by using function_score
query with a decay function.