Elasticsearch dsl - python 中单列的大型唯一列表

Question

我有一个大型 Windows 事件日志集，我试图从单个事件 ID 的单个列中查找用户的唯一列表。这会运行，但需要很长时间。您将如何使用 python Elasticsearch_dsl 和 Elasticsearch-py 来完成此操作？

    es = Elasticsearch([localhostmines], timeout=30)
    s = Search(using=es, index="logindex-*").filter('term', EventID="4624")

    users = set([])
    for hit in s.scan():
        users.add(hit.TargetUserName)

    print(users)

TargetUserName 列包含字符串名称，EventID 列包含 windows 的事件 ID 字符串。

Answer 1

您需要使用 terms 聚合，这将完全符合您的期望。

s = Search(using=es, index="logindex-*").filter('term', EventID="4624")
s.aggs.bucket('per_user', 'terms', field='TargetUserName')

response = s.execute()
for user in response.aggregations.per_user.buckets:
    print(user.key, user.doc_count)

Elasticsearch dsl - python 中单列的大型唯一列表

Elasticsearch dsl - large unique list of single column in python

python-3.x

elasticsearch

pandas

elasticsearch-dsl