Elasticsearch Ruby Activerecord 持久性模型 URL 术语搜索

Elasticsearch Ruby Activerecord Persistence Model URL term search

我正在尝试使用弹性搜索字词查询对包含 URL 的字段进行搜索。我使用 elasticsearch-rails ActiveRecord 持久性模式。这就是我尝试做的方式。

total_views = UserAction.search :query=> {
        :filtered=> {
            :filter=> {
                :term=> { action_path:"http://0.0.0.0:3000/tshirt/test" } 
            }
        }
    }  

如果没有“/”或“:”字符,它会起作用。例如,当 action_path 只是 'tshirt' 时。不分析其他字段,如果字段中没有“/”、“:”等字符,它们将起作用。 所以显然弹性搜索试图分析它,但问题是它们不应该被分析,因为映射已经存在。

这是我的用户操作 class

class UserAction
  include Elasticsearch::Persistence::Model  
  extend Calculations
  include Styles

  attribute :user_id, Integer
    attribute :user_referrer, String, mapping: { index: 'not_analyzed' } 
    attribute :user_ip, String, mapping: { index: 'not_analyzed' } 
    attribute :user_country, String, mapping: { index: 'not_analyzed' }
    attribute :user_city, String, mapping: { index: 'not_analyzed' }
    attribute :user_device, String, mapping: { index: 'not_analyzed' }
  attribute :user_agent, String, mapping: { index: 'not_analyzed' }
    attribute :user_platform
  attribute :user_visitid, Integer
    attribute :action_type, String, mapping: { index: 'not_analyzed' } 
    attribute :action_css, String, mapping: { index: 'not_analyzed' }
  attribute :action_text, String, mapping: { index: 'not_analyzed' }
  attribute :action_path, String, mapping: { index: 'not_analyzed' } 
  attribute :share_url, String, mapping: { index: 'not_analyzed' } 
  attribute :tag 
  attribute :date 

我还尝试使用“mapping do..”添加索引,然后 "create_index!" 但结果是一样的。因为存在映射,所以它确实创建了映射。

这是我的 gem 文件

   gem "elasticsearch-model", git: "git://github.com/elasticsearch/elasticsearch-rails.git", require: "elasticsearch/model"
          gem "elasticsearch-persistence", git: "git://github.com/elasticsearch/elasticsearch-rails.git", require: "elasticsearch/persistence/model"
          gem "elasticsearch-rails"

当我进行搜索时,我还看到那些未分析的字段。

       :reload_on_failure=>false,
         :randomize_hosts=>false,
         :transport_options=>{}},
       @protocol="http",
       @reload_after=10000,
       @resurrect_after=60,
       @serializer=
        #<Elasticsearch::Transport::Transport::Serializer::MultiJson:0x007fc4bf9e0e18
         @transport=#<Elasticsearch::Transport::Transport::HTTP::Faraday:0x007fc4bf9b35a8 ...>>,
       @sniffer=
        #<Elasticsearch::Transport::Transport::Sniffer:0x007fc4bf9e0dc8
         @timeout=1,
         @transport=#<Elasticsearch::Transport::Transport::HTTP::Faraday:0x007fc4bf9b35a8 ...>>,
       @tracer=nil>>,
   @document_type="user_action",
   @index_name="useraction",
   @klass=UserAction,
   @mapping=
    #<Elasticsearch::Model::Indexing::Mappings:0x007fc4bfab18d8
     @mapping=
      {:created_at=>{:type=>"date"},
       :updated_at=>{:type=>"date"},
       :user_id=>{:type=>"integer"},
       :user_referrer=>{:type=>"string"},
       :user_ip=>{:type=>"string"},
       :user_country=>{:type=>"string", :index=>"not_analyzed"},
       :user_city=>{:type=>"string", :index=>"not_analyzed"},
       :user_device=>{:type=>"string", :index=>"not_analyzed"},
       :user_agent=>{:type=>"string", :index=>"not_analyzed"},
       :user_platform=>{:type=>"string"},
       :user_visitid=>{:type=>"integer"},
       :action_type=>{:type=>"string", :index=>"not_analyzed"},
       :action_css=>{:type=>"string", :index=>"not_analyzed"},
       :action_text=>{:type=>"string", :index=>"not_analyzed"},
       :action_path=>{:type=>"string", :index=>"not_analyzed"}},
     @options={},
     @type="user_action">,
   @options={:host=>UserAction}>,
 @response={"took"=>1, "timed_out"=>false, "_shards"=>{"total"=>4, "successful"=>4, "failed"=>0}, "hits"=>{"total"=>0, "max_score"=>nil, "hits"=>[]}}>
(END) 

初始化文件除了 elastichq 连接外什么都没有 url。

数据在 elastichq 中,所以我应该得到结果但无法得到任何结果。

    user_action 1   AUzH9xKDueQ8OtBQuyQC    http://example.org/api/analytics/track
user_actions    user_action 1   AUzIAUsvueQ8OtBQuyQg    http://0.0.0.0:3000/tshirt/funnel_test2
user_actions    user_action 1   AUzH7ay5ueQ8OtBQuyP2    http://example.org/api/analytics/track
user_actions    user_action 1   AUzH-HAdueQ8OtBQuyQU    http://0.0.0.0:3000/tshirt/test
user_actions    user_action 1   AUzIJbCGueQ8OtBQuyQ4    http://example.org/api/analytics/track
user_actions    user_action 1   AUzIJbCjueQ8OtBQuyQ5    http://example.org/api/analytics/track

Curl 来自 Elastichq 的结果

curl -XGET "https://YYYYY:XXXXX@xxxx.qbox.io/user_actions/_mapping"
{
  "user_actions": {
    "mappings": {
      "user_action": {
        "properties": {
          "action_css": { "type": "string" },
          "action_path": { "type": "string" },
          "action_text": { "type": "string" },
          "action_type": { "type": "string" },
          "created_at": { "format": "dateOptionalTime", "type": "date" },
          "date": { "type": "string" },
          "share_url": { "type": "string" },
          "tag": { "type": "string" },
          "updated_at": { "format": "dateOptionalTime", "type": "date" },
          "user_agent": { "type": "string" },
          "user_city": { "type": "string" },
          "user_country": { "type": "string" },
          "user_device": { "type": "string" },
          "user_id": { "type": "long" },
          "user_ip": { "type": "string" },
          "user_referrer": { "type": "string" },
          "user_visitid": { "type": "long" }
        }
      }
    }
  }
}

任何人都可以帮助我获得 url 术语搜索工作吗?

从末尾的 elasticsearch curl 看来,您的字段已被分析(没有 not_analyzed 标志)。也许尝试用你想要的映射重建你的索引。

根据经验,如果你想搜索某些东西,你不应该离开它 not_analyzed

在这种情况下,您绝对应该尝试 Keyword Analyzer,将相关字段映射到 keyword

只要您搜索完整的字符串,即 "http://0.0.0.0:3000/tshirt/test",使用 Keyword Analyzer 很有可能会成功。

尝试原始查询:

total_views = UserAction.search :query=> {
    :filtered=> {
        :filter=> {
            :term=> { "action_path.raw" => "http://0.0.0.0:3000/tshirt/test" } 
        }
    }
}  

我做了我不想做的事。 使用以下 post 请求手动创建索引及其映射,因此 elasticsearch-rails 不会创建错误。现在一切正常

curl -XPOST https://xxxxxx.qbox.io/user_actions -d '{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "user_action" : {
            "_source" : { "enabled" : false },
            "properties" : {
                "action_path" : { "type" : "string", "index" : "not_analyzed" }
            }
        }
    }
}'