太阳能性能

SOLR performance

我在我的项目中使用 SolrJ + Solr。 问题是我在 Solr/Jetty

方面遇到了不明确的瓶颈

我使用 jvisualvm 连接到 Solr 在其下启动的 JVM 实例,发现 77% 的时间花费在方法 "org.eclipse.jetty.io.ByteArrayBuffer.readFrom()" 中,其中一个线程的堆栈跟踪如下:

"qtp64700533-36718" - Thread t@36718
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at org.eclipse.jetty.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:391)
    at org.eclipse.jetty.io.bio.StreamEndPoint.fill(StreamEndPoint.java:141)
    at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.fill(SocketConnector.java:227)
    at org.eclipse.jetty.http.HttpParser.fill(HttpParser.java:1040)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:280)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
    at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
    at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)

因此,花在 I/O 上的时间可能看起来不错,但是:

  1. 应用程序,它在本地机器上启动查询(所以 I/O 时间应该不会很大,上面堆栈跟踪中的线程状态 "RUNNABLE" 似乎很可疑)
  2. 查询响应时间可能长达 5-10 秒
  3. 机器上的平均负载 (CentOS) 约为 10

任何 help/advices 感谢,谢谢!

更新:
确实,伙计们,我忘了提供其他信息。这是:

硬件:i3770,32gb ram,根据iotop,它显示读取速度为50-600kb/秒,写入速度为200-1000kb/秒(几乎与SOLR过程有关)
OS:Centos 6.6
java:OpenJDK 64 位服务器虚拟机 (1.7.0_71 24.65-b04)
solr:4.9.0(以 -Xmx=24000 启动,但我认为应该将 SOLR 核心拆分为独立的 JVM SOLR 实例以最小化 GC 时间)
solrj:4.10.3,adding/updating/removing 文档在 java 代码中使用 commitWithIn=10000 毫秒完成。

关于模式:我在 SOLR 中存储了关于 5 个国家的数据(广告 + 对象):UA、RU、PL、BY、KZ。 因此,每个国家/地区有 2 个核心,例如乌克兰:ua_ads 和 ua_objects(总共 10 个核心) 国家之间的模式几乎相同,请参见下面的乌克兰

"ua_ads" 架构(应该从 "example" 重命名 :))

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
  <fieldType name="int"       class="solr.TrieIntField"   precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="long"      class="solr.TrieLongField"  precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="boolean"   class="solr.BoolField"      sortMissingLast="true"/>
  <fieldType name="tdate"     class="solr.TrieDateField"  precisionStep="6" positionIncrementGap="0"/>
  <fieldType name="string"    class="solr.StrField"       sortMissingLast="true" />
  <fieldType name="text_ru"   class="solr.TextField"      positionIncrementGap="100"/>

  <field name="_version_" type="long" indexed="true" stored="true"/>

  <uniqueKey>adId</uniqueKey>

  <field name="adId"          type="long"     indexed="true"    stored="true"   required="true"/>
  <field name="objectId"      type="long"     indexed="true"    stored="true"   required="false"/>
  <field name="url"           type="string"   indexed="false"   stored="true"   required="true"/>
  <field name="regionId"      type="int"      indexed="false"   stored="true"   required="true"/>
  <field name="sourceId"      type="int"      indexed="false"   stored="true"   required="true"/>
  <field name="type"          type="int"      indexed="false"   stored="true"   required="true"/>
  <field name="title"         type="text_ru"  indexed="false"   stored="true"   required="true"/>
  <field name="address"       type="text_ru"  indexed="false"   stored="true"   required="true"/>
  <field name="text"          type="text_ru"  indexed="false"   stored="true"   required="true"/>
  <field name="dateFound"     type="tdate"    indexed="true"    stored="true"   required="true"/>
  <!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
  <field name="phoneNumbers"  type="string"   indexed="true"    stored="true"   required="true"   multiValued="true"/>
  <field name="priceLocal"    type="long"     indexed="false"   stored="true"   required="false"/>
  <field name="priceUsd"      type="long"     indexed="false"   stored="true"   required="false"/>
  <field name="currency"      type="int"      indexed="false"   stored="true"   required="false"/>

  <field name="roomsCount"    type="int"      indexed="false"   stored="true"   required="false"/>
  <field name="area"          type="int"      indexed="false"   stored="true"   required="false"/>
  <field name="imagesCount"   type="int"      indexed="true"    stored="true"   required="true"/>
</schema>

"ua_objects" 架构

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">

  <fieldType name="int"     class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="long"    class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="float"   class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
  <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
  <fieldType name="tdate"   class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
  <fieldType name="string"  class="solr.StrField" sortMissingLast="true" />
  <fieldtype name="binary"  class="solr.BinaryField"/>

  <fieldType name="addr_ru" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <!-- no stemming for address, dots must me followed by space: "г. Киев" -->
      <!-- char filters is always firs (preprocessing) -->
      <charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- replacing all except letters, removing "-" in home address (9-А) -->
      <filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
      <!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
      <filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="" replace="all"/>
      <filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/cities_ukr2rus.txt"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
      <!-- 1-length is for case with home letters: "Хрещатик, 3" -->
      <filter class="solr.LengthFilterFactory" min="1" max="64"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt,lang/stopwords_addr.txt" format="snowball"/>
    </analyzer>
  </fieldType>
  <fieldType name="text_ru" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <!-- dots must me followed by space: "г. Киев" -->
      <!-- char filters is always firs (preprocessing) -->
      <charFilter class="solr.MappingCharFilterFactory" mapping="lang/chars_replacement.txt" />
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="[^0-9abcdefghijklmnopqrstuvwxyzабвгдеёжзийклмнопрстуфхцчшщъыьэюяіїє\-]" replacement="" replace="all"/>
      <!-- replacing all except letters, removing "-" in home address ("9-а" => "9а") -->
      <filter class="solr.PatternReplaceFilterFactory" pattern="(\d{1,3})[\- ]([абвгдеёжзийклмнопрстуфхцчшщ])" replacement="" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="ї" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="і" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="й" replacement="и" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="ё" replacement="е" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="є" replacement="е" replace="all"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="э" replacement="е" replace="all"/>
      <filter class="solr.LengthFilterFactory" min="1" max="64"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_ru.txt" format="snowball"/>
      <filter class="solr.SynonymFilterFactory" ignoreCase="true" synonyms="lang/synonyms.txt"/>
      <filter class="solr.SnowballPorterFilterFactory" language="Russian"/>
    </analyzer>
  </fieldType>

  <field name="_version_" type="long" indexed="true" stored="true"/>

  <uniqueKey>objectId</uniqueKey>

  <field name="objectId"      type="long"     indexed="true"    stored="true"   required="true"/>
  <field name="url"           type="string"   indexed="false"   stored="true"   required="true"/>
  <field name="regionId"      type="int"      indexed="true"    stored="true"   required="true"/>
  <field name="sourceId"      type="int"      indexed="false"   stored="true"   required="true"/>
  <field name="type"          type="int"      indexed="true"    stored="true"   required="true"/>
  <field name="address"       type="addr_ru"  indexed="true"    stored="true"   required="true"/>
  <field name="title"         type="text_ru"  indexed="true"    stored="true"   required="true"/>
  <field name="text"          type="text_ru"  indexed="true"    stored="true"   required="true"/>
  <field name="dateFound"     type="tdate"    indexed="true"    stored="true"   required="true"/>
  <!-- should be a string field (not int) to avoid cutting zero at beginning of phone number -->
  <field name="phoneNumbers"  type="string"   indexed="true"    stored="true"   required="true"   multiValued="true"/>
  <field name="ownerDetected" type="boolean"  indexed="true"    stored="true"   required="true"/>
  <field name="priceUsd"      type="long"     indexed="true"    stored="true"   required="false"/>
  <field name="priceLocal"    type="long"     indexed="false"   stored="true"   required="false"/>
  <field name="currency"      type="int"      indexed="false"   stored="true"   required="false"/>
  <field name="roomsCount"    type="int"      indexed="true"    stored="true"   required="false"/>
  <field name="area"          type="int"      indexed="true"    stored="true"   required="false"/>

  <field name="dateUpdated"   type="tdate"    indexed="true"    stored="true"   required="true"/>
  <field name="dateClosed"    type="tdate"    indexed="true"    stored="true"   required="false"/>
  <field name="m2priceRel"    type="float"    indexed="true"    stored="true"   required="false"/>
  <field name="ceddData"      type="binary"   indexed="false"   stored="true"   required="false"  multiValued="true"/>
  <field name="imagesCount"   type="int"      indexed="true"    stored="true"   required="true"/>
  <field name="uniqAdTexts"   type="string"   indexed="false"   stored="true"   required="true"   multiValued="true"/>
</schema>

最大索引:
ru_ads:2.99gb
ru_objects: 3.25gb
ua_ads:5.45gb
ua_objects: 2.36gb
其他核心指标相对较小

运行时间过长的查询("too long" 来自客户端)看起来像这样(取自 SOLR 日志,“????”只是非英文字母)

400723188 [qtp64700533-40547] INFO  org.apache.solr.core.SolrCore  ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+????????\+???????\+????????)+AND+type:3+AND+regionId:2+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[2+TO+2])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+60])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[23500+TO+70500])+AND+dateUpdated:[2014-12-09T10:23:07Z+TO+2015-01-28T10:23:07Z]+AND+-objectId:(27824841)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=18 status=0 QTime=287

401989528 [qtp64700533-40830] INFO  org.apache.solr.core.SolrCore  ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(?????????????\+??????)+AND+type:4+AND+regionId:162+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[40+TO+58])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9+TO+27])+AND+dateUpdated:[2014-12-09T10:44:08Z+TO+2015-01-28T10:44:08Z]+AND+-objectId:(26415616)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=820 status=0 QTime=5755

400832723 [qtp64700533-40322] INFO  org.apache.solr.core.SolrCore  ? [ru-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(????????\+???????)+AND+type:4+AND+regionId:102+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[31+TO+45])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[115+TO+343])+AND+dateUpdated:[2014-12-09T10:24:57Z+TO+2015-01-28T10:24:57Z]+AND+-objectId:(26415342)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=9 status=0 QTime=372

402069370 [qtp64700533-40832] INFO  org.apache.solr.core.SolrCore  ? [ru-objects] webapp=/solr path=/select params={mm=1&fl=*&start=0&q=(????????\+?????????\+??\+????????)+AND+type:3+AND+regionId:135+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[1+TO+1])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[28+TO+40])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[9529+TO+28585])+AND+dateUpdated:[2014-10-30T10:45:33Z+TO+2015-01-28T10:45:33Z]+AND+-objectId:(26415855)&qf=address^20+title^2+text&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=14075 status=0 QTime=544

401805198 [qtp64700533-40233] INFO  org.apache.solr.core.SolrCore  ? [ua-objects] webapp=/solr path=/select params={mm=2&fl=*&start=0&q=(??????\+??\+??????\+?????\+??????????)+AND+type:3+AND+regionId:16+AND+((*:*+AND+-roomsCount:[*+TO+*])+OR+roomsCount:[3+TO+3])+AND+((*:*+AND+-area:[*+TO+*])+OR+area:[93+TO+95])+AND+((*:*+AND+-priceUsd:[*+TO+*])+OR+priceUsd:[284050+TO+313950])+AND+dateUpdated:[2015-01-08T10:41:09Z+TO+2015-01-28T10:41:09Z]+AND+-objectId:(27826334)&qf=address^20+title^2&wt=javabin&version=2&defType=edismax&rows=2147483647} hits=6 status=0 QTime=462

这是来自 jvisualvm 的最新分析截图

"top" 命令的一部分,延迟=10 秒

您在每个查询中都提供了参数 rows=2147483647。这个参数的意思是(取自参考资料)

You can use the rows parameter to paginate results from a query. The parameter specifies the maximum number of documents from the complete result set that Solr should return to the client at one time.

The default value is 10. That is, by default, Solr returns 10 documents at a time in response to a query.

因此,您实际上是在告诉 Solr 在单个响应中发送为查询找到的所有匹配项。这就是你表现不佳的原因。

google 是否将找到的所有 500.000.000 次匹配发送给您​​ when querying for "java",没有。为什么不,性能。我知道的每一个 IR 应用程序都会为您提供一个包含第一个结果的小页面,以便搜索执行得很好。

这就是你高I/O的原因,solr从磁盘中获取记录并将它们写入响应。这是 I/O,仅此而已。

由于您正在使用它进行分析并希望提取所有匹配的内容,因此您应该查看新的 streaming export 功能。不幸的是,它仅在 Solr 4.10 中可用。

您也可以更新到 SSD - 这对 Solr 性能有很好的提升。

最后,检查您的缓存级别。如果您不经常更新并且某些缓存已满,则可以增加默认值。如果您确实经常更新,那么缓存在提交时就会失效。