在 https 上使用 solr 的 Nutch
Nutch with solr on https
早上好,
我来找你是因为 Nutch (1.14)
和 Solr (7.2)
有问题
所以在我安装 SSL 之前一切正常。
在 http 中使用 Solr,抓取完成后我执行此命令
bin/nutch index -Dsolr.server.url=http://127.0.0.1:8983/solr/CORENAME crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
而且效果很好
但是一旦开启了SSL,solr服务器在HTTPS下,就无法将数据发送给solr了。
我在 nutch 站点中添加了以下属性
<name>solr.auth</name>
<value>true</value>
<property>
<name>solr.auth.username</name>
<value>xxxx</value>
<property>
<name>solr.auth.password</name>
<value>xxxx</value>
property>
<name>solr.server.type</name>
<value>https</value>
property>
<name>solr.server.url</name>
<value>https://127.0.0.1:8983/solr/CORENAME</value>
但是当我执行前面的命令时,我得到了这种类型的错误
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:8983/solr/CORENAME
&
caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
&
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
您是否成功将数据发送到 HTTPS solr?
谢谢
编辑
按照 SSL 程序 https://lucene.apache.org/solr/guide/7_0/enabling-ssl.html
修复此错误
最后执行这个
keytool -import -file /path/to/solr/solr-ssl.pem -alias solr_cert -keystore /path/to/java-cacert
(jre/lib/security/cacerts)
默认密码是 changeit
进步了一点,在cacerts中导入证书后,我就没有再出现这个错误了。
仍然在相同的上下文中,在 solr 服务器上激活 SSL 和身份验证之后。我使用 Nutch 来抓取 url 并将数据发送到 solr。
由于 SSL 的实施,我无法再向 SOLR 发送数据。
当我执行this bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/CORE -Dsolr.auth=true -Dsolr.auth.username='solr' -Dsolr.auth.password='xxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我有以下两个错误:
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/CORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/CORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/CORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/CORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
编辑:
第一个错误是由于身份验证错误。
填写正确的值后,我有一个新的错误,我不明白。你有什么想法吗?
2018-06-20 09:47:18,116 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default
2018-06-20 09:47:19,151 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: content dest: content
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: title dest: title
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: host dest: host
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: segment dest: segment
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: boost dest: boost
2018-06-20 09:47:19,195 INFO solr.SolrMappingReader - source: digest dest: digest
2018-06-20 09:47:19,195 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2018-06-20 09:47:19,525 INFO solr.SolrIndexWriter - Indexing 250/250 documents
2018-06-20 09:47:19,525 INFO solr.SolrIndexWriter - Deleting 0 documents
2018-06-20 09:47:19,808 INFO solr.SolrIndexWriter - Indexing 250/250 documents
2018-06-20 09:47:19,809 INFO solr.SolrIndexWriter - Deleting 0 documents
2018-06-20 09:47:19,951 WARN mapred.LocalJobRunner - job_local146539832_0001
java.lang.Exception: java.io.IOException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.io.IOException
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:234)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:213)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://localhost:8983/solr/ESRF-EXTERNAL
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:589)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
... 16 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:886)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:115)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:146)
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:96)
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:112)
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
... 20 more
2018-06-20 09:47:20,873 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
堆栈跟踪
name cpuTime / userTime
process reaper (37)
java.util.concurrent.SynchronousQueue$TransferStack@24197386
1.8587ms
0.0000ms
process reaper (36)
java.util.concurrent.SynchronousQueue$TransferStack@24197386
1.2672ms
0.0000ms
Scheduler-201556483 (31) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6202c0c1
1.1534ms
0.0000ms
searcherExecutor-7-thread-1 (30) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@663d5859
63.2030ms
50.0000ms
DestroyJavaVM (27)
1164.4748ms
1040.0000ms
Thread-12 (25)
java.lang.Object@233fcafa
0.1211ms
0.0000ms
Connection evictor (23)
0.9319ms
0.0000ms
Connection evictor (22)
2.0995ms
0.0000ms
org.eclipse.jetty.server.session.HashSessionManager@1a052a00Timer (21) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6626f9cf
4.2127ms
0.0000ms
qtp2012232625-20 (20) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
56.7955ms
50.0000ms
qtp2012232625-19 (19)
47.6864ms
40.0000ms
qtp2012232625-18 (18) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
79.3320ms
70.0000ms
qtp2012232625-17 (17) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
100.9593ms
90.0000ms
qtp2012232625-16-acceptor-0@2d033cc4-ServerConnector@23c4c714{SSL,[ssl, http/1.1]}{0.0.0.0:8983} (16)
4.5898ms
0.0000ms
qtp2012232625-15 (15) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
73.3096ms
60.0000ms
qtp2012232625-14 (14) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
18.7950ms
10.0000ms
qtp2012232625-13 (13)
79.7804ms
70.0000ms
qtp2012232625-12 (12)
70.2385ms
60.0000ms
qtp2012232625-11 (11) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
22.1012ms
10.0000ms
ShutdownMonitor (10)
0.3055ms
0.0000ms
Signal Dispatcher (5)
0.0873ms
0.0000ms
Finalizer (3)
java.lang.ref.ReferenceQueue$Lock@1e254491
8.2575ms
0.0000ms
Reference Handler (2)
java.lang.ref.Reference$Lock@431035b5
6.3846ms
0.0000ms
EDIT2
为了测试我禁用了身份验证以查看问题是否不是来自 https。无需身份验证即可使用!
我试图更改文件并将其包含在 jetty-https.xml 而不是 jetty.xml.
我有 2 个帐户这样配置
<security-constraint>
<web-resource-collection>
<web-resource-name>Solr authenticated application</web-resource-name>
<url-pattern>/</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>admin</role-name>
</auth-constraint>
</security-constraint>
<login-config>
<auth-method>BASIC</auth-method>
<realm-name>Test Realm</realm-name>
</login-config>
security.json
{
"authentication":{
"blockUnknown": true,
"class":"solr.BasicAuthPlugin",
"credentials":{"solr":"xxxx"}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{"name":"security-edit",
"role":"admin"}],
"user-role":{"solr":"admin"}
}}
当我执行以下命令时
bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/MYCORE -Dsolr.auth=true -Dsolr.auth.username='admin' -Dsolr.auth.password='xxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我收到这个错误
java.lang.Exception: java.io.IOException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.io.IOException
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:234)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:213)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://localhost:8983/solr/MYCORE
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:589)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
... 16 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:886)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:115)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:146)
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:96)
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:112)
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
... 20 more
2018-06-25 09:38:41,870 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
当我执行此操作时
bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/MYCORE -Dsolr.auth=true -Dsolr.auth.username='solr' -Dsolr.auth.password='xxxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我收到这个错误
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:544)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-06-25 09:45:20,106 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
或者现在这个:
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://127.0.0.1:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 503 </title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Service Unavailable</pre></p>
<hr />
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://127.0.0.1:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 503 </title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Service Unavailable</pre></p>
<hr />
</body>
</html>
Solr 日志:
2018-06-25 14:18:44.352 INFO (main) [ ] o.e.j.s.Server jetty-9.3.20.v20170531
2018-06-25 14:18:44.597 WARN (main) [ ] o.e.j.w.WebAppContext Failed startup of context o.e.j.w.WebAppContext@5891e32e{/solr,file:///app/solr-7.2.1/server/solr-webapp/webapp/,UNAVAILABLE}{/app/solr-7.2.1/server/solr-webapp/webapp}
java.lang.IllegalStateException: No LoginService for org.eclipse.jetty.security.authentication.BasicAuthenticator@64c87930 in org.eclipse.jetty.security.ConstraintSecurityHandler@400cff1a
at org.eclipse.jetty.security.authentication.LoginAuthenticator.setConfiguration(LoginAuthenticator.java:76)
at org.eclipse.jetty.security.SecurityHandler.doStart(SecurityHandler.java:354)
at org.eclipse.jetty.security.ConstraintSecurityHandler.doStart(ConstraintSecurityHandler.java:448)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.handler.ScopedHandler.doStart(ScopedHandler.java:120)
at org.eclipse.jetty.server.session.SessionHandler.doStart(SessionHandler.java:116)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.handler.ScopedHandler.doStart(ScopedHandler.java:120)
at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:809)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:345)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)
at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:147)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:458)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:561)
at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:236)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:422)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:389)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.xml.XmlConfiguration.run(XmlConfiguration.java:1520)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1442)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:215)
at org.eclipse.jetty.start.Main.start(Main.java:458)
at org.eclipse.jetty.start.Main.main(Main.java:76)
2018-06-25 14:18:44.745 INFO (main) [ ] o.e.j.s.Server Started @799ms
我通过在 /var/solr/data/ (SOLR_HOME)
上移动 security.json 解决了 "No LoginService"
编辑 3:
现在,当我想将 nutch 数据发送到 solr 时,我只会收到一条错误消息 "No allow"。我也无法再连接到管理界面,我得到了同样的错误。我认为它来自 security.json 文件
{
"authentication":{
"class":"solr.BasicAuthPlugin",
"credentials":{"solr":"xxxxxx"}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin"
"permissions":[{"name":"security-edit","role":"adminRole"},{"name":"collection-admin-edit","role":"adminRole"},{"name":"update","role":"adminRole"},{"name":"all","role":"adminRole"},{"name":"core-admin-edit","role":"adminRole"},{"name":"read","role":"adminRole"},{"name":"config-edit","role":"adminRole"},{"name":"core-admin-read","role":"adminRole"},{"name":"core-admin-read","role":"adminRole"}]
"user-role":{"solr":"adminRole"}
}}
我做错了什么?谢谢
我添加了一个新的答案,因为之前的答案太长了
已解决 API 身份验证但不使用 NUTCH:
因此,为了通过 API 进行身份验证,我删除了 jetty-https.xml
、webdefault.xml
中的配置,并删除了 realm.properties
文件以及 [=18] 中的基本身份验证选项=]
我只在 SOLR HOME
中处理 security.json
文件
其实最大的问题是我没有使用加密密码来测试连接,没有它就无法连接。另一方面,我仍然有 nutch 的问题,这是不允许的。
这是 security.json
文件
`{
"authentication":{
"class":"solr.BasicAuthPlugin",
"credentials":{
"solr":"hzMjhfgN4b9X8KR0QgLB2Um3cUzqDzJygtEBL/O7g5E= CkP7HyXjYvqKNF3F4hBjnVvKGQOkLc/ta4FaNIkqgII="
}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
{
"name":"security-edit",
"role":"adminRole"
},
{
"name":"collection-admin-edit",
"role":"adminRole"
},
{
"name":"update",
"role":"adminRole"
},
{
"name":"config-edit",
"role":"adminRole"
},
{
"name":"core-admin-edit",
"role":"adminRole"
},
{
"name":"core-admin-read",
"role":"adminRole"
{
"name":"schema-edit",
"role":"adminRole"
},
{
"name":"all",
"role":"adminRole"
}
],
"user-role":{
"solr":"adminRole"
}
}
}
`
加密后的密码表示值"test"
为了测试文件代码,我建议这样 http://json.parser.online.fr/
为了能够使用 nutch 更新 solr,我可能错过了什么?
已解决
添加数据导入的更新角色路径
{
"name":"update",
"path":"/dataimport",
"role":"adminRole"
},
但是现在我可以将 nutch 索引到 solr 但是我在爬行时遇到了一个新错误...
`Thu Jun 28 09:21:03 CEST 2018 : Iteration 2 of 5
Generating a new segment
/app/nutch-external/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true crawl//crawldb crawl//segments -topN 50000 -numFetchers 1 -noFilter
Generator: starting at 2018-06-28 09:21:04
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: false
Generator: normalizing: true
Generator: topN: 50000
Generator: 0 records selected for fetching, exiting ...
Generate returned 1 (no new segments created)
Escaping loop: no more URLs to fetch now
`
在第一次迭代期间我遇到了这个错误
`Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'name' = adminRole
No form element found with 'name' = adminRole
Supported authentication schemes in the order of preference: [ntlm, digest, basic]
Supported authentication schemes in the order of preference: [ntlm, digest, basic]
Challenge for ntlm authentication scheme not available
Challenge for ntlm authentication scheme not available
Challenge for digest authentication scheme not available
basic authentication scheme selected
Using authentication scheme: basic
Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'name' = adminRole
Failed to get protocol output
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form exists: adminRole
at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:506)
at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:183)
at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:276)
at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:342)
Caused by: java.lang.IllegalArgumentException: No form exists: adminRole
at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219)
at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:504)
... 3 more
Challenge for digest authentication scheme not available
basic authentication scheme selected
Using authentication scheme: basic
Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
`
它仍然是 security.json
文件。你有什么想法吗?谢谢
已解决
我通过在 /nutch/conf/
中配置 httpclient-auth.xml
解决了这个问题
<auth-configuration>
<credentials username="solr" password="xxxxx">
<authscope host="localhost" port="8983"/>
</credentials>
</auth-configuration>
感谢您的帮助
早上好,
我来找你是因为 Nutch (1.14)
和 Solr (7.2)
所以在我安装 SSL 之前一切正常。
在 http 中使用 Solr,抓取完成后我执行此命令
bin/nutch index -Dsolr.server.url=http://127.0.0.1:8983/solr/CORENAME crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
而且效果很好
但是一旦开启了SSL,solr服务器在HTTPS下,就无法将数据发送给solr了。 我在 nutch 站点中添加了以下属性
<name>solr.auth</name>
<value>true</value>
<property>
<name>solr.auth.username</name>
<value>xxxx</value>
<property>
<name>solr.auth.password</name>
<value>xxxx</value>
property>
<name>solr.server.type</name>
<value>https</value>
property>
<name>solr.server.url</name>
<value>https://127.0.0.1:8983/solr/CORENAME</value>
但是当我执行前面的命令时,我得到了这种类型的错误
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://127.0.0.1:8983/solr/CORENAME
&
caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
&
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
您是否成功将数据发送到 HTTPS solr? 谢谢
编辑 按照 SSL 程序 https://lucene.apache.org/solr/guide/7_0/enabling-ssl.html
修复此错误最后执行这个
keytool -import -file /path/to/solr/solr-ssl.pem -alias solr_cert -keystore /path/to/java-cacert
(jre/lib/security/cacerts)
默认密码是 changeit
进步了一点,在cacerts中导入证书后,我就没有再出现这个错误了。
仍然在相同的上下文中,在 solr 服务器上激活 SSL 和身份验证之后。我使用 Nutch 来抓取 url 并将数据发送到 solr。 由于 SSL 的实施,我无法再向 SOLR 发送数据。
当我执行this bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/CORE -Dsolr.auth=true -Dsolr.auth.username='solr' -Dsolr.auth.password='xxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我有以下两个错误:
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/CORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/CORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/CORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/CORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
编辑: 第一个错误是由于身份验证错误。 填写正确的值后,我有一个新的错误,我不明白。你有什么想法吗?
2018-06-20 09:47:18,116 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default
2018-06-20 09:47:19,151 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: content dest: content
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: title dest: title
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: host dest: host
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: segment dest: segment
2018-06-20 09:47:19,194 INFO solr.SolrMappingReader - source: boost dest: boost
2018-06-20 09:47:19,195 INFO solr.SolrMappingReader - source: digest dest: digest
2018-06-20 09:47:19,195 INFO solr.SolrMappingReader - source: tstamp dest: tstamp
2018-06-20 09:47:19,525 INFO solr.SolrIndexWriter - Indexing 250/250 documents
2018-06-20 09:47:19,525 INFO solr.SolrIndexWriter - Deleting 0 documents
2018-06-20 09:47:19,808 INFO solr.SolrIndexWriter - Indexing 250/250 documents
2018-06-20 09:47:19,809 INFO solr.SolrIndexWriter - Deleting 0 documents
2018-06-20 09:47:19,951 WARN mapred.LocalJobRunner - job_local146539832_0001
java.lang.Exception: java.io.IOException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.io.IOException
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:234)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:213)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://localhost:8983/solr/ESRF-EXTERNAL
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:589)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
... 16 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:886)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:115)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:146)
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:96)
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:112)
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
... 20 more
2018-06-20 09:47:20,873 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
堆栈跟踪
name cpuTime / userTime
process reaper (37)
java.util.concurrent.SynchronousQueue$TransferStack@24197386
1.8587ms
0.0000ms
process reaper (36)
java.util.concurrent.SynchronousQueue$TransferStack@24197386
1.2672ms
0.0000ms
Scheduler-201556483 (31) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6202c0c1
1.1534ms
0.0000ms
searcherExecutor-7-thread-1 (30) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@663d5859
63.2030ms
50.0000ms
DestroyJavaVM (27)
1164.4748ms
1040.0000ms
Thread-12 (25)
java.lang.Object@233fcafa
0.1211ms
0.0000ms
Connection evictor (23)
0.9319ms
0.0000ms
Connection evictor (22)
2.0995ms
0.0000ms
org.eclipse.jetty.server.session.HashSessionManager@1a052a00Timer (21) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6626f9cf
4.2127ms
0.0000ms
qtp2012232625-20 (20) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
56.7955ms
50.0000ms
qtp2012232625-19 (19)
47.6864ms
40.0000ms
qtp2012232625-18 (18) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
79.3320ms
70.0000ms
qtp2012232625-17 (17) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
100.9593ms
90.0000ms
qtp2012232625-16-acceptor-0@2d033cc4-ServerConnector@23c4c714{SSL,[ssl, http/1.1]}{0.0.0.0:8983} (16)
4.5898ms
0.0000ms
qtp2012232625-15 (15) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
73.3096ms
60.0000ms
qtp2012232625-14 (14) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
18.7950ms
10.0000ms
qtp2012232625-13 (13)
79.7804ms
70.0000ms
qtp2012232625-12 (12)
70.2385ms
60.0000ms
qtp2012232625-11 (11) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@12d62902
22.1012ms
10.0000ms
ShutdownMonitor (10)
0.3055ms
0.0000ms
Signal Dispatcher (5)
0.0873ms
0.0000ms
Finalizer (3)
java.lang.ref.ReferenceQueue$Lock@1e254491
8.2575ms
0.0000ms
Reference Handler (2)
java.lang.ref.Reference$Lock@431035b5
6.3846ms
0.0000ms
EDIT2 为了测试我禁用了身份验证以查看问题是否不是来自 https。无需身份验证即可使用! 我试图更改文件并将其包含在 jetty-https.xml 而不是 jetty.xml.
我有 2 个帐户这样配置
<security-constraint>
<web-resource-collection>
<web-resource-name>Solr authenticated application</web-resource-name>
<url-pattern>/</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>admin</role-name>
</auth-constraint>
</security-constraint>
<login-config>
<auth-method>BASIC</auth-method>
<realm-name>Test Realm</realm-name>
</login-config>
security.json
{
"authentication":{
"blockUnknown": true,
"class":"solr.BasicAuthPlugin",
"credentials":{"solr":"xxxx"}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[{"name":"security-edit",
"role":"admin"}],
"user-role":{"solr":"admin"}
}}
当我执行以下命令时
bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/MYCORE -Dsolr.auth=true -Dsolr.auth.username='admin' -Dsolr.auth.password='xxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我收到这个错误
java.lang.Exception: java.io.IOException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.io.IOException
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:234)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:213)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: https://localhost:8983/solr/MYCORE
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:589)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
... 16 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:886)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:115)
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:146)
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:96)
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:112)
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:481)
... 20 more
2018-06-25 09:38:41,870 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
当我执行此操作时
bin/nutch index -Dsolr.server.url=https://localhost:8983/solr/MYCORE -Dsolr.auth=true -Dsolr.auth.username='solr' -Dsolr.auth.password='xxxxx' crawltest/crawldb/ -linkdb crawltest/linkdb/ crawltest/segments/* -filter -normalize -deleteGone
我收到这个错误
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://localhost:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Unauthorized</pre></p>
</body>
</html>
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:544)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:229)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:210)
at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:174)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:87)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:50)
at org.apache.nutch.indexer.IndexerOutputFormat.write(IndexerOutputFormat.java:41)
at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:369)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-06-25 09:45:20,106 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
或者现在这个:
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://127.0.0.1:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 503 </title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Service Unavailable</pre></p>
<hr />
</body>
</html>
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://127.0.0.1:8983/solr/MYCORE: Expected mime type application/octet-stream but got text/html. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 503 </title>
</head>
<body>
<h2>HTTP ERROR: 503</h2>
<p>Problem accessing /solr/MYCORE/update. Reason:
<pre> Service Unavailable</pre></p>
<hr />
</body>
</html>
Solr 日志:
2018-06-25 14:18:44.352 INFO (main) [ ] o.e.j.s.Server jetty-9.3.20.v20170531
2018-06-25 14:18:44.597 WARN (main) [ ] o.e.j.w.WebAppContext Failed startup of context o.e.j.w.WebAppContext@5891e32e{/solr,file:///app/solr-7.2.1/server/solr-webapp/webapp/,UNAVAILABLE}{/app/solr-7.2.1/server/solr-webapp/webapp}
java.lang.IllegalStateException: No LoginService for org.eclipse.jetty.security.authentication.BasicAuthenticator@64c87930 in org.eclipse.jetty.security.ConstraintSecurityHandler@400cff1a
at org.eclipse.jetty.security.authentication.LoginAuthenticator.setConfiguration(LoginAuthenticator.java:76)
at org.eclipse.jetty.security.SecurityHandler.doStart(SecurityHandler.java:354)
at org.eclipse.jetty.security.ConstraintSecurityHandler.doStart(ConstraintSecurityHandler.java:448)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.handler.ScopedHandler.doStart(ScopedHandler.java:120)
at org.eclipse.jetty.server.session.SessionHandler.doStart(SessionHandler.java:116)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:105)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.handler.ScopedHandler.doStart(ScopedHandler.java:120)
at org.eclipse.jetty.server.handler.ContextHandler.startContext(ContextHandler.java:809)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:345)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1406)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1368)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:522)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)
at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:147)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:458)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:561)
at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:236)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:422)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:389)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.xml.XmlConfiguration.run(XmlConfiguration.java:1520)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1442)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:215)
at org.eclipse.jetty.start.Main.start(Main.java:458)
at org.eclipse.jetty.start.Main.main(Main.java:76)
2018-06-25 14:18:44.745 INFO (main) [ ] o.e.j.s.Server Started @799ms
我通过在 /var/solr/data/ (SOLR_HOME)
上移动 security.json 解决了 "No LoginService"编辑 3: 现在,当我想将 nutch 数据发送到 solr 时,我只会收到一条错误消息 "No allow"。我也无法再连接到管理界面,我得到了同样的错误。我认为它来自 security.json 文件
{
"authentication":{
"class":"solr.BasicAuthPlugin",
"credentials":{"solr":"xxxxxx"}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin"
"permissions":[{"name":"security-edit","role":"adminRole"},{"name":"collection-admin-edit","role":"adminRole"},{"name":"update","role":"adminRole"},{"name":"all","role":"adminRole"},{"name":"core-admin-edit","role":"adminRole"},{"name":"read","role":"adminRole"},{"name":"config-edit","role":"adminRole"},{"name":"core-admin-read","role":"adminRole"},{"name":"core-admin-read","role":"adminRole"}]
"user-role":{"solr":"adminRole"}
}}
我做错了什么?谢谢
我添加了一个新的答案,因为之前的答案太长了
已解决 API 身份验证但不使用 NUTCH:
因此,为了通过 API 进行身份验证,我删除了 jetty-https.xml
、webdefault.xml
中的配置,并删除了 realm.properties
文件以及 [=18] 中的基本身份验证选项=]
我只在 SOLR HOME
security.json
文件
其实最大的问题是我没有使用加密密码来测试连接,没有它就无法连接。另一方面,我仍然有 nutch 的问题,这是不允许的。
这是 security.json
文件
`{
"authentication":{
"class":"solr.BasicAuthPlugin",
"credentials":{
"solr":"hzMjhfgN4b9X8KR0QgLB2Um3cUzqDzJygtEBL/O7g5E= CkP7HyXjYvqKNF3F4hBjnVvKGQOkLc/ta4FaNIkqgII="
}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
{
"name":"security-edit",
"role":"adminRole"
},
{
"name":"collection-admin-edit",
"role":"adminRole"
},
{
"name":"update",
"role":"adminRole"
},
{
"name":"config-edit",
"role":"adminRole"
},
{
"name":"core-admin-edit",
"role":"adminRole"
},
{
"name":"core-admin-read",
"role":"adminRole"
{
"name":"schema-edit",
"role":"adminRole"
},
{
"name":"all",
"role":"adminRole"
}
],
"user-role":{
"solr":"adminRole"
}
}
}
`
加密后的密码表示值"test"
为了测试文件代码,我建议这样 http://json.parser.online.fr/
为了能够使用 nutch 更新 solr,我可能错过了什么?
已解决 添加数据导入的更新角色路径
{
"name":"update",
"path":"/dataimport",
"role":"adminRole"
},
但是现在我可以将 nutch 索引到 solr 但是我在爬行时遇到了一个新错误...
`Thu Jun 28 09:21:03 CEST 2018 : Iteration 2 of 5
Generating a new segment
/app/nutch-external/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true crawl//crawldb crawl//segments -topN 50000 -numFetchers 1 -noFilter
Generator: starting at 2018-06-28 09:21:04
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: false
Generator: normalizing: true
Generator: topN: 50000
Generator: 0 records selected for fetching, exiting ...
Generate returned 1 (no new segments created)
Escaping loop: no more URLs to fetch now
`
在第一次迭代期间我遇到了这个错误
`Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'name' = adminRole
No form element found with 'name' = adminRole
Supported authentication schemes in the order of preference: [ntlm, digest, basic]
Supported authentication schemes in the order of preference: [ntlm, digest, basic]
Challenge for ntlm authentication scheme not available
Challenge for ntlm authentication scheme not available
Challenge for digest authentication scheme not available
basic authentication scheme selected
Using authentication scheme: basic
Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
No form element found with 'name' = adminRole
Failed to get protocol output
java.lang.RuntimeException: java.lang.IllegalArgumentException: No form exists: adminRole
at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:506)
at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:183)
at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:276)
at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:342)
Caused by: java.lang.IllegalArgumentException: No form exists: adminRole
at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.getLoginFormParams(HttpFormAuthentication.java:219)
at org.apache.nutch.protocol.httpclient.HttpFormAuthentication.login(HttpFormAuthentication.java:95)
at org.apache.nutch.protocol.httpclient.Http.resolveCredentials(Http.java:504)
... 3 more
Challenge for digest authentication scheme not available
basic authentication scheme selected
Using authentication scheme: basic
Authorization challenge processed
No form element found with 'id' = adminRole, trying 'name'.
`
它仍然是 security.json
文件。你有什么想法吗?谢谢
已解决
我通过在 /nutch/conf/
httpclient-auth.xml
解决了这个问题
<auth-configuration>
<credentials username="solr" password="xxxxx">
<authscope host="localhost" port="8983"/>
</credentials>
</auth-configuration>
感谢您的帮助