CloudSearch deleteByQuery

CloudSearch deleteByQuery

Solr官方Java API有一个deleteByQuery操作,我们可以删除满足查询的文档。 AWS CloudSearch SDK 似乎没有匹配的功能。我只是没有看到 deleteByQuery 等价物,还是我们需要自己推出?

像这样:

SearchRequest searchRequest = new SearchRequest();
searchRequest.setQuery(queryString);
searchRequest.setReturn("id,version");
SearchResult searchResult = awsCloudSearch.search(searchRequest);
JSONArray docs = new JSONArray();
for (Hit hit : searchResult.getHits().getHit()) {
    JSONObject doc = new JSONObject();
    doc.put("id", hit.getId());
    // is version necessary?
    doc.put("version", hit.getFields().get("version").get(0));
    doc.put("type", "delete");
    docs.put(doc);
}
UploadDocumentsRequest uploadDocumentsRequest = new UploadDocumentsRequest();
StringInputStream documents = new StringInputStream(docs.toString());
uploadDocumentsRequest.setDocuments(documents);
UploadDocumentsResult uploadResult = awsCloudSearch.uploadDocuments(uploadDocumentsRequest);

这样合理吗?有没有更简单的方法?

您是对的,CloudSearch 没有与 deleteByQuery 等效的功能。您的方法看起来是下一个最好的方法。

不,version 不是必需的——它已随 CloudSearch 01-01-2013 API(又名 v2)一起删除。

CloudSearch 不提供删除查询,它支持删除的方式略有不同,即构建 json 对象只有文档 ID(要删除)并且操作应指定为删除。这些 json 个对象可以一起批处理,但批处理大小必须小于 5 MB。

以下class支持此功能,您只需将要删除的id数组传递给它的delete方法:

class AWS_CS
{
    protected $client;

    function connect($domain)
    {
        try{
            $csClient = CloudSearchClient::factory(array(
                            'key'          => 'YOUR_KEY',
                            'secret'      => 'YOUR_SECRET',
                            'region'     =>  'us-east-1'

                        ));

            $this->client = $csClient->getDomainClient(
                        $domain,
                        array(
                            'credentials' => $csClient->getCredentials(),
                            'scheme' => 'HTTPS'
                        )
                    );
        }
        catch(Exception $ex){
            echo "Exception: ";
            echo $ex->getMessage();
        }
        //$this->client->addSubscriber(LogPlugin::getDebugPlugin());        
    }
    function search($queryStr, $domain){

        $this->connect($domain);

        $result = $this->client->search(array(
            'query' => $queryStr,
            'queryParser' => 'lucene',
            'size' => 100,
            'return' => '_score,_all_fields'
            ))->toArray();

        return json_encode($result['hits']);
        //$hitCount = $result->getPath('hits/found');
        //echo "Number of Hits: {$hitCount}\n";
    }

    function deleteDocs($idArray, $operation = 'delete'){

        $batch = array();

        foreach($idArray as $id){
            //dumpArray($song);
            $batch[] = array(
                        'type'        => $operation,
                        'id'        => $id);                       
        }
        $batch = array_filter($batch);
        $jsonObj = json_encode($batch, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP);

        print_r($this->client->uploadDocuments(array(
                        'documents'     => $jsonObj,
                        'contentType'     =>'application/json'
                    )));

        return $result['status'] == 'success' ? mb_strlen($jsonObj) : 0;
    }   
}

针对 C# 进行了修改 - 在云搜索中删除上传的文档

 public void DeleteUploadedDocuments(string location)
    {
        SearchRequest searchRequest = new SearchRequest { };
        searchRequest = new SearchRequest { Query = "resourcename:'filepath'", QueryParser = QueryParser.Lucene, Size = 10000 };
        searchClient = new AmazonCloudSearchDomainClient( ConfigurationManager.AppSettings["awsAccessKeyId"]  ,  ConfigurationManager.AppSettings["awsSecretAccessKey"]  , new AmazonCloudSearchDomainConfig { ServiceURL = ConfigurationManager.AppSettings["CloudSearchEndPoint"] });

        SearchResponse searchResponse = searchClient.Search(searchRequest);
        JArray docs = new JArray();

        foreach (Hit hit in searchResponse.Hits.Hit)
        {
            JObject doc = new JObject();
            doc.Add("id", hit.Id);
            doc.Add("type", "delete");
            docs.Add(doc);
        }

        UpdateIndexDocument<JArray>(docs, ConfigurationManager.AppSettings["CloudSearchEndPoint"]);
    }

    public void UpdateIndexDocument<T>(T document, string DocumentUrl)
    {
        AmazonCloudSearchDomainConfig config = new AmazonCloudSearchDomainConfig { ServiceURL = DocumentUrl };
        AmazonCloudSearchDomainClient searchClient = new AmazonCloudSearchDomainClient( ConfigurationManager.AppSettings["awsAccessKeyId"]  ,  ConfigurationManager.AppSettings["awsSecretAccessKey"]   , config);
        using (Stream stream = GenerateStreamFromString(JsonConvert.SerializeObject(document)))
        {
            UploadDocumentsRequest upload = new UploadDocumentsRequest()
            {
                ContentType = "application/json",
                Documents = stream
            };
            searchClient.UploadDocuments(upload);
        };

    }