如何让 CAS 在部分更新期间更新一小部分记录属性?
How do I get CAS to update a small subset of record properties during a partial update?
我在 Oracle Commerce 11.1 上,在一个应用程序上 运行 仅 CAS(没有 Forge)。
基线更新工作正常。我对部分更新有疑问。
我们有一个包含需要更新的记录子集的提取文件。但是,此文件仅列出每条记录的一小部分属性(即它仅提供实际更改的属性)。
当我进行部分更新时(使用 CAS-only 部署模板附带的默认机制),它成功完成但更新的记录只有文件中提供的字段子集 - 所有没有改变的字段只是丢失了。就好像 CAS 只是将现有记录(具有完整的属性集)替换为仅包含提取文件中的少数属性的新记录。
例如,假设其中一条记录如下所示:
Record 23
---------
id 23
name Test
inventoryCount 23
buyable 1
imageUrl test.jpg
并说部分提取文件有这样的条目
Record 23
---------
id 23
inventoryCount 10
我在部分更新后得到的结果是这样的:
Record 23
---------
id 23
inventoryCount 10
我想知道如何让 CAS 保留这些属性而不是删除它们。我知道 Forge 可以做到这一点。
我已经确认并没有真正明确的机制来执行此操作,所以我发明了自己的机制。
总结一下它是如何工作的:我定制了 PartialUpdate beanshell 脚本,以便在最后一英里爬行运行后立即调用我创建的名为 DGIDXTransformer 的自定义组件(即它扩展了 CustomComponent)。 class 解压缩并解析最后一英里爬网创建的文件,该文件应该被送入 DGIDX 并写出该文件的修改版本。具体来说,它会修改所有更新信息,以便更新记录而不是用新属性替换。这有点 hacky,因为没有记录 DGIDX 输入文件的格式,但根据我的研究,这种格式在未来的 Endeca 版本中不太可能发生太大变化。
这是 DGIDXTransformer:
import com.endeca.soleng.eac.toolkit.component.*;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.io.*;
import java.nio.file.AccessDeniedException;
import java.nio.file.Files;
import java.util.Map;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
/**
* Custom component which runs during the PartialUpdate beanshell script. It transforms the DGIDX-compatible input file
* that CAS produces so that records will be updated instead of replaced.
*
* Expects only one property entry called "dgidxInputFileDirectory", specifying the directory to look in to
* find the file to transform (relative to the config directory).
*
* @author chairbender
*/
public class DGIDXTransformer extends CustomComponent {
private static final String DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME = "dgidxInputFileDirectory";
private static final String RECORD_SPEC_PROPERTY_NAME = "record.spec";
/**
* Does the transformation as specified in the class javadoc.
*/
public void transformDGIDXInputFileToUpdateInsteadOfReplace() throws Exception {
//Find the file in the directory
Map<String, String> properties = getProperties();
if (null == properties || !properties.containsKey(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME)) {
throw new Exception();
} else {
File directory = new File(properties.get(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME));
File[] gzipFiles = directory.listFiles(new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
return name.endsWith(".xml.gz");
}
});
if (gzipFiles == null || gzipFiles.length == 0) {
throw new Exception();
} else {
File gzipFile = gzipFiles[0];
File unzippedFile = unzipFile(gzipFile);
transformInputFile(unzippedFile, unzippedFile.getAbsolutePath().replace(".xml", "transformed.xml"));
//delete the extra files in a way that throws an exception if deletion fails
Files.delete(gzipFile.toPath());
Files.delete(unzippedFile.toPath());
}
}
}
/**
* Gzips the passed file and saves it at the specified location
* @param toGzip file to gzip
* @param outputPath where to output the gzipped file
*
*/
private void gzipFile(File toGzip,String outputPath) throws IOException {
byte[] buffer = new byte[1024];
GZIPOutputStream gzipOutputStream =
new GZIPOutputStream(new FileOutputStream(outputPath,false));
FileInputStream inputStream =
new FileInputStream(toGzip);
int len;
while ((len = inputStream.read(buffer)) > 0) {
gzipOutputStream.write(buffer, 0, len);
}
inputStream.close();
gzipOutputStream.finish();
gzipOutputStream.close();
inputStream.close();
}
/**
*
* @param unzippedFile file representing DGIDX input data to transform
* @param transformedFilePath path where transformed file should go.
* @return the transformed file
*/
private File transformInputFile(File unzippedFile, String transformedFilePath) throws IOException {
File outputFile = new File(transformedFilePath);
//Since the XML and the transformation isn't very complicated, we'll just write it out line by line as we go through the
//unzipped file line-by-line
BufferedReader unzippedFileReader = new BufferedReader(new FileReader(unzippedFile));
BufferedWriter outputFileWriter = new BufferedWriter(new FileWriter(outputFile));
String nextLine;
while ((nextLine = unzippedFileReader.readLine()) != null) {
if (nextLine.contains("RECORD_ADD_OR_REPLACE")) {
//If the line contains RECORD_ADD_OR_REPLACE, need to change it to RECORD_UPDATE
outputFileWriter.write(nextLine.replace("RECORD_ADD_OR_REPLACE","RECORD_UPDATE"));
} else if (nextLine.contains("<PROP NAME=")) {
//if this line contains <PROP NAME="...">, and the property
//name isn't the record spec, we need to transform this element only if it isn't the record spec.
String propertyName = nextLine.split("\"")[1];
if (!propertyName.equals(RECORD_SPEC_PROPERTY_NAME)) {
//Read the property value from the next line
String propertyValueLine = unzippedFileReader.readLine();
String propertyValue = propertyValueLine.replace("<PVAL>","").replace("</PVAL>","").trim();
//Now write the PVAL_DELETE and PVAL_ADD entries
outputFileWriter.write("<PVAL_DELETE><PROPERTY_NAME NAME=\"" + propertyName + "\"/></PVAL_DELETE>");
outputFileWriter.write("<PVAL_ADD><PROP NAME=\"" + propertyName + "\"><PVAL>" + propertyValue + "</PVAL></PROP></PVAL_ADD>");
//Discard the closing element line of the input file
unzippedFileReader.readLine();
} else {
//it's not the record spec, so don't transform it.
outputFileWriter.write(nextLine);
}
} else {
//Just output the line
outputFileWriter.write(nextLine);
}
}
unzippedFileReader.close();
outputFileWriter.close();
return outputFile;
}
/**
*
* @param gzipFile file to un-gzip. Will create the un-gzipped version in the same directory as gzipFile,
* but without the ".gz" ending.
* @return the unzipped version of the file.
*/
private File unzipFile(File gzipFile) throws IOException {
//Un-gzip the file in one pass
GZIPInputStream gzipInputStream =
new GZIPInputStream(new FileInputStream(gzipFile));
File outputFile = new File(gzipFile.getAbsolutePath().replace(".gz",""));
FileOutputStream outputStream =
new FileOutputStream(outputFile);
int len;
byte[] buffer = new byte[1024];
while ((len = gzipInputStream.read(buffer)) > 0) {
outputStream.write(buffer, 0, len);
}
gzipInputStream.close();
outputStream.close();
return outputFile;
}
}
这被编译成一个 JAR,进入 config/lib/java。
这是 DataIngest.xml 中的自定义组件定义:
<custom-component id="DGIDXTransformer" host-id="ITLHost" class="com.chairbender.DGIDXTransformer">
<properties>
<property name="dgidxInputFileDirectory" value="../data/cas_output" />
</properties>
</custom-component>
这里是自定义 PartialUpdate 脚本的相关部分:
CAS.runIncrementalCasCrawl("${lastMileCrawlName}");
DGIDXTransformer.transformDGIDXInputFileToUpdateInsteadOfReplace();
CAS.archiveDvalIdMappingsForCrawlIfChanged("${lastMileCrawlName}");
我在 Oracle Commerce 11.1 上,在一个应用程序上 运行 仅 CAS(没有 Forge)。
基线更新工作正常。我对部分更新有疑问。
我们有一个包含需要更新的记录子集的提取文件。但是,此文件仅列出每条记录的一小部分属性(即它仅提供实际更改的属性)。
当我进行部分更新时(使用 CAS-only 部署模板附带的默认机制),它成功完成但更新的记录只有文件中提供的字段子集 - 所有没有改变的字段只是丢失了。就好像 CAS 只是将现有记录(具有完整的属性集)替换为仅包含提取文件中的少数属性的新记录。
例如,假设其中一条记录如下所示:
Record 23
---------
id 23
name Test
inventoryCount 23
buyable 1
imageUrl test.jpg
并说部分提取文件有这样的条目
Record 23
---------
id 23
inventoryCount 10
我在部分更新后得到的结果是这样的:
Record 23
---------
id 23
inventoryCount 10
我想知道如何让 CAS 保留这些属性而不是删除它们。我知道 Forge 可以做到这一点。
我已经确认并没有真正明确的机制来执行此操作,所以我发明了自己的机制。
总结一下它是如何工作的:我定制了 PartialUpdate beanshell 脚本,以便在最后一英里爬行运行后立即调用我创建的名为 DGIDXTransformer 的自定义组件(即它扩展了 CustomComponent)。 class 解压缩并解析最后一英里爬网创建的文件,该文件应该被送入 DGIDX 并写出该文件的修改版本。具体来说,它会修改所有更新信息,以便更新记录而不是用新属性替换。这有点 hacky,因为没有记录 DGIDX 输入文件的格式,但根据我的研究,这种格式在未来的 Endeca 版本中不太可能发生太大变化。
这是 DGIDXTransformer:
import com.endeca.soleng.eac.toolkit.component.*;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import java.io.*;
import java.nio.file.AccessDeniedException;
import java.nio.file.Files;
import java.util.Map;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
/**
* Custom component which runs during the PartialUpdate beanshell script. It transforms the DGIDX-compatible input file
* that CAS produces so that records will be updated instead of replaced.
*
* Expects only one property entry called "dgidxInputFileDirectory", specifying the directory to look in to
* find the file to transform (relative to the config directory).
*
* @author chairbender
*/
public class DGIDXTransformer extends CustomComponent {
private static final String DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME = "dgidxInputFileDirectory";
private static final String RECORD_SPEC_PROPERTY_NAME = "record.spec";
/**
* Does the transformation as specified in the class javadoc.
*/
public void transformDGIDXInputFileToUpdateInsteadOfReplace() throws Exception {
//Find the file in the directory
Map<String, String> properties = getProperties();
if (null == properties || !properties.containsKey(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME)) {
throw new Exception();
} else {
File directory = new File(properties.get(DGIDX_INPUT_FILE_DIRECTORY_PROPERTY_NAME));
File[] gzipFiles = directory.listFiles(new FilenameFilter() {
@Override
public boolean accept(File dir, String name) {
return name.endsWith(".xml.gz");
}
});
if (gzipFiles == null || gzipFiles.length == 0) {
throw new Exception();
} else {
File gzipFile = gzipFiles[0];
File unzippedFile = unzipFile(gzipFile);
transformInputFile(unzippedFile, unzippedFile.getAbsolutePath().replace(".xml", "transformed.xml"));
//delete the extra files in a way that throws an exception if deletion fails
Files.delete(gzipFile.toPath());
Files.delete(unzippedFile.toPath());
}
}
}
/**
* Gzips the passed file and saves it at the specified location
* @param toGzip file to gzip
* @param outputPath where to output the gzipped file
*
*/
private void gzipFile(File toGzip,String outputPath) throws IOException {
byte[] buffer = new byte[1024];
GZIPOutputStream gzipOutputStream =
new GZIPOutputStream(new FileOutputStream(outputPath,false));
FileInputStream inputStream =
new FileInputStream(toGzip);
int len;
while ((len = inputStream.read(buffer)) > 0) {
gzipOutputStream.write(buffer, 0, len);
}
inputStream.close();
gzipOutputStream.finish();
gzipOutputStream.close();
inputStream.close();
}
/**
*
* @param unzippedFile file representing DGIDX input data to transform
* @param transformedFilePath path where transformed file should go.
* @return the transformed file
*/
private File transformInputFile(File unzippedFile, String transformedFilePath) throws IOException {
File outputFile = new File(transformedFilePath);
//Since the XML and the transformation isn't very complicated, we'll just write it out line by line as we go through the
//unzipped file line-by-line
BufferedReader unzippedFileReader = new BufferedReader(new FileReader(unzippedFile));
BufferedWriter outputFileWriter = new BufferedWriter(new FileWriter(outputFile));
String nextLine;
while ((nextLine = unzippedFileReader.readLine()) != null) {
if (nextLine.contains("RECORD_ADD_OR_REPLACE")) {
//If the line contains RECORD_ADD_OR_REPLACE, need to change it to RECORD_UPDATE
outputFileWriter.write(nextLine.replace("RECORD_ADD_OR_REPLACE","RECORD_UPDATE"));
} else if (nextLine.contains("<PROP NAME=")) {
//if this line contains <PROP NAME="...">, and the property
//name isn't the record spec, we need to transform this element only if it isn't the record spec.
String propertyName = nextLine.split("\"")[1];
if (!propertyName.equals(RECORD_SPEC_PROPERTY_NAME)) {
//Read the property value from the next line
String propertyValueLine = unzippedFileReader.readLine();
String propertyValue = propertyValueLine.replace("<PVAL>","").replace("</PVAL>","").trim();
//Now write the PVAL_DELETE and PVAL_ADD entries
outputFileWriter.write("<PVAL_DELETE><PROPERTY_NAME NAME=\"" + propertyName + "\"/></PVAL_DELETE>");
outputFileWriter.write("<PVAL_ADD><PROP NAME=\"" + propertyName + "\"><PVAL>" + propertyValue + "</PVAL></PROP></PVAL_ADD>");
//Discard the closing element line of the input file
unzippedFileReader.readLine();
} else {
//it's not the record spec, so don't transform it.
outputFileWriter.write(nextLine);
}
} else {
//Just output the line
outputFileWriter.write(nextLine);
}
}
unzippedFileReader.close();
outputFileWriter.close();
return outputFile;
}
/**
*
* @param gzipFile file to un-gzip. Will create the un-gzipped version in the same directory as gzipFile,
* but without the ".gz" ending.
* @return the unzipped version of the file.
*/
private File unzipFile(File gzipFile) throws IOException {
//Un-gzip the file in one pass
GZIPInputStream gzipInputStream =
new GZIPInputStream(new FileInputStream(gzipFile));
File outputFile = new File(gzipFile.getAbsolutePath().replace(".gz",""));
FileOutputStream outputStream =
new FileOutputStream(outputFile);
int len;
byte[] buffer = new byte[1024];
while ((len = gzipInputStream.read(buffer)) > 0) {
outputStream.write(buffer, 0, len);
}
gzipInputStream.close();
outputStream.close();
return outputFile;
}
}
这被编译成一个 JAR,进入 config/lib/java。
这是 DataIngest.xml 中的自定义组件定义:
<custom-component id="DGIDXTransformer" host-id="ITLHost" class="com.chairbender.DGIDXTransformer">
<properties>
<property name="dgidxInputFileDirectory" value="../data/cas_output" />
</properties>
</custom-component>
这里是自定义 PartialUpdate 脚本的相关部分:
CAS.runIncrementalCasCrawl("${lastMileCrawlName}");
DGIDXTransformer.transformDGIDXInputFileToUpdateInsteadOfReplace();
CAS.archiveDvalIdMappingsForCrawlIfChanged("${lastMileCrawlName}");