如何在二进制数据中搜索唯一序列？

Question

我正在尝试使用 header 读取二进制文件。我知道某些信息是在唯一序列 02 06 08 22 02 02 08 00 之后保存的。我怎么能找到这种唯一序列的位置？

我可以用

String StreamReadAsText（ScriptObject 流，数字编码，数字计数）

逐一读取二进制文件。但我想这很愚蠢而且很慢。

此外，当输出不是实际文本（Ascii Table 中的 00 和 1F 之间）时，我如何比较 StreamReadAsText() 的结果？

那么，如何将二进制文件读取为int8（与字符串中的字符大小相同）。例如，读取02，然后读取06，然后读取08等...

欢迎并感谢任何帮助。

此致，

罗杰

Answer 1

如果您使用的是现代机器，只需将文件加载到内存中，然后使用内存比较功能和移动索引扫描序列。

这不是内存效率最高的处理方式，甚至也不是最快的处理方式，但它足够简单快速，前提是您有资源可以燃烧。

Answer 2

您已经在正确的轨道上使用流式命令读取文件。但是，为什么要将流作为文本读取？您可以将流读取为任何（支持的）数字，使用 tagGroup 对象作为 TagGroupReadTagDataFromStream() 的代理。

F1 帮助部分实际上有一个示例，其中列出了流式传输命令，我只是复制到这里。

 Object stream = NewStreamFromBuffer( NewMemoryBuffer( 256 ) )
 TagGroup tg = NewTagGroup();

 Number stream_byte_order = 1; // 1 == bigendian, 2 == littleendian
 Number v_uint32_0, v_uint32_1, v_sint32_0, v_uint16_0, v_uint16_1

 // Create the tags and initialize with default values
 tg.TagGroupSetTagAsUInt32( "UInt32_0", 0 )
 tg.TagGroupSetTagAsUInt32( "UInt32_1", 0 )
 tg.TagGroupSetTagAsLong( "SInt32_0", 0 )
 tg.TagGroupSetTagAsUInt16( "UInt16_0", 0 )
 tg.TagGroupSetTagAsUInt16( "UInt16_1", 0 )

 // Stream the data into the tags   
 TagGroupReadTagDataFromStream( tg, "UInt32_0", stream, stream_byte_order );
 TagGroupReadTagDataFromStream( tg, "UInt32_1", stream, stream_byte_order );
 TagGroupReadTagDataFromStream( tg, "SInt32_0", stream, stream_byte_order );
 TagGroupReadTagDataFromStream( tg, "UInt16_0", stream, stream_byte_order );
 TagGroupReadTagDataFromStream( tg, "UInt16_1", stream, stream_byte_order );

// Show the taggroup, if you want
// tg.TagGroupOpenBrowserWindow("AuxTags",0)

 // Get the data from the tags
 tg.TagGroupGetTagAsUInt32( "UInt32_0", v_uint32_0 )
 tg.TagGroupGetTagAsUInt32( "UInt32_1", v_uint32_1 )
 tg.TagGroupGetTagAsLong( "Sint32_0", v_sint32_0 )
 tg.TagGroupGetTagAsUInt16( "UInt16_0", v_uint16_0 )
 tg.TagGroupGetTagAsUInt16( "UInt16_1", v_uint16_1 )

网站上已有 post 关于在流中搜索模式的内容：这显示了您将如何使用流来查看图像，但您当然可以直接使用文件流。

作为替代方案，您可以在预先准备好合适的图像后使用 ImageReadImageDataFromStream 从流中读取整个数组。然后您可以使用图像来搜索位置。这将是一个例子：

// Example of reading the first X bytes of a file
// as uInt16 data

image ReadHeaderAsUint16( string filepath, number nBytes )
{
    number kEndianness = 0 // Default byte order of the current platform
    if ( !DoesFileExist( filePath ) ) 
        Throw( "File '" + filePath + "' not found." )
    number fileID = OpenFileForReading( filePath )
    object fStream = NewStreamFromFileReference( fileID, 1 )
    if ( nBytes > fStream.StreamGetSize() ) 
        Throw( "File '" + filePath + "' has less than " + nBytes + "bytes." )

    image buff := IntegerImage( "Header", 2, 0, nBytes/2 )  // UINT16 array of suitable size
    ImageReadImageDataFromStream( buff, fStream, kEndianness )
    return buff 
}

number FindSignature( image header, image search )
{
    // 1D images only
    if (        ( header.ImageGetNumDimensions() != 1 ) \
            ||  ( search.ImageGetNumDimensions() != 1 ) )
        Throw( "Only 1D images supported" )

    number sx = search.ImageGetDimensionSize( 0 ) 
    number hx = header.ImageGetDimensionSize( 0 )
    if ( hx < sx )
        return -1

    // Create a mask of possible start locations
    number startV = search.getPixel( 0, 0 )
    image mask = (header == startV) ? 1 : 0

    // Search all the occurances from the first
    number mx, my
    while( max( mask, mx, my ) )
    {
        if ( 0 == sum( header[0,mx,1,mx+sx] - search ) )
            return mx
        else
            mask.SetPixel( mx, 0, 0)
    }
    return -1
}

// Example
// 1) Load file header as image (up to the size you want )
string path = GetApplicationDirectory( "open_save", 0 )
number maxHeaderSize = 200
if ( !OpenDialog( NULL, "Select file to open", path, path ) ) Exit(0)
image headerImg := ReadHeaderAsUint16( path, maxHeaderSize  )
headerImg.ShowImage()

// 2) define search-header as image
image search := [8]: { 02, 06, 08, 22, 02, 02, 08, 00 }
// MatrixPrint( search )

// 3) search for it in the header
number foundAt = FindSignature( headerImg, search )
if ( -1 == foundAt ) 
    Throw( "The file header does not contain the search pattern." )
else
    OKDialog( "Found the search pattern at offset: " + foundAt * 16 + "bytes" )

如何在二进制数据中搜索唯一序列？

How to search for a unique sequence in binary data?

binary

search

header

file

dm-script