如何在二进制数据中搜索唯一序列?
How to search for a unique sequence in binary data?
我正在尝试使用 header 读取二进制文件。我知道某些信息是在唯一序列 02 06 08 22 02 02 08 00 之后保存的。我怎么能找到这种唯一序列的位置?
我可以用
String StreamReadAsText(ScriptObject 流,数字编码,数字计数)
逐一读取二进制文件。但我想这很愚蠢而且很慢。
此外,当输出不是实际文本(Ascii Table 中的 00 和 1F 之间)时,我如何比较 StreamReadAsText() 的结果?
那么,如何将二进制文件读取为int8(与字符串中的字符大小相同)。例如,读取02,然后读取06,然后读取08等...
欢迎并感谢任何帮助。
此致,
罗杰
如果您使用的是现代机器,只需将文件加载到内存中,然后使用内存比较功能和移动索引扫描序列。
这不是内存效率最高的处理方式,甚至也不是最快的处理方式,但它足够简单快速,前提是您有资源可以燃烧。
您已经在正确的轨道上使用流式命令读取文件。但是,为什么要将流作为文本读取?您可以将流读取为任何(支持的)数字,使用 tagGroup 对象作为 TagGroupReadTagDataFromStream()
的代理。
F1 帮助部分实际上有一个示例,其中列出了流式传输命令,我只是复制到这里。
Object stream = NewStreamFromBuffer( NewMemoryBuffer( 256 ) )
TagGroup tg = NewTagGroup();
Number stream_byte_order = 1; // 1 == bigendian, 2 == littleendian
Number v_uint32_0, v_uint32_1, v_sint32_0, v_uint16_0, v_uint16_1
// Create the tags and initialize with default values
tg.TagGroupSetTagAsUInt32( "UInt32_0", 0 )
tg.TagGroupSetTagAsUInt32( "UInt32_1", 0 )
tg.TagGroupSetTagAsLong( "SInt32_0", 0 )
tg.TagGroupSetTagAsUInt16( "UInt16_0", 0 )
tg.TagGroupSetTagAsUInt16( "UInt16_1", 0 )
// Stream the data into the tags
TagGroupReadTagDataFromStream( tg, "UInt32_0", stream, stream_byte_order );
TagGroupReadTagDataFromStream( tg, "UInt32_1", stream, stream_byte_order );
TagGroupReadTagDataFromStream( tg, "SInt32_0", stream, stream_byte_order );
TagGroupReadTagDataFromStream( tg, "UInt16_0", stream, stream_byte_order );
TagGroupReadTagDataFromStream( tg, "UInt16_1", stream, stream_byte_order );
// Show the taggroup, if you want
// tg.TagGroupOpenBrowserWindow("AuxTags",0)
// Get the data from the tags
tg.TagGroupGetTagAsUInt32( "UInt32_0", v_uint32_0 )
tg.TagGroupGetTagAsUInt32( "UInt32_1", v_uint32_1 )
tg.TagGroupGetTagAsLong( "Sint32_0", v_sint32_0 )
tg.TagGroupGetTagAsUInt16( "UInt16_0", v_uint16_0 )
tg.TagGroupGetTagAsUInt16( "UInt16_1", v_uint16_1 )
网站上已有 post 关于在流中搜索模式的内容:
这显示了您将如何使用流来查看图像,但您当然可以直接使用文件流。
作为替代方案,您可以在预先准备好合适的图像后使用 ImageReadImageDataFromStream
从流中读取整个数组。
然后您可以使用图像来搜索位置。这将是一个例子:
// Example of reading the first X bytes of a file
// as uInt16 data
image ReadHeaderAsUint16( string filepath, number nBytes )
{
number kEndianness = 0 // Default byte order of the current platform
if ( !DoesFileExist( filePath ) )
Throw( "File '" + filePath + "' not found." )
number fileID = OpenFileForReading( filePath )
object fStream = NewStreamFromFileReference( fileID, 1 )
if ( nBytes > fStream.StreamGetSize() )
Throw( "File '" + filePath + "' has less than " + nBytes + "bytes." )
image buff := IntegerImage( "Header", 2, 0, nBytes/2 ) // UINT16 array of suitable size
ImageReadImageDataFromStream( buff, fStream, kEndianness )
return buff
}
number FindSignature( image header, image search )
{
// 1D images only
if ( ( header.ImageGetNumDimensions() != 1 ) \
|| ( search.ImageGetNumDimensions() != 1 ) )
Throw( "Only 1D images supported" )
number sx = search.ImageGetDimensionSize( 0 )
number hx = header.ImageGetDimensionSize( 0 )
if ( hx < sx )
return -1
// Create a mask of possible start locations
number startV = search.getPixel( 0, 0 )
image mask = (header == startV) ? 1 : 0
// Search all the occurances from the first
number mx, my
while( max( mask, mx, my ) )
{
if ( 0 == sum( header[0,mx,1,mx+sx] - search ) )
return mx
else
mask.SetPixel( mx, 0, 0)
}
return -1
}
// Example
// 1) Load file header as image (up to the size you want )
string path = GetApplicationDirectory( "open_save", 0 )
number maxHeaderSize = 200
if ( !OpenDialog( NULL, "Select file to open", path, path ) ) Exit(0)
image headerImg := ReadHeaderAsUint16( path, maxHeaderSize )
headerImg.ShowImage()
// 2) define search-header as image
image search := [8]: { 02, 06, 08, 22, 02, 02, 08, 00 }
// MatrixPrint( search )
// 3) search for it in the header
number foundAt = FindSignature( headerImg, search )
if ( -1 == foundAt )
Throw( "The file header does not contain the search pattern." )
else
OKDialog( "Found the search pattern at offset: " + foundAt * 16 + "bytes" )
我正在尝试使用 header 读取二进制文件。我知道某些信息是在唯一序列 02 06 08 22 02 02 08 00 之后保存的。我怎么能找到这种唯一序列的位置?
我可以用
String StreamReadAsText(ScriptObject 流,数字编码,数字计数)
逐一读取二进制文件。但我想这很愚蠢而且很慢。
此外,当输出不是实际文本(Ascii Table 中的 00 和 1F 之间)时,我如何比较 StreamReadAsText() 的结果?
那么,如何将二进制文件读取为int8(与字符串中的字符大小相同)。例如,读取02,然后读取06,然后读取08等...
欢迎并感谢任何帮助。
此致,
罗杰
如果您使用的是现代机器,只需将文件加载到内存中,然后使用内存比较功能和移动索引扫描序列。
这不是内存效率最高的处理方式,甚至也不是最快的处理方式,但它足够简单快速,前提是您有资源可以燃烧。
您已经在正确的轨道上使用流式命令读取文件。但是,为什么要将流作为文本读取?您可以将流读取为任何(支持的)数字,使用 tagGroup 对象作为 TagGroupReadTagDataFromStream()
的代理。
F1 帮助部分实际上有一个示例,其中列出了流式传输命令,我只是复制到这里。
Object stream = NewStreamFromBuffer( NewMemoryBuffer( 256 ) )
TagGroup tg = NewTagGroup();
Number stream_byte_order = 1; // 1 == bigendian, 2 == littleendian
Number v_uint32_0, v_uint32_1, v_sint32_0, v_uint16_0, v_uint16_1
// Create the tags and initialize with default values
tg.TagGroupSetTagAsUInt32( "UInt32_0", 0 )
tg.TagGroupSetTagAsUInt32( "UInt32_1", 0 )
tg.TagGroupSetTagAsLong( "SInt32_0", 0 )
tg.TagGroupSetTagAsUInt16( "UInt16_0", 0 )
tg.TagGroupSetTagAsUInt16( "UInt16_1", 0 )
// Stream the data into the tags
TagGroupReadTagDataFromStream( tg, "UInt32_0", stream, stream_byte_order );
TagGroupReadTagDataFromStream( tg, "UInt32_1", stream, stream_byte_order );
TagGroupReadTagDataFromStream( tg, "SInt32_0", stream, stream_byte_order );
TagGroupReadTagDataFromStream( tg, "UInt16_0", stream, stream_byte_order );
TagGroupReadTagDataFromStream( tg, "UInt16_1", stream, stream_byte_order );
// Show the taggroup, if you want
// tg.TagGroupOpenBrowserWindow("AuxTags",0)
// Get the data from the tags
tg.TagGroupGetTagAsUInt32( "UInt32_0", v_uint32_0 )
tg.TagGroupGetTagAsUInt32( "UInt32_1", v_uint32_1 )
tg.TagGroupGetTagAsLong( "Sint32_0", v_sint32_0 )
tg.TagGroupGetTagAsUInt16( "UInt16_0", v_uint16_0 )
tg.TagGroupGetTagAsUInt16( "UInt16_1", v_uint16_1 )
网站上已有 post 关于在流中搜索模式的内容:
作为替代方案,您可以在预先准备好合适的图像后使用 ImageReadImageDataFromStream
从流中读取整个数组。
然后您可以使用图像来搜索位置。这将是一个例子:
// Example of reading the first X bytes of a file
// as uInt16 data
image ReadHeaderAsUint16( string filepath, number nBytes )
{
number kEndianness = 0 // Default byte order of the current platform
if ( !DoesFileExist( filePath ) )
Throw( "File '" + filePath + "' not found." )
number fileID = OpenFileForReading( filePath )
object fStream = NewStreamFromFileReference( fileID, 1 )
if ( nBytes > fStream.StreamGetSize() )
Throw( "File '" + filePath + "' has less than " + nBytes + "bytes." )
image buff := IntegerImage( "Header", 2, 0, nBytes/2 ) // UINT16 array of suitable size
ImageReadImageDataFromStream( buff, fStream, kEndianness )
return buff
}
number FindSignature( image header, image search )
{
// 1D images only
if ( ( header.ImageGetNumDimensions() != 1 ) \
|| ( search.ImageGetNumDimensions() != 1 ) )
Throw( "Only 1D images supported" )
number sx = search.ImageGetDimensionSize( 0 )
number hx = header.ImageGetDimensionSize( 0 )
if ( hx < sx )
return -1
// Create a mask of possible start locations
number startV = search.getPixel( 0, 0 )
image mask = (header == startV) ? 1 : 0
// Search all the occurances from the first
number mx, my
while( max( mask, mx, my ) )
{
if ( 0 == sum( header[0,mx,1,mx+sx] - search ) )
return mx
else
mask.SetPixel( mx, 0, 0)
}
return -1
}
// Example
// 1) Load file header as image (up to the size you want )
string path = GetApplicationDirectory( "open_save", 0 )
number maxHeaderSize = 200
if ( !OpenDialog( NULL, "Select file to open", path, path ) ) Exit(0)
image headerImg := ReadHeaderAsUint16( path, maxHeaderSize )
headerImg.ShowImage()
// 2) define search-header as image
image search := [8]: { 02, 06, 08, 22, 02, 02, 08, 00 }
// MatrixPrint( search )
// 3) search for it in the header
number foundAt = FindSignature( headerImg, search )
if ( -1 == foundAt )
Throw( "The file header does not contain the search pattern." )
else
OKDialog( "Found the search pattern at offset: " + foundAt * 16 + "bytes" )