如何在众多字节中查找特定字节？

Question

我使用 Java 读取文件并使用 HexDump 输出数据。它看起来像这样：第一行和第二行： one:31 30 30 31 30 30 30 31 31 30 30 31 30 31 31 31 二：30 31 31 30 30 31 31 30 31 31 30 30 31 31 30 31 我想打印第一个“31 30 30 31"and the second "31 30 30 31”之间的数据。我理想的输出是 31 30 30 31 30 30 30 31 31 30 30 31 30 31 31 31 30 31。但是真正的输出是错误的，我想我的代码找不到data1.How中的31 30 30 31来弄明白？

我用的是jdk1.7，软件是idea

import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.File;
public class TestDemo{

  public static void main(String[] args) {


        try {
            File file = new File("/0testData/1.bin");
            DataInputStream isr = new DataInputStream(newFileInputStream(file));

            int bytesPerLine = 16;

            int byteCount = 0;
            int data;
            while ((data = isr.read()) != -1) {
                if (byteCount == 0)
                    System.out.println();
                else if (byteCount % bytesPerLine == 0)
                    System.out.printf("\n",byteCount );
                else
                    System.out.print(" ");


                String data1 = String.format("%02X",data & 0xFF);
                System.out.printf(data1);


                byteCount += 1;
                if(data1.contains("31 30 30 31")) {
                    int i=data1.indexOf("31 30 30 31",12);

                    System.out.println("find it！");
                    String strEFG=data1.substring(i,i+53);
                    System.out.println("str="+strEFG);
                }else {
                    System.out.println("cannot find it");
                }

            }

        } catch (Exception e) {
            System.out.println("Exception: " + e);
        }

    }
}

我理想的输出是 31 30 30 31 30 30 30 31 31 30 30 31 30 31 31 31 30 31。但真正的输出是：

31找不到 30找不到 30找不到 31找不到 30找不到 30找不到 30找不到 31找不到 31找不到 30找不到 30找不到 31找不到 30找不到 31找不到 31找不到 31找不到

30找不到 31找不到 31找不到 30找不到 30找不到 31找不到 31找不到 30找不到 31找不到 31找不到 30找不到 30找不到 31找不到 31找不到 30找不到 31找不到

31找不到 31找不到 31找不到 31找不到 30找不到 30找不到 30找不到 30找不到 30找不到 31找不到 30找不到 31找不到 30找不到 31找不到 31找不到 31找不到

31找不到 31找不到 30找不到 31找不到 31找不到 31找不到 31找不到 31找不到 31找不到 31找不到 30找不到 30找不到 31找不到 30找不到 31找不到 31找不到

30找不到 31找不到 31找不到 30找不到 30找不到 31找不到 31找不到 30找不到 30找不到 31找不到 30找不到 30找不到

Answer 1

感觉你输入的数据有点乱。不过，这可能会回答您的问题。

它并没有提供您所要求的完全相同的输出，但我认为您应该能够通过使用标志 "inPattern" 来调整它以打开或关闭输出。如果 inPattern 为真，则打印从文件中读取的数据，如果为假，则不打印从文件中读取的数据。

这可能不是最好的编码形式，因为它完全是静态方法 - 但它可以满足您的要求。

您的代码的问题（我认为）是 data1 将是一个 2 个字符的字符串。它不可能包含 11 个字符的字符串（“31 30 30 31”）。如果您尝试反转测试（即“31 30 30 31”.contains(data1)），那么它只会匹配一个字节 - 而不是您打算匹配的 4 个字节。

package hexdump;

import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.LinkedList;

public class HexDumpWithFilter {
//    private static final int beginPattern [] = { 0x47, 0x0d, 0x0a, 0x1a };
    private static final int beginPattern [] = { 0x00, 0x83, 0x7d, 0x8a };
    private static final int endPattern [] = { 0x23, 0x01, 0x78, 0xa5 };
    private static LinkedList<Integer> bytesRead = new LinkedList();

    public static void main(String[] args) {
        try {
            InputStream isr = new DataInputStream(new FileInputStream("C:\Temp\resistor.png"));
            int bytesPerLine = 16;
            int byteCount = 0;
            int data;
            boolean inPattern = false;
            while ((data = isr.read()) != -1) {
                // Capture the data just read into an input buffer.
                bytesRead.add(data);
                // If we have too much data in the input buffer to compare to our
                // pattern, peel off the first byte.
                // Note: This assumes that the begin pattern and end Pattern are the same lengths.
                if (bytesRead.size() > beginPattern.length) {
                    bytesRead.removeFirst();
                }

                // Output a byte count at the start of each new line of output.
                if (byteCount % bytesPerLine == 0)
                    System.out.printf("\n%04x:", byteCount);

                // Output the spacing - if we have found our pattern, then also output an asterisk
                System.out.printf(inPattern ? " *%02x" : "  %02x", data);

                // Finally check to see if we have found our pattern if we have enough bytes
                // in our bytesRead buffer.
                if (bytesRead.size() == beginPattern.length) {
                    // If we are not currently in a pattern, then check for the begin pattern
                    if (!inPattern && checkPattern(beginPattern, bytesRead)) {
                        inPattern = true;
                    }
                    // if we are currently in a pattern, then check for the end pattern.
                    if (inPattern && checkPattern (endPattern, bytesRead)) {
                        inPattern = false;
                    }
                }

                byteCount += 1;
            }
            System.out.println();
        } catch (Exception e) {
            System.out.println("Exception: " + e);
        }
    }

    /**
     * Function to check whether our input buffer read from the file matches
     * the supplied pattern.
     * @param pattern the pattern to look for in the buffer.
     * @param bytesRead the buffer of bytes read from the file.
     * @return true if pattern and bytesRead have the same content.
     */
    private static boolean checkPattern (int [] pattern, LinkedList<Integer> bytesRead) {
        int ptr = 0;
        boolean patternMatch = true;
        for (int br : bytesRead) {
            if (br != pattern[ptr++]) {
                patternMatch = false;
                break;
            }
        }
        return patternMatch;
    }
}

这段代码有一个小问题，它没有标记开始模式，但标记了结束模式。希望这对您来说不是问题。如果您需要正确标记开始或不标记结束，那么就会有另一个层次的复杂性。基本上你必须在文件中提前读取并将数据写出你正在读取的数据后面的 4 个字节。这可以通过在以下行打印缓冲区中的值来实现：

    bytesRead.removeFirst();

而不是打印从文件中读取的值（即 "data" 变量中的值）。

以下是运行针对电阻器图像的 PNG 文件生成的数据示例。

0000:  89  50  4e  47  0d  0a  1a  0a  00  00  00  0d  49  48  44  52
0010:  00  00  00  60  00  00  00  1b  08  06  00  00  00  83  7d  8a
0020: *3a *00 *00 *00 *09 *70 *48 *59 *73 *00 *00 *2e *23 *00 *00 *2e
0030: *23 *01 *78 *a5  3f  76  00  00  00  07  74  49  4d  45  07  e3
0040:  03  0e  17  1a  0f  c2  80  9c  d0  00  00  01  09  49  44  41
0050:  54  68  de  ed  9a  31  0b  82  40  18  86  cf  52  d4  a1  7e
0060:  45  4e  81  5b  a3  9b  10  ae  ae  4d  4d  61  7f  a1  21  1b
0070:  fa  0b  45  53  53  ab  ab  04  6e  42  4b  9b  d0  64  bf  a2
0080:  06  15  a9  6b  ef  14  82  ea  ec  e8  7d  c6  f7  0e  f1  be
0090:  e7  3b  0f  0e  25  4a  29  25  a0  31  5a  28  01  04  fc  35
00a0:  f2  73  e0  af  af  b5  93  fd  c9  8c  cd  36  cb  da  f9  ae
00b0:  ad  11  d3  50  84  2e  50  92  96  24  88  f2  ca  b1  41  7b
00c0:  cc  64  c7  db  b6  be  7e  5e  87  ef  0e  08  e3  82  64  85
00d0:  b8  47  4c  56  50  12  c6  85  b8  9f  20  1e  0b  10  bd  81
00e0:  64  1e  5b  38  49  cb  ca  31  e3  7c  67  b2  b4  c7  f6  c4
00f0:  62  da  65  b2  f9  ea  c2  64  a7  dd  90  c9  fa  a3  3d  0e
0100:  61  00  01  10  00  20  00  02  00  04  40  00  80  00  08  00
0110:  10  00  01  00  02  7e  82  af  5f  c6  99  86  42  5c  5b  7b
0120:  eb  19  be  f7  e2  8d  a4  77  f8  e8  bb  07  51  5e  7b  91
0130:  28  c4  0e  d0  55  89  38  96  2a  6c  77  3a  96  4a  74  55
0140:  12  57  00  8f  05  88  de  40  12  fe  8a  c0  21  0c  01  00
0150:  02  20  00  34  c3  03  f7  3f  46  9a  04  49  f8  9d  00  00
0160:  00  00  49  45  4e  44  ae  42  60  82

注意有些字节前面有星号？这些是 beginPattern 和 endPattern 内部的字节。

另请注意，我使用了 beginPattern 和 endPattern。你不需要这样做，我这样做只是为了让我更容易在我的 resistor.png 文件中找到一个模式来测试模式匹配。您可以为开始和结束使用一个变量，为两者设置相同的值，或者如果您想为开始和结束使用单一模式（例如“0x31、0x30、0x30、0x31”），则只需分配 endPattern = beginPattern。

如何在众多字节中查找特定字节？

How to find a specific byte in many bytes？

java

byte

hexdump