从 Fortran 文件中读取十六进制数据

reading hexadecimal data from file in fortran

在 Fortran 中,我试图在 Linux.

上读取一个包含 8 位(十六进制)字节数据的文件

在 'hexedit' 中,对于 tiff 文件,第一行看起来应该是这样。

49 49 2A 00  08 00 20 00  00 00 0B 02  00 00 00 00  II*... .........

我声明了一个两字节的字符变量(character(len=2) :: tifhead(8)) 像这样阅读:

      open(1,file=filename,access='stream')
      read(1) tifhead,greyvalue

我得到了前两个(49 49),它们在格式化写入中打印为 II (格式(2Z2),但不是其他格式。

我怎样才能得到所有这些十六进制值?我应该看到 49 49 2A 00 08 ....... .

您的 read 语句将简单地读取 tifhead(1) 的 2 个字符,tifhead(2) 的下两个字符,等等,包括 spaces。因此,您最终会得到 tifhead(1)="49"tifhead(2)=" 4"tifhead(3)="9 " 等。您认为您正确读取前 2 个字节只是因为您一个接一个地打印字符串“49”、“4”、“9”,...,所以它在输出中看起来像“49 49”。编译器无法知道有一个空格 space 分隔字符串和每四个数据有 2 spaces。

要正确读取数据,您必须使用格式化的 reading,这意味着您还必须在 open 语句中将 stream 声明为 'formatted'。以下示例显示了如何完成此操作:

program example
implicit none
character(len=2) :: tifhead(8), greyscale(8)
open(1, file="example.txt", access='stream', form='formatted')
read(1, "(4(a2,tr1),tr1,3(a2,tr1),a2)", advance='no') tifhead
read(1, "(tr2,4(a2,tr1),tr1,3(a2,tr1),a2)", advance='no') greyscale
close(1)
print "(a,7(a2,tr1),a2,a)", "  tifhead = (", tifhead, ")"
print "(a,7(a2,tr1),a2,a)", "greyscale = (", greyscale, ")"
end program example

也许需要一些解释:a2,tr1 表示读取 2 个字符的字符串,然后将读取指针前进一次(这会跳过您的十六进制“数字”之间的 space - 实际上,它们是仅视为字符串)。 4(a2,tr1) 表示这样做 4 次。这将读取前 4 个字节加一个 space。现在,在下一个要读取的数据之前多了一个space,所以我们添加tr1来跳过它,我们的格式是4(a2,tr1),tr1;然后我们用 3(a2,tr1) 再读取 3 个字节,然后只用 a2 读取最后一个字节(不跳过它后面的 space )。所以格式字符串是 (4(a2,tr1),tr1,3(a2,tr1),a2),它将正确读取前 8 个字节,将读取指针留在第 8 个字节之后。请注意 advance='no' 是必需的,否则 Fortran 将假设回车 return 并将跳过同一记录(行)中的其余数据。

现在,要读取接下来的 8 个字节,我们使用相同的格式,除了我们在开头添加 tr2 以跳过两个空白 space。我在程序中添加了格式化打印以检查数据是否被正确读取。 运行 程序给出:

  tifhead = (49 49 2A 00 08 00 20 00)
greyscale = (00 00 0B 02 00 00 00 00)

验证数据是否正确读取。

最后但同样重要的是,我建议避免在您的代码和上面的示例中使用老式的 Fortran。这意味着使用 newunit 让程序找到第一个空闲单元而不是明确给出单元号,有一些方法来检查你试图打开的文件是否实际存在或者如果你到达文件末尾,避免未命名参数,使用 dimension 属性来声明数组等。其中 None 是绝对必要的,乍一看可能看起来像是不必要的冗长。但从长远来看,运行 严格(正如现代 Fortran 所鼓励的那样)会在调试大型程序时为您节省大量时间。所以上面的例子可以(可以说应该)写成如下。

program example2
implicit none
integer :: unt, status
character(len=2), dimension(8) :: tifhead, greyscale
open(newunit=unt, file="example.txt", access='stream', form='formatted',&
     action='read', status='old', iostat=status)
if (status /= 0) then
  print "(a)","Error reading file."; stop
end if
! More sophisticated reading is probably needed to check for end of file.
read(unit=unt, fmt="(4(a2,tr1),tr1,3(a2,tr1),a2)", advance='no') tifhead
read(unit=unt, fmt="(tr2,4(a2,tr1),tr1,3(a2,tr1),a2)") greyscale
close(unit=unt)
print "(a,7(a2,tr1),a2,a)", "  tifhead = (", tifhead, ")"
print "(a,7(a2,tr1),a2,a)", "greyscale = (", greyscale, ")"
end program example2

假设你的数据实际上是以二进制格式存储的(实际上它似乎是一个tiff图像数据文件),我的第一个答案只有在你将数据转换为纯文本时才有效。如果你更喜欢直接读取二进制文件,我能想到的最简单的方法是用access='direct'打开文件,然后读取数据byte-by-byte。每个字节被读取为一个字符,然后将其转换为一个整数,我想这比应该表示十六进制数的字符串更有用。

例如,以下程序将从 tiff 数据文件中读取 header(前 8 个字节)。该示例从我发现的示例 tiff 图像中读取数据 here,但它适用于任何二进制文件。

program read_tiff_data
implicit none
integer :: unt, status, i
character :: ch
integer, dimension(8) :: tifhead

open(newunit=unt, file="flag_t24.tif", access='direct', form='unformatted',
     action='read', status='old', iostat=status, recl=1)
if (status /= 0) then
  print "(a)","Error reading file."; stop
end if
do i=1,8
  read(unit=unt, rec=i) ch; tifhead(i)=ichar(ch)
end do
close(unit=unt)

print "(a,7(i0,tr1),i0,a)", "tifhead = (", tifhead, ")"
end program read_tiff_data

程序给出了这个输出:

tifhead = (73 73 42 0 8 0 0 0)

这是正确的。您可以轻松扩展程序以从文件中读取更多数据。

如果您仍然需要十六进制表示,只需将打印语句中的 i0 替换为 z0,这样它就变成了

print "(a,7(z0,tr1),z0,a)", "tifhead = (", tifhead, ")"

这将以十六进制打印结果,在本例中:

tifhead = (49 49 2A 0 8 0 0 0)

我不确定我是否必须大量修改我以前的答案(因为我相信它们仍然有用),所以我决定再添加一个答案,希望是最后一个。对于冗长,我深表歉意。

以下 Fortran 90 模块提供了一个名为 tiff_reader_16bit 的子例程,它读取任何 TIFF 数据文件和 returns 其整数数组中的 16 位内容:

module tiff_reader
implicit none
private
public :: tiff_reader_16bit
contains
subroutine tiff_reader_16bit(filename, tifdata, ndata)
character(len=*), intent(in) :: filename
integer, allocatable, intent(out) :: tifdata(:)
integer, intent(out) :: ndata
integer, parameter :: max_integers=10000000
integer :: unt, status, record_length, i, records, lsb, msb
character ch;
integer, dimension(max_integers) :: temp
ndata=0
inquire(iolength=record_length) ch
open(newunit=unt, file=filename, access='direct', form='unformatted',&
     action='read', status='old', iostat=status, recl=record_length)
if (status /= 0) then
  print "(3a)","Error reading file """,filename,""": File not found."; return
end if
records=1
do i=1,max_integers
  read(unit=unt, rec=records, iostat=status) ch; msb=ichar(ch)
  if (status /= 0) then; records=records-1; ndata=i-1; exit; end if
  read(unit=unt, rec=records+1, iostat=status) ch; lsb=ichar(ch)
  if (status /= 0) then; ndata=i; temp(ndata)=msb; exit; end if
  temp(i)=lsb+256*msb; records=records+2
end do
close(unit=unt)
if (ndata==0) then
  print "(a)","File partially read."; records=records-1; ndata=max_integers
end if
allocate(tifdata(ndata), stat=status); tifdata=temp(:ndata)
print "(2(i0,a),/)",records," records read, ",ndata," 16-bit integers returned."
end subroutine tiff_reader_16bit
end module tiff_reader

该子例程获取 TIFF 文件名和 returns 一个整数数组,以及读取的整数总数。在内部,子程序使用固定大小的数组 temp 来临时存储数据。为了节省内存,子例程 returns 一个可分配数组 tifdata,它是 temp 的一部分,包含只读的数据。最大读取数据数在参数 max_integers 中设置为 1000 万,但如果需要且内存允许(在我的系统中大约为 21.4 亿整数),可以达到 huge(0);如果您使用整数的“更高”kind,它可以走得更远。现在,还有其他方法可以做到这一点,避免使用临时固定大小的数组,但这通常是以额外的计算时间为代价的,我不会那样做。也可以完成更复杂的实现,但是那会增加代码的复杂性,我认为它不适合这里。

由于您需要16位数据形式的结果,因此必须从文件中读取两个连续的字节,然后将它们视为最重要的字节在前,然后是次要的字节。这就是每次迭代中读取的第一个字节乘以 256 的原因。请注意,这在二进制文件中并不总是如此(但在 TIFF 中是这样)。一些二进制文件以次要字节为先。

这个子例程比我之前发布的例子更长,但那是因为我添加了错误检查,这实际上是必要的。您应该始终检查文件是否存在以及在读取文件时是否已到达文件末尾。还必须特别注意最后一个字节为“孤儿”的 TIFF 图像(我发现 here 的样本文件“FLAG_T24.TIF”确实属于这种情况 - 但样本图像并非如此在同一网页找到“MARBLES.TIF”。

使用上述模块的示例驱动程序是:

program tiff_reader_example
use tiff_reader
implicit none
integer :: n
integer, allocatable :: tifdata(:)
call tiff_reader_16bit("FLAG_T24.TIF", tifdata, n);
if (n > 0) then
  print "(a,7(z4.4,tr1),z4.4,a)", "First 8 integers read: (", tifdata(:8), ")"
  print "(a,7(z4.4,tr1),z4.4,a)", " Last 8 integers read: (", tifdata(n-7:), ")"
  deallocate(tifdata)
end if
end program tiff_reader_example

运行 程序给出:

46371 records read, 23186 16-bit integers returned.

First 8 integers read: (4949 2A00 0800 0000 0E00 FE00 0400 0100)
 Last 8 integers read: (F800 F8F8 00F8 F800 F8F8 00F8 F800 00F8)

这是正确的。请注意,在这种情况下,记录数(= 字节,因为文件以 unformatted 打开)不是返回整数数的两倍。那是因为这个特定的示例图像具有我之前提到的“孤立的”最后一个字节。另请注意,我使用了另一种格式来打印 16 位十六进制数,如果需要,包括前导零。

可以给出更详细的解释,但是这个线程已经很长了。如果有什么不清楚的地方,请随时在评论中提问。

编辑:默认情况下,英特尔 Fortran 将直接访问记录视为 4 字节字,这对我来说似乎不太正确。这种不寻常的行为可以用编译器标志修复,但为了避免在有人使用没有这种标志的特定编译器时缺乏可移植性,我稍微修改了模块 tiff_reader 来解决这个问题。

这是适合我的代码。这大部分是评论。欢迎您对 Fortran 风格发表任何评论。请注意,我过去一直熟悉fortran 77,在编写这段代码的过程中学习了一些更现代的fortran

  program putiff

c This program is solely intended to read the data from the .tif files made by the CCD camera
c PIXIS 1024F at beamline 1-BM at the Advanced Photon Source, so that they can be manipulated
c in fortran. It is not a general .tif reader.
c A little bit extra work may make this a reader for baseline .tif files,: some of the 
c information below may help with such an implementation.
c  
c The PIXIS .tif file is written in hex with the little-endian convention. 
c The hex numbers have two 8-bit bytes. They are read with an integer(kind=2) declaration.
c When describing an unsigned integer these cover numbers from 0 to 65535 (or 2**16-1). 
c For the PIXIS files the first two bytes are the decimal number 18761. The TIFF6 specification 
c gives them as a hexadecimal number (0x4949 for a little-endian convention, 4D4D for the
c big-endian convention. The PIXIS files are little-endian. 
c
c The next two bytes should be 42 decimal, and 0x2A.
c
c The next 4 bytes give the byte offset for the first image file directory (IFD) that contains
c all the other information needed to understand how the .tif files are put together.
c This number should be read together as a 4 byte integer (kind=4). These (unsigned) integers
c go from 0 to 2**32-1, or 4294967295: this is the maximum file length for a .tif file. 
c For the PIXIS this number is 2097160, or 0x200008: in between are the image date for the
c PIXIS's 1024x1024 pixels, each with a two-byte gray range from 0 to 2**16-1 (or 65535 decimal).   
c Therefore the PIXIS image can be read without understanding the IFD. 
c
c The line right below the hex representation gives the byte order, for the 
c little-endian convention indicated by two first bytes. It's 4949  for little-endian,
c in both the first and in the second byte separately. The byte order is then least importan
c part first; with two bytes together, it is byte by byte. For big-endian it is 4D4D.
c
c One way to confirm all this information is to look at the files
c with a binary editor (linux has xxd) or a binary editor (linux has hexedit).
c For the PIXIS image .tif file, the first 8 bytes in hexedit are indeed:
c   49 49    2A 00    08 00    20 00
c For a little-endian file, the bytes are read from the least important to the 
c most important within the two-byte number, like this:
c   49 49    2A 00    08 00    20 00
c  (34 12)  (34 12)  (78 56    34 12)
c Here the byte order is indicated below the numbers. The second two-byte number is
c therefore 2+2*16+0*256+0*4096, or 42. Likewise, the last 4-byte number is 0x00200008.  
c
c (When the individual byte are read in binary (with 'xxd -b -l 100') this gives   
c for the hexadecimals    49       49       2A       00       08       00       20       00
c         binary          01001001 01001001 00101010 00000000 00001000 00000000 00100000 00000000 
c         in ASCII        I        I        *        .        .        .        .        .        )

c After the PIXIS data comes the so-called IFD (Image File Directory).
c These contain 209 bytes. They mean something, but what I do not know. I printed them
c out one by one at the end of the program. Perhaps they are better read in two-byte units
c (right now they are read as 'integer(kind=1); integer(kind=2) may be better). But, then
c there's an odd number so you have to read one separately.

c I want to know these only because I want to use the same .tif format to
c write the results of rctopo (the max, the COM, the FWHM, and the spread).
c I know what's in the first 8 bytes, and what the data are, so I can just
c copy the ifd at the end and count on getting a good .tif file back. 
c It's sort of stupid, but it should work.
      use iso_fortran_env
      implicit logical (A-Z)
  
      integer                         :: j,jmin,jmax
      integer                         :: k,kmin,kmax
      integer                         :: ifdlength
      data jmin,kmin/1,1,/
      parameter(jmax=1024,kmax=1024)
      parameter(ifdlength=209)

c 8-byte header that starts the PIXIS data file
      integer (kind=2)                :: tifh12,tifh34 ! each two (8-bit) bytes
      integer (kind=4)                :: tifh5678      ! 4 bytes


c open and read the file now that you have the correct file name in the sequence
      open(newunit=unt,file='tiff_file,access='stream',iostat=ios)
      if (ios /= 0) then ; call problem(ios,'read_in_samples'); end if   
      read (unt) tifh12,tifh34,tifh5678,greyread,ifd
      close (unt)

      stop
      end