逐字节读取 jpeg 文件
Reading a jpeg file byte by byte
对于class cs50,我必须从存储卡中逐字节读取jpeg文件才能查看header信息。该文件编译得很好,但每当我执行该文件时,它都会 returns 一条“分段错误(核心转储)”消息。
编辑)好的,现在我知道为什么我必须使用“unsigned char”而不是“int*”了。有人可以告诉我如何将信息存储到此特定代码范围内的文件中吗?现在,我正试图在 if() 条件之外存储信息,我认为 fread 函数实际上并没有访问我打开的“图像”文件。
#include <stdio.h>
#include <string.h>
#include <math.h>
FILE * image = NULL;
int main(int argc, char* argv[])
{
FILE* infile = fopen("card.raw", "r");
if (infile == NULL)
{
printf("Could not open.\n");
fclose(infile);
return 1;
}
unsigned char storage[512];
int number = 0;
int b = floor((number) / 100);
int c = floor(((number) - (b * 100))/ 10);
int d = floor(((number) - (b * 100) - (c * 10)));
int writing = 0;
char string[5];
char* extension = ".jpg";
while (fread(&storage, sizeof(storage), 1, infile))
{
if (storage == NULL)
{
break;
}
if (storage[0] == 0xff && storage[1] == 0xd8 && storage[2] == 0xff)
{
if (storage[3] == 0xe0 || storage[3] == 0xe1)
{
if (image != NULL)
{
fclose(image);
}
sprintf(string, "%d%d%d%s", b, c, d, extension);
image = fopen(string, "w");
number++;
writing = 1;
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
}
}
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
if (storage == NULL)
{
fclose(image);
}
}
fclose(image);
fclose(infile);
return 0;
}
这是为了防止我的解释不清楚而设置的问题。
recover
In anticipation of this problem set, I spent the past several days snapping photos of people I know, all of which were saved by my
digital camera as JPEGs on a 1GB CompactFlash (CF) card. (It’s
possible I actually spent the past several days on Facebook instead.)
Unfortunately, I’m not very good with computers, and I somehow deleted
them all! Thankfully, in the computer world, "deleted" tends not to
mean "deleted" so much as "forgotten." My computer insists that the CF
card is now blank, but I’m pretty sure it’s lying to me.
Write in ~/Dropbox/pset4/jpg/recover.c a program that recovers these photos.
Ummm.
Okay, here’s the thing. Even though JPEGs are more complicated than BMPs, JPEGs have "signatures," patterns of bytes that distinguish
them from other file formats. In fact, most JPEGs begin with one of
two sequences of bytes. Specifically, the first four bytes of most
JPEGs are either
0xff 0xd8 0xff 0xe0 or 0xff 0xd8 0xff 0xe1
from first byte to fourth byte, left to right. Odds are, if you find one of these patterns of bytes on a disk known to store photos
(e.g., my CF card), they demark the start of a JPEG. (To be sure, you
might encounter these patterns on some disk purely by chance, so data
recovery isn’t an exact science.)
Fortunately, digital cameras tend to store photographs contiguously on CF cards, whereby each photo is stored immediately
after the previously taken photo. Accordingly, the start of a JPEG
usually demarks the end of another. However, digital cameras generally
initialize CF cards with a FAT file system whose "block size" is 512
bytes (B). The implication is that these cameras only write to those
cards in units of 512 B. A photo that’s 1 MB (i.e.,
1,048,576 B) thus takes up 1048576 ÷ 512 = 2048 "blocks" on a CF card. But so does a photo that’s, say, one byte smaller (i.e.,
1,048,575 B)! The wasted space on disk is called "slack space."
Forensic investigators often look at slack space for remnants of
suspicious data.
The implication of all these details is that you, the investigator, can probably write a program that iterates over a copy
of my CF card, looking for JPEGs' signatures. Each time you find a
signature, you can open a new file for writing and start filling that
file with bytes from my CF card, closing that file only once you
encounter another signature. Moreover, rather than read my CF card’s
bytes one at a time, you can read 512 of them at a time into a buffer
for efficiency’s sake. Thanks to FAT, you can trust that JPEGs'
signatures will be "block-aligned." That is, you need only look for
those signatures in a block’s first four bytes.
Realize, of course, that JPEGs can span contiguous blocks. Otherwise, no JPEG could be larger than 512 B. But the last byte of a
JPEG might not fall at the very end of a block. Recall the possibility
of slack space. But not to worry. Because this CF card was brand- new
when I started snapping photos, odds are it’d been "zeroed" (i.e.,
filled with 0s) by the manufacturer, in which case any slack space
will be filled with 0s. It’s okay if those trailing 0s end up in the
JPEGs you recover; they should still be viewable.
Now, I only have one CF card, but there are a whole lot of you! And so I’ve gone ahead and created a "forensic image" of the card,
storing its contents, byte after byte, in a file called card.raw . So
that you don’t waste time iterating over millions of 0s unnecessarily,
I’ve only imaged the first few megabytes of the CF card. But you
should ultimately find that the image contains 16 JPEGs. As usual, you
can open the file programmatically with
fopen , as in the below. FILE* file = fopen("card.raw", "r");
Notice, incidentally, that ~/Dropbox/pset4/jpg contains only recover.c, but it’s devoid of any code. (We leave it to you to decide
how to implement and compile recover!) For simplicity, you should
hard-code "card.raw" in your program; your program need not accept any
command-line arguments. When executed, though, your program should
recover every one of the JPEGs from card.raw, storing each as a
separate file in your current working directory. Your program should
number the files it
outputs by naming each , ###.jpg where ### is three-digit decimal number from 000 on up. (Befriend sprintf.) You need not try to
recover the JPEGs' original names. To
check whether the JPEGs your program spit out are correct, simply double-click and take a look! If each photo appears intact,
your operation was likely a success!
Odds are, though, the JPEGs that the first draft of your code spits out won’t be correct. (If you open them up and don’t see
anything, they’re probably not correct!) Execute the command below to
delete all JPEGs in your current working directory.
rm *.jpg
If you’d rather not be prompted to confirm each deletion, execute the command below instead.
rm -f *.jpg
Just be careful with that -f switch, as it "forces" deletion without prompting you.
int* storage[512];
您为 512 个整数定义了一个指向内存位置的指针,但实际上并没有保留 space(仅保留指针。
我怀疑你只是想要
int storage[512];
这之后storage还是一个指针,但是现在它实际指向了512个int
s。虽然我仍然认为你不想要这个。您需要 'bytes' 而不是 int
。最近的 C 有 unsigned char
。所以最后的声明是:
unsigned char storage[512];
为什么?因为 read
读入连续的字节。如果你读入int
s,那么你会在每个int
中读入4个字节(因为一个int占4个字节)。
你的程序中有很多问题。首先是你没有以二进制模式打开文件。
第二个是你在做不必要的指针运算。为什么不——
char buffer [BUFFERSIZE] ;
....
if (buffer [ii] == WHATEVER)
对于class cs50,我必须从存储卡中逐字节读取jpeg文件才能查看header信息。该文件编译得很好,但每当我执行该文件时,它都会 returns 一条“分段错误(核心转储)”消息。
编辑)好的,现在我知道为什么我必须使用“unsigned char”而不是“int*”了。有人可以告诉我如何将信息存储到此特定代码范围内的文件中吗?现在,我正试图在 if() 条件之外存储信息,我认为 fread 函数实际上并没有访问我打开的“图像”文件。
#include <stdio.h>
#include <string.h>
#include <math.h>
FILE * image = NULL;
int main(int argc, char* argv[])
{
FILE* infile = fopen("card.raw", "r");
if (infile == NULL)
{
printf("Could not open.\n");
fclose(infile);
return 1;
}
unsigned char storage[512];
int number = 0;
int b = floor((number) / 100);
int c = floor(((number) - (b * 100))/ 10);
int d = floor(((number) - (b * 100) - (c * 10)));
int writing = 0;
char string[5];
char* extension = ".jpg";
while (fread(&storage, sizeof(storage), 1, infile))
{
if (storage == NULL)
{
break;
}
if (storage[0] == 0xff && storage[1] == 0xd8 && storage[2] == 0xff)
{
if (storage[3] == 0xe0 || storage[3] == 0xe1)
{
if (image != NULL)
{
fclose(image);
}
sprintf(string, "%d%d%d%s", b, c, d, extension);
image = fopen(string, "w");
number++;
writing = 1;
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
}
}
if (writing == 1 && storage != NULL)
{
fwrite(storage, sizeof(storage), 1, image);
}
if (storage == NULL)
{
fclose(image);
}
}
fclose(image);
fclose(infile);
return 0;
}
这是为了防止我的解释不清楚而设置的问题。
recover
In anticipation of this problem set, I spent the past several days snapping photos of people I know, all of which were saved by my digital camera as JPEGs on a 1GB CompactFlash (CF) card. (It’s possible I actually spent the past several days on Facebook instead.) Unfortunately, I’m not very good with computers, and I somehow deleted them all! Thankfully, in the computer world, "deleted" tends not to mean "deleted" so much as "forgotten." My computer insists that the CF card is now blank, but I’m pretty sure it’s lying to me. Write in ~/Dropbox/pset4/jpg/recover.c a program that recovers these photos.
Ummm. Okay, here’s the thing. Even though JPEGs are more complicated than BMPs, JPEGs have "signatures," patterns of bytes that distinguish them from other file formats. In fact, most JPEGs begin with one of two sequences of bytes. Specifically, the first four bytes of most JPEGs are either 0xff 0xd8 0xff 0xe0 or 0xff 0xd8 0xff 0xe1 from first byte to fourth byte, left to right. Odds are, if you find one of these patterns of bytes on a disk known to store photos (e.g., my CF card), they demark the start of a JPEG. (To be sure, you might encounter these patterns on some disk purely by chance, so data recovery isn’t an exact science.)
Fortunately, digital cameras tend to store photographs contiguously on CF cards, whereby each photo is stored immediately after the previously taken photo. Accordingly, the start of a JPEG usually demarks the end of another. However, digital cameras generally initialize CF cards with a FAT file system whose "block size" is 512 bytes (B). The implication is that these cameras only write to those cards in units of 512 B. A photo that’s 1 MB (i.e., 1,048,576 B) thus takes up 1048576 ÷ 512 = 2048 "blocks" on a CF card. But so does a photo that’s, say, one byte smaller (i.e., 1,048,575 B)! The wasted space on disk is called "slack space." Forensic investigators often look at slack space for remnants of suspicious data.
The implication of all these details is that you, the investigator, can probably write a program that iterates over a copy of my CF card, looking for JPEGs' signatures. Each time you find a signature, you can open a new file for writing and start filling that file with bytes from my CF card, closing that file only once you encounter another signature. Moreover, rather than read my CF card’s bytes one at a time, you can read 512 of them at a time into a buffer for efficiency’s sake. Thanks to FAT, you can trust that JPEGs' signatures will be "block-aligned." That is, you need only look for those signatures in a block’s first four bytes.
Realize, of course, that JPEGs can span contiguous blocks. Otherwise, no JPEG could be larger than 512 B. But the last byte of a JPEG might not fall at the very end of a block. Recall the possibility of slack space. But not to worry. Because this CF card was brand- new when I started snapping photos, odds are it’d been "zeroed" (i.e., filled with 0s) by the manufacturer, in which case any slack space will be filled with 0s. It’s okay if those trailing 0s end up in the JPEGs you recover; they should still be viewable. Now, I only have one CF card, but there are a whole lot of you! And so I’ve gone ahead and created a "forensic image" of the card, storing its contents, byte after byte, in a file called card.raw . So that you don’t waste time iterating over millions of 0s unnecessarily, I’ve only imaged the first few megabytes of the CF card. But you should ultimately find that the image contains 16 JPEGs. As usual, you can open the file programmatically with fopen , as in the below. FILE* file = fopen("card.raw", "r"); Notice, incidentally, that ~/Dropbox/pset4/jpg contains only recover.c, but it’s devoid of any code. (We leave it to you to decide how to implement and compile recover!) For simplicity, you should hard-code "card.raw" in your program; your program need not accept any command-line arguments. When executed, though, your program should recover every one of the JPEGs from card.raw, storing each as a separate file in your current working directory. Your program should number the files it outputs by naming each , ###.jpg where ### is three-digit decimal number from 000 on up. (Befriend sprintf.) You need not try to recover the JPEGs' original names. To
check whether the JPEGs your program spit out are correct, simply double-click and take a look! If each photo appears intact, your operation was likely a success! Odds are, though, the JPEGs that the first draft of your code spits out won’t be correct. (If you open them up and don’t see anything, they’re probably not correct!) Execute the command below to delete all JPEGs in your current working directory. rm *.jpg If you’d rather not be prompted to confirm each deletion, execute the command below instead. rm -f *.jpg Just be careful with that -f switch, as it "forces" deletion without prompting you.
int* storage[512];
您为 512 个整数定义了一个指向内存位置的指针,但实际上并没有保留 space(仅保留指针。
我怀疑你只是想要
int storage[512];
这之后storage还是一个指针,但是现在它实际指向了512个int
s。虽然我仍然认为你不想要这个。您需要 'bytes' 而不是 int
。最近的 C 有 unsigned char
。所以最后的声明是:
unsigned char storage[512];
为什么?因为 read
读入连续的字节。如果你读入int
s,那么你会在每个int
中读入4个字节(因为一个int占4个字节)。
你的程序中有很多问题。首先是你没有以二进制模式打开文件。
第二个是你在做不必要的指针运算。为什么不——
char buffer [BUFFERSIZE] ;
....
if (buffer [ii] == WHATEVER)