写入超过 1GB 的 .txt 文件时出现奇怪行为
Odd behaviour when writing to a .txt file that is over 1GB
我有一些寻找素数的代码,它将数字输出到一个 .txt 文件中,这似乎工作正常,直到它达到 1GB(我不确定文件的确切大小,但它就在那附近)。达到 1GB 后,文件大小似乎迅速增加,我相信这是因为整块数字都在重复。这是我的代码:
#include "pch.h"
#include <cmath>
#include <fstream>
#include <thread>
#include <iostream>
#include <string>
#include <mutex>
int nextInt = 1;
std::ofstream file;
bool TestPrime(int number)
{
double rootInt = sqrt(number);
for (int i = 3; i <= rootInt; i += 2)
{
double divValue = (double)number / i;
if (int(divValue) == divValue)
{
return false;
}
}
return true;
}
int GetNextNumber()
{
static std::mutex m;
const std::lock_guard<std::mutex> lock(m);
return (nextInt += 2);
}
void PrimeFinderThread()
{
while (true)
{
int number = GetNextNumber();
bool isPrime = TestPrime(number);
if (isPrime)
{
std::string fileOutput = std::to_string(number) + "-";
file << fileOutput;
}
}
}
int main() {
file.open("primes.txt", std::ofstream::app);
file << "2-";
std::thread threads[4];
for (int i = 0; i < 4; i++) {
threads[i] = std::thread(PrimeFinderThread);
}
for (int i = 0; i < 4; i++) {
threads[i].join();
}
return 0;
}
这里是 .txt 文件开头的摘录:
2-3-5-7-11-13-17-19-23-29-31-37-41-43-47-53-59-61-67-71-73-79-83-89-97-101-103-107-109-113-127-131-137-139-149-151-157-163-167-173-179-181-191-193-197-199-211-223-227-229-233-239-241-251-257-263-269-271-277-281-283-293-307-311-313-317-331-337-347-349-353-359-367-373-379-383-389-397-401-409-419-421-431-433-439-443-449-457-461-463-467-479-487-491-499-503-509-521-523-541-547-557-563-569-571-577-587-593-599-601-607-613-617-619-631-641-643-647-653-659-661-673-677-683-691-701
这里是文件中间某处的摘录:
2038621267--2038621265--2038621269--2038621263--2038621259--2038621257--2038621255--2038621253--2038621261--2038621249--2038621247--2038621245--2038621367--2038621251--2038621243--2038621237--2038621239--2038621233--2038621231--2038621235--2038621241--2038621227--2038621223--2038621221--2038621219--2038621217--2038621225--2038621213--2038621215--2038621209--2038621207--2038621205--2038621211--2038621203--2038621199--2038621197--2038621229--2038621193--2038621201--2038621189--2038621187--2038621185--2038621183--2038621195
还有文件末尾的一些:
1812147945--1812147959--1812147941--1812147939--1812147947--1812147935--1812147933--1812147937--1812147929--1812147943--1812147925--1812147927--1812147921--1812147919--1812147917--1812147915--1812147913--1812147911--1812147923--1812147909--1812147907--1812147903--1812147901--1812147931--1812147897--1812147895--1812147893--1812147905--1812147889--1812147887--1812147885--1812147899--1812147881--1812147883--1812147891--1812147879--1812147873--1812147871--1812147875--1812147869--1812147865--1812147877--1812147867--1812147859--1812147857--1812147855--1812147853--1812147861
所以,这个文件有很多错误:
-There are two dashes sometimes.
-There are numbers at the end of the file that are smaller than the ones in the middle, which should
not happen. The numbers may be a bit out of order since it is running on multiple threads, but not
by that much.
-if we assume that ever number is 10 digits long, which would mean that they take up 11 bytes each,
the largest number it got to was about 2.2 billion. you can estimate the number of primes under that
number by using an estimate of the PI function which is π(x) ≈ (x/ln(x)), so the number of primes is
about 102 million, so they should take up about 1.1GB of storage. the .txt file is 3.1GB.
有时我会在一段时间后测量文件大小:
10 Minutes : 760MB
20 Minutes : 3.1GB
30 Minutes : 14.6GB
我知道文件大小不是衡量速度的好方法,但这些值太离谱了,我认为这足以表明有问题。
根据我对 post 的原始评论和 OP 的回复,问题似乎是数字溢出。一旦数字溢出,你就会进入混乱模式。有些数字可能溢出而有些还没有溢出,还有一些可能是负数(因此是双破折号,因为第二个破折号实际上是负号)
我有一些寻找素数的代码,它将数字输出到一个 .txt 文件中,这似乎工作正常,直到它达到 1GB(我不确定文件的确切大小,但它就在那附近)。达到 1GB 后,文件大小似乎迅速增加,我相信这是因为整块数字都在重复。这是我的代码:
#include "pch.h"
#include <cmath>
#include <fstream>
#include <thread>
#include <iostream>
#include <string>
#include <mutex>
int nextInt = 1;
std::ofstream file;
bool TestPrime(int number)
{
double rootInt = sqrt(number);
for (int i = 3; i <= rootInt; i += 2)
{
double divValue = (double)number / i;
if (int(divValue) == divValue)
{
return false;
}
}
return true;
}
int GetNextNumber()
{
static std::mutex m;
const std::lock_guard<std::mutex> lock(m);
return (nextInt += 2);
}
void PrimeFinderThread()
{
while (true)
{
int number = GetNextNumber();
bool isPrime = TestPrime(number);
if (isPrime)
{
std::string fileOutput = std::to_string(number) + "-";
file << fileOutput;
}
}
}
int main() {
file.open("primes.txt", std::ofstream::app);
file << "2-";
std::thread threads[4];
for (int i = 0; i < 4; i++) {
threads[i] = std::thread(PrimeFinderThread);
}
for (int i = 0; i < 4; i++) {
threads[i].join();
}
return 0;
}
这里是 .txt 文件开头的摘录:
2-3-5-7-11-13-17-19-23-29-31-37-41-43-47-53-59-61-67-71-73-79-83-89-97-101-103-107-109-113-127-131-137-139-149-151-157-163-167-173-179-181-191-193-197-199-211-223-227-229-233-239-241-251-257-263-269-271-277-281-283-293-307-311-313-317-331-337-347-349-353-359-367-373-379-383-389-397-401-409-419-421-431-433-439-443-449-457-461-463-467-479-487-491-499-503-509-521-523-541-547-557-563-569-571-577-587-593-599-601-607-613-617-619-631-641-643-647-653-659-661-673-677-683-691-701
这里是文件中间某处的摘录:
2038621267--2038621265--2038621269--2038621263--2038621259--2038621257--2038621255--2038621253--2038621261--2038621249--2038621247--2038621245--2038621367--2038621251--2038621243--2038621237--2038621239--2038621233--2038621231--2038621235--2038621241--2038621227--2038621223--2038621221--2038621219--2038621217--2038621225--2038621213--2038621215--2038621209--2038621207--2038621205--2038621211--2038621203--2038621199--2038621197--2038621229--2038621193--2038621201--2038621189--2038621187--2038621185--2038621183--2038621195
还有文件末尾的一些:
1812147945--1812147959--1812147941--1812147939--1812147947--1812147935--1812147933--1812147937--1812147929--1812147943--1812147925--1812147927--1812147921--1812147919--1812147917--1812147915--1812147913--1812147911--1812147923--1812147909--1812147907--1812147903--1812147901--1812147931--1812147897--1812147895--1812147893--1812147905--1812147889--1812147887--1812147885--1812147899--1812147881--1812147883--1812147891--1812147879--1812147873--1812147871--1812147875--1812147869--1812147865--1812147877--1812147867--1812147859--1812147857--1812147855--1812147853--1812147861
所以,这个文件有很多错误:
-There are two dashes sometimes.
-There are numbers at the end of the file that are smaller than the ones in the middle, which should
not happen. The numbers may be a bit out of order since it is running on multiple threads, but not
by that much.
-if we assume that ever number is 10 digits long, which would mean that they take up 11 bytes each,
the largest number it got to was about 2.2 billion. you can estimate the number of primes under that
number by using an estimate of the PI function which is π(x) ≈ (x/ln(x)), so the number of primes is
about 102 million, so they should take up about 1.1GB of storage. the .txt file is 3.1GB.
有时我会在一段时间后测量文件大小:
10 Minutes : 760MB
20 Minutes : 3.1GB
30 Minutes : 14.6GB
我知道文件大小不是衡量速度的好方法,但这些值太离谱了,我认为这足以表明有问题。
根据我对 post 的原始评论和 OP 的回复,问题似乎是数字溢出。一旦数字溢出,你就会进入混乱模式。有些数字可能溢出而有些还没有溢出,还有一些可能是负数(因此是双破折号,因为第二个破折号实际上是负号)