写入大量数据时,部分数据丢失/当每个数据都存在时,写入过程非常慢
When writing a huge amount of data, parts of it get lost / When every data is present, the write process is very slow
我在将大量字符串写入文件时遇到 Buffered writer 问题。
情况:
我必须读取一个大文本文件(>100k 行)并对每一行执行一些修改(删除空格、检查可选命令等)并将修改后的内容写入新文件。
我尝试了两种写入文件的可能性,但只得到以下两种结果之一:
- 写入过程非常慢,但所有行都已处理
- 在写入过程中有几行被吞噬,留下不完整的修改结果。
方法和结果:
- 非常慢但完整
// read file content and put it in List<String> fileContent
for (String line : fileContent)
{
try(BufferedWriter writer = new BufferedWriter(new OutputStreamwriter(new FileOutputStream(filename, true))))
{
writer.write(modifyFileContent(fileContent));
}
}
我已经知道了,打开一个文件写一行然后直接关闭是非常擅长表现不佳的。修改一个大约 4M 行的文件大约需要 4 小时左右,这是不可取的。至少,它有效...
- 更快,但写入不完整
// read file content and put it in List<String> fileContent
// This is placed in a try/catch block, I'm omitting it here for brevity
BufferedWriter writer = new BufferedWriter(new OutputStreamwriter(new FileOutputStream(filename, true);
for (String line : fileContent)
{
writer.write(modifyFileContent(fileContent));
}
writer.close();
这工作得更快,但我在结果文件中得到以下内容(我使用原始文件中的行号进行调试):
...
Very long line with interesting content // line nb 567
Very long line with interesting content // line nb 568
Very long line with interesting content // line nb 569
Very long line wi
Very long line with interesting content // line nb 834
Very long line with interesting content // line nb 835
Very long line with interesting content // line nb 836
...
将此字符串打印到控制台时,我发现行号没有间断!看来,某处存在缓冲问题...
其他方法:
我也试过newBufferedWriter的NIO版本,同样省略了好几行
问题:
我在这里错过了什么?有没有办法在这里正确地获得良好的写入性能?
输入文件通常在几百MB和数百万行的范围内......非常感谢任何提示:)
[编辑]
感谢 我找到了一个可行的解决方案。我之前从未偶然发现 RandomAccessFile
...
现在有了这些信息,我想我 运行 陷入了竞争条件或其他与线程相关的问题...因为我最近才开始使用线程,我想,这可能是预料中的... .
为了给出正确的观点,我做了一个最小的例子,它显示了我的问题最初发生的背景。欢迎任何反馈:) :
package minex;
import java.awt.EventQueue;
import java.awt.event.ActionEvent;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.OutputStreamWriter;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.swing.GroupLayout;
import static javax.swing.GroupLayout.Alignment.BASELINE;
import static javax.swing.GroupLayout.Alignment.LEADING;
import javax.swing.JButton;
import javax.swing.JFileChooser;
import javax.swing.JFrame;
import javax.swing.JProgressBar;
import javax.swing.SwingWorker;
import javax.swing.UIManager;
import javax.swing.WindowConstants;
/**
* Read a file line by line, modify its content and write it to another file.
* @author demo
*/
public class gui extends JFrame {
/**
* Back ground task, so the gui isn't blocked and the progress bar can be updated.
*/
class fileConversionWorker extends SwingWorker<Integer, Double>
{
private final File file;
public fileConversionWorker(File file)
{
this.file = file;
}
/**
* Count the lines in the provided file. Needed to set the boundary
* settings for the progress bar.
* @param aFile File to read.
* @return Number of lines present in aFile
* @throws IOException
* @see quick and dirty taken from
*/
private int countLines(File aFile) throws IOException {
LineNumberReader reader = null;
try {
reader = new LineNumberReader(new FileReader(aFile));
while ((reader.readLine()) != null);
return reader.getLineNumber();
} catch (Exception ex) {
return -1;
} finally {
if(reader != null)
reader.close();
}
}
/**
* Reads a file line by line, modify the line
* content and write it back to a different file immediately.
* @return
*/
@Override
public Integer doInBackground()
{
int totalLines = 0;
try {
// Indicate, that something is happening
barProgress.setIndeterminate(true);
totalLines = countLines(file);
barProgress.setIndeterminate(false);
} catch (IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
// only proceed, when we at least have 1 line to manipulate.
if (totalLines > 0)
{
BufferedReader br = null;
BufferedWriter writer = null;
try {
barProgress.setMaximum(totalLines);
br = new BufferedReader(new FileReader(file));
String filename = file.getAbsolutePath() + ".mod";
long lineNb = 0;
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filename, true)));
String line;
// Read original file, modify line and immediately write to new file
while ((line = br.readLine()) != null)
{
writer.write(line + " // " + lineNb);
writer.newLine();
publish((double)(lineNb / totalLines));
lineNb++;
}
} catch (FileNotFoundException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
} catch ( IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
finally {
// Tidying up
try {
if (br != null)
br.close();
if (writer != null)
writer.close();
} catch (IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
return 0;
}
/**
* Prevent any interaction, which could interrupt the worker
*/
@Override
public void done()
{
butLoadFile.setEnabled(true);
}
/**
* Update progress the progress bar,
* @param aDoubles
*/
@Override
protected void process(java.util.List<Double> aDoubles) {
int amount = barProgress.getMaximum() - barProgress.getMinimum();
barProgress.setValue( ( int ) (barProgress.getMinimum() + ( amount * aDoubles.get( aDoubles.size() - 1 ))) );
}
}
/**
* Start the gui.
*/
public static void main()
{
EventQueue.invokeLater(() -> {
new gui().setVisible(true);
});
}
/**
* Initialize all things needed.
*/
public gui()
{
initComponents();
}
/**
* Load a file and immediately begin processing it.
* @param evt
*/
private void butLoadFileActionListener(ActionEvent evt)
{
javax.swing.JFileChooser fc = new javax.swing.JFileChooser("/home/demo/fileFolder");
int returnVal = fc.showOpenDialog(gui.this);
if (returnVal == JFileChooser.APPROVE_OPTION) {
File file = fc.getSelectedFile();
butLoadFile.setEnabled(false);
fileConversionWorker worker = new fileConversionWorker(file);
worker.execute();
}
}
/**
* Paint the canvas.
*/
private void initComponents()
{
setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);
setResizable(false);
setTitle("Min Example");
butLoadFile = new JButton("Load file");
butLoadFile.addActionListener((ActionEvent evt) -> {
butLoadFileActionListener(evt);
});
barProgress = new JProgressBar();
barProgress.setStringPainted(true);
barProgress.setMinimum(0);
javax.swing.GroupLayout layout = new GroupLayout(getContentPane());
getContentPane().setLayout(layout);
layout.setHorizontalGroup(
layout.createParallelGroup(LEADING)
.addComponent(butLoadFile, GroupLayout.PREFERRED_SIZE, 200, GroupLayout.PREFERRED_SIZE)
.addComponent(barProgress, GroupLayout.PREFERRED_SIZE, 200, GroupLayout.PREFERRED_SIZE)
);
layout.setVerticalGroup(
layout.createParallelGroup(BASELINE)
.addGroup(layout.createSequentialGroup()
.addComponent(butLoadFile, GroupLayout.PREFERRED_SIZE, 20, GroupLayout.PREFERRED_SIZE)
.addComponent(barProgress, GroupLayout.PREFERRED_SIZE, 20, GroupLayout.PREFERRED_SIZE)
)
);
pack();
}
private JButton butLoadFile; /** Button to load a file. */
private JProgressBar barProgress; /** Progress bar to visualize progress. */
}
[/edit]
也许这可以帮到你
Fastest way to write huge data in text file Java
https://www.quora.com/How-do-to-read-and-write-large-size-file-in-Java-efficiently
我在将大量字符串写入文件时遇到 Buffered writer 问题。
情况: 我必须读取一个大文本文件(>100k 行)并对每一行执行一些修改(删除空格、检查可选命令等)并将修改后的内容写入新文件。
我尝试了两种写入文件的可能性,但只得到以下两种结果之一:
- 写入过程非常慢,但所有行都已处理
- 在写入过程中有几行被吞噬,留下不完整的修改结果。
方法和结果:
- 非常慢但完整
// read file content and put it in List<String> fileContent
for (String line : fileContent)
{
try(BufferedWriter writer = new BufferedWriter(new OutputStreamwriter(new FileOutputStream(filename, true))))
{
writer.write(modifyFileContent(fileContent));
}
}
我已经知道了,打开一个文件写一行然后直接关闭是非常擅长表现不佳的。修改一个大约 4M 行的文件大约需要 4 小时左右,这是不可取的。至少,它有效...
- 更快,但写入不完整
// read file content and put it in List<String> fileContent
// This is placed in a try/catch block, I'm omitting it here for brevity
BufferedWriter writer = new BufferedWriter(new OutputStreamwriter(new FileOutputStream(filename, true);
for (String line : fileContent)
{
writer.write(modifyFileContent(fileContent));
}
writer.close();
这工作得更快,但我在结果文件中得到以下内容(我使用原始文件中的行号进行调试):
...
Very long line with interesting content // line nb 567
Very long line with interesting content // line nb 568
Very long line with interesting content // line nb 569
Very long line wi
Very long line with interesting content // line nb 834
Very long line with interesting content // line nb 835
Very long line with interesting content // line nb 836
...
将此字符串打印到控制台时,我发现行号没有间断!看来,某处存在缓冲问题...
其他方法: 我也试过newBufferedWriter的NIO版本,同样省略了好几行
问题: 我在这里错过了什么?有没有办法在这里正确地获得良好的写入性能? 输入文件通常在几百MB和数百万行的范围内......非常感谢任何提示:)
[编辑]
感谢 RandomAccessFile
...
现在有了这些信息,我想我 运行 陷入了竞争条件或其他与线程相关的问题...因为我最近才开始使用线程,我想,这可能是预料中的... .
为了给出正确的观点,我做了一个最小的例子,它显示了我的问题最初发生的背景。欢迎任何反馈:) :
package minex;
import java.awt.EventQueue;
import java.awt.event.ActionEvent;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.OutputStreamWriter;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.swing.GroupLayout;
import static javax.swing.GroupLayout.Alignment.BASELINE;
import static javax.swing.GroupLayout.Alignment.LEADING;
import javax.swing.JButton;
import javax.swing.JFileChooser;
import javax.swing.JFrame;
import javax.swing.JProgressBar;
import javax.swing.SwingWorker;
import javax.swing.UIManager;
import javax.swing.WindowConstants;
/**
* Read a file line by line, modify its content and write it to another file.
* @author demo
*/
public class gui extends JFrame {
/**
* Back ground task, so the gui isn't blocked and the progress bar can be updated.
*/
class fileConversionWorker extends SwingWorker<Integer, Double>
{
private final File file;
public fileConversionWorker(File file)
{
this.file = file;
}
/**
* Count the lines in the provided file. Needed to set the boundary
* settings for the progress bar.
* @param aFile File to read.
* @return Number of lines present in aFile
* @throws IOException
* @see quick and dirty taken from
*/
private int countLines(File aFile) throws IOException {
LineNumberReader reader = null;
try {
reader = new LineNumberReader(new FileReader(aFile));
while ((reader.readLine()) != null);
return reader.getLineNumber();
} catch (Exception ex) {
return -1;
} finally {
if(reader != null)
reader.close();
}
}
/**
* Reads a file line by line, modify the line
* content and write it back to a different file immediately.
* @return
*/
@Override
public Integer doInBackground()
{
int totalLines = 0;
try {
// Indicate, that something is happening
barProgress.setIndeterminate(true);
totalLines = countLines(file);
barProgress.setIndeterminate(false);
} catch (IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
// only proceed, when we at least have 1 line to manipulate.
if (totalLines > 0)
{
BufferedReader br = null;
BufferedWriter writer = null;
try {
barProgress.setMaximum(totalLines);
br = new BufferedReader(new FileReader(file));
String filename = file.getAbsolutePath() + ".mod";
long lineNb = 0;
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filename, true)));
String line;
// Read original file, modify line and immediately write to new file
while ((line = br.readLine()) != null)
{
writer.write(line + " // " + lineNb);
writer.newLine();
publish((double)(lineNb / totalLines));
lineNb++;
}
} catch (FileNotFoundException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
} catch ( IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
finally {
// Tidying up
try {
if (br != null)
br.close();
if (writer != null)
writer.close();
} catch (IOException ex) {
Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
return 0;
}
/**
* Prevent any interaction, which could interrupt the worker
*/
@Override
public void done()
{
butLoadFile.setEnabled(true);
}
/**
* Update progress the progress bar,
* @param aDoubles
*/
@Override
protected void process(java.util.List<Double> aDoubles) {
int amount = barProgress.getMaximum() - barProgress.getMinimum();
barProgress.setValue( ( int ) (barProgress.getMinimum() + ( amount * aDoubles.get( aDoubles.size() - 1 ))) );
}
}
/**
* Start the gui.
*/
public static void main()
{
EventQueue.invokeLater(() -> {
new gui().setVisible(true);
});
}
/**
* Initialize all things needed.
*/
public gui()
{
initComponents();
}
/**
* Load a file and immediately begin processing it.
* @param evt
*/
private void butLoadFileActionListener(ActionEvent evt)
{
javax.swing.JFileChooser fc = new javax.swing.JFileChooser("/home/demo/fileFolder");
int returnVal = fc.showOpenDialog(gui.this);
if (returnVal == JFileChooser.APPROVE_OPTION) {
File file = fc.getSelectedFile();
butLoadFile.setEnabled(false);
fileConversionWorker worker = new fileConversionWorker(file);
worker.execute();
}
}
/**
* Paint the canvas.
*/
private void initComponents()
{
setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);
setResizable(false);
setTitle("Min Example");
butLoadFile = new JButton("Load file");
butLoadFile.addActionListener((ActionEvent evt) -> {
butLoadFileActionListener(evt);
});
barProgress = new JProgressBar();
barProgress.setStringPainted(true);
barProgress.setMinimum(0);
javax.swing.GroupLayout layout = new GroupLayout(getContentPane());
getContentPane().setLayout(layout);
layout.setHorizontalGroup(
layout.createParallelGroup(LEADING)
.addComponent(butLoadFile, GroupLayout.PREFERRED_SIZE, 200, GroupLayout.PREFERRED_SIZE)
.addComponent(barProgress, GroupLayout.PREFERRED_SIZE, 200, GroupLayout.PREFERRED_SIZE)
);
layout.setVerticalGroup(
layout.createParallelGroup(BASELINE)
.addGroup(layout.createSequentialGroup()
.addComponent(butLoadFile, GroupLayout.PREFERRED_SIZE, 20, GroupLayout.PREFERRED_SIZE)
.addComponent(barProgress, GroupLayout.PREFERRED_SIZE, 20, GroupLayout.PREFERRED_SIZE)
)
);
pack();
}
private JButton butLoadFile; /** Button to load a file. */
private JProgressBar barProgress; /** Progress bar to visualize progress. */
}
[/edit]
也许这可以帮到你
Fastest way to write huge data in text file Java
https://www.quora.com/How-do-to-read-and-write-large-size-file-in-Java-efficiently