尝试用 Rust 迭代 2 个文件

Trying to iterate 2 files in rust

我正在尝试读取 2 个文件并比较每个文件中的每一项以查看它们是否相等。

use std::fs::File;
use std::io::{BufRead, BufReader};

fn main() {
    let filename1 = "file1.txt";
    let filename2 = "file2.txt";

    // Open the file in read-only mode (ignoring errors).
    let file = File::open(filename1).unwrap();
    let reader = BufReader::new(file);

    let file2 = File::open(filename2).unwrap();
    let mut reader2 = BufReader::new(file2);

    // Read the file line by line using the lines() iterator from std::io::BufRead.
    for line1 in reader.lines() {

        let line = line.unwrap(); // Ignore errors.

        for line2 in reader2.lines() {
            let line2 = line2.unwrap(); // Ignore errors.
            
            if line2 == line1 {
                println!("{}",line2)
            }

        }
    }
}

但是,这不起作用。如何使用缓冲区在循环上应用循环?

虽然我找到了解决方案,但速度非常慢。如果有人有更好的解决方案来查找 2 个文件中相似的项目,请告诉我。

use std::fs::File;
use std::io::{BufRead, BufReader};



fn main() {

   let mut vec2 = findvec("file1.txt".to_string());
   let mut vec3 = &findvec("file2.txt".to_string());

   for line in vec2 {
       for line2 in vec3 {
           if line.to_string() == line2.to_string() {
               println!("{}",line.to_string());
           }
       }
   }
}

    fn findvec(filename: String) -> Vec<String> {

        // Open the file in read-only mode (ignoring errors).
        let file = File::open(filename).unwrap();
        let reader = BufReader::new(file);
    // blank vector
    let mut myvec = Vec::new();
    // Read the file line by line using the lines() iterator from std::io::BufRead.
    for (index, line) in reader.lines().enumerate() {
        let line = line.unwrap(); // Ignore errors.
        // Show the line and its number.
     
        myvec.push(line);
    }

    myvec

    
}

你的第一个问题是 this question. TLDR: you need to call by_ref 的副本,如果你希望能够在调用其 lines 方法后重用 reader2(例如,在下一个循环迭代中)。

这样你的代码将编译但不会工作,因为一旦你处理了第一个文件的第一行你就在第二个文件的末尾,所以在处理后续文件时第二个文件将显示为空线。您可以通过为每一行倒带第二个文件来解决这个问题。使您的代码正常工作的最小更改集是:

use std::io::Read;
use std::io::Seek;
use std::io::SeekFrom;
use std::fs::File;
use std::io::{BufRead, BufReader};

fn main() {
    let filename1 = "file1.txt";
    let filename2 = "file2.txt";

    // Open the file in read-only mode (ignoring errors).
    let file = File::open(filename1).unwrap();
    let reader = BufReader::new(file);

    let file2 = File::open(filename2).unwrap();
    let mut reader2 = BufReader::new(file2);

    // Read the file line by line using the lines() iterator from std::io::BufRead.
    for line1 in reader.lines() {
        let line1 = line1.unwrap(); // Ignore errors.

        reader2.seek (SeekFrom::Start (0)).unwrap(); // <-- Add this line
        for line2 in reader2.by_ref().lines() {      // <-- Use by_ref here
            let line2 = line2.unwrap(); // Ignore errors.
            
            if line2 == line1 {
                println!("{}",line2)
            }

        }
    }
}

但是这会很慢。您可以通过读取 HashSet 中的一个文件并检查另一个文件的每一行是否在集合中来使其更快:

use std::collections::HashSet;
use std::fs::File;
use std::io::{BufRead, BufReader};

fn main() {
    let filename1 = "file1.txt";
    let filename2 = "file2.txt";

    // Open the file in read-only mode (ignoring errors).
    let file = File::open(filename1).unwrap();
    let reader = BufReader::new(file);

    let file2 = File::open(filename2).unwrap();
    let reader2 = BufReader::new(file2);
    let lines2 = reader2.lines().collect::<Result<HashSet<_>, _>>().unwrap();

    // Read the file line by line using the lines() iterator from std::io::BufRead.
    for line1 in reader.lines() {
        let line1 = line1.unwrap(); // Ignore errors.

        if lines2.contains (&line1) {
            println!("{}", line1)
        }
    }
}

最后你也可以将两个文件读入 HashSets 并打印出交集:

use std::collections::HashSet;
use std::fs::File;
use std::io::{BufRead, BufReader};

fn main() {
    let filename1 = "file1.txt";
    let filename2 = "file2.txt";

    // Open the file in read-only mode (ignoring errors).
    let file = File::open(filename1).unwrap();
    let reader = BufReader::new(file);
    let lines1 = reader.lines().collect::<Result<HashSet<_>, _>>().unwrap();

    let file2 = File::open(filename2).unwrap();
    let reader2 = BufReader::new(file2);
    let lines2 = reader2.lines().collect::<Result<HashSet<_>, _>>().unwrap();

    for l in lines1.intersection (&lines2) {
        println!("{}", l)
    }
}

作为奖励,最后一个解决方案将删除重复的行。 OTOH 它不会保留行的顺序。