Rust - 开放动态数量的作者
Rust - open dynamic number of writers
假设我有来自文件(条形码)的动态输入字符串数。
我想根据与输入字符串的匹配拆分一个 111GB 的巨大文本文件,并将这些匹配项写入文件。
我不知道会有多少输入。
我已经完成了所有的文件输入和字符串匹配,但是卡在了输出步骤。
理想情况下,我会为输入向量条形码中的每个输入打开一个文件,只包含字符串。有什么方法可以打开动态数量的输出文件吗?
一种次优的方法是搜索条形码字符串作为输入参数,但这意味着我必须反复读取这个巨大的文件。
条形码输入向量只包含字符串,例如
“塔格塔特”,
"TAGAGTAG",
理想情况下,如果输入前两个字符串,输出应该是这样的
file1 -> TAGAGTAT.txt
file2 -> TAGAGTAG.txt
感谢您的帮助。
extern crate needletail;
use needletail::{parse_fastx_file, Sequence, FastxReader};
use std::str;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
fn read_barcodes () -> Vec<String> {
// TODO - can replace this with file reading code (OR move to an arguments based model, parse and demultiplex only one oligomer at a time..... )
// The `vec!` macro can be used to initialize a vector or strings
let barcodes = vec![
"TCTCAAAG".to_string(),
"AACTCCGC".into(),
"TAAACGCG".into()
];
println!("Initial vector: {:?}", barcodes);
return barcodes
}
fn main() {
//let filename = "test5m.fastq";
let filename = "Undetermined_S0_R1.fastq";
println!("Fastq filename: {} ", filename);
//println!("Barcodes filename: {} ", barcodes_filename);
let barcodes_vector: Vec<String> = read_barcodes();
let mut counts_vector: [i32; 30] = [0; 30];
let mut n_bases = 0;
let mut n_valid_kmers = 0;
let mut reader = parse_fastx_file(&filename).expect("Not a valid path/file");
while let Some(record) = reader.next() {
let seqrec = record.expect("invalid record");
// get sequence
let sequenceBytes = seqrec.normalize(false);
let sequenceText = str::from_utf8(&sequenceBytes).unwrap();
//println!("Seq: {} ", &sequenceText);
// get first 8 chars (8chars x 2 bytes)
let sequenceOligo = &sequenceText[0..8];
//println!("barcode vector {}, seqOligo {} ", &barcodes_vector[0], sequenceOligo);
if sequenceOligo == barcodes_vector[0]{
//println!("Hit ! Barcode vector {}, seqOligo {} ", &barcodes_vector[0], sequenceOligo);
counts_vector[0] = counts_vector[0] + 1;
}
您可能想要 HashMap<String, File>
。您可以像这样从条形码向量构建它:
use std::collections::HashMap;
use std::fs::File;
use std::path::Path;
fn build_file_map(barcodes: &[String]) -> HashMap<String, File> {
let mut files = HashMap::new();
for barcode in barcodes {
let filename = Path::new(barcode).with_extension("txt");
let file = File::create(filename).expect("failed to create output file");
files.insert(barcode.clone(), file);
}
files
}
你可以这样称呼它:
let barcodes = vec!["TCTCAAAG".to_string(), "AACTCCGC".into(), "TAAACGCG".into()];
let file_map = build_file_map(&barcodes);
你会得到一个文件,可以像这样写入:
let barcode = barcodes[0];
let file = file_map.get(&barcode).expect("barcode not in file map");
// write to file
I just need an example of a) how to properly instantiate a vector of files named after the relevant string b) setup the output file objects properly c) write to those files.
这是一个注释示例:
use std::io::Write;
use std::fs::File;
use std::io;
fn read_barcodes() -> Vec<String> {
// read barcodes here
todo!()
}
fn process_barcode(barcode: &str) -> String {
// process barcodes here
todo!()
}
fn main() -> io::Result<()> {
let barcodes = read_barcodes();
for barcode in barcodes {
// process barcode to get output
let output = process_barcode(&barcode);
// create file for barcode with {barcode}.txt name
let mut file = File::create(format!("{}.txt", barcode))?;
// write output to created file
file.write_all(output.as_bytes());
}
Ok(())
}
假设我有来自文件(条形码)的动态输入字符串数。 我想根据与输入字符串的匹配拆分一个 111GB 的巨大文本文件,并将这些匹配项写入文件。
我不知道会有多少输入。
我已经完成了所有的文件输入和字符串匹配,但是卡在了输出步骤。
理想情况下,我会为输入向量条形码中的每个输入打开一个文件,只包含字符串。有什么方法可以打开动态数量的输出文件吗?
一种次优的方法是搜索条形码字符串作为输入参数,但这意味着我必须反复读取这个巨大的文件。
条形码输入向量只包含字符串,例如 “塔格塔特”, "TAGAGTAG",
理想情况下,如果输入前两个字符串,输出应该是这样的
file1 -> TAGAGTAT.txt
file2 -> TAGAGTAG.txt
感谢您的帮助。
extern crate needletail;
use needletail::{parse_fastx_file, Sequence, FastxReader};
use std::str;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
fn read_barcodes () -> Vec<String> {
// TODO - can replace this with file reading code (OR move to an arguments based model, parse and demultiplex only one oligomer at a time..... )
// The `vec!` macro can be used to initialize a vector or strings
let barcodes = vec![
"TCTCAAAG".to_string(),
"AACTCCGC".into(),
"TAAACGCG".into()
];
println!("Initial vector: {:?}", barcodes);
return barcodes
}
fn main() {
//let filename = "test5m.fastq";
let filename = "Undetermined_S0_R1.fastq";
println!("Fastq filename: {} ", filename);
//println!("Barcodes filename: {} ", barcodes_filename);
let barcodes_vector: Vec<String> = read_barcodes();
let mut counts_vector: [i32; 30] = [0; 30];
let mut n_bases = 0;
let mut n_valid_kmers = 0;
let mut reader = parse_fastx_file(&filename).expect("Not a valid path/file");
while let Some(record) = reader.next() {
let seqrec = record.expect("invalid record");
// get sequence
let sequenceBytes = seqrec.normalize(false);
let sequenceText = str::from_utf8(&sequenceBytes).unwrap();
//println!("Seq: {} ", &sequenceText);
// get first 8 chars (8chars x 2 bytes)
let sequenceOligo = &sequenceText[0..8];
//println!("barcode vector {}, seqOligo {} ", &barcodes_vector[0], sequenceOligo);
if sequenceOligo == barcodes_vector[0]{
//println!("Hit ! Barcode vector {}, seqOligo {} ", &barcodes_vector[0], sequenceOligo);
counts_vector[0] = counts_vector[0] + 1;
}
您可能想要 HashMap<String, File>
。您可以像这样从条形码向量构建它:
use std::collections::HashMap;
use std::fs::File;
use std::path::Path;
fn build_file_map(barcodes: &[String]) -> HashMap<String, File> {
let mut files = HashMap::new();
for barcode in barcodes {
let filename = Path::new(barcode).with_extension("txt");
let file = File::create(filename).expect("failed to create output file");
files.insert(barcode.clone(), file);
}
files
}
你可以这样称呼它:
let barcodes = vec!["TCTCAAAG".to_string(), "AACTCCGC".into(), "TAAACGCG".into()];
let file_map = build_file_map(&barcodes);
你会得到一个文件,可以像这样写入:
let barcode = barcodes[0];
let file = file_map.get(&barcode).expect("barcode not in file map");
// write to file
I just need an example of a) how to properly instantiate a vector of files named after the relevant string b) setup the output file objects properly c) write to those files.
这是一个注释示例:
use std::io::Write;
use std::fs::File;
use std::io;
fn read_barcodes() -> Vec<String> {
// read barcodes here
todo!()
}
fn process_barcode(barcode: &str) -> String {
// process barcodes here
todo!()
}
fn main() -> io::Result<()> {
let barcodes = read_barcodes();
for barcode in barcodes {
// process barcode to get output
let output = process_barcode(&barcode);
// create file for barcode with {barcode}.txt name
let mut file = File::create(format!("{}.txt", barcode))?;
// write output to created file
file.write_all(output.as_bytes());
}
Ok(())
}