从 csv 文件动态创建数据帧列表
dynamically create list of data frames from csv files
我想一次将许多 csv 文件读取到一个大的哈希结构,在该结构中,在键(这将是 csv 文件的名称)下可以访问特定的数据集。 AFAIK R 没有哈希,所以选择使用带有命名元素的列表(如果这不正确,请纠正我)。到目前为止我的代码:
csv_files <- list.files(pattern="*.csv");
datasets <- vector("list", length(csv_files));
names(datasets) <- csv_files;
for (i in 1:length(datasets)){
csv_file <- names(datasets[i])
datasets[i] <- read.csv(file=csv_file, header=T, sep=",", skip=0, check.names=TRUE)
}
但是此代码不起作用(datasets
包含其他内容但不是特定的 csv data.frame)并且 returns 以下警告:
Warning messages:
1: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
2: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
3: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
4: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
5: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
以下是我的故障排除结果:
似乎正在从 csv 文件读取:
> csv_file <- names(datasets[1])
> temp_dataset <- read.csv(file=csv_file, header=T, sep=",", skip=0, check.names=TRUE)
> temp_dataset
ord orig pred as o.p
1 1 0 0 1 0
2 2 0 0 1 0
3 3 0 0 1 0
4 4 0 0 0 0
5 5 0 0 0 0
6 6 0 0 0 0
7 7 0 0 0 0
8 8 0 0 0 0
9 9 0 0 0 0
10 10 0 0 0 0
11 11 0 0 0 0
12 12 0 0 0 0
13 13 0 0 0 0
14 14 0 0 0 0
15 15 0 0 0 0
16 16 0 0 0 0
17 17 0 0 0 0
18 18 0 0 0 0
19 19 0 0 0 0
20 20 0 0 0 0
21 21 0 0 0 0
22 22 0 0 0 0
23 23 4 0 0 4
24 24 402 0 1 402
25 25 0 0 1 0
26 26 0 0 1 0
27 27 0 0 1 0
28 28 1 0 1 0
问题在于将这些数据分配给列表中的特定数据框
> datasets[1] <- temp_dataset[-1]
Warning message:
In datasets[1] <- temp_dataset[-1] :
number of items to replace is not a multiple of replacement length
似乎只有第一列分配给了列表中的特定数据框:
> datasets[1]
$repeating.csv
repeating.csv
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[17] 0 0 0 0 0 0 4 402 0 0 0 1
我错过了什么?
基于@RichardScriven 和@joran 的评论,我最终得到了以下解决方案:
CSV_PATH="/home/wakatana/r/csv"
CSV_FILES <- list.files(pattern="*.csv");
DATASETS <- vector("list", length(CSV_FILES))
names(DATASETS) <- CSV_FILES
for (i in 1:length(CSV_FILES)){
message(CSV_FILES[i])
full.csv.path = file.path(CSV_PATH, CSV_FILES[i])
if (CSV_FILES[i] == "skip_first_four_lines.csv"){
DATASETS[[i]] <- read.csv(file=full.csv.path, header=F, sep=",", skip=4, col.names = names(read.csv(file=full.csv.path, nrow = 0)))
}
else {
DATASETS[[i]] <- read.csv(file=full.csv.path, header=T, sep=",", skip=0, check.names=TRUE)
}
}
如果有人会展示不同的方法并解释为什么更好,我会接受他的问题。
我想一次将许多 csv 文件读取到一个大的哈希结构,在该结构中,在键(这将是 csv 文件的名称)下可以访问特定的数据集。 AFAIK R 没有哈希,所以选择使用带有命名元素的列表(如果这不正确,请纠正我)。到目前为止我的代码:
csv_files <- list.files(pattern="*.csv");
datasets <- vector("list", length(csv_files));
names(datasets) <- csv_files;
for (i in 1:length(datasets)){
csv_file <- names(datasets[i])
datasets[i] <- read.csv(file=csv_file, header=T, sep=",", skip=0, check.names=TRUE)
}
但是此代码不起作用(datasets
包含其他内容但不是特定的 csv data.frame)并且 returns 以下警告:
Warning messages:
1: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
2: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
3: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
4: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
5: In datasets[i] <- read.csv(file = csv_file, header = T, sep = ",", :
number of items to replace is not a multiple of replacement length
以下是我的故障排除结果:
似乎正在从 csv 文件读取:
> csv_file <- names(datasets[1])
> temp_dataset <- read.csv(file=csv_file, header=T, sep=",", skip=0, check.names=TRUE)
> temp_dataset
ord orig pred as o.p
1 1 0 0 1 0
2 2 0 0 1 0
3 3 0 0 1 0
4 4 0 0 0 0
5 5 0 0 0 0
6 6 0 0 0 0
7 7 0 0 0 0
8 8 0 0 0 0
9 9 0 0 0 0
10 10 0 0 0 0
11 11 0 0 0 0
12 12 0 0 0 0
13 13 0 0 0 0
14 14 0 0 0 0
15 15 0 0 0 0
16 16 0 0 0 0
17 17 0 0 0 0
18 18 0 0 0 0
19 19 0 0 0 0
20 20 0 0 0 0
21 21 0 0 0 0
22 22 0 0 0 0
23 23 4 0 0 4
24 24 402 0 1 402
25 25 0 0 1 0
26 26 0 0 1 0
27 27 0 0 1 0
28 28 1 0 1 0
问题在于将这些数据分配给列表中的特定数据框
> datasets[1] <- temp_dataset[-1]
Warning message:
In datasets[1] <- temp_dataset[-1] :
number of items to replace is not a multiple of replacement length
似乎只有第一列分配给了列表中的特定数据框:
> datasets[1]
$repeating.csv
repeating.csv
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[17] 0 0 0 0 0 0 4 402 0 0 0 1
我错过了什么?
基于@RichardScriven 和@joran 的评论,我最终得到了以下解决方案:
CSV_PATH="/home/wakatana/r/csv"
CSV_FILES <- list.files(pattern="*.csv");
DATASETS <- vector("list", length(CSV_FILES))
names(DATASETS) <- CSV_FILES
for (i in 1:length(CSV_FILES)){
message(CSV_FILES[i])
full.csv.path = file.path(CSV_PATH, CSV_FILES[i])
if (CSV_FILES[i] == "skip_first_four_lines.csv"){
DATASETS[[i]] <- read.csv(file=full.csv.path, header=F, sep=",", skip=4, col.names = names(read.csv(file=full.csv.path, nrow = 0)))
}
else {
DATASETS[[i]] <- read.csv(file=full.csv.path, header=T, sep=",", skip=0, check.names=TRUE)
}
}
如果有人会展示不同的方法并解释为什么更好,我会接受他的问题。