操作和转换数据框对象中的 *txt
Manipulate and convert a *txt in data frame object
我想做一个dataframe with the results of training yolo v3 model
in r。但是我在 *.txt
中有一个非常复杂的输出对象,在我的示例中:
原始文件
https://www.dropbox.com/s/pncmjwl3camap6d/log.txt?dl=0
myfile<-read.table("log.txt", sep="\t", quote="", comment.char="")
部分结构:我的文件
obj
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
416
Loaded: 0.062388 seconds
Region 82 Avg IOU: 0.254732, Class: 0.000000, Obj: 0.575008, No Obj: 0.417811, .5R: 0.000000, .75R: 0.000000, count: 4
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496387, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415856, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.263274, Class: 0.000000, Obj: 0.306391, No Obj: 0.418069, .5R: 0.000000, .75R: 0.000000, count: 4
Region 94 Avg IOU: 0.435966, Class: 0.000000, Obj: 0.207774, No Obj: 0.496172, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413582, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.303235, Class: 0.000000, Obj: 0.424457, No Obj: 0.418686, .5R: 0.000000, .75R: 0.000000, count: 4
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496352, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.579218, Class: 0.000000, Obj: 0.502197, No Obj: 0.415232, .5R: 1.000000, .75R: 0.000000, count: 1
Region 82 Avg IOU: 0.187162, Class: 0.000000, Obj: 0.501398, No Obj: 0.416089, .5R: 0.000000, .75R: 0.000000, count: 5
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496362, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.414499, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.271427, Class: 0.000000, Obj: 0.481964, No Obj: 0.417647, .5R: 0.166667, .75R: 0.000000, count: 6
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495838, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415899, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.285605, Class: 0.000000, Obj: 0.469981, No Obj: 0.417026, .5R: 0.000000, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.494833, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413943, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.300229, Class: 0.000000, Obj: 0.313481, No Obj: 0.416831, .5R: 0.000000, .75R: 0.000000, count: 6
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495936, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413855, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.384617, Class: 0.000000, Obj: 0.398042, No Obj: 0.418052, .5R: 0.333333, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496205, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.144387, Class: 0.000000, Obj: 0.349722, No Obj: 0.414624, .5R: 0.000000, .75R: 0.000000, count: 1
1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images
Loaded: 0.000042 seconds
Region 82 Avg IOU: 0.308919, Class: 0.000000, Obj: 0.264983, No Obj: 0.418332, .5R: 0.250000, .75R: 0.000000, count: 4
Region 94 Avg IOU: 0.204282, Class: 0.000000, Obj: 0.167168, No Obj: 0.495162, .5R: 0.000000, .75R: 0.000000, count: 2
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415848, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.274081, Class: 0.000000, Obj: 0.471111, No Obj: 0.418323, .5R: 0.000000, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495826, .5R: -nan, .75R: -nan, count: 0
...
现在我想创建一个数据框,我知道模型中的每个迭代都以 Loaded:
表达式和之前开始和结束
我的这个表达式 "1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images"
总是以数字开头并且 :
(表示当前训练iteration/batch),
但我需要一些规则(不需要的信息以 Region
开头并且每 24 行出现一次),首先仅针对此特定训练 iteration/batch 结果,如:
1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images
2: 799.555359, 799.253113 avg, 0.000000 rate, 672.519735 seconds, 48 images
...
55: 1025.803833, 1181.399658 avg, 0.000000 rate, 919.132681 seconds, 1320 images
然后进行一些新的操作以创建我的最终数据框:
iteration total_loss loss_error rate time n_images
1 799.219543 799.219543 0.000000 654.661284 24
2 799.555359 799.253113 0.000000 672.519735 48
...
55 1025.803833 1181.399658 0.000000 919.132681 1320
请问,已经操作过此类文件的人有什么提示吗?
您可以使用 readLines
并使用 grep
为以数字开头后跟冒号的行设置子集。之后使用 strsplit
和 gsub
进行一些清理,转换 as.numeric
和 setNames
。完成!
tmp <- readLines("log.txt")
tmp <- tmp[grep("^\d*\:", unlist(tmp))]
tmp <- do.call(rbind, strsplit(tmp, ", "))
tmp <- data.frame(do.call(rbind, strsplit(tmp[, 1], "\: ")), tmp[, -1],
stringsAsFactors=FALSE)
tmp[] <- lapply(tmp, gsub, pat="\s.+", repl="")
tmp[] <- lapply(tmp, as.numeric)
res <- setNames(tmp, c("iteration", "total_loss", "loss_error", "rate",
"time", "n_images"))
head(res)
# iteration total_loss loss_error rate time n_images
# 1 1 799.2195 799.2195 0 654.6613 24
# 2 2 799.5554 799.2531 0 672.5197 48
# 3 3 801.0438 799.4322 0 667.1184 72
# 4 4 799.9001 799.4790 0 647.3321 96
# 5 5 801.5366 799.6848 0 660.7798 120
# 6 6 799.3589 799.6522 0 683.4424 144
tidyverse
包也可以工作:
library(tidyverse)
myfile <- read_lines("log.txt")
names_col <- c("iteration", "total_loss", "loss_error", "rate", "time",
"n_images")
mydf <- myfile %>%
str_subset("images$") %>%
enframe(name = NULL) %>%
separate(col = value, into = names_col, sep = "[:,]") %>%
mutate_all(parse_number)
head(as.data.frame(mydf))
# iteration total_loss loss_error rate time n_images
# 1 1 799.2195 799.2195 0 654.6613 24
# 2 2 799.5554 799.2531 0 672.5197 48
# 3 3 801.0438 799.4322 0 667.1184 72
# 4 4 799.9001 799.4790 0 647.3321 96
# 5 5 801.5366 799.6848 0 660.7798 120
# 6 6 799.3589 799.6522 0 683.4424 144
我想做一个dataframe with the results of training yolo v3 model
in r。但是我在 *.txt
中有一个非常复杂的输出对象,在我的示例中:
原始文件
https://www.dropbox.com/s/pncmjwl3camap6d/log.txt?dl=0
myfile<-read.table("log.txt", sep="\t", quote="", comment.char="")
部分结构:我的文件
obj
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
416
Loaded: 0.062388 seconds
Region 82 Avg IOU: 0.254732, Class: 0.000000, Obj: 0.575008, No Obj: 0.417811, .5R: 0.000000, .75R: 0.000000, count: 4
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496387, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415856, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.263274, Class: 0.000000, Obj: 0.306391, No Obj: 0.418069, .5R: 0.000000, .75R: 0.000000, count: 4
Region 94 Avg IOU: 0.435966, Class: 0.000000, Obj: 0.207774, No Obj: 0.496172, .5R: 0.000000, .75R: 0.000000, count: 1
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413582, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.303235, Class: 0.000000, Obj: 0.424457, No Obj: 0.418686, .5R: 0.000000, .75R: 0.000000, count: 4
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496352, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.579218, Class: 0.000000, Obj: 0.502197, No Obj: 0.415232, .5R: 1.000000, .75R: 0.000000, count: 1
Region 82 Avg IOU: 0.187162, Class: 0.000000, Obj: 0.501398, No Obj: 0.416089, .5R: 0.000000, .75R: 0.000000, count: 5
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496362, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.414499, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.271427, Class: 0.000000, Obj: 0.481964, No Obj: 0.417647, .5R: 0.166667, .75R: 0.000000, count: 6
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495838, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415899, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.285605, Class: 0.000000, Obj: 0.469981, No Obj: 0.417026, .5R: 0.000000, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.494833, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413943, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.300229, Class: 0.000000, Obj: 0.313481, No Obj: 0.416831, .5R: 0.000000, .75R: 0.000000, count: 6
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495936, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.413855, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.384617, Class: 0.000000, Obj: 0.398042, No Obj: 0.418052, .5R: 0.333333, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.496205, .5R: -nan, .75R: -nan, count: 0
Region 106 Avg IOU: 0.144387, Class: 0.000000, Obj: 0.349722, No Obj: 0.414624, .5R: 0.000000, .75R: 0.000000, count: 1
1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images
Loaded: 0.000042 seconds
Region 82 Avg IOU: 0.308919, Class: 0.000000, Obj: 0.264983, No Obj: 0.418332, .5R: 0.250000, .75R: 0.000000, count: 4
Region 94 Avg IOU: 0.204282, Class: 0.000000, Obj: 0.167168, No Obj: 0.495162, .5R: 0.000000, .75R: 0.000000, count: 2
Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.415848, .5R: -nan, .75R: -nan, count: 0
Region 82 Avg IOU: 0.274081, Class: 0.000000, Obj: 0.471111, No Obj: 0.418323, .5R: 0.000000, .75R: 0.000000, count: 3
Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.495826, .5R: -nan, .75R: -nan, count: 0
...
现在我想创建一个数据框,我知道模型中的每个迭代都以 Loaded:
表达式和之前开始和结束
我的这个表达式 "1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images"
总是以数字开头并且 :
(表示当前训练iteration/batch),
但我需要一些规则(不需要的信息以 Region
开头并且每 24 行出现一次),首先仅针对此特定训练 iteration/batch 结果,如:
1: 799.219543, 799.219543 avg, 0.000000 rate, 654.661284 seconds, 24 images
2: 799.555359, 799.253113 avg, 0.000000 rate, 672.519735 seconds, 48 images
...
55: 1025.803833, 1181.399658 avg, 0.000000 rate, 919.132681 seconds, 1320 images
然后进行一些新的操作以创建我的最终数据框:
iteration total_loss loss_error rate time n_images
1 799.219543 799.219543 0.000000 654.661284 24
2 799.555359 799.253113 0.000000 672.519735 48
...
55 1025.803833 1181.399658 0.000000 919.132681 1320
请问,已经操作过此类文件的人有什么提示吗?
您可以使用 readLines
并使用 grep
为以数字开头后跟冒号的行设置子集。之后使用 strsplit
和 gsub
进行一些清理,转换 as.numeric
和 setNames
。完成!
tmp <- readLines("log.txt")
tmp <- tmp[grep("^\d*\:", unlist(tmp))]
tmp <- do.call(rbind, strsplit(tmp, ", "))
tmp <- data.frame(do.call(rbind, strsplit(tmp[, 1], "\: ")), tmp[, -1],
stringsAsFactors=FALSE)
tmp[] <- lapply(tmp, gsub, pat="\s.+", repl="")
tmp[] <- lapply(tmp, as.numeric)
res <- setNames(tmp, c("iteration", "total_loss", "loss_error", "rate",
"time", "n_images"))
head(res)
# iteration total_loss loss_error rate time n_images
# 1 1 799.2195 799.2195 0 654.6613 24
# 2 2 799.5554 799.2531 0 672.5197 48
# 3 3 801.0438 799.4322 0 667.1184 72
# 4 4 799.9001 799.4790 0 647.3321 96
# 5 5 801.5366 799.6848 0 660.7798 120
# 6 6 799.3589 799.6522 0 683.4424 144
tidyverse
包也可以工作:
library(tidyverse)
myfile <- read_lines("log.txt")
names_col <- c("iteration", "total_loss", "loss_error", "rate", "time",
"n_images")
mydf <- myfile %>%
str_subset("images$") %>%
enframe(name = NULL) %>%
separate(col = value, into = names_col, sep = "[:,]") %>%
mutate_all(parse_number)
head(as.data.frame(mydf))
# iteration total_loss loss_error rate time n_images
# 1 1 799.2195 799.2195 0 654.6613 24
# 2 2 799.5554 799.2531 0 672.5197 48
# 3 3 801.0438 799.4322 0 667.1184 72
# 4 4 799.9001 799.4790 0 647.3321 96
# 5 5 801.5366 799.6848 0 660.7798 120
# 6 6 799.3589 799.6522 0 683.4424 144