如何删除不重复的数据?
How to removing unduplicated data?
我尝试在 R 中解决对象检测问题。第一个数据是图像数据 (RGB)。其次是边界框信息(class,坐标...)
但是有些图片没有边界框信息。所以我检查了一些文件。结果,这是不必要的。所以我想删除这些数据。但是,为了加载图像文件,我无法删除文件目录。让我们看看。
image_all <- list.files(img_dir, pattern = "png", full.names = TRUE)
head(image_all) #And then i use magick package in r. Load and resize file.
[1] "C:/Users/Sang won kim/Desktop/train/0.png" "C:/Users/Sang won kim/Desktop/train/1.png"
[3] "C:/Users/Sang won kim/Desktop/train/10.png" "C:/Users/Sang won kim/Desktop/train/100.png"
[5] "C:/Users/Sang won kim/Desktop/train/1000.png" "C:/Users/Sang won kim/Desktop/train/1001.png"
> head(bbox)
# A tibble: 6 x 12
file_name class x1 y1 x2 y2 x3 y3 x4 y4
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 0.png 4, 4, ~ 166.9801643~ 150.2190229~ 167.1537030~ 146.5573567~ 185.2927712~ 147.4170282~ 185.119232~ 151.078694~
2 1.png 4, 4, ~ 19.29481532~ 126.6060203~ 20.46862692~ 128.2216333~ 14.06897217~ 132.8712547~ 12.8951605~ 131.255641~
3 10.png 1, 1, ~ 99.69594772~ 187.4999087~ 102.1867698~ 191.0384564~ 84.99339110~ 203.1410626~ 82.5025689~ 199.602514~
4 100.png 4, 4, 4 100.1321271~ 63.97578214~ 102.6031447~ 64.09151524~ 102.2166042~ 72.34454343~ 99.7455865~ 72.2288103~
5 1000.png 1, 1 205.8749899~ 136.5865980~ 211.2637429~ 142.6596224~ 170.5469358~ 178.7887077~ 165.158182~ 172.715683~
6 1001.png 1, 1 35.37469380~ 89.71580246~ 41.00788160~ 96.06430034~ -0.40033915~ 132.8068949~ -6.0335269~ 126.458397~
我想要的是如果 file_name 的 bbox 不存在(与 image_all 相比),删除 image_all 向量。
这是match/subset
的事情。但在从 image_all
中删除不需要的值之前,请使用 basename
和 sub
.
获取不带扩展名的文件名
bn <- sub("\..*$", "", basename(image_all))
i <- match(bbox$file_name, bn)
image_all <- image_all[i]
数据
image_all <- c("C:/Users/Sang won kim/Desktop/train/0.png","C:/Users/Sang won kim/Desktop/train/1.png",
"C:/Users/Sang won kim/Desktop/train/10.png","C:/Users/Sang won kim/Desktop/train/100.png",
"C:/Users/Sang won kim/Desktop/train/1000.png","C:/Users/Sang won kim/Desktop/train/1001.png")
file_name <- c("0", "1", "10", "1001")
bbox <- data.frame(file_name, stringsAsFactors = FALSE)
我尝试在 R 中解决对象检测问题。第一个数据是图像数据 (RGB)。其次是边界框信息(class,坐标...) 但是有些图片没有边界框信息。所以我检查了一些文件。结果,这是不必要的。所以我想删除这些数据。但是,为了加载图像文件,我无法删除文件目录。让我们看看。
image_all <- list.files(img_dir, pattern = "png", full.names = TRUE)
head(image_all) #And then i use magick package in r. Load and resize file.
[1] "C:/Users/Sang won kim/Desktop/train/0.png" "C:/Users/Sang won kim/Desktop/train/1.png"
[3] "C:/Users/Sang won kim/Desktop/train/10.png" "C:/Users/Sang won kim/Desktop/train/100.png"
[5] "C:/Users/Sang won kim/Desktop/train/1000.png" "C:/Users/Sang won kim/Desktop/train/1001.png"
> head(bbox)
# A tibble: 6 x 12
file_name class x1 y1 x2 y2 x3 y3 x4 y4
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 0.png 4, 4, ~ 166.9801643~ 150.2190229~ 167.1537030~ 146.5573567~ 185.2927712~ 147.4170282~ 185.119232~ 151.078694~
2 1.png 4, 4, ~ 19.29481532~ 126.6060203~ 20.46862692~ 128.2216333~ 14.06897217~ 132.8712547~ 12.8951605~ 131.255641~
3 10.png 1, 1, ~ 99.69594772~ 187.4999087~ 102.1867698~ 191.0384564~ 84.99339110~ 203.1410626~ 82.5025689~ 199.602514~
4 100.png 4, 4, 4 100.1321271~ 63.97578214~ 102.6031447~ 64.09151524~ 102.2166042~ 72.34454343~ 99.7455865~ 72.2288103~
5 1000.png 1, 1 205.8749899~ 136.5865980~ 211.2637429~ 142.6596224~ 170.5469358~ 178.7887077~ 165.158182~ 172.715683~
6 1001.png 1, 1 35.37469380~ 89.71580246~ 41.00788160~ 96.06430034~ -0.40033915~ 132.8068949~ -6.0335269~ 126.458397~
我想要的是如果 file_name 的 bbox 不存在(与 image_all 相比),删除 image_all 向量。
这是match/subset
的事情。但在从 image_all
中删除不需要的值之前,请使用 basename
和 sub
.
bn <- sub("\..*$", "", basename(image_all))
i <- match(bbox$file_name, bn)
image_all <- image_all[i]
数据
image_all <- c("C:/Users/Sang won kim/Desktop/train/0.png","C:/Users/Sang won kim/Desktop/train/1.png",
"C:/Users/Sang won kim/Desktop/train/10.png","C:/Users/Sang won kim/Desktop/train/100.png",
"C:/Users/Sang won kim/Desktop/train/1000.png","C:/Users/Sang won kim/Desktop/train/1001.png")
file_name <- c("0", "1", "10", "1001")
bbox <- data.frame(file_name, stringsAsFactors = FALSE)