如何在 R 中使用 separate 将列拆分为两列
How to split column into two in R using separate
我有一个数据集,其中有一列这样的位置 (41.797634883, -87.708426986)。我想把它分成纬度和经度。我尝试使用 tidyr 包中的单独方法
library(dplyr)
library(tidyr)
df <- data.frame(x = c('(4, 9)', '(9, 10)', '(20, 100)', '(100, 200)'))
df %>% separate(x, c('Latitude', 'Longitude'))
但我遇到了这个错误
Error: Values not split into 2 pieces at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
我做错了什么?
指定分隔符
dataframe %>% separate(Location, c('Latitude', 'Longitude'), sep=",")
但是,extract
看起来更干净,因为您可以同时删除“()”
dataframe %>% extract(x, c("Latitude", "Longitude"), "\(([^,]+), ([^)]+)\)")
或者,您可以获取数字并使用 stringi 包创建数据框。
library(stringi)
data.frame(lat = stri_extract_first(mydf$x, regex = "\d{1,}.\d{1,}"),
lon = stri_extract_last(mydf$x, regex = "\d{1,}.\d{1,}"))
# lat lon
#1 41.797634883 87.708426986
#2 41.911390159 87.732635428
#3 41.672925444 87.642819748
#4 41.759925265 87.698867528
#5 41.856122914 87.717449534
#6 41.900794625 87.671240384
数据
mydf <- structure(list(x = structure(c(3L, 6L, 1L, 2L, 4L, 5L), .Label = c("(41.672925444, -87.642819748)",
"(41.759925265, -87.698867528)", "(41.797634883, -87.708426986)",
"(41.856122914, -87.717449534)", "(41.900794625, -87.671240384)",
"(41.911390159, -87.732635428)"), class = "factor")), .Names = "x", row.names = c(NA,
-6L), class = "data.frame")
您可以使用 base R
来执行此操作。用gsub
去掉括号,用read.table
读取列'x'(根据@jazzuro的例子)分成两列。
read.table(text=gsub('[()]', '', mydf$x),
sep=",", col.names=c('Latitute', 'Longitude'))
# Latitute Longitude
#1 41.79763 -87.70843
#2 41.91139 -87.73264
#3 41.67293 -87.64282
#4 41.75993 -87.69887
#5 41.85612 -87.71745
#6 41.90079 -87.67124
我有一个数据集,其中有一列这样的位置 (41.797634883, -87.708426986)。我想把它分成纬度和经度。我尝试使用 tidyr 包中的单独方法
library(dplyr)
library(tidyr)
df <- data.frame(x = c('(4, 9)', '(9, 10)', '(20, 100)', '(100, 200)'))
df %>% separate(x, c('Latitude', 'Longitude'))
但我遇到了这个错误
Error: Values not split into 2 pieces at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
我做错了什么?
指定分隔符
dataframe %>% separate(Location, c('Latitude', 'Longitude'), sep=",")
但是,extract
看起来更干净,因为您可以同时删除“()”
dataframe %>% extract(x, c("Latitude", "Longitude"), "\(([^,]+), ([^)]+)\)")
或者,您可以获取数字并使用 stringi 包创建数据框。
library(stringi)
data.frame(lat = stri_extract_first(mydf$x, regex = "\d{1,}.\d{1,}"),
lon = stri_extract_last(mydf$x, regex = "\d{1,}.\d{1,}"))
# lat lon
#1 41.797634883 87.708426986
#2 41.911390159 87.732635428
#3 41.672925444 87.642819748
#4 41.759925265 87.698867528
#5 41.856122914 87.717449534
#6 41.900794625 87.671240384
数据
mydf <- structure(list(x = structure(c(3L, 6L, 1L, 2L, 4L, 5L), .Label = c("(41.672925444, -87.642819748)",
"(41.759925265, -87.698867528)", "(41.797634883, -87.708426986)",
"(41.856122914, -87.717449534)", "(41.900794625, -87.671240384)",
"(41.911390159, -87.732635428)"), class = "factor")), .Names = "x", row.names = c(NA,
-6L), class = "data.frame")
您可以使用 base R
来执行此操作。用gsub
去掉括号,用read.table
读取列'x'(根据@jazzuro的例子)分成两列。
read.table(text=gsub('[()]', '', mydf$x),
sep=",", col.names=c('Latitute', 'Longitude'))
# Latitute Longitude
#1 41.79763 -87.70843
#2 41.91139 -87.73264
#3 41.67293 -87.64282
#4 41.75993 -87.69887
#5 41.85612 -87.71745
#6 41.90079 -87.67124