由于字符,从 Chinook 数据集中检索 table 时出现问题
Problem with retrieving a table from Chinook dataset due to character
我在 MySQL 中使用 Chinook 数据集中的数据创建了一个数据库,其中包含有关购买音乐的客户的虚构信息。
其中一个表格(“发票”)包含帐单地址,其中包含多种语言的字符:
InvoiceId CustomerId InvoiceDate BillingAddress
1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34
2 4 2009-01-02 00:00:00 Ullevålsveien 14
3 8 2009-01-03 00:00:00 Grétrystraat 63
4 14 2009-01-06 00:00:00 8210 111 ST NW
我尝试使用 R 检索数据,代码如下:
library(DBI)
library(RMySQL)
library(dplyr)
library(magrittr)
library(lubridate)
library(stringi)
# Step 1 - Connect to the database ----------------------------------------
con <- DBI::dbConnect(MySQL(),
dbname = Sys.getenv("DB_CHINOOK"),
host = Sys.getenv("HST_CHINOOK"),
user = Sys.getenv("USR_CHINOOK"),
password = Sys.getenv("PASS_CHINOOK"),
port = XXXX)
invoices_tbl <- tbl(con, "Invoice") %>%
collect()
连接没问题,但是在尝试可视化数据时,我看不到特殊字符:
> head(invoices_tbl[,1:4])
# A tibble: 6 x 4
InvoiceId CustomerId InvoiceDate BillingAddress
<int> <int> <chr> <chr>
1 1 2 2009-01-01 00:00:00 "Theodor-Heuss-Stra\xdfe 34"
2 2 4 2009-01-02 00:00:00 "Ullev\xe5lsveien 14"
3 3 8 2009-01-03 00:00:00 "Gr\xe9trystraat 63"
4 4 14 2009-01-06 00:00:00 "8210 111 ST NW"
5 5 23 2009-01-11 00:00:00 "69 Salem Street"
6 6 37 2009-01-19 00:00:00 "Berger Stra\xdfe 10"
我的问题是,我是否应该更改 MySQL 中的配置?还是 R 的问题?我怎样才能看到特殊字符? \xdfe
是什么意思?
拜托,我们将不胜感激。
十六进制格式可以用iconv
转换
invoices_tbl$BillingAddress <- iconv(invoices_tbl$BillingAddress,
"latin1", "utf-8")
-输出
invoices_tbl
InvoiceId CustomerId InvoiceDate BillingAddress
1 1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34
2 2 4 2009-01-02 00:00:00 Ullevålsveien 14
3 3 8 2009-01-03 00:00:00 Grétrystraat 63
4 4 14 2009-01-06 00:00:00 8210 111 ST NW
5 5 23 2009-01-11 00:00:00 69 Salem Street
6 6 37 2009-01-19 00:00:00 Berger Straße 10
数据
invoices_tbl <- structure(list(InvoiceId = 1:6, CustomerId = c(2L, 4L, 8L, 14L,
23L, 37L), InvoiceDate = c("2009-01-01 00:00:00", "2009-01-02 00:00:00",
"2009-01-03 00:00:00", "2009-01-06 00:00:00", "2009-01-11 00:00:00",
"2009-01-19 00:00:00"), BillingAddress = c("Theodor-Heuss-Stra\xdfe 34",
"Ullev\xe5lsveien 14", "Gr\xe9trystraat 63", "8210 111 ST NW",
"69 Salem Street", "Berger Stra\xdfe 10")), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")
我在 MySQL 中使用 Chinook 数据集中的数据创建了一个数据库,其中包含有关购买音乐的客户的虚构信息。
其中一个表格(“发票”)包含帐单地址,其中包含多种语言的字符:
InvoiceId CustomerId InvoiceDate BillingAddress
1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34
2 4 2009-01-02 00:00:00 Ullevålsveien 14
3 8 2009-01-03 00:00:00 Grétrystraat 63
4 14 2009-01-06 00:00:00 8210 111 ST NW
我尝试使用 R 检索数据,代码如下:
library(DBI)
library(RMySQL)
library(dplyr)
library(magrittr)
library(lubridate)
library(stringi)
# Step 1 - Connect to the database ----------------------------------------
con <- DBI::dbConnect(MySQL(),
dbname = Sys.getenv("DB_CHINOOK"),
host = Sys.getenv("HST_CHINOOK"),
user = Sys.getenv("USR_CHINOOK"),
password = Sys.getenv("PASS_CHINOOK"),
port = XXXX)
invoices_tbl <- tbl(con, "Invoice") %>%
collect()
连接没问题,但是在尝试可视化数据时,我看不到特殊字符:
> head(invoices_tbl[,1:4])
# A tibble: 6 x 4
InvoiceId CustomerId InvoiceDate BillingAddress
<int> <int> <chr> <chr>
1 1 2 2009-01-01 00:00:00 "Theodor-Heuss-Stra\xdfe 34"
2 2 4 2009-01-02 00:00:00 "Ullev\xe5lsveien 14"
3 3 8 2009-01-03 00:00:00 "Gr\xe9trystraat 63"
4 4 14 2009-01-06 00:00:00 "8210 111 ST NW"
5 5 23 2009-01-11 00:00:00 "69 Salem Street"
6 6 37 2009-01-19 00:00:00 "Berger Stra\xdfe 10"
我的问题是,我是否应该更改 MySQL 中的配置?还是 R 的问题?我怎样才能看到特殊字符? \xdfe
是什么意思?
拜托,我们将不胜感激。
十六进制格式可以用iconv
invoices_tbl$BillingAddress <- iconv(invoices_tbl$BillingAddress,
"latin1", "utf-8")
-输出
invoices_tbl
InvoiceId CustomerId InvoiceDate BillingAddress
1 1 2 2009-01-01 00:00:00 Theodor-Heuss-Straße 34
2 2 4 2009-01-02 00:00:00 Ullevålsveien 14
3 3 8 2009-01-03 00:00:00 Grétrystraat 63
4 4 14 2009-01-06 00:00:00 8210 111 ST NW
5 5 23 2009-01-11 00:00:00 69 Salem Street
6 6 37 2009-01-19 00:00:00 Berger Straße 10
数据
invoices_tbl <- structure(list(InvoiceId = 1:6, CustomerId = c(2L, 4L, 8L, 14L,
23L, 37L), InvoiceDate = c("2009-01-01 00:00:00", "2009-01-02 00:00:00",
"2009-01-03 00:00:00", "2009-01-06 00:00:00", "2009-01-11 00:00:00",
"2009-01-19 00:00:00"), BillingAddress = c("Theodor-Heuss-Stra\xdfe 34",
"Ullev\xe5lsveien 14", "Gr\xe9trystraat 63", "8210 111 ST NW",
"69 Salem Street", "Berger Stra\xdfe 10")), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")