在 R 中翻译 android 邮件的编码
Translate encoding of android mail in R
问题
我正在使用 R 包 mRpostman
来使用 R 访问我的邮件帐户。当我通过 Thunderbird 获取从我的计算机发送到专用邮件地址的邮件时,一切正常。但是当我使用我的 Android phone 做同样的事情时,文本被奇怪地编码并且不再清晰可辨。我该如何解决?我试过使用 base64enc::base64decode()
但我无法让它工作。我尝试通过 Encoding()
.
更改编码,同样失败了
代表
我发了两封邮件。一个来自我使用 Thunderbird 的计算机,文本只是“从计算机上的 Thunderbird 发送”。另一封邮件是使用我的 Android phone 使用默认邮件应用程序发送的。这一个仅包含文本“发自 Android”。
library(mRpostman) # for email communication
# Connect to mail server
imap_mail <- 'imaps://imap.gmail.com' # mail client
user_mail <- keyring::key_get('dataviz-mail')
password_mail <- keyring::key_get('dataviz-mail-password')
# Establish connection to imap server
con <- configure_imap(
url = imap_mail,
user = user_mail,
password = password_mail
)
# Switch to Inbox
con$select_folder('Inbox')
# Fetch Thunderbird mail
con$fetch_text(11)
#> $text11
#> [1] "Sent from thunderbird on computer\r\n\r\n"
# Fetch Android mail
con$fetch_text(12)
#> $text12
#> [1] "----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nVGhpcyBtYWlsIGlzIHNlbnQgZnJvbSBBbmRyb2lk\r\n\r\n----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw+PGhlYWQ+PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPlRoaXMgbWFpbCBp\r\ncyBzZW50IGZyb20gQW5kcm9pZDwvYm9keT48L2h0bWw+\r\n\r\n----_com.samsung.android.email_7640956728775490--\r\n\r\n"
由 reprex package (v2.0.0)
于 2022-04-06 创建
更新
Allan Cameron 的解决方案有效但删除了换行符
library(tidyverse)
text_that_should_contain_line_breaks <- "----_com.samsung.android.email_6729645824359240\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\naHR0cHM6Ly90d2l0dGVyLmNvbS9jX2dlYmhhcmQvc3RhdHVzLzE1MTA4NjcwMDkxMTM5MjM1ODg/\r\ncz0yMCZ0PWR0X3dvVkV2a3dPSjBfRGZUc2ttZUFIYW5kZHJhd24gZm9udCBoZWFkaW5nVm9uIG1l\r\naW5lbS9tZWluZXIgR2FsYXh5IGdlc2VuZGV0\r\n\r\n----_com.samsung.android.email_6729645824359240\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw+PGhlYWQ+PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPmh0dHBzOi8vdHdp\r\ndHRlci5jb20vY19nZWJoYXJkL3N0YXR1cy8xNTEwODY3MDA5MTEzOTIzNTg4P3M9MjAmYW1wO3Q9\r\nZHRfd29WRXZrd09KMF9EZlRza21lQTxkaXYgZGlyPSJhdXRvIj48YnI+PC9kaXY+PGRpdiBkaXI9\r\nImF1dG8iPkhhbmRkcmF3biBmb250IGhlYWRpbmc8L2Rpdj48ZGl2IGRpcj0iYXV0byI+PGJyPjwv\r\nZGl2PjxkaXYgaWQ9ImNvbXBvc2VyX3NpZ25hdHVyZSIgZGlyPSJhdXRvIj48ZGl2IHN0eWxlPSJm\r\nb250LXNpemU6MTJweDtjb2xvcjojNTc1NzU3IiBkaXI9ImF1dG8iPlZvbiBtZWluZW0vbWVpbmVy\r\nIEdhbGF4eSBnZXNlbmRldDwvZGl2PjwvZGl2PjxkaXYgZGlyPSJhdXRvIj48YnI+PC9kaXY+PC9i\r\nb2R5PjwvaHRtbD4=\r\n\r\n----_com.samsung.android.email_6729645824359240--\r\n\r\n"
decoded <- text_that_should_contain_line_breaks %>%
str_match('base64\r\n\r\n([[:alpha:][:digit:]/\r\n]*)----') %>%
.[, 2] %>%
base64enc::base64decode() %>%
rawToChar()
decoded
#> [1] "https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeAHanddrawn font headingVon meinem/meiner Galaxy gesendet"
# But should be
cat("https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA\nHanddrawn font heading\nVon meinem/meiner Galaxy gesendet")
#> https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA
#> Handdrawn font heading
#> Von meinem/meiner Galaxy gesendet
由 reprex package (v2.0.0)
于 2022-04-11 创建
android 字符串确实包含 base 64 编码的消息,但它嵌入在其他非 base64 编码的文本中,因此您必须提取它。
如果我们从您的问题中提取字符串:
text12 <- "----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nVGhpcyBtYWlsIGlzIHNlbnQgZnJvbSBBbmRyb2lk\r\n\r\n----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw+PGhlYWQ+PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPlRoaXMgbWFpbCBp\r\ncyBzZW50IGZyb20gQW5kcm9pZDwvYm9keT48L2h0bWw+\r\n\r\n----_com.samsung.android.email_7640956728775490--\r\n\r\n"
然后我们可以分割出 base 64 字符串,将其解码为字节并转换为字符,如下所示:
library(dplyr)
library(purrr)
library(base64enc)
text12 %>%
strsplit("base64\r\n\r\n") %>%
pluck(1, 2) %>%
strsplit("----") %>%
pluck(1, 1) %>%
gsub(pattern = "[\r\n]+", replacement = "", .) %>%
base64decode() %>%
rawToChar()
#> [1] "This mail is sent from Android"
由 reprex package (v2.0.1)
于 2022-04-06 创建
更新
消息似乎存储了两次:一次是纯文本,第二次是 html-formatted 文本。纯文本中没有实际的换行符,html 只是因为 <br>
标记而有换行符。获取保留换行符的文本的最简单方法是解析 html.
parsed_content <- text_that_should_contain_line_breaks %>%
strsplit("base64\r\n\r\n") %>%
pluck(1, 3) %>%
strsplit("----") %>%
pluck(1, 1) %>%
base64decode() %>%
rawToChar() %>%
rvest::read_html() %>%
rvest::html_text2()
例如:
cat(parsed_content)
#> https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA
#>
#>
#> Handdrawn font heading
#>
#>
#> Von meinem/meiner Galaxy gesendet
问题
我正在使用 R 包 mRpostman
来使用 R 访问我的邮件帐户。当我通过 Thunderbird 获取从我的计算机发送到专用邮件地址的邮件时,一切正常。但是当我使用我的 Android phone 做同样的事情时,文本被奇怪地编码并且不再清晰可辨。我该如何解决?我试过使用 base64enc::base64decode()
但我无法让它工作。我尝试通过 Encoding()
.
代表
我发了两封邮件。一个来自我使用 Thunderbird 的计算机,文本只是“从计算机上的 Thunderbird 发送”。另一封邮件是使用我的 Android phone 使用默认邮件应用程序发送的。这一个仅包含文本“发自 Android”。
library(mRpostman) # for email communication
# Connect to mail server
imap_mail <- 'imaps://imap.gmail.com' # mail client
user_mail <- keyring::key_get('dataviz-mail')
password_mail <- keyring::key_get('dataviz-mail-password')
# Establish connection to imap server
con <- configure_imap(
url = imap_mail,
user = user_mail,
password = password_mail
)
# Switch to Inbox
con$select_folder('Inbox')
# Fetch Thunderbird mail
con$fetch_text(11)
#> $text11
#> [1] "Sent from thunderbird on computer\r\n\r\n"
# Fetch Android mail
con$fetch_text(12)
#> $text12
#> [1] "----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nVGhpcyBtYWlsIGlzIHNlbnQgZnJvbSBBbmRyb2lk\r\n\r\n----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw+PGhlYWQ+PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPlRoaXMgbWFpbCBp\r\ncyBzZW50IGZyb20gQW5kcm9pZDwvYm9keT48L2h0bWw+\r\n\r\n----_com.samsung.android.email_7640956728775490--\r\n\r\n"
由 reprex package (v2.0.0)
于 2022-04-06 创建更新
Allan Cameron 的解决方案有效但删除了换行符
library(tidyverse)
text_that_should_contain_line_breaks <- "----_com.samsung.android.email_6729645824359240\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\naHR0cHM6Ly90d2l0dGVyLmNvbS9jX2dlYmhhcmQvc3RhdHVzLzE1MTA4NjcwMDkxMTM5MjM1ODg/\r\ncz0yMCZ0PWR0X3dvVkV2a3dPSjBfRGZUc2ttZUFIYW5kZHJhd24gZm9udCBoZWFkaW5nVm9uIG1l\r\naW5lbS9tZWluZXIgR2FsYXh5IGdlc2VuZGV0\r\n\r\n----_com.samsung.android.email_6729645824359240\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw+PGhlYWQ+PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPmh0dHBzOi8vdHdp\r\ndHRlci5jb20vY19nZWJoYXJkL3N0YXR1cy8xNTEwODY3MDA5MTEzOTIzNTg4P3M9MjAmYW1wO3Q9\r\nZHRfd29WRXZrd09KMF9EZlRza21lQTxkaXYgZGlyPSJhdXRvIj48YnI+PC9kaXY+PGRpdiBkaXI9\r\nImF1dG8iPkhhbmRkcmF3biBmb250IGhlYWRpbmc8L2Rpdj48ZGl2IGRpcj0iYXV0byI+PGJyPjwv\r\nZGl2PjxkaXYgaWQ9ImNvbXBvc2VyX3NpZ25hdHVyZSIgZGlyPSJhdXRvIj48ZGl2IHN0eWxlPSJm\r\nb250LXNpemU6MTJweDtjb2xvcjojNTc1NzU3IiBkaXI9ImF1dG8iPlZvbiBtZWluZW0vbWVpbmVy\r\nIEdhbGF4eSBnZXNlbmRldDwvZGl2PjwvZGl2PjxkaXYgZGlyPSJhdXRvIj48YnI+PC9kaXY+PC9i\r\nb2R5PjwvaHRtbD4=\r\n\r\n----_com.samsung.android.email_6729645824359240--\r\n\r\n"
decoded <- text_that_should_contain_line_breaks %>%
str_match('base64\r\n\r\n([[:alpha:][:digit:]/\r\n]*)----') %>%
.[, 2] %>%
base64enc::base64decode() %>%
rawToChar()
decoded
#> [1] "https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeAHanddrawn font headingVon meinem/meiner Galaxy gesendet"
# But should be
cat("https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA\nHanddrawn font heading\nVon meinem/meiner Galaxy gesendet")
#> https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA
#> Handdrawn font heading
#> Von meinem/meiner Galaxy gesendet
由 reprex package (v2.0.0)
于 2022-04-11 创建android 字符串确实包含 base 64 编码的消息,但它嵌入在其他非 base64 编码的文本中,因此您必须提取它。
如果我们从您的问题中提取字符串:
text12 <- "----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nVGhpcyBtYWlsIGlzIHNlbnQgZnJvbSBBbmRyb2lk\r\n\r\n----_com.samsung.android.email_7640956728775490\r\nContent-Type: text/html; charset=utf-8\r\nContent-Transfer-Encoding: base64\r\n\r\nPGh0bWw+PGhlYWQ+PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0\r\nL2h0bWw7IGNoYXJzZXQ9VVRGLTgiPjwvaGVhZD48Ym9keSBkaXI9ImF1dG8iPlRoaXMgbWFpbCBp\r\ncyBzZW50IGZyb20gQW5kcm9pZDwvYm9keT48L2h0bWw+\r\n\r\n----_com.samsung.android.email_7640956728775490--\r\n\r\n"
然后我们可以分割出 base 64 字符串,将其解码为字节并转换为字符,如下所示:
library(dplyr)
library(purrr)
library(base64enc)
text12 %>%
strsplit("base64\r\n\r\n") %>%
pluck(1, 2) %>%
strsplit("----") %>%
pluck(1, 1) %>%
gsub(pattern = "[\r\n]+", replacement = "", .) %>%
base64decode() %>%
rawToChar()
#> [1] "This mail is sent from Android"
由 reprex package (v2.0.1)
于 2022-04-06 创建更新
消息似乎存储了两次:一次是纯文本,第二次是 html-formatted 文本。纯文本中没有实际的换行符,html 只是因为 <br>
标记而有换行符。获取保留换行符的文本的最简单方法是解析 html.
parsed_content <- text_that_should_contain_line_breaks %>%
strsplit("base64\r\n\r\n") %>%
pluck(1, 3) %>%
strsplit("----") %>%
pluck(1, 1) %>%
base64decode() %>%
rawToChar() %>%
rvest::read_html() %>%
rvest::html_text2()
例如:
cat(parsed_content)
#> https://twitter.com/c_gebhard/status/1510867009113923588?s=20&t=dt_woVEvkwOJ0_DfTskmeA
#>
#>
#> Handdrawn font heading
#>
#>
#> Von meinem/meiner Galaxy gesendet