在 R 中对齐文本
Justify text in R
你如何在 R 中对齐文本?我所说的对齐是指段落中的每一行都具有完全相同的长度(就像您在开放式办公室或 excel 中对齐时一样)。我试图找到 strwrap
和 cat
的选项,但没有成功。
## Get some sample text example from wikipedia api
library(httr)
library(xml2)
name <- "Invictus"
url <- URLencode(sprintf("https://en.wikisource.org/w/api.php?action=parse&prop=text&page=%s&format=json", name))
res <- read_html(content(GET(url))$parse$text[[1]])
string <- iconv(xml_text(xml_find_all(res, "//p"), trim=TRUE), "latin1", "ASCII", sub=" ")[1:2]
(string <- trimws(gsub('\n|\s{3,}', ' ', paste(string, collapse=" "))))
# [1] "Out of the night that covers me, Black as the pit from pole to pole, I thank whatever gods may be For my unconquerable soul. In the fell clutch of circumstance I have not winced nor cried aloud. Under the bludgeonings of chance My head is bloody, but unbow'd. Beyond this place of wrath and tears Looms but the Horror of the shade, And yet the menace of the years Finds and shall find me unafraid. It matters not how strait the gate, How charged with punishments the scroll, I am the master of my fate: I am the captain of my soul."
上述功能的一些尝试
## Using these I can get left/right/center justified text but not
## justified like in other text editing programs or newspapers.
width <- 30
cat(paste(strwrap(string, width=width), collapse='\n'))
## Or with cat
tokens <- strsplit(string, '\s+')[[1]] # tokenise to pass to cat
out <- capture.output(cat(tokens, fill=width, sep=" ")) # strings <= width chars
cat(paste(out, collapse='\n'))
好吧,如果没有 built-in 方法,这对我的目的来说已经足够好了。感谢上面关于如何使用 html 样式的评论。
justify <- function(string, width=getOption('width'),
fill=c('random', 'right', 'left')) {
strs <- strwrap(string, width=width)
paste(fill_spaces(strs, width, match.arg(fill)), collapse="\n")
}
fill_spaces <- function(lines, width, fill) {
tokens <- strsplit(lines, '\s+')
res <- lapply(head(tokens, -1L), function(x) {
nspace <- length(x)-1L
extra <- width - sum(nchar(x)) - nspace
reps <- extra %/% nspace
extra <- extra %% nspace
times <- rep.int(if (reps>0) reps+1L else 1L, nspace)
if (extra > 0) {
if (fill=='right') times[1:extra] <- times[1:extra]+1L
else if (fill=='left')
times[(nspace-extra+1L):nspace] <- times[(nspace-extra+1L):nspace]+1L
else times[inds] <- times[(inds <- sample(nspace, extra))]+1L
}
spaces <- c('', unlist(lapply(times, formatC, x=' ', digits=NULL)))
paste(c(rbind(spaces, x)), collapse='')
})
c(res, paste(tail(tokens, 1L)[[1]], collapse = ' '))
}
cat(justify(string, width=40))
# Out of the night that covers me, Black
# as the pit from pole to pole, I thank
# whatever gods may be For my
# unconquerable soul. In the fell clutch
# of circumstance I have not winced nor
# cried aloud. Under the bludgeonings of
# chance My head is bloody, but unbow'd.
# Beyond this place of wrath and tears
# Looms but the Horror of the shade, And
# yet the menace of the years Finds and
# shall find me unafraid. It matters not
# how strait the gate, How charged with
# punishments the scroll, I am the master
# of my fate: I am the captain of my
# soul.
@jenesaisquoi - 很好的解决方案!但是我发现如果有一个带有 space 的段落分隔符或者如果 strs <- strwrap(string, width=width)
returns 一个或更少的元素,它就不起作用。
所以,我找到了一个改进的版本,它首先将字符串拆分为 paragraph/line 个断点,然后应用相同的逻辑:
justify = function(string, width = getOption('width'),
fill = c('random', 'right', 'left')) {
# Split text into paragraphs and remove trailing and leading white space.
paragraphs = gsub("^\s+|\s+$", "",
unlist(strsplit(x = string, split = "\n", fixed = TRUE)))
# NOTE: Empty elements are paragraphs break.
paragraphs = paragraphs[nchar(paragraphs) > 0]
formatted_text = lapply(paragraphs, function(paragraph){
strs = strwrap(paragraph, width = width)
paste(fill_spaces(strs, width, fill), collapse = "\n")
})
paste0(unlist(formatted_text, recursive = FALSE), collapse = "\n")
}
fill_spaces = function(lines, width, fill) {
tokens = strsplit(lines, '\s+')
res = lapply(head(tokens, -1L), function(x) {
nspace = length(x) - 1L
extra = width - sum(nchar(x)) - nspace
reps = extra %/% nspace
extra = extra %% nspace
times = rep.int(if (reps > 0) reps + 1L else 1L, nspace)
if (extra > 0) {
if (fill == 'right') times[1:extra] = times[1:extra] + 1L
else if (fill == 'left')
times[(nspace - extra + 1L):nspace] = times[(nspace - extra + 1L):nspace] + 1L
else times[inds] = times[(inds <- sample(nspace, extra))] + 1L
}
spaces <- c('', unlist(lapply(times, formatC, x = ' ', digits = NULL)))
paste(c(rbind(spaces, x)), collapse = '')
})
c(res, paste(tail(tokens, 1L)[[1]], collapse = ' '))
}
nchar_per_line = 50
string = "Colin\'s practice outfits have reached a new level recently. It's difficult to determine the effect they are having on his teammates - whether they serve more as a distraction or a nice bit of comice releif, but it is clear they fuel Colin's fire and act as a motivator to him. \n\n On another note, Colin's high fives per 36 have been through the roof recently. It really seems like something he's been focusing on in practice lately, as well as putting extra reps in at the gym. \n\n Keep an eye out for Colin, could be an interesting 10 day pickup down the line."
cat(justify(string, width = nchar_per_line))
# Colin's practice outfits have reached a new level
# recently. It's difficult to determine the effect
# they are having on his teammates - whether they
# serve more as a distraction or a nice bit of
# comice releif, but it is clear they fuel Colin's
# fire and act as a motivator to him.
# On another note, Colin's high fives per 36 have
# been through the roof recently. It really seems
# like something he's been focusing on in practice
# lately, as well as putting extra reps in at the
# gym.
# Keep an eye out for Colin, could be an
# interesting 10 day pickup down the line.
你如何在 R 中对齐文本?我所说的对齐是指段落中的每一行都具有完全相同的长度(就像您在开放式办公室或 excel 中对齐时一样)。我试图找到 strwrap
和 cat
的选项,但没有成功。
## Get some sample text example from wikipedia api
library(httr)
library(xml2)
name <- "Invictus"
url <- URLencode(sprintf("https://en.wikisource.org/w/api.php?action=parse&prop=text&page=%s&format=json", name))
res <- read_html(content(GET(url))$parse$text[[1]])
string <- iconv(xml_text(xml_find_all(res, "//p"), trim=TRUE), "latin1", "ASCII", sub=" ")[1:2]
(string <- trimws(gsub('\n|\s{3,}', ' ', paste(string, collapse=" "))))
# [1] "Out of the night that covers me, Black as the pit from pole to pole, I thank whatever gods may be For my unconquerable soul. In the fell clutch of circumstance I have not winced nor cried aloud. Under the bludgeonings of chance My head is bloody, but unbow'd. Beyond this place of wrath and tears Looms but the Horror of the shade, And yet the menace of the years Finds and shall find me unafraid. It matters not how strait the gate, How charged with punishments the scroll, I am the master of my fate: I am the captain of my soul."
上述功能的一些尝试
## Using these I can get left/right/center justified text but not
## justified like in other text editing programs or newspapers.
width <- 30
cat(paste(strwrap(string, width=width), collapse='\n'))
## Or with cat
tokens <- strsplit(string, '\s+')[[1]] # tokenise to pass to cat
out <- capture.output(cat(tokens, fill=width, sep=" ")) # strings <= width chars
cat(paste(out, collapse='\n'))
好吧,如果没有 built-in 方法,这对我的目的来说已经足够好了。感谢上面关于如何使用 html 样式的评论。
justify <- function(string, width=getOption('width'),
fill=c('random', 'right', 'left')) {
strs <- strwrap(string, width=width)
paste(fill_spaces(strs, width, match.arg(fill)), collapse="\n")
}
fill_spaces <- function(lines, width, fill) {
tokens <- strsplit(lines, '\s+')
res <- lapply(head(tokens, -1L), function(x) {
nspace <- length(x)-1L
extra <- width - sum(nchar(x)) - nspace
reps <- extra %/% nspace
extra <- extra %% nspace
times <- rep.int(if (reps>0) reps+1L else 1L, nspace)
if (extra > 0) {
if (fill=='right') times[1:extra] <- times[1:extra]+1L
else if (fill=='left')
times[(nspace-extra+1L):nspace] <- times[(nspace-extra+1L):nspace]+1L
else times[inds] <- times[(inds <- sample(nspace, extra))]+1L
}
spaces <- c('', unlist(lapply(times, formatC, x=' ', digits=NULL)))
paste(c(rbind(spaces, x)), collapse='')
})
c(res, paste(tail(tokens, 1L)[[1]], collapse = ' '))
}
cat(justify(string, width=40))
# Out of the night that covers me, Black
# as the pit from pole to pole, I thank
# whatever gods may be For my
# unconquerable soul. In the fell clutch
# of circumstance I have not winced nor
# cried aloud. Under the bludgeonings of
# chance My head is bloody, but unbow'd.
# Beyond this place of wrath and tears
# Looms but the Horror of the shade, And
# yet the menace of the years Finds and
# shall find me unafraid. It matters not
# how strait the gate, How charged with
# punishments the scroll, I am the master
# of my fate: I am the captain of my
# soul.
@jenesaisquoi - 很好的解决方案!但是我发现如果有一个带有 space 的段落分隔符或者如果 strs <- strwrap(string, width=width)
returns 一个或更少的元素,它就不起作用。
所以,我找到了一个改进的版本,它首先将字符串拆分为 paragraph/line 个断点,然后应用相同的逻辑:
justify = function(string, width = getOption('width'),
fill = c('random', 'right', 'left')) {
# Split text into paragraphs and remove trailing and leading white space.
paragraphs = gsub("^\s+|\s+$", "",
unlist(strsplit(x = string, split = "\n", fixed = TRUE)))
# NOTE: Empty elements are paragraphs break.
paragraphs = paragraphs[nchar(paragraphs) > 0]
formatted_text = lapply(paragraphs, function(paragraph){
strs = strwrap(paragraph, width = width)
paste(fill_spaces(strs, width, fill), collapse = "\n")
})
paste0(unlist(formatted_text, recursive = FALSE), collapse = "\n")
}
fill_spaces = function(lines, width, fill) {
tokens = strsplit(lines, '\s+')
res = lapply(head(tokens, -1L), function(x) {
nspace = length(x) - 1L
extra = width - sum(nchar(x)) - nspace
reps = extra %/% nspace
extra = extra %% nspace
times = rep.int(if (reps > 0) reps + 1L else 1L, nspace)
if (extra > 0) {
if (fill == 'right') times[1:extra] = times[1:extra] + 1L
else if (fill == 'left')
times[(nspace - extra + 1L):nspace] = times[(nspace - extra + 1L):nspace] + 1L
else times[inds] = times[(inds <- sample(nspace, extra))] + 1L
}
spaces <- c('', unlist(lapply(times, formatC, x = ' ', digits = NULL)))
paste(c(rbind(spaces, x)), collapse = '')
})
c(res, paste(tail(tokens, 1L)[[1]], collapse = ' '))
}
nchar_per_line = 50
string = "Colin\'s practice outfits have reached a new level recently. It's difficult to determine the effect they are having on his teammates - whether they serve more as a distraction or a nice bit of comice releif, but it is clear they fuel Colin's fire and act as a motivator to him. \n\n On another note, Colin's high fives per 36 have been through the roof recently. It really seems like something he's been focusing on in practice lately, as well as putting extra reps in at the gym. \n\n Keep an eye out for Colin, could be an interesting 10 day pickup down the line."
cat(justify(string, width = nchar_per_line))
# Colin's practice outfits have reached a new level
# recently. It's difficult to determine the effect
# they are having on his teammates - whether they
# serve more as a distraction or a nice bit of
# comice releif, but it is clear they fuel Colin's
# fire and act as a motivator to him.
# On another note, Colin's high fives per 36 have
# been through the roof recently. It really seems
# like something he's been focusing on in practice
# lately, as well as putting extra reps in at the
# gym.
# Keep an eye out for Colin, could be an
# interesting 10 day pickup down the line.