用 R 中的常规减号替换长破折号的所有实例
Replace all instances of long dashes – with regular minus - signs in R
在我下面的文本数据中,显然有一个类似于长破折号的特殊字符–
。但这实际上需要一个常规的减号-
。
有没有办法用 R 中的常规减号 -
替换所有长破折号 –
的实例,这样我就可以在 dat
中使用: read.table(text = dat, header = TRUE)
?
dat <- "
Study Outcome Subscale g Variance Precision
1 1 1 –.251 .024 41.455
2 1 1 –.069 .001 1,361.067
3 1 5 .138 .001 957.620
4 1 1 –.754 .085 11.809
5 1 1 –.228 .020 49.598
6 1 6 –.212 .004 246.180
6 2 7 .219 .004 246.095
7 1 1 .000 .012 83.367
8 1 2 –.103 .006 162.778
8 2 3 .138 .006 162.612
8 3 4 –.387 .006 160.133
9 1 1 –.032 .023 44.415
10 1 5 –.020 .058 17.110
11 1 1 .128 .017 59.999
12 1 1 –.262 .032 31.505
13 1 1 –.046 .071 14.080
14 1 6 –.324 .003 381.620
14 2 6 –.409 .003 378.611
14 3 7 .080 .003 386.319
14 4 7 –.140 .003 385.542
15 1 1 .311 .005 185.364
16 1 1 .036 .005 205.063
17 1 6 –.259 .001 925.643
17 2 7 .196 .001 928.897
18 1 1 .157 .013 74.094
19 1 1 .000 .056 17.985
20 1 1 .000 .074 13.600
21 1 6 –.013 .039 25.425
21 2 7 –.004 .039 25.426
22 1 1 –.202 .001 1,487.992
23 1 1 .000 .086 11.628
24 1 1 –.221 .001 713.110
25 1 1 –.099 .001 749.964
26 1 5 –.165 .000 6,505.024
27 1 1 –.523 .063 15.856
28 1 1 .000 .001 1,611.801
29 1 6 .377 .045 22.045
29 2 7 .575 .046 21.677
30 1 1 .590 .074 13.477
31 1 1 .020 .001 1,335.991
32 1 1 .121 .043 23.489
33 1 1 –.101 .003 363.163
34 1 1 –.101 .003 369.507
35 1 1 –.104 .004 255.507
36 1 1 –.270 .003 340.761
37 1 1 .179 .150 6.645
38 1 2 .468 .020 51.255
38 2 4 –.479 .020 51.193
39 1 5 –.081 .024 42.536
40 1 1 –.071 .043 23.519
41 1 1 .201 .077 13.036
42 1 6 –.070 .006 180.844
42 2 7 .190 .006 180.168
43 1 1 .277 .013 79.220
44 1 5 –.086 .001 903.924
45 1 5 –.338 .002 469.260
46 1 1 .262 .003 290.330
47 1 5 .000 .003 304.959
48 1 1 –.645 .055 18.192
49 1 5 –.120 .002 461.802
50 1 5 –.286 .009 106.189
51 1 1 –.124 .006 172.261
52 1 1 .023 .028 35.941
53 1 5 –.064 .001 944.600
54 1 1 .000 .043 23.010
55 1 1 .000 .014 72.723
56 1 5 .000 .012 85.832
57 1 1 .000 .012 85.832
"
使用基础 R 中的 gsub()
。
dat <- gsub(pattern = "–", replacement = "-", x = dat)
head(read.table(text = dat, header = T))
Study Outcome Subscale g Variance Precision
1 1 1 1 -0.251 0.024 41.455
2 2 1 1 -0.069 0.001 1,361.067
3 3 1 5 0.138 0.001 957.620
4 4 1 1 -0.754 0.085 11.809
5 5 1 1 -0.228 0.020 49.598
6 6 1 6 -0.212 0.004 246.180
使用 stringr 的示例。
library(stringr)
library(dplyr)
x <- str_replace_all(dat, "–", "-")
tibble(read.table(textConnection(x), header = TRUE))
轻松标准化所有破折号:
dat <- gsub("\p{Pd}", "-", dat, perl=TRUE)
参见https://www.fileformat.info/info/unicode/category/Pd/list.htm:
Character Name Browser Image
U+002D HYPHEN-MINUS - view
U+058A ARMENIAN HYPHEN ֊ view
U+05BE HEBREW PUNCTUATION MAQAF ־ view
U+1400 CANADIAN SYLLABICS HYPHEN ᐀ view
U+1806 MONGOLIAN TODO SOFT HYPHEN ᠆ view
U+2010 HYPHEN ‐ view
U+2011 NON-BREAKING HYPHEN ‑ view
U+2012 FIGURE DASH ‒ view
U+2013 EN DASH – view
U+2014 EM DASH — view
U+2015 HORIZONTAL BAR ― view
U+2E17 DOUBLE OBLIQUE HYPHEN ⸗ view
U+2E1A HYPHEN WITH DIAERESIS ⸚ view
U+2E3A TWO-EM DASH ⸺ view
U+2E3B THREE-EM DASH ⸻ view
U+2E40 DOUBLE HYPHEN ⹀ view
U+301C WAVE DASH 〜 view
U+3030 WAVY DASH 〰 view
U+30A0 KATAKANA-HIRAGANA DOUBLE HYPHEN ゠ view
U+FE31 PRESENTATION FORM FOR VERTICAL EM DASH ︱ view
U+FE32 PRESENTATION FORM FOR VERTICAL EN DASH ︲ view
U+FE58 SMALL EM DASH ﹘ view
U+FE63 SMALL HYPHEN-MINUS ﹣ view
U+FF0D FULLWIDTH HYPHEN-MINUS - view
U+10EAD YEZIDI HYPHENATION MARK view
在我下面的文本数据中,显然有一个类似于长破折号的特殊字符–
。但这实际上需要一个常规的减号-
。
有没有办法用 R 中的常规减号 -
替换所有长破折号 –
的实例,这样我就可以在 dat
中使用: read.table(text = dat, header = TRUE)
?
dat <- "
Study Outcome Subscale g Variance Precision
1 1 1 –.251 .024 41.455
2 1 1 –.069 .001 1,361.067
3 1 5 .138 .001 957.620
4 1 1 –.754 .085 11.809
5 1 1 –.228 .020 49.598
6 1 6 –.212 .004 246.180
6 2 7 .219 .004 246.095
7 1 1 .000 .012 83.367
8 1 2 –.103 .006 162.778
8 2 3 .138 .006 162.612
8 3 4 –.387 .006 160.133
9 1 1 –.032 .023 44.415
10 1 5 –.020 .058 17.110
11 1 1 .128 .017 59.999
12 1 1 –.262 .032 31.505
13 1 1 –.046 .071 14.080
14 1 6 –.324 .003 381.620
14 2 6 –.409 .003 378.611
14 3 7 .080 .003 386.319
14 4 7 –.140 .003 385.542
15 1 1 .311 .005 185.364
16 1 1 .036 .005 205.063
17 1 6 –.259 .001 925.643
17 2 7 .196 .001 928.897
18 1 1 .157 .013 74.094
19 1 1 .000 .056 17.985
20 1 1 .000 .074 13.600
21 1 6 –.013 .039 25.425
21 2 7 –.004 .039 25.426
22 1 1 –.202 .001 1,487.992
23 1 1 .000 .086 11.628
24 1 1 –.221 .001 713.110
25 1 1 –.099 .001 749.964
26 1 5 –.165 .000 6,505.024
27 1 1 –.523 .063 15.856
28 1 1 .000 .001 1,611.801
29 1 6 .377 .045 22.045
29 2 7 .575 .046 21.677
30 1 1 .590 .074 13.477
31 1 1 .020 .001 1,335.991
32 1 1 .121 .043 23.489
33 1 1 –.101 .003 363.163
34 1 1 –.101 .003 369.507
35 1 1 –.104 .004 255.507
36 1 1 –.270 .003 340.761
37 1 1 .179 .150 6.645
38 1 2 .468 .020 51.255
38 2 4 –.479 .020 51.193
39 1 5 –.081 .024 42.536
40 1 1 –.071 .043 23.519
41 1 1 .201 .077 13.036
42 1 6 –.070 .006 180.844
42 2 7 .190 .006 180.168
43 1 1 .277 .013 79.220
44 1 5 –.086 .001 903.924
45 1 5 –.338 .002 469.260
46 1 1 .262 .003 290.330
47 1 5 .000 .003 304.959
48 1 1 –.645 .055 18.192
49 1 5 –.120 .002 461.802
50 1 5 –.286 .009 106.189
51 1 1 –.124 .006 172.261
52 1 1 .023 .028 35.941
53 1 5 –.064 .001 944.600
54 1 1 .000 .043 23.010
55 1 1 .000 .014 72.723
56 1 5 .000 .012 85.832
57 1 1 .000 .012 85.832
"
使用基础 R 中的 gsub()
。
dat <- gsub(pattern = "–", replacement = "-", x = dat)
head(read.table(text = dat, header = T))
Study Outcome Subscale g Variance Precision
1 1 1 1 -0.251 0.024 41.455
2 2 1 1 -0.069 0.001 1,361.067
3 3 1 5 0.138 0.001 957.620
4 4 1 1 -0.754 0.085 11.809
5 5 1 1 -0.228 0.020 49.598
6 6 1 6 -0.212 0.004 246.180
使用 stringr 的示例。
library(stringr)
library(dplyr)
x <- str_replace_all(dat, "–", "-")
tibble(read.table(textConnection(x), header = TRUE))
轻松标准化所有破折号:
dat <- gsub("\p{Pd}", "-", dat, perl=TRUE)
参见https://www.fileformat.info/info/unicode/category/Pd/list.htm:
Character Name Browser Image
U+002D HYPHEN-MINUS - view
U+058A ARMENIAN HYPHEN ֊ view
U+05BE HEBREW PUNCTUATION MAQAF ־ view
U+1400 CANADIAN SYLLABICS HYPHEN ᐀ view
U+1806 MONGOLIAN TODO SOFT HYPHEN ᠆ view
U+2010 HYPHEN ‐ view
U+2011 NON-BREAKING HYPHEN ‑ view
U+2012 FIGURE DASH ‒ view
U+2013 EN DASH – view
U+2014 EM DASH — view
U+2015 HORIZONTAL BAR ― view
U+2E17 DOUBLE OBLIQUE HYPHEN ⸗ view
U+2E1A HYPHEN WITH DIAERESIS ⸚ view
U+2E3A TWO-EM DASH ⸺ view
U+2E3B THREE-EM DASH ⸻ view
U+2E40 DOUBLE HYPHEN ⹀ view
U+301C WAVE DASH 〜 view
U+3030 WAVY DASH 〰 view
U+30A0 KATAKANA-HIRAGANA DOUBLE HYPHEN ゠ view
U+FE31 PRESENTATION FORM FOR VERTICAL EM DASH ︱ view
U+FE32 PRESENTATION FORM FOR VERTICAL EN DASH ︲ view
U+FE58 SMALL EM DASH ﹘ view
U+FE63 SMALL HYPHEN-MINUS ﹣ view
U+FF0D FULLWIDTH HYPHEN-MINUS - view
U+10EAD YEZIDI HYPHENATION MARK view