从整个数据框中删除所有“$”
Removing all "$" from an entire data frame
我有一个包含多个列的 df,这些列的美元值前面有“$”,如下所示:
> str(data)
Classes ‘data.table’ and 'data.frame': 196879 obs. of 32 variables:
$ City : chr "" "" "" "" ...
$ Company_Goal : chr "" "" "" "" ...
$ Company_Name : chr "" "" "" "" ...
$ Event_Date : chr "5/14/2016" "9/26/2015" "9/12/2015" "6/3/2017" ...
$ Event_Year : chr "FY 2016" "FY 2016" "FY 2016" "FY 2017" ...
$ Fundraising_Goal : chr "0" "0" "0" "[=10=]" ...
$ Name : chr "Heart Walk 2015-2016 St. Louis MO" "Heart Walk 2015-2016 Canton, OH" "Heart Walk 2015-2016 Dallas, TX" "FDA HW 2016-2017 Albany, NY WO-65355" ...
$ Participant_Id : chr "2323216" "2273391" "2419569" "4088558" ...
$ State : chr "" "OH" "TX" "" ...
$ Street : chr "" "" "" "" ...
$ Team_Average : chr "6" "3" "6" "" ...
$ Team_Captain : chr "No" "No" "Yes" "No" ...
$ Team_Count : chr "7" "6" "4" "46" ...
$ Team_Id : chr "152788" "127127" "45273" "179207" ...
$ Team_Member_Goal : chr "[=10=]" "[=10=]" "[=10=]" "[=10=]" ...
$ Team_Name : chr "Team Clayton" "Cardiac Crusaders" "BIS - Team Myers" "Independent Walkers" ...
$ Team_Total_Gifts : chr ",230 " "8" ",225 " ",145 " ...
$ Zip : chr "" "" "" "" ...
$ Gifts_Count : chr "2" "1" "2" "1" ...
$ Registration_Gift: chr "No" "No" "No" "No" ...
$ Participant_Gifts: chr "6" "8" "5" "[=10=]" ...
$ Personal_Gift : chr "[=10=]" "[=10=]" "[=10=]" "0" ...
$ Total_Gifts : chr "6" "8" "5" "0" ...
$ MATCH_CODE : chr "UX000" "UX000" "UX000" "UX000" ...
$ TAP_LEVEL : chr "X" "X" "X" "X" ...
$ TAP_DESC : chr "" "" "" "" ...
$ TAP_LIFED : chr "" "" "" "" ...
$ MEDAGE_CY : chr "0" "0" "0" "0" ...
$ DIVINDX_CY : chr "0" "0" "0" "0" ...
$ MEDHINC_CY : chr "0" "0" "0" "0" ...
$ MEDDI_CY : chr "0" "0" "0" "0" ...
$ MEDNW_CY : chr "0" "0" "0" "0" ...
- attr(*, ".internal.selfref")=<externalptr>
我正在尝试删除所有“$”。我一直无法这样做 - 我尝试了 this post as well as 中提供的建议,但在这两种情况下 - 数据保持不变...
帮忙?
美元符号是正则表达式中的保留字符(有关详细信息,请参阅 here)。 gsub()
函数假定 pattern
默认是正则表达式。
您必须使用反斜杠 (\$
) 转义美元符号以匹配文字 $
.
#sample data
df = data.frame(Team_Average = c("6", "3", "6"),
Name = c("Heart Walk 2015-2016 St. Louis MO",
"Heart Walk 2015-2016 Canton, OH",
"Heart Walk 2015-2016 Dallas, TX"),
stringsAsFactors = FALSE)
df[] = lapply(df, gsub, pattern="\$", replacement="")
或者,您可以使用 gsub
的选项 fixed=TRUE
来匹配 pattern
字面意思。
df[] = lapply(df, gsub, pattern="$", replcement="", fixed=TRUE)
其他答案在提供的示例中效果很好。但是,如果数据集包含任何数字列,则 运行 gsub()
或 stringr::str_replace_all()
通过 lapply()
会将数字列强制转换为字符:
library(stringr)
library(dplyr)
d <- data_frame(
x = c("0", "1.40", "80.12"),
y = c("$test", "column", "$foo"),
z = 1:3
)
d[] <- lapply(d, gsub, pattern = "\$", replacement = "")
# A tibble: 3 x 3
x y z
<chr> <chr> <chr>
1 200 test 1
2 191.40 column 2
3 80.12 foo 3
注意上面z
的class。
这是从所有 character 列中删除 $
的 tidyverse 方法:
d %>%
mutate_if(
is.character,
funs(str_replace_all(., "\$", ""))
)
# A tibble: 3 x 3
x y z
<chr> <chr> <int>
1 200 test 1
2 191.40 column 2
3 80.12 foo 3
我有一个包含多个列的 df,这些列的美元值前面有“$”,如下所示:
> str(data)
Classes ‘data.table’ and 'data.frame': 196879 obs. of 32 variables:
$ City : chr "" "" "" "" ...
$ Company_Goal : chr "" "" "" "" ...
$ Company_Name : chr "" "" "" "" ...
$ Event_Date : chr "5/14/2016" "9/26/2015" "9/12/2015" "6/3/2017" ...
$ Event_Year : chr "FY 2016" "FY 2016" "FY 2016" "FY 2017" ...
$ Fundraising_Goal : chr "0" "0" "0" "[=10=]" ...
$ Name : chr "Heart Walk 2015-2016 St. Louis MO" "Heart Walk 2015-2016 Canton, OH" "Heart Walk 2015-2016 Dallas, TX" "FDA HW 2016-2017 Albany, NY WO-65355" ...
$ Participant_Id : chr "2323216" "2273391" "2419569" "4088558" ...
$ State : chr "" "OH" "TX" "" ...
$ Street : chr "" "" "" "" ...
$ Team_Average : chr "6" "3" "6" "" ...
$ Team_Captain : chr "No" "No" "Yes" "No" ...
$ Team_Count : chr "7" "6" "4" "46" ...
$ Team_Id : chr "152788" "127127" "45273" "179207" ...
$ Team_Member_Goal : chr "[=10=]" "[=10=]" "[=10=]" "[=10=]" ...
$ Team_Name : chr "Team Clayton" "Cardiac Crusaders" "BIS - Team Myers" "Independent Walkers" ...
$ Team_Total_Gifts : chr ",230 " "8" ",225 " ",145 " ...
$ Zip : chr "" "" "" "" ...
$ Gifts_Count : chr "2" "1" "2" "1" ...
$ Registration_Gift: chr "No" "No" "No" "No" ...
$ Participant_Gifts: chr "6" "8" "5" "[=10=]" ...
$ Personal_Gift : chr "[=10=]" "[=10=]" "[=10=]" "0" ...
$ Total_Gifts : chr "6" "8" "5" "0" ...
$ MATCH_CODE : chr "UX000" "UX000" "UX000" "UX000" ...
$ TAP_LEVEL : chr "X" "X" "X" "X" ...
$ TAP_DESC : chr "" "" "" "" ...
$ TAP_LIFED : chr "" "" "" "" ...
$ MEDAGE_CY : chr "0" "0" "0" "0" ...
$ DIVINDX_CY : chr "0" "0" "0" "0" ...
$ MEDHINC_CY : chr "0" "0" "0" "0" ...
$ MEDDI_CY : chr "0" "0" "0" "0" ...
$ MEDNW_CY : chr "0" "0" "0" "0" ...
- attr(*, ".internal.selfref")=<externalptr>
我正在尝试删除所有“$”。我一直无法这样做 - 我尝试了 this post as well as
帮忙?
美元符号是正则表达式中的保留字符(有关详细信息,请参阅 here)。 gsub()
函数假定 pattern
默认是正则表达式。
您必须使用反斜杠 (\$
) 转义美元符号以匹配文字 $
.
#sample data
df = data.frame(Team_Average = c("6", "3", "6"),
Name = c("Heart Walk 2015-2016 St. Louis MO",
"Heart Walk 2015-2016 Canton, OH",
"Heart Walk 2015-2016 Dallas, TX"),
stringsAsFactors = FALSE)
df[] = lapply(df, gsub, pattern="\$", replacement="")
或者,您可以使用 gsub
的选项 fixed=TRUE
来匹配 pattern
字面意思。
df[] = lapply(df, gsub, pattern="$", replcement="", fixed=TRUE)
其他答案在提供的示例中效果很好。但是,如果数据集包含任何数字列,则 运行 gsub()
或 stringr::str_replace_all()
通过 lapply()
会将数字列强制转换为字符:
library(stringr)
library(dplyr)
d <- data_frame(
x = c("0", "1.40", "80.12"),
y = c("$test", "column", "$foo"),
z = 1:3
)
d[] <- lapply(d, gsub, pattern = "\$", replacement = "")
# A tibble: 3 x 3
x y z
<chr> <chr> <chr>
1 200 test 1
2 191.40 column 2
3 80.12 foo 3
注意上面z
的class。
这是从所有 character 列中删除 $
的 tidyverse 方法:
d %>%
mutate_if(
is.character,
funs(str_replace_all(., "\$", ""))
)
# A tibble: 3 x 3
x y z
<chr> <chr> <int>
1 200 test 1
2 191.40 column 2
3 80.12 foo 3