在R中的数据帧的列中查找每个字符串的长度
Finding the length of each string within a column of a data-frame in R
我想计算name
列中每个字符串的字符数。我的数据框 sample
如下所示:
date name expenditure type
23MAR2013 KOSH ENTRP 4000 COMPANY
23MAR2013 JOHN DOE 800 INDIVIDUAL
24MAR2013 S KHAN 300 INDIVIDUAL
24MAR2013 JASINT PVT LTD 8000 COMPANY
25MAR2013 KOSH ENTRPRISE 2000 COMPANY
25MAR2013 JOHN S DOE 220 INDIVIDUAL
25MAR2013 S KHAN 300 INDIVIDUAL
26MAR2013 S KHAN 300 INDIVIDUAL
为什么 nchar
给我一个随机数列表? str_length()
来自 stringr
包
Length <- aggregate(nchar(sample$name), by=list(sample$name), FUN=nchar)
输出
Group.1 x
1 JASINT PVT LTD 2
2 JOHN DOE 1
3 JOHN S DOE 2
4 KOSH ENTRP 2
5 KOSH ENTRPRISE 2
6 S KHAN 1, 1, 1
期望的输出:
Group.1 x
1 JASINT PVT LTD 14
2 JOHN DOE 8
3 JOHN S DOE 10
4 KOSH ENTRP 10
5 KOSH ENTRPRISE 14
6 S KHAN 6
上述 table 的 csv :
"Date","name","expenditure","type"
"23MAR2013","KOSH ENTRP",4000,"COMPANY"
"23MAR2013 ","JOHN DOE",800,"INDIVIDUAL"
"24MAR2013","S KHAN",300,"INDIVIDUAL"
"24MAR2013","JASINT PVT LTD",8000,"COMPANY"
"25MAR2013","KOSH ENTRPRISE",2000,"COMPANY"
"25MAR2013","JOHN S DOE",220,"INDIVIDUAL"
"25MAR2013","S KHAN",300,"INDIVIDUAL"
"26MAR2013","S KHAN",300,"INDIVIDUAL"
如果 "Desired Output" 中的最后一行是错字,
aggregate(name~name1, transform(sample, name1=name),
FUN=function(x) nchar(unique(x)))
# name1 name
#1 JASINT PVT LTD 14
#2 JOHN DOE 8
#3 JOHN S DOE 10
#4 KOSH ENTRP 10
#5 KOSH ENTRPRISE 14
#6 S KHAN 6
或者
Un1 <- unique(sample$name)
data.frame(Group=Un1, x=nchar(Un1))
您还可以 apply
nchar
到您的数据框并从相应的列中获取结果:
data.frame(names=temp$name,chr=apply(temp,2,nchar)[,2])
names chr
1 KOSH ENTRP 10
2 JOHN DOE 8
3 S KHAN 6
4 JASINT PVT LTD 14
5 KOSH ENTRPRISE 14
6 JOHN S DOE 10
7 S KHAN 6
8 S KHAN 6
或者,使用data.table
dtx[,PepSeqLen := nchar(PepSeq)]
我想计算name
列中每个字符串的字符数。我的数据框 sample
如下所示:
date name expenditure type
23MAR2013 KOSH ENTRP 4000 COMPANY
23MAR2013 JOHN DOE 800 INDIVIDUAL
24MAR2013 S KHAN 300 INDIVIDUAL
24MAR2013 JASINT PVT LTD 8000 COMPANY
25MAR2013 KOSH ENTRPRISE 2000 COMPANY
25MAR2013 JOHN S DOE 220 INDIVIDUAL
25MAR2013 S KHAN 300 INDIVIDUAL
26MAR2013 S KHAN 300 INDIVIDUAL
为什么 nchar
给我一个随机数列表? str_length()
来自 stringr
包
Length <- aggregate(nchar(sample$name), by=list(sample$name), FUN=nchar)
输出
Group.1 x
1 JASINT PVT LTD 2
2 JOHN DOE 1
3 JOHN S DOE 2
4 KOSH ENTRP 2
5 KOSH ENTRPRISE 2
6 S KHAN 1, 1, 1
期望的输出:
Group.1 x
1 JASINT PVT LTD 14
2 JOHN DOE 8
3 JOHN S DOE 10
4 KOSH ENTRP 10
5 KOSH ENTRPRISE 14
6 S KHAN 6
上述 table 的 csv :
"Date","name","expenditure","type"
"23MAR2013","KOSH ENTRP",4000,"COMPANY"
"23MAR2013 ","JOHN DOE",800,"INDIVIDUAL"
"24MAR2013","S KHAN",300,"INDIVIDUAL"
"24MAR2013","JASINT PVT LTD",8000,"COMPANY"
"25MAR2013","KOSH ENTRPRISE",2000,"COMPANY"
"25MAR2013","JOHN S DOE",220,"INDIVIDUAL"
"25MAR2013","S KHAN",300,"INDIVIDUAL"
"26MAR2013","S KHAN",300,"INDIVIDUAL"
如果 "Desired Output" 中的最后一行是错字,
aggregate(name~name1, transform(sample, name1=name),
FUN=function(x) nchar(unique(x)))
# name1 name
#1 JASINT PVT LTD 14
#2 JOHN DOE 8
#3 JOHN S DOE 10
#4 KOSH ENTRP 10
#5 KOSH ENTRPRISE 14
#6 S KHAN 6
或者
Un1 <- unique(sample$name)
data.frame(Group=Un1, x=nchar(Un1))
您还可以 apply
nchar
到您的数据框并从相应的列中获取结果:
data.frame(names=temp$name,chr=apply(temp,2,nchar)[,2])
names chr
1 KOSH ENTRP 10
2 JOHN DOE 8
3 S KHAN 6
4 JASINT PVT LTD 14
5 KOSH ENTRPRISE 14
6 JOHN S DOE 10
7 S KHAN 6
8 S KHAN 6
或者,使用data.table
dtx[,PepSeqLen := nchar(PepSeq)]