根据级别在数据中出现的顺序对因子级别进行排序
Order factor levels according to the order in which the levels appear in the data
这是我使用 readHTMLtable
从互联网加载的数据框的一部分:
head(tt,59)
year sport event athlete_id medal
1 1896 Track & Field 100m Men BURKETOM01 GOLD
2 1896 Track & Field 100m Men HOFMAFRI01 SILVER
3 1896 Track & Field 100m Men LANEFRA01 BRONZE
4 1896 Track & Field 100m Men SZOKOALA01 BRONZE
5 1896 Track & Field 400m Men BURKETOM01 GOLD
6 1896 Track & Field 400m Men JAMISHER01 SILVER
7 1896 Track & Field 400m Men GMELICHA01 BRONZE
8 1896 Track & Field 800m Men FLACKTED01 GOLD
9 1896 Track & Field 800m Men D<C1>NIN<C1>N01 SILVER
10 1896 Track & Field 800m Men GOLEMDEM01 BRONZE
11 1896 Track & Field 1500m Men FLACKTED01 GOLD
12 1896 Track & Field 1500m Men BLAKEART01 SILVER
13 1896 Track & Field 1500m Men LERMUALB01 BRONZE
14 1896 Track & Field Marathon Men LOUISSPI01 GOLD
15 1896 Track & Field Marathon Men VASILCHA01 SILVER
16 1896 Track & Field Marathon Men KELLNGYU01 BRONZE
17 1896 Track & Field 110m Hurdles Men CURTITOM01 GOLD
18 1896 Track & Field 110m Hurdles Men GOULDGRA01 SILVER
19 1896 Track & Field High Jump Men CLARKELL01 GOLD
20 1896 Track & Field High Jump Men CONNOJAM01 SILVER
21 1896 Track & Field High Jump Men GARREBOB01 SILVER
22 1896 Track & Field Pole Vault Men HOYTBIL01 GOLD
23 1896 Track & Field Pole Vault Men TYLERALB01 SILVER
24 1896 Track & Field Pole Vault Men THEODIOA01 BRONZE
25 1896 Track & Field Pole Vault Men DAMASEVA01 BRONZE
26 1896 Track & Field Long Jump Men CLARKELL01 GOLD
27 1896 Track & Field Long Jump Men GARREBOB01 SILVER
28 1896 Track & Field Long Jump Men CONNOJAM01 BRONZE
29 1896 Track & Field Triple Jump Men CONNOJAM01 GOLD
30 1896 Track & Field Triple Jump Men TUFF<C8>ALE01 SILVER
31 1896 Track & Field Triple Jump Men PERSAIOA01 BRONZE
32 1896 Track & Field Shot Put Men GARREBOB01 GOLD
33 1896 Track & Field Shot Put Men GOUSKMIL01 SILVER
34 1896 Track & Field Shot Put Men PAPASGEO01 BRONZE
35 1896 Track & Field Discus Throw Men GARREBOB01 GOLD
36 1896 Track & Field Discus Throw Men PARASPAN01 SILVER
37 1896 Track & Field Discus Throw Men VERSISOT01 BRONZE
38 1896 Cycling 2000m Sprint (Scratch) Men MASSOPAU01 GOLD
39 1896 Cycling 2000m Sprint (Scratch) Men NIKOLSTA01 SILVER
40 1896 Cycling 2000m Sprint (Scratch) Men FLAMEL<C9>O01 BRONZE
41 1896 Cycling Individual Road Race Men KONSTARI01 GOLD
42 1896 Cycling Individual Road Race Men GOEDRAUG01 SILVER
43 1896 Cycling Individual Road Race Men BATTEEDW01 BRONZE
44 1896 Cycling One-Lap Race MASSOPAU01 GOLD
45 1896 Cycling One-Lap Race NIKOLSTA01 SILVER
46 1896 Cycling One-Lap Race SCHMAADO01 BRONZE
47 1896 Cycling 10km Track Race MASSOPAU01 GOLD
48 1896 Cycling 10km Track Race FLAMEL<C9>O01 SILVER
49 1896 Cycling 10km Track Race SCHMAADO01 BRONZE
50 1896 Cycling 100km Track Race FLAMEL<C9>O01 GOLD
51 1896 Cycling 100km Track Race KOLETGEO01 SILVER
52 1896 Cycling 12-Hour Race SCHMAADO01 GOLD
53 1896 Cycling 12-Hour Race KEEPIFRA01 SILVER
54 1896 Fencing Foil, Individual GRAVEEUG01 GOLD
55 1896 Fencing Foil, Individual CALLOHEN01 SILVER
56 1896 Fencing Foil, Individual PIERRPER01 BRONZE
57 1896 Fencing Sabre, Individual GEORGIOA01 GOLD
58 1896 Fencing Sabre, Individual KARAKTEL01 SILVER
59 1896 Fencing Sabre, Individual NIELSHOL01 BRONZE
如您所见,变量 sport
是一个因素。当我检查级别时,这是我得到的:
levels(tt$sport)
[1] "Cycling" "Fencing" "Gymnastics" "Shooting" "Swimming" "Tennis"
[7] "Track & Field" "Weightlifting" "Wrestling
由于某种原因,级别出现的顺序与数据框中的顺序不匹配。我正在寻找一种方法,在其中使用 levels 函数会给我一个根据第一次出现组织的级别列表,类似于:
levels(medals.df$tt)
[1] "Track & Field" "Cycling" "Fencing" "Gymnastics" "Shooting" "Swimming"
[7] "Tennis" "Weightlifting" "Wrestling"
现在要记住的另一件事是,列 sport 不在 "block design" 中,这意味着前 59 行具有所有相同的相邻值,但在整个数据框中并非如此。
请注意,我必须调整您的数据集,以便您列出的所有级别都出现,并按照您指定的顺序进行。从那里,我编写了一个简单的函数,按照它们在数据集中出现的顺序输出水平。关键是使用 which
(列出符合条件的观测值的行号)、min
(选择最低值)和 order
(告诉您执行用于从最低到最高)。
d <- read.table(text="rn year sport event athlete_id medal
1 1896 'Track & Field' '100m Men' 'BURKETOM01' 'GOLD'
53 1896 'Cycling' '12-Hour Race' 'KEEPIFRA01' 'SILVER'
54 1896 'Fencing' 'Foil, Individual' 'GRAVEEUG01' 'GOLD'
55 1896 'Gymnastics' 'Foil, Individual' 'CALLOHEN01' 'SILVER'
56 1896 'Shooting' 'Foil, Individual' 'PIERRPER01' 'BRONZE'
57 1896 'Swimming' 'Sabre, Individual' 'GEORGIOA01' 'GOLD'
58 1896 'Tennis' 'Sabre, Individual' 'KARAKTEL01' 'SILVER'
58 1896 'Weightlifting' 'Sabre, Individual' 'KARAKTEL01' 'SILVER'
59 1896 'Wrestling' 'Sabre, Individual' 'NIELSHOL01' 'BRONZE'",
header=T)
levels(d$sport)
# [1] "Cycling" "Fencing" "Gymnastics" "Shooting"
# [5] "Swimming" "Tennis" "Track & Field" "Weightlifting"
# [9] "Wrestling"
level.order <- function(var){
l <- levels(var)
o <- c()
for(i in 1:length(l)){
o[i] <- min(which(var==l[i]))
}
return(l[order(o)])
}
level.order(d$sport)
# [1] "Track & Field" "Cycling" "Fencing" "Gymnastics"
# [5] "Shooting" "Swimming" "Tennis" "Weightlifting"
# [9] "Wrestling"
从这里开始,如果您想将默认排序(字母顺序)更改为水平在数据集中显示的顺序,您可以使用 factor
。考虑:
levels(d$sport)
# [1] "Cycling" "Fencing" "Gymnastics" "Shooting"
# [5] "Swimming" "Tennis" "Track & Field" "Weightlifting"
# [9] "Wrestling"
d$sport <- factor(d$sport, levels=level.order(d$sport))
levels(d$sport)
# [1] "Track & Field" "Cycling" "Fencing" "Gymnastics"
# [5] "Shooting" "Swimming" "Tennis" "Weightlifting"
# [9] "Wrestling"
我在他的回答中使用了@gung设置的数据框:
d <- read.table(text="rn year sport event athlete_id medal
1 1896 'Track & Field' '100m Men' 'BURKETOM01' 'GOLD'
53 1896 'Cycling' '12-Hour Race' 'KEEPIFRA01' 'SILVER'
54 1896 'Fencing' 'Foil, Individual' 'GRAVEEUG01' 'GOLD'
55 1896 'Gymnastics' 'Foil, Individual' 'CALLOHEN01' 'SILVER'
56 1896 'Shooting' 'Foil, Individual' 'PIERRPER01' 'BRONZE'
57 1896 'Swimming' 'Sabre, Individual' 'GEORGIOA01' 'GOLD'
58 1896 'Tennis' 'Sabre, Individual' 'KARAKTEL01' 'SILVER'
58 1896 'Weightlifting' 'Sabre, Individual' 'KARAKTEL01' 'SILVER'
59 1896 'Wrestling' 'Sabre, Individual' 'NIELSHOL01' 'BRONZE'",
header=T)
levels(d$sport)
然后你可以像这样在因子函数中使用unique(d$sport)
:
d$sport <- factor(d$sport, levels=unique(d$sport))
# Check the results:
levels(d$sport)
这是我使用 readHTMLtable
从互联网加载的数据框的一部分:
head(tt,59)
year sport event athlete_id medal
1 1896 Track & Field 100m Men BURKETOM01 GOLD
2 1896 Track & Field 100m Men HOFMAFRI01 SILVER
3 1896 Track & Field 100m Men LANEFRA01 BRONZE
4 1896 Track & Field 100m Men SZOKOALA01 BRONZE
5 1896 Track & Field 400m Men BURKETOM01 GOLD
6 1896 Track & Field 400m Men JAMISHER01 SILVER
7 1896 Track & Field 400m Men GMELICHA01 BRONZE
8 1896 Track & Field 800m Men FLACKTED01 GOLD
9 1896 Track & Field 800m Men D<C1>NIN<C1>N01 SILVER
10 1896 Track & Field 800m Men GOLEMDEM01 BRONZE
11 1896 Track & Field 1500m Men FLACKTED01 GOLD
12 1896 Track & Field 1500m Men BLAKEART01 SILVER
13 1896 Track & Field 1500m Men LERMUALB01 BRONZE
14 1896 Track & Field Marathon Men LOUISSPI01 GOLD
15 1896 Track & Field Marathon Men VASILCHA01 SILVER
16 1896 Track & Field Marathon Men KELLNGYU01 BRONZE
17 1896 Track & Field 110m Hurdles Men CURTITOM01 GOLD
18 1896 Track & Field 110m Hurdles Men GOULDGRA01 SILVER
19 1896 Track & Field High Jump Men CLARKELL01 GOLD
20 1896 Track & Field High Jump Men CONNOJAM01 SILVER
21 1896 Track & Field High Jump Men GARREBOB01 SILVER
22 1896 Track & Field Pole Vault Men HOYTBIL01 GOLD
23 1896 Track & Field Pole Vault Men TYLERALB01 SILVER
24 1896 Track & Field Pole Vault Men THEODIOA01 BRONZE
25 1896 Track & Field Pole Vault Men DAMASEVA01 BRONZE
26 1896 Track & Field Long Jump Men CLARKELL01 GOLD
27 1896 Track & Field Long Jump Men GARREBOB01 SILVER
28 1896 Track & Field Long Jump Men CONNOJAM01 BRONZE
29 1896 Track & Field Triple Jump Men CONNOJAM01 GOLD
30 1896 Track & Field Triple Jump Men TUFF<C8>ALE01 SILVER
31 1896 Track & Field Triple Jump Men PERSAIOA01 BRONZE
32 1896 Track & Field Shot Put Men GARREBOB01 GOLD
33 1896 Track & Field Shot Put Men GOUSKMIL01 SILVER
34 1896 Track & Field Shot Put Men PAPASGEO01 BRONZE
35 1896 Track & Field Discus Throw Men GARREBOB01 GOLD
36 1896 Track & Field Discus Throw Men PARASPAN01 SILVER
37 1896 Track & Field Discus Throw Men VERSISOT01 BRONZE
38 1896 Cycling 2000m Sprint (Scratch) Men MASSOPAU01 GOLD
39 1896 Cycling 2000m Sprint (Scratch) Men NIKOLSTA01 SILVER
40 1896 Cycling 2000m Sprint (Scratch) Men FLAMEL<C9>O01 BRONZE
41 1896 Cycling Individual Road Race Men KONSTARI01 GOLD
42 1896 Cycling Individual Road Race Men GOEDRAUG01 SILVER
43 1896 Cycling Individual Road Race Men BATTEEDW01 BRONZE
44 1896 Cycling One-Lap Race MASSOPAU01 GOLD
45 1896 Cycling One-Lap Race NIKOLSTA01 SILVER
46 1896 Cycling One-Lap Race SCHMAADO01 BRONZE
47 1896 Cycling 10km Track Race MASSOPAU01 GOLD
48 1896 Cycling 10km Track Race FLAMEL<C9>O01 SILVER
49 1896 Cycling 10km Track Race SCHMAADO01 BRONZE
50 1896 Cycling 100km Track Race FLAMEL<C9>O01 GOLD
51 1896 Cycling 100km Track Race KOLETGEO01 SILVER
52 1896 Cycling 12-Hour Race SCHMAADO01 GOLD
53 1896 Cycling 12-Hour Race KEEPIFRA01 SILVER
54 1896 Fencing Foil, Individual GRAVEEUG01 GOLD
55 1896 Fencing Foil, Individual CALLOHEN01 SILVER
56 1896 Fencing Foil, Individual PIERRPER01 BRONZE
57 1896 Fencing Sabre, Individual GEORGIOA01 GOLD
58 1896 Fencing Sabre, Individual KARAKTEL01 SILVER
59 1896 Fencing Sabre, Individual NIELSHOL01 BRONZE
如您所见,变量 sport
是一个因素。当我检查级别时,这是我得到的:
levels(tt$sport)
[1] "Cycling" "Fencing" "Gymnastics" "Shooting" "Swimming" "Tennis"
[7] "Track & Field" "Weightlifting" "Wrestling
由于某种原因,级别出现的顺序与数据框中的顺序不匹配。我正在寻找一种方法,在其中使用 levels 函数会给我一个根据第一次出现组织的级别列表,类似于:
levels(medals.df$tt)
[1] "Track & Field" "Cycling" "Fencing" "Gymnastics" "Shooting" "Swimming"
[7] "Tennis" "Weightlifting" "Wrestling"
现在要记住的另一件事是,列 sport 不在 "block design" 中,这意味着前 59 行具有所有相同的相邻值,但在整个数据框中并非如此。
请注意,我必须调整您的数据集,以便您列出的所有级别都出现,并按照您指定的顺序进行。从那里,我编写了一个简单的函数,按照它们在数据集中出现的顺序输出水平。关键是使用 which
(列出符合条件的观测值的行号)、min
(选择最低值)和 order
(告诉您执行用于从最低到最高)。
d <- read.table(text="rn year sport event athlete_id medal
1 1896 'Track & Field' '100m Men' 'BURKETOM01' 'GOLD'
53 1896 'Cycling' '12-Hour Race' 'KEEPIFRA01' 'SILVER'
54 1896 'Fencing' 'Foil, Individual' 'GRAVEEUG01' 'GOLD'
55 1896 'Gymnastics' 'Foil, Individual' 'CALLOHEN01' 'SILVER'
56 1896 'Shooting' 'Foil, Individual' 'PIERRPER01' 'BRONZE'
57 1896 'Swimming' 'Sabre, Individual' 'GEORGIOA01' 'GOLD'
58 1896 'Tennis' 'Sabre, Individual' 'KARAKTEL01' 'SILVER'
58 1896 'Weightlifting' 'Sabre, Individual' 'KARAKTEL01' 'SILVER'
59 1896 'Wrestling' 'Sabre, Individual' 'NIELSHOL01' 'BRONZE'",
header=T)
levels(d$sport)
# [1] "Cycling" "Fencing" "Gymnastics" "Shooting"
# [5] "Swimming" "Tennis" "Track & Field" "Weightlifting"
# [9] "Wrestling"
level.order <- function(var){
l <- levels(var)
o <- c()
for(i in 1:length(l)){
o[i] <- min(which(var==l[i]))
}
return(l[order(o)])
}
level.order(d$sport)
# [1] "Track & Field" "Cycling" "Fencing" "Gymnastics"
# [5] "Shooting" "Swimming" "Tennis" "Weightlifting"
# [9] "Wrestling"
从这里开始,如果您想将默认排序(字母顺序)更改为水平在数据集中显示的顺序,您可以使用 factor
。考虑:
levels(d$sport)
# [1] "Cycling" "Fencing" "Gymnastics" "Shooting"
# [5] "Swimming" "Tennis" "Track & Field" "Weightlifting"
# [9] "Wrestling"
d$sport <- factor(d$sport, levels=level.order(d$sport))
levels(d$sport)
# [1] "Track & Field" "Cycling" "Fencing" "Gymnastics"
# [5] "Shooting" "Swimming" "Tennis" "Weightlifting"
# [9] "Wrestling"
我在他的回答中使用了@gung设置的数据框:
d <- read.table(text="rn year sport event athlete_id medal
1 1896 'Track & Field' '100m Men' 'BURKETOM01' 'GOLD'
53 1896 'Cycling' '12-Hour Race' 'KEEPIFRA01' 'SILVER'
54 1896 'Fencing' 'Foil, Individual' 'GRAVEEUG01' 'GOLD'
55 1896 'Gymnastics' 'Foil, Individual' 'CALLOHEN01' 'SILVER'
56 1896 'Shooting' 'Foil, Individual' 'PIERRPER01' 'BRONZE'
57 1896 'Swimming' 'Sabre, Individual' 'GEORGIOA01' 'GOLD'
58 1896 'Tennis' 'Sabre, Individual' 'KARAKTEL01' 'SILVER'
58 1896 'Weightlifting' 'Sabre, Individual' 'KARAKTEL01' 'SILVER'
59 1896 'Wrestling' 'Sabre, Individual' 'NIELSHOL01' 'BRONZE'",
header=T)
levels(d$sport)
然后你可以像这样在因子函数中使用unique(d$sport)
:
d$sport <- factor(d$sport, levels=unique(d$sport))
# Check the results:
levels(d$sport)