R - 来自 Reshape2 的 dcast - 按值变量排序而不是变量名称
R - dcast from Reshape2 - order by value variable and not variable name
我有以下 data.frame 名为 countries_tools
。它由 3 列组成(一个日期时间列(过去 13 个月)、一个名称列(包含国家/地区)和一个访问列(从这些特定国家/地区访问该页面的人)):
datetime name Visits
2016-07-01 00:00:00 China 5237
2016-07-01 00:00:00 Germany 1434
2016-07-01 00:00:00 United States 1530
2016-07-01 00:00:00 India 696
2016-07-01 00:00:00 Japan 569
...
2017-07-01 00:00:00 China 4484
2017-07-01 00:00:00 Germany 1593
2017-07-01 00:00:00 United States 1438
2017-07-01 00:00:00 India 1204
2017-07-01 00:00:00 Japan 538
请注意,我删除了中间的其他 11 个月。另请注意,该名称始终是相同 5 个国家/地区的列表,对应于上次分析月份(在本例中为 2017 年 7 月)访问次数较多的五个国家/地区。
在这条消息的末尾有一个 dput
和我的数据。
为了更好地了解跨月访问的数据和发展,我从 data.frame:
中做了一个 dcast
countries_tools <- dcast(countries_tools, datetime ~ name, value.var="Visits")
但是,生成的数据框按国家名称(字母顺序)对列进行排序:
> names(countries_tools)
[1] "datetime" "China" "Germany" "India" "Japan" "United States"
但是,我希望顺序是由值变量(访问)完成的,因此最佳顺序应该是:
datetime, China, Germany, United States, India, Japan
可以做到吗(最好不需要额外的步骤)?也可以使用其他功能。
数据
dput(countries_tools)
structure(list(datetime = structure(c(1467320400, 1467320400,
1467320400, 1467320400, 1467320400, 1469998800, 1469998800, 1469998800,
1469998800, 1469998800, 1472677200, 1472677200, 1472677200, 1472677200,
1472677200, 1475269200, 1475269200, 1475269200, 1475269200, 1475269200,
1477951200, 1477951200, 1477951200, 1477951200, 1477951200, 1480543200,
1480543200, 1480543200, 1480543200, 1480543200, 1483221600, 1483221600,
1483221600, 1483221600, 1483221600, 1485900000, 1485900000, 1485900000,
1485900000, 1485900000, 1488319200, 1488319200, 1488319200, 1488319200,
1488319200, 1490994000, 1490994000, 1490994000, 1490994000, 1490994000,
1493586000, 1493586000, 1493586000, 1493586000, 1493586000, 1496264400,
1496264400, 1496264400, 1496264400, 1496264400, 1498856400, 1498856400,
1498856400, 1498856400, 1498856400), class = c("POSIXct", "POSIXt"
), tzone = "Europe/Moscow"), name = c("China", "Germany", "United States",
"India", "Japan", "China", "Germany", "United States", "India",
"Japan", "China", "Germany", "United States", "India", "Japan",
"China", "Germany", "United States", "India", "Japan", "China",
"Germany", "United States", "India", "Japan", "China", "Germany",
"United States", "India", "Japan", "China", "Germany", "United States",
"India", "Japan", "China", "Germany", "United States", "India",
"Japan", "China", "Germany", "United States", "India", "Japan",
"China", "Germany", "United States", "India", "Japan", "China",
"Germany", "United States", "India", "Japan", "China", "Germany",
"United States", "India", "Japan", "China", "Germany", "United States",
"India", "Japan"), Visits = c(5237, 1434, 1530, 696, 569, 4422,
1508, 1971, 672, 461, 3993, 1521, 1901, 2027, 517, 3656, 1764,
1716, 993, 509, 5483, 3117, 2762, 1298, 594, 5548, 2804, 2365,
1222, 551, 3747, 3083, 1917, 999, 496, 3903, 2136, 1751, 1229,
611, 5638, 2721, 2074, 1569, 533, 4326, 1618, 1511, 1254, 458,
4364, 2021, 1690, 1162, 462, 4462, 1572, 1517, 1068, 574, 4484,
1593, 1438, 1204, 538)), .Names = c("datetime", "name", "Visits"
), row.names = c(NA, -65L), class = "data.frame")
您可以将 "name" 转换为有序因子,说明您想要的级别顺序:
countries_tools$name <- ordered(countries_tools$name, levels = unique(countries_tools$name))
现在可以使用了:
dcast(countries_tools, datetime ~ name, value.var="Visits")
我有以下 data.frame 名为 countries_tools
。它由 3 列组成(一个日期时间列(过去 13 个月)、一个名称列(包含国家/地区)和一个访问列(从这些特定国家/地区访问该页面的人)):
datetime name Visits
2016-07-01 00:00:00 China 5237
2016-07-01 00:00:00 Germany 1434
2016-07-01 00:00:00 United States 1530
2016-07-01 00:00:00 India 696
2016-07-01 00:00:00 Japan 569
...
2017-07-01 00:00:00 China 4484
2017-07-01 00:00:00 Germany 1593
2017-07-01 00:00:00 United States 1438
2017-07-01 00:00:00 India 1204
2017-07-01 00:00:00 Japan 538
请注意,我删除了中间的其他 11 个月。另请注意,该名称始终是相同 5 个国家/地区的列表,对应于上次分析月份(在本例中为 2017 年 7 月)访问次数较多的五个国家/地区。
在这条消息的末尾有一个 dput
和我的数据。
为了更好地了解跨月访问的数据和发展,我从 data.frame:
中做了一个dcast
countries_tools <- dcast(countries_tools, datetime ~ name, value.var="Visits")
但是,生成的数据框按国家名称(字母顺序)对列进行排序:
> names(countries_tools)
[1] "datetime" "China" "Germany" "India" "Japan" "United States"
但是,我希望顺序是由值变量(访问)完成的,因此最佳顺序应该是:
datetime, China, Germany, United States, India, Japan
可以做到吗(最好不需要额外的步骤)?也可以使用其他功能。
数据
dput(countries_tools)
structure(list(datetime = structure(c(1467320400, 1467320400,
1467320400, 1467320400, 1467320400, 1469998800, 1469998800, 1469998800,
1469998800, 1469998800, 1472677200, 1472677200, 1472677200, 1472677200,
1472677200, 1475269200, 1475269200, 1475269200, 1475269200, 1475269200,
1477951200, 1477951200, 1477951200, 1477951200, 1477951200, 1480543200,
1480543200, 1480543200, 1480543200, 1480543200, 1483221600, 1483221600,
1483221600, 1483221600, 1483221600, 1485900000, 1485900000, 1485900000,
1485900000, 1485900000, 1488319200, 1488319200, 1488319200, 1488319200,
1488319200, 1490994000, 1490994000, 1490994000, 1490994000, 1490994000,
1493586000, 1493586000, 1493586000, 1493586000, 1493586000, 1496264400,
1496264400, 1496264400, 1496264400, 1496264400, 1498856400, 1498856400,
1498856400, 1498856400, 1498856400), class = c("POSIXct", "POSIXt"
), tzone = "Europe/Moscow"), name = c("China", "Germany", "United States",
"India", "Japan", "China", "Germany", "United States", "India",
"Japan", "China", "Germany", "United States", "India", "Japan",
"China", "Germany", "United States", "India", "Japan", "China",
"Germany", "United States", "India", "Japan", "China", "Germany",
"United States", "India", "Japan", "China", "Germany", "United States",
"India", "Japan", "China", "Germany", "United States", "India",
"Japan", "China", "Germany", "United States", "India", "Japan",
"China", "Germany", "United States", "India", "Japan", "China",
"Germany", "United States", "India", "Japan", "China", "Germany",
"United States", "India", "Japan", "China", "Germany", "United States",
"India", "Japan"), Visits = c(5237, 1434, 1530, 696, 569, 4422,
1508, 1971, 672, 461, 3993, 1521, 1901, 2027, 517, 3656, 1764,
1716, 993, 509, 5483, 3117, 2762, 1298, 594, 5548, 2804, 2365,
1222, 551, 3747, 3083, 1917, 999, 496, 3903, 2136, 1751, 1229,
611, 5638, 2721, 2074, 1569, 533, 4326, 1618, 1511, 1254, 458,
4364, 2021, 1690, 1162, 462, 4462, 1572, 1517, 1068, 574, 4484,
1593, 1438, 1204, 538)), .Names = c("datetime", "name", "Visits"
), row.names = c(NA, -65L), class = "data.frame")
您可以将 "name" 转换为有序因子,说明您想要的级别顺序:
countries_tools$name <- ordered(countries_tools$name, levels = unique(countries_tools$name))
现在可以使用了:
dcast(countries_tools, datetime ~ name, value.var="Visits")