试图用两条线制作一个ggplot
trying to make a ggplot with two lines
数据可在以下位置找到:https://www.kaggle.com/tovarischsukhov/southparklines
SP = read.csv("/Users/michael/Desktop/stat 479 proj data/All-seasons.csv")
SP$Season = as.numeric(SP$Season)
SP$Episode = as.numeric(SP$Episode)
Clean.Boys = SP %>% select(Season, Episode, Character) %>%
arrange(Season, Episode, Character) %>%
filter(Character == "Kenny" | Character == "Cartman") %>%
group_by(Season, Episode)
count = table(Clean.Boys)
count = as.data.frame(count)
Clean = count %>% pivot_wider(names_from = Character, values_from = Freq) %>% group_by(Episode)
Season Episode Cartman Kenny
<fct> <fct> <int> <int>
1 1 1 85 5
2 2 1 1 0
3 3 1 43 19
4 4 1 83 6
5 5 1 37 3
6 6 1 67 0
我正在尝试使用 ggplot 制作一个图,上面有 2 条线,一条用于 Cartman 变量,一条用于 Kenny 变量。我的两个问题是
我的数据格式是否正确,可以使用 geom_line() 绘制图表?还是我必须将其旋转更长的时间?
我想将 X 尺度绘制为连续变量,类似于日期,但它是季节和剧集。例如,第一个绘图点是第 1 季第 1 集,然后是第 1 季第 2 集,依此类推。我对如何将季节和剧集放在不同的列中做到这一点感到困惑,即使我将它们组合在一起,我也不确定正确的格式是什么。
诀窍是收集要映射的列作为变量。我不知道,你想如何绘制你的图表,意思是,关于 x-axis 和 y-axis,我做了一个伪图。对于连续变量部分,您可以使用 as.integer()
或 as.numeric()
将值转换为整数或数字,然后可以用作连续刻度。您可以通过调用 str(df)
检查您的变量结构,这将显示您变量的 class,如果它是因子或字符,将它们转换为数字。
#libraries
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.0.3
#your code
SP <- read.csv("C:/Users/saura/Desktop/All-seasons.csv")
SP$Season = as.numeric(SP$Season)
#> Warning: NAs introduced by coercion
SP$Episode = as.numeric(SP$Episode)
#> Warning: NAs introduced by coercion
Clean.Boys = SP %>% select(Season, Episode, Character) %>%
arrange(Season, Episode, Character) %>%
filter(Character == "Kenny" | Character == "Cartman") %>%
group_by(Season, Episode)
count = table(Clean.Boys)
count = as.data.frame(count)
Clean = count %>% pivot_wider(names_from = Character, values_from = Freq) %>% group_by(Episode)
#here is your code, but as I dont know, what you want on your axis
new_df <- Clean %>%
gather(-Season,-Episode, key = "Views", value = "numbers")
ggplot(data = new_df, aes(
as.numeric(Episode),
numbers,
color = Views,
group = Views
)) +
geom_path()
由 reprex package (v2.0.1)
于 2022-02-19 创建
在此示例中,我使用 readr::read_csv
读取文件并在调用中设置变量类型,以保存在单独的代码行中执行此操作。
频率计数可以在管道工作流中使用 dplyr::summarise
完成。
我不确定您想要将季节和剧集数据保留为连续变量的真正意思 - 您必须更明确地说明您希望它看起来如何。我采用的方法是提供一种使用最少文本显示季节和剧集的方法:
默认情况下,季节和剧集的顺序是数字顺序,但当它们组合成一个角色时,必须使用 factor
将它们强制转换为数字顺序。另一种方法是按季节分面。
ggplot喜欢长格式的数据,所以不需要将数据转换成宽格式。
为了保持图表的可读性,仅显示前 80 个观察值。
library(readr)
library(dplyr)
library(ggplot2
SP <- read_csv("...your file path.../All-seasons.csv"col_types = "nncc")
Clean.Boys <-
SP %>%
select(-Line) %>%
arrange(Season, Episode, Character) %>%
filter(Character == "Kenny" | Character == "Cartman") %>%
group_by(Season, Episode, Character)%>%
summarise(count = n(), .groups = "keep") %>%
mutate(x_lab = factor(paste(Season, Episode, sep = "\n"))) %>%
head(n = 80)
ggplot(Clean.Boys)+
geom_line(aes(x_lab, count, group = Character, colour = Character))+
labs(x = "Season and episode")
由 reprex package (v2.0.1)
于 2022-02-20 创建
数据可在以下位置找到:https://www.kaggle.com/tovarischsukhov/southparklines
SP = read.csv("/Users/michael/Desktop/stat 479 proj data/All-seasons.csv")
SP$Season = as.numeric(SP$Season)
SP$Episode = as.numeric(SP$Episode)
Clean.Boys = SP %>% select(Season, Episode, Character) %>%
arrange(Season, Episode, Character) %>%
filter(Character == "Kenny" | Character == "Cartman") %>%
group_by(Season, Episode)
count = table(Clean.Boys)
count = as.data.frame(count)
Clean = count %>% pivot_wider(names_from = Character, values_from = Freq) %>% group_by(Episode)
Season Episode Cartman Kenny
<fct> <fct> <int> <int>
1 1 1 85 5
2 2 1 1 0
3 3 1 43 19
4 4 1 83 6
5 5 1 37 3
6 6 1 67 0
我正在尝试使用 ggplot 制作一个图,上面有 2 条线,一条用于 Cartman 变量,一条用于 Kenny 变量。我的两个问题是
我的数据格式是否正确,可以使用 geom_line() 绘制图表?还是我必须将其旋转更长的时间?
我想将 X 尺度绘制为连续变量,类似于日期,但它是季节和剧集。例如,第一个绘图点是第 1 季第 1 集,然后是第 1 季第 2 集,依此类推。我对如何将季节和剧集放在不同的列中做到这一点感到困惑,即使我将它们组合在一起,我也不确定正确的格式是什么。
诀窍是收集要映射的列作为变量。我不知道,你想如何绘制你的图表,意思是,关于 x-axis 和 y-axis,我做了一个伪图。对于连续变量部分,您可以使用 as.integer()
或 as.numeric()
将值转换为整数或数字,然后可以用作连续刻度。您可以通过调用 str(df)
检查您的变量结构,这将显示您变量的 class,如果它是因子或字符,将它们转换为数字。
#libraries
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 4.0.3
#your code
SP <- read.csv("C:/Users/saura/Desktop/All-seasons.csv")
SP$Season = as.numeric(SP$Season)
#> Warning: NAs introduced by coercion
SP$Episode = as.numeric(SP$Episode)
#> Warning: NAs introduced by coercion
Clean.Boys = SP %>% select(Season, Episode, Character) %>%
arrange(Season, Episode, Character) %>%
filter(Character == "Kenny" | Character == "Cartman") %>%
group_by(Season, Episode)
count = table(Clean.Boys)
count = as.data.frame(count)
Clean = count %>% pivot_wider(names_from = Character, values_from = Freq) %>% group_by(Episode)
#here is your code, but as I dont know, what you want on your axis
new_df <- Clean %>%
gather(-Season,-Episode, key = "Views", value = "numbers")
ggplot(data = new_df, aes(
as.numeric(Episode),
numbers,
color = Views,
group = Views
)) +
geom_path()
由 reprex package (v2.0.1)
于 2022-02-19 创建在此示例中,我使用 readr::read_csv
读取文件并在调用中设置变量类型,以保存在单独的代码行中执行此操作。
频率计数可以在管道工作流中使用 dplyr::summarise
完成。
我不确定您想要将季节和剧集数据保留为连续变量的真正意思 - 您必须更明确地说明您希望它看起来如何。我采用的方法是提供一种使用最少文本显示季节和剧集的方法:
默认情况下,季节和剧集的顺序是数字顺序,但当它们组合成一个角色时,必须使用 factor
将它们强制转换为数字顺序。另一种方法是按季节分面。
ggplot喜欢长格式的数据,所以不需要将数据转换成宽格式。
为了保持图表的可读性,仅显示前 80 个观察值。
library(readr)
library(dplyr)
library(ggplot2
SP <- read_csv("...your file path.../All-seasons.csv"col_types = "nncc")
Clean.Boys <-
SP %>%
select(-Line) %>%
arrange(Season, Episode, Character) %>%
filter(Character == "Kenny" | Character == "Cartman") %>%
group_by(Season, Episode, Character)%>%
summarise(count = n(), .groups = "keep") %>%
mutate(x_lab = factor(paste(Season, Episode, sep = "\n"))) %>%
head(n = 80)
ggplot(Clean.Boys)+
geom_line(aes(x_lab, count, group = Character, colour = Character))+
labs(x = "Season and episode")
由 reprex package (v2.0.1)
于 2022-02-20 创建