使用R编程计算数据框中的均值
Calculating mean in a dataframe using R programming
我是 R 的新手,需要一些帮助。我有一个巨大的数据框,其中包含不同的患者样本。每个病人有 24 'chrom's。每个 'chrom' 有 3 个段。以下是患者 'A2461' 的示例。以下是我拥有的一些数据的示例:
ID chrom loc.start loc.end num.mark seg.mean seg.sd seg.median seg.mad
1 A2461 1 61735 23342732 13103 0.0314 0.4757 0.0221 0.4811
2 A2461 1 23345569 54962669 17435 -0.0103 0.4807 -0.0292 0.4821
3 A2461 1 54963958 55075062 57 0.4841 0.4070 0.5201 0.3519
1 A2461 2 12784 17248573 13037 -0.0037 0.4643 -0.0053 0.4583
2 A2461 2 17248890 85480817 45819 -0.0331 0.4667 -0.0352 0.4635
3 A2461 2 85481399 89121495 1626 0.0153 0.4727 0.0000 0.4617
我目前使用以下代码得到总均值:
seg_mean <- df$seg.mean
mean(seg_mean)
但是,我想计算每个染色体的 'seg.mean' 的平均值,并输出澄清患者 ID 和色度的信息。所以也许像...
ID chrom seg.mean
A2461 1 0.1684
A2461 2 -0.0072
如有任何帮助,我们将不胜感激!感谢阅读。
require(dplyr)
seg_mean <- df %>% group_by(ID, chrom) %>% summarise(seg.mean = mean(seg.mean))
您可以使用 base-R 函数:
aggregate(.~ ID + chrom, data=df, mean)
这会给你:
# ID chrom loc.start loc.end num.mark seg.mean seg.sd seg.median seg.mad
# 1 A2461 1 26123754 44460154 10198.33 0.168400000 0.4544667 0.1710 0.4383667
# 2 A2461 2 34247691 63950295 20160.67 -0.007166667 0.4679000 -0.0135 0.4611667
或者您可以选择仅获取 seg.mean
:
的平均值
aggregate(.~ ID + chrom, data=df, mean)[,c("ID", "chrom","seg.mean")]
# ID chrom seg.mean
# 1 A2461 1 0.168400000
# 2 A2461 2 -0.007166667
数据
df <- structure(list(ID = c("A2461", "A2461", "A2461", "A2461", "A2461",
"A2461"), chrom = c(1L, 1L, 1L, 2L, 2L, 2L), loc.start = c(61735L,
23345569L, 54963958L, 12784L, 17248890L, 85481399L), loc.end = c(23342732L,
54962669L, 55075062L, 17248573L, 85480817L, 89121495L), num.mark = c(13103L,
17435L, 57L, 13037L, 45819L, 1626L), seg.mean = c(0.0314, -0.0103,
0.4841, -0.0037, -0.0331, 0.0153), seg.sd = c(0.4757, 0.4807,
0.407, 0.4643, 0.4667, 0.4727), seg.median = c(0.0221, -0.0292,
0.5201, -0.0053, -0.0352, 0), seg.mad = c(0.4811, 0.4821, 0.3519,
0.4583, 0.4635, 0.4617)), .Names = c("ID", "chrom", "loc.start",
"loc.end", "num.mark", "seg.mean", "seg.sd", "seg.median", "seg.mad"
), row.names = c(NA, -6L), class = "data.frame")
只是对 解决方案稍作修改。
aggregate(seg.mean~ID+chrom , df , mean)
我是 R 的新手,需要一些帮助。我有一个巨大的数据框,其中包含不同的患者样本。每个病人有 24 'chrom's。每个 'chrom' 有 3 个段。以下是患者 'A2461' 的示例。以下是我拥有的一些数据的示例:
ID chrom loc.start loc.end num.mark seg.mean seg.sd seg.median seg.mad
1 A2461 1 61735 23342732 13103 0.0314 0.4757 0.0221 0.4811
2 A2461 1 23345569 54962669 17435 -0.0103 0.4807 -0.0292 0.4821
3 A2461 1 54963958 55075062 57 0.4841 0.4070 0.5201 0.3519
1 A2461 2 12784 17248573 13037 -0.0037 0.4643 -0.0053 0.4583
2 A2461 2 17248890 85480817 45819 -0.0331 0.4667 -0.0352 0.4635
3 A2461 2 85481399 89121495 1626 0.0153 0.4727 0.0000 0.4617
我目前使用以下代码得到总均值:
seg_mean <- df$seg.mean
mean(seg_mean)
但是,我想计算每个染色体的 'seg.mean' 的平均值,并输出澄清患者 ID 和色度的信息。所以也许像...
ID chrom seg.mean
A2461 1 0.1684
A2461 2 -0.0072
如有任何帮助,我们将不胜感激!感谢阅读。
require(dplyr)
seg_mean <- df %>% group_by(ID, chrom) %>% summarise(seg.mean = mean(seg.mean))
您可以使用 base-R 函数:
aggregate(.~ ID + chrom, data=df, mean)
这会给你:
# ID chrom loc.start loc.end num.mark seg.mean seg.sd seg.median seg.mad
# 1 A2461 1 26123754 44460154 10198.33 0.168400000 0.4544667 0.1710 0.4383667
# 2 A2461 2 34247691 63950295 20160.67 -0.007166667 0.4679000 -0.0135 0.4611667
或者您可以选择仅获取 seg.mean
:
aggregate(.~ ID + chrom, data=df, mean)[,c("ID", "chrom","seg.mean")]
# ID chrom seg.mean
# 1 A2461 1 0.168400000
# 2 A2461 2 -0.007166667
数据
df <- structure(list(ID = c("A2461", "A2461", "A2461", "A2461", "A2461",
"A2461"), chrom = c(1L, 1L, 1L, 2L, 2L, 2L), loc.start = c(61735L,
23345569L, 54963958L, 12784L, 17248890L, 85481399L), loc.end = c(23342732L,
54962669L, 55075062L, 17248573L, 85480817L, 89121495L), num.mark = c(13103L,
17435L, 57L, 13037L, 45819L, 1626L), seg.mean = c(0.0314, -0.0103,
0.4841, -0.0037, -0.0331, 0.0153), seg.sd = c(0.4757, 0.4807,
0.407, 0.4643, 0.4667, 0.4727), seg.median = c(0.0221, -0.0292,
0.5201, -0.0053, -0.0352, 0), seg.mad = c(0.4811, 0.4821, 0.3519,
0.4583, 0.4635, 0.4617)), .Names = c("ID", "chrom", "loc.start",
"loc.end", "num.mark", "seg.mean", "seg.sd", "seg.median", "seg.mad"
), row.names = c(NA, -6L), class = "data.frame")
只是对
aggregate(seg.mean~ID+chrom , df , mean)