我如何计算 R 中的治疗和未治疗

How do I count treated and untreated in R

我正在尝试再次学习 R 并尝试计算 bioconductor 气道数据集中使用 dex“处理”和“未处理”的基因总数。 (https://bioconductor.org/packages/release/data/experiment/html/airway.html).

我正在尝试:

airway$dex=='trted'
#[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

它不起作用。

使用sum()函数统计真值:

sum(airway$dex=='trted')

安装该软件包后,我在我的控制台上执行了以下操作(包括所有输出):

> library(airway)
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following object is masked from ‘package:dplyr’:

    count


Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins,
    colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads, colMaxs,
    colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges, colRanks, colSdDiffs,
    colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, colWeightedMeans,
    colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
    rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs,
    rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats,
    rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs,
    rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply,
    parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:bit64’:

    match, order, rank

The following objects are masked from ‘package:dplyr’:

    combine, intersect, setdiff, union

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval,
    evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order,
    paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:Matrix’:

    expand

The following objects are masked from ‘package:data.table’:

    first, second

The following objects are masked from ‘package:tidygraph’:

    active, rename

The following object is masked from ‘package:tidyr’:

    expand

The following objects are masked from ‘package:dplyr’:

    first, rename

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:data.table’:

    shift

The following object is masked from ‘package:nlme’:

    collapse

The following object is masked from ‘package:tidygraph’:

    slice

The following object is masked from ‘package:purrr’:

    reduce

The following objects are masked from ‘package:dplyr’:

    collapse, desc, slice

Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

    rowMedians

The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians

The following object is masked from ‘package:bit64’:

    cache


Attaching package: ‘SummarizedExperiment’

The following object is masked from ‘package:SeuratObject’:

    Assays

The following object is masked from ‘package:Seurat’:

    Assays

我查看了帮助页面

> help(pac=airway)

所以在阅读之后我认为 airway 数据集可能可以访问,但是没有:

> str(airway)
Error in str(airway) : object 'airway' not found

所以我尝试用data函数加载它(没有报错)所以我查看了它的结构:

> data(airway)
> str(airway)
Formal class 'RangedSummarizedExperiment' [package "SummarizedExperiment"] with 6 slots
  ..@ rowRanges      :Formal class 'GRangesList' [package "GenomicRanges"] with 3 slots
  .. .. ..@ elementMetadata:Formal class 'DataFrame' [package "IRanges"] with 6 slots
  .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. ..@ nrows          : int 64102
  .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. ..@ metadata       : list()
  .. .. ..@ elementType    : chr "GRanges"
  .. .. ..@ metadata       :List of 1
  .. .. .. ..$ genomeInfo:List of 20
  .. .. .. .. ..$ Db type                                 : chr "TranscriptDb"
  .. .. .. .. ..$ Supporting package                      : chr "GenomicFeatures"
  .. .. .. .. ..$ Data source                             : chr "BioMart"
  .. .. .. .. ..$ Organism                                : chr "Homo sapiens"
  .. .. .. .. ..$ Resource URL                            : chr "www.biomart.org:80"
  .. .. .. .. ..$ BioMart database                        : chr "ensembl"
  .. .. .. .. ..$ BioMart database version                : chr "ENSEMBL GENES 75 (SANGER UK)"
  .. .. .. .. ..$ BioMart dataset                         : chr "hsapiens_gene_ensembl"
  .. .. .. .. ..$ BioMart dataset description             : chr "Homo sapiens genes (GRCh37.p13)"
  .. .. .. .. ..$ BioMart dataset version                 : chr "GRCh37.p13"
  .. .. .. .. ..$ Full dataset                            : chr "yes"
  .. .. .. .. ..$ miRBase build ID                        : chr NA
  .. .. .. .. ..$ transcript_nrow                         : chr "215647"
  .. .. .. .. ..$ exon_nrow                               : chr "745593"
  .. .. .. .. ..$ cds_nrow                                : chr "537555"
  .. .. .. .. ..$ Db created by                           : chr "GenomicFeatures package from Bioconductor"
  .. .. .. .. ..$ Creation time                           : chr "2014-07-10 14:55:55 -0400 (Thu, 10 Jul 2014)"
  .. .. .. .. ..$ GenomicFeatures version at creation time: chr "1.17.9"
  .. .. .. .. ..$ RSQLite version at creation time        : chr "0.11.4"
  .. .. .. .. ..$ DBSCHEMAVERSION                         : chr "1.0"
  ..@ colData        :Formal class 'DataFrame' [package "IRanges"] with 6 slots
  .. .. ..@ rownames       : chr [1:8] "SRR1039508" "SRR1039509" "SRR1039512" "SRR1039513" ...
  .. .. ..@ nrows          : int 8
  .. .. ..@ listData       :List of 9
  .. .. .. ..$ SampleName: Factor w/ 8 levels "GSM1275862","GSM1275863",..: 1 2 3 4 5 6 7 8
  .. .. .. ..$ cell      : Factor w/ 4 levels "N052611","N061011",..: 4 4 1 1 3 3 2 2
  .. .. .. ..$ dex       : Factor w/ 2 levels "trt","untrt": 2 1 2 1 2 1 2 1
  .. .. .. ..$ albut     : Factor w/ 1 level "untrt": 1 1 1 1 1 1 1 1
  .. .. .. ..$ Run       : Factor w/ 8 levels "SRR1039508","SRR1039509",..: 1 2 3 4 5 6 7 8
  .. .. .. ..$ avgLength : int [1:8] 126 126 126 87 120 126 101 98
  .. .. .. ..$ Experiment: Factor w/ 8 levels "SRX384345","SRX384346",..: 1 2 3 4 5 6 7 8
  .. .. .. ..$ Sample    : Factor w/ 8 levels "SRS508567","SRS508568",..: 2 1 3 4 5 6 7 8
  .. .. .. ..$ BioSample : Factor w/ 8 levels "SAMN02422669",..: 1 4 6 2 7 3 8 5
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ assays         :Reference class 'ShallowSimpleListAssays' [package "GenomicRanges"] with 1 field
  .. ..$ data:Formal class 'SimpleList' [package "IRanges"] with 4 slots
  .. .. .. ..@ listData       :List of 1
  .. .. .. .. ..$ counts: int [1:64102, 1:8] 679 0 467 260 60 0 3251 1433 519 394 ...
  .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  .. ..and 12 methods.
  ..@ NAMES          : NULL
  ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 64102
  .. .. ..@ listData       : Named list()
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
  ..@ metadata       :List of 1
  .. ..$ :Formal class 'MIAME' [package "Biobase"] with 13 slots
  .. .. .. ..@ name             : chr "Himes BE"
  .. .. .. ..@ lab              : chr NA
  .. .. .. ..@ contact          : chr ""
  .. .. .. ..@ title            : chr "RNA-Seq transcriptome profiling identifies CRISPLD2 as a glucocorticoid responsive gene that modulates cytokine"| __truncated__
  .. .. .. ..@ abstract         : chr "Asthma is a chronic inflammatory respiratory disease that affects over 300 million people worldwide. Glucocorti"| __truncated__
  .. .. .. ..@ url              : chr "http://www.ncbi.nlm.nih.gov/pubmed/24926665"
  .. .. .. ..@ pubMedIds        : chr "24926665"
  .. .. .. ..@ samples          : list()
  .. .. .. ..@ hybridizations   : list()
  .. .. .. ..@ normControls     : list()
  .. .. .. ..@ preprocessing    : list()
  .. .. .. ..@ other            : list()
  .. .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. .. ..@ .Data:List of 2
  .. .. .. .. .. .. ..$ : int [1:3] 1 0 0
  .. .. .. .. .. .. ..$ : int [1:3] 1 1 0

扫描 S4 结构化数据列表,我看到了这一行:

      .. .. .. ..$ dex       : Factor w/ 2 levels "trt","untrt": 2 1 2 1 2 1 2 1

因此 dex 项确实具有“trt”和“untrt”作为值,但该“列”在整个 DesignedExperiment 结构中位于更深的位置。可能有一个特定的功能,我不知道它的名字,从这样的结构中提取值,但我们现在有足够的信息来回答(或破解)这个问题。按照该嵌套列表中的名称和运算符向后返回其原点,并使用 S4 提取运算符:在适当的地方使用“@”,在不适当的地方使用 $

sum( airway@ colData @ listData $ dex == "trt")
#[1] 4