Python Pandas 数据框中的第 sorting/counting 行

Row sorting/counting in a Python Pandas dataframe

我正在尝试开发一个质量检查脚本,该脚本将检查一组数据(在 pandas 数据框中)并计算不同类型样本的总数。这是数据库中的示例:

有问题的样本都是

XXX123

我当前的脚本选择并挑选所有 QC 样本,如空白或 IRM,但我无法计算实际的 XXX123,因为其中一些样本有两种类型的重复,作为内部质量检查。

  1. 一种是 "ORIG" 和 "PREP"
  2. 第二种是“.1”和“.2”

另一个问题是,很少有一个样本会同时获得两者,就像您在 XXX123 85-90 中看到的那样

最后,问题是我怎么可能解释这个?如何告诉 python 这个:

如果我能进一步澄清这一点,请告诉我。谢谢!这是我目前 运行 的代码,但是“# Replicates”下面的所有内容都没有按我想要的方式执行,因为我无法弄清楚:

# IRMs
IRMs = CorrectedDF[CorrectedDF['SampleID'].str.match('IRM')]
print('Total numer of IRM samples in the run is: {}' .format(len(IRMs.index)))

# BLANKs 
searchfor = ['blk', 'Blank', 'BLK', 'blank']
BLANKs = CorrectedDF[CorrectedDF['SampleID'].str.contains('|'.join(searchfor))]
print('Total numer of BLANKs in the run is: {}' .format(len(BLANKs.index)))

# OREAS 239
searchfor2 = ['OREAS 239', 'oreas 239', 'Oreas 239']
OREAS_239 = CorrectedDF[CorrectedDF['SampleID'].str.contains('|'.join(searchfor2))]
print('Total numer of OREAS 239 Samples in the run is: {}' .format(len(OREAS_239.index)))

# Cal Standards 
searchfor3 = ['Standard', 'Au 15']
CalSTD = CorrectedDF[CorrectedDF['SampleID'].str.contains('|'.join(searchfor3))]
print('Total numer of Cal Standard Samples in the run is: {}' .format(len(CalSTD.index)))

# Prep samples
searchfor4 = ['Prep']
Prep = CorrectedDF[CorrectedDF['SampleID'].str.contains('|'.join(searchfor4))]
print('Total numer of Prep Samples in the run is: {}' .format(len(Prep.index)))

# Replicates
searchfor5 = ['ORIG', 'PREPDUP']
Replicates = CorrectedDF[CorrectedDF['SampleID'].str.contains('|'.join(searchfor5))]
print('Total numer of Replicate Samples in the run is: {}' .format(len(Replicates.index)))

print('Total numer of ALL Samples in the run is: {}' .format(len(CorrectedDF.index)))
ClientSamples = len(CorrectedDF.index) - (len(IRMs.index) + len(BLANKs.index)
                                          + len(OREAS_239.index) + len(CalSTD.index) 
                                          + len(Prep.index) + len(Replicates.index))
print('Total numer of Client-ONLY Samples in the run is: {}' .format(ClientSamples))
df['Label'].str.extract('(XXX123 \d+-\d+)').nunique()

您可以只使用正则表达式提取您要查找的内容,然后使用 nunique 找出有多少个唯一值。