Astropy table 操作:如何创建一个新的 table,其中的行按列中的值分组?

Astropy table manipulation: How can I create a new table with rows grouped by values in a column?

我有一个包含多行和多列的 table。其中一列具有重复多次的不同数字。我如何创建一个新的 astropy table,它只存储具有重复次数超过 3 次的列的行?

示例:

table

注意0129在c列重复了3次,2780在c列重复了4次。我希望我的代码能够创建新的 table:

已修改table

我正在使用 astropy 模块,特别是:

from astropy.table import Table

我假设我需要使用 for 循环来完成此任务并最终完成命令

new_table.add_row(table[index]) 

大局,我想要完成的是:

if column_c_value repeats >=3:
    new_table.add_row(table[index])

感谢您的帮助!我有点被困在这里,非常感谢有见识。

您可以使用 Table grouping 功能:

In [2]: t = Table([[1, 2, 3, 4, 5, 6, 7, 8],
   ...:            [10, 11, 10, 10, 11, 12, 13, 12]],
   ...:            names=['a', 'id'])

In [3]: tg = t.group_by('id')

In [4]: tg.groups
Out[4]: <TableGroups indices=[0 3 5 7 8]>

In [6]: tg.groups.keys
Out[6]: 
<Table length=4>
  id 
int64
-----
   10
   11
   12
   13

In [7]: np.diff(tg.groups.indices)
Out[7]: array([3, 2, 2, 1])

In [8]: tg
Out[8]: 
<Table length=8>
  a     id 
int64 int64
----- -----
    1    10
    3    10
    4    10
    2    11
    5    11
    6    12
    8    12
    7    13

In [9]: ok = np.zeros(len(tg), dtype=bool)

In [10]: for i0, i1 in zip(tg.groups.indices[:-1], tg.groups.indices[1:]):
    ...:     if (i1 - i0) >= 3:
    ...:         ok[i0:i1] = True
    ...: tg3 = tg[ok]
    ...: tg3
    ...: 
Out[10]: 
<Table length=3>
  a     id 
int64 int64
----- -----
    1    10
    3    10
    4    10

In [12]: for tgg in tg.groups:
    ...:     if len(tgg) >= 2:
    ...:         print(tgg)  # or do something with it
    ...:         
 a   id
--- ---
  1  10
  3  10
  4  10
 a   id
--- ---
  2  11
  5  11
 a   id
--- ---
  6  12
  8  12

我提出的解决方案使用的工具比 solution and actually filters out short groups. Let's assume that the column with repeated elements is called 'id' as in 解决方案少。那么,

from collections import Counter
import numpy as np
min_repeats = 3 # Let's say selected rows must have 'id' repeated at least 3 times

cntr = Counter(t['id'])
repeated_elements = [k for k, v in cntr.items() if v >= min_repeats]
mask = np.in1d(t['id'], repeated_elements)
new_table = t[mask]