pandas 数据框中元素和子集的总长度
Total length of elements, and subsets, in a pandas dataframe
如何计算数据框中的元素总数(包括子集)并将结果放入新列?
import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]], \
index=range(1, len(x)+1))
df = pd.DataFrame({'A': x})
我尝试了以下代码,但它在每一行中给出了 2 个:
df['Length'] = df['A'].apply(len)
print(df)
A Length
1 [1, (2, 5, 6)] 2
2 [2, (3, 4)] 2
3 [3, 4] 2
4 [(5, 6), (7, 8, 9)] 2
然而,我想要得到的是:
A Length
1 [1, (2, 5, 6)] 4
2 [2, (3, 4)] 3
3 [3, 4] 2
4 [(5, 6), (7, 8, 9)] 5
谢谢
使用itertools
df['Length'] = df['A'].apply(lambda x: len(list(itertools.chain(*x))))
你可以尝试使用这个函数,它是递归的但它有效:
def recursive_len(item):
try:
iter(item)
return sum(recursive_len(subitem) for subitem in item)
except TypeError:
return 1
然后就这样调用apply函数:
df['Length'] = df['A'].apply(recursive_len)
鉴于:
import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]])
df = pd.DataFrame({'A': x})
您可以编写一个递归生成器,为每个不可迭代的嵌套元素生成 1
。这些方面的内容:
import collections
def glen(LoS):
def iselement(e):
return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
for el in LoS:
if iselement(el):
yield 1
else:
for sub in glen(el): yield sub
df['Length'] = df['A'].apply(lambda e: sum(glen(e)))
产量:
>>> df
A Length
0 [1, (2, 5, 6)] 4
1 [2, (3, 4)] 3
2 [3, 4] 2
3 [(5, 6), (7, 8, 9)] 5
这将在 Python 2 或 3 中工作。对于 Python 3.3 或更高版本,您可以使用 yield from
替换循环:
def glen(LoS):
def iselement(e):
return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
for el in LoS:
if iselement(el):
yield 1
else:
yield from glen(el)
如何计算数据框中的元素总数(包括子集)并将结果放入新列?
import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]], \
index=range(1, len(x)+1))
df = pd.DataFrame({'A': x})
我尝试了以下代码,但它在每一行中给出了 2 个:
df['Length'] = df['A'].apply(len)
print(df)
A Length
1 [1, (2, 5, 6)] 2
2 [2, (3, 4)] 2
3 [3, 4] 2
4 [(5, 6), (7, 8, 9)] 2
然而,我想要得到的是:
A Length
1 [1, (2, 5, 6)] 4
2 [2, (3, 4)] 3
3 [3, 4] 2
4 [(5, 6), (7, 8, 9)] 5
谢谢
使用itertools
df['Length'] = df['A'].apply(lambda x: len(list(itertools.chain(*x))))
你可以尝试使用这个函数,它是递归的但它有效:
def recursive_len(item):
try:
iter(item)
return sum(recursive_len(subitem) for subitem in item)
except TypeError:
return 1
然后就这样调用apply函数:
df['Length'] = df['A'].apply(recursive_len)
鉴于:
import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]])
df = pd.DataFrame({'A': x})
您可以编写一个递归生成器,为每个不可迭代的嵌套元素生成 1
。这些方面的内容:
import collections
def glen(LoS):
def iselement(e):
return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
for el in LoS:
if iselement(el):
yield 1
else:
for sub in glen(el): yield sub
df['Length'] = df['A'].apply(lambda e: sum(glen(e)))
产量:
>>> df
A Length
0 [1, (2, 5, 6)] 4
1 [2, (3, 4)] 3
2 [3, 4] 2
3 [(5, 6), (7, 8, 9)] 5
这将在 Python 2 或 3 中工作。对于 Python 3.3 或更高版本,您可以使用 yield from
替换循环:
def glen(LoS):
def iselement(e):
return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
for el in LoS:
if iselement(el):
yield 1
else:
yield from glen(el)