在给定特定条件的情况下计算具有范围的列中的值
Count values in column with ranges given a specific condition
我有这个代码
df = pd.DataFrame({'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'}, 'an': {0: 'f', 1: 'i', 2: '-', 3: '-', 4: 'f', 5: 'c,f,i,j', 6: 'c,d,e,j'}, 'nv1': {0: [-1.0], 1: [-1.0], 2: [], 3: [], 4: [-2.0], 5: [-2.0, -1.0, -3.0, -1.0], 6: [-2.0, -1.0, -2.0, -1.0]}})
产生以下数据帧:
nv1
0 [-1.0]
1 [-1.0]
2 []
3 []
4 [-2.0]
5 [-2.0, -1.0, -3.0, -1.0]
6 [-2.0, -1.0, -2.0, -1.0]
我希望创建新列来计算 df['nv1'] 列中每行中有多少值低于 -1。
期望的输出如下:
nv1 ct
0 [-1.0]
1 [-1.0]
2 []
3 []
4 [-2.0] 1
5 [-2.0, -1.0, -3.0, -1.0] 2
6 [-2.0, -1.0, -2.0, -1.0] 2
我分别尝试了下面的两行代码,但是我 运行 出错了:
df['ct'] = np.sum((df['nv1']>-1))
df['ct'] = df['nv1'].mask(lambda x: x.ne(x>[-1])).transform('count')
你需要在这里循环。
使用带有 lambda 函数的 Series.apply
和 sum
:
df['ct'] = df['nv1'].apply(lambda s: sum(e<-1 for e in s))
或经典循环理解:
df['ct'] = [sum(e<-1 for e in s) for s in df['nv1']]
输出:
R an nv1 ct
0 1 f [-1.0] 0
1 2 i [-1.0] 0
2 3 - [] 0
3 4 - [] 0
4 5 f [-2.0] 1
5 6 c,f,i,j [-2.0, -1.0, -3.0, -1.0] 2
6 7 c,d,e,j [-2.0, -1.0, -2.0, -1.0] 2
如果您真的想要用空字符串代替零:
df['ct'] = [S if (S:=sum(e<-1 for e in s)) else '' for s in df['nv1']]
输出:
R an nv1 ct
0 1 f [-1.0]
1 2 i [-1.0]
2 3 - []
3 4 - []
4 5 f [-2.0] 1
5 6 c,f,i,j [-2.0, -1.0, -3.0, -1.0] 2
6 7 c,d,e,j [-2.0, -1.0, -2.0, -1.0] 2
使用 lambda 函数 sum
:
df['ct'] = df['nv1'].apply(lambda x: sum(y <-1 for y in x))
print (df)
R an nv1 ct
0 1 f [-1.0] 0
1 2 i [-1.0] 0
2 3 - [] 0
3 4 - [] 0
4 5 f [-2.0] 1
5 6 c,f,i,j [-2.0, -1.0, -3.0, -1.0] 2
6 7 c,d,e,j [-2.0, -1.0, -2.0, -1.0] 2
另一个想法是通过列表创建 DataFrame 并比较 -1
与 sum
:
df['ct'] = pd.DataFrame(df['nv1'].tolist(), index=df.index).lt(-1).sum(axis=1)
我有这个代码
df = pd.DataFrame({'R': {0: '1', 1: '2', 2: '3', 3: '4', 4: '5', 5: '6', 6: '7'}, 'an': {0: 'f', 1: 'i', 2: '-', 3: '-', 4: 'f', 5: 'c,f,i,j', 6: 'c,d,e,j'}, 'nv1': {0: [-1.0], 1: [-1.0], 2: [], 3: [], 4: [-2.0], 5: [-2.0, -1.0, -3.0, -1.0], 6: [-2.0, -1.0, -2.0, -1.0]}})
产生以下数据帧:
nv1
0 [-1.0]
1 [-1.0]
2 []
3 []
4 [-2.0]
5 [-2.0, -1.0, -3.0, -1.0]
6 [-2.0, -1.0, -2.0, -1.0]
我希望创建新列来计算 df['nv1'] 列中每行中有多少值低于 -1。
期望的输出如下:
nv1 ct
0 [-1.0]
1 [-1.0]
2 []
3 []
4 [-2.0] 1
5 [-2.0, -1.0, -3.0, -1.0] 2
6 [-2.0, -1.0, -2.0, -1.0] 2
我分别尝试了下面的两行代码,但是我 运行 出错了:
df['ct'] = np.sum((df['nv1']>-1))
df['ct'] = df['nv1'].mask(lambda x: x.ne(x>[-1])).transform('count')
你需要在这里循环。
使用带有 lambda 函数的 Series.apply
和 sum
:
df['ct'] = df['nv1'].apply(lambda s: sum(e<-1 for e in s))
或经典循环理解:
df['ct'] = [sum(e<-1 for e in s) for s in df['nv1']]
输出:
R an nv1 ct
0 1 f [-1.0] 0
1 2 i [-1.0] 0
2 3 - [] 0
3 4 - [] 0
4 5 f [-2.0] 1
5 6 c,f,i,j [-2.0, -1.0, -3.0, -1.0] 2
6 7 c,d,e,j [-2.0, -1.0, -2.0, -1.0] 2
如果您真的想要用空字符串代替零:
df['ct'] = [S if (S:=sum(e<-1 for e in s)) else '' for s in df['nv1']]
输出:
R an nv1 ct
0 1 f [-1.0]
1 2 i [-1.0]
2 3 - []
3 4 - []
4 5 f [-2.0] 1
5 6 c,f,i,j [-2.0, -1.0, -3.0, -1.0] 2
6 7 c,d,e,j [-2.0, -1.0, -2.0, -1.0] 2
使用 lambda 函数 sum
:
df['ct'] = df['nv1'].apply(lambda x: sum(y <-1 for y in x))
print (df)
R an nv1 ct
0 1 f [-1.0] 0
1 2 i [-1.0] 0
2 3 - [] 0
3 4 - [] 0
4 5 f [-2.0] 1
5 6 c,f,i,j [-2.0, -1.0, -3.0, -1.0] 2
6 7 c,d,e,j [-2.0, -1.0, -2.0, -1.0] 2
另一个想法是通过列表创建 DataFrame 并比较 -1
与 sum
:
df['ct'] = pd.DataFrame(df['nv1'].tolist(), index=df.index).lt(-1).sum(axis=1)