从逗号分隔值计数中排除特定的基于文本的值，并给出不包括那些的输出

Question

我想计算数据框中的 ; 个分隔值

解决方案可以是这样的

count=[] 
for row in gdf.itertuples():
    newstr = row.info.split(";")
    n = len(newstr)
    count.append(n)

gdf["count"] = count

但是这是一个问题，我不想计算那些 ; 分隔值，如果它们是特定符号，在我的例子中是 #、##、### 或#### 所以让我们说下图 id:2 我希望计数为二，对于 id:6 我希望计数为一等等

到目前为止失败的尝试

我试着在计数之前剥离它们，我用 .replace 方法和 # 符号被删除，但分隔符仍然存在留下来，让它变得更乱
试过 len-1 没用
我尝试添加另一个我认为可行的 for 循环和 if 语句
但没有

count=[]                            
for row in gdf.itertuples():
        newstr = row.info.split(";")
        for i in newstr:
            if (i !='#'):
                n = len(newstr)
        count.append(n)

感谢帮助

Answer 1

您可以使用正则表达式排除不需要的 ##，然后 count ; 并添加 1:

df = pd.DataFrame({'info': ['1;2;3', '##;2;3', '1;##;3', '1;2;##']})

df['count'] = df['info'].str.replace('#+;|;#+', '', regex=True).str.count(';').add(1)

输出：

     info  count
0   1;2;3      3
1  ##;2;3      2
2  1;##;3      2
3  1;2;##      2

正则表达式：

#+;   # match one or more literal # followed by ;
|     # OR
;#+   # match one or more literal # preceded by ;

从逗号分隔值计数中排除特定的基于文本的值，并给出不包括那些的输出

exclude specific text-based values from comma separated value count and give output excluding those

python

pandas

geopandas