在 pandas 数据框列中填充空白会导致函数反转

Question

我有以下数据框：

   a    b      x  y    language
0  id1  id_2   3 
1  id2  id_4   6 ,0=/%
2  id3  id_6   9 |-|/#
3  id4  id_8  12 text4

我使用 langdetect 来检测 y 列中文本元素的语言。

这是我为此目的使用的代码：

for i, row in df.iterrows():
try:
    df.loc[i, "language"] = detect(row["y"])
except:
    continue

这是结果：

   a    b      x  y    language
0  id1  id_2   3 
1  id2  id_4   6 ,0=/%
2  id3  id_6   9 |-|/#
3  id4  id_8  12 text4  en
4  id5  id_9  14 text5  de
5  id6  id_10 12

然后我尝试使用以下任一命令用字符串“N/A”填充语言列中的空白：

df['language'].replace([''],"N/A", inplace=True)

df['language'] = df['language'].fillna(0)

对于上面的每个命令，我收到了以下结果：

      a    b    x  y    language
 0  id1  id_2   3 N/A   N/A
 1  id2  id_4   6 ,0=/% ,0=/%
 2  id3  id_6   9 |-|/# |-|/#
 3  id4  id_8  12 text4 text4  
 4  id5  id_9  14 text5 text5 
 5  id6  id_10 12 N/A   N/A

如何获得以下结果：

   a    b      x  y    language
0  id1  id_2   3        N/A
1  id2  id_4   6 ,0=/%  N/A
2  id3  id_6   9 |-|/#  N/A
3  id4  id_8  12 text4  en
4  id5  id_9  14 text5  de
5  id6  id_10 12        N/A

Answer 1

使用 np where()，检查语言是否有字母数字。

df['language']=np.where(df['language'].str.contains('\w'),df['language'],'N/A')

Answer 2

这有效！

初始数据帧：

   a    b     x   y language new
0  0  id1  id_2   3     None    
1  1  id2  id_4   6    ,0=/%    
2  2  id3  id_6   9           kl
3  3  id4  id_8  12    text4

用过的替换掉就用space

df.new=df['new'].replace(" ",'n/a')#or

df['new'].replace(" ",'n/a',inplace=True)#also works

输出：

   a    b     x   y language  new
0  0  id1  id_2   3     None  n/a
1  1  id2  id_4   6    ,0=/%  n/a
2  2  id3  id_6   9            kl
3  3  id4  id_8  12    text4  n/a

在 pandas 数据框列中填充空白会导致函数反转

Filling blanks in a pandas dataframe column leads to reversal of function

replace

pandas

fillna