通过 pandas 数据框用 str 列的空格替换换行符

Question

给定一个包含自由文本的第 2 列和第 3 列的示例数据框，例如

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> pd.DataFrame(lol)
   0  1          2          3
0  1  2        abc   foo\nbar
1  3  1  def\nhaha  love it\n

目标是将\n替换为</code>（空格）并去除第2列和第3列中的字符串以实现：</p> <pre><code>>>> pd.DataFrame(lol) 0 1 2 3 0 1 2 abc foo bar 1 3 1 def haha love it

如何通过 pandas 数据框用空格替换特定列的换行符？

我试过这个：

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]

>>> replace_and_strip = lambda x: x.replace('\n', ' ').strip()

>>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]

>>> pd.DataFrame(lol2)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

但必须有一个better/simpler方法。

Answer 1

您可以使用以下两种正则表达式替换方法：

>>> df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True)
>>> df
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it
>>>

详情

'\A\s+|\s+\Z' -> '' 将像 strip() 删除所有前导和尾随白色 space：
- \A\s+ - 匹配字符串开头的 1 个或多个白色 space 符号
- | - 或
- \s+\Z - 匹配字符串末尾的 1 个或多个白色 space 符号
'\n' -> ' ' 将用 space.

Answer 2

您可以 select_dtypes 到 select 类型 object 的列，并在这些列上使用 applymap。

因为这些函数没有 inplace 参数，这将是更改数据框的解决方法：

strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
#   0  1         2        3
#0  1  2       abc  foo bar
#1  3  1  def haha  love it

Answer 3

除了其他不错的答案外，这是您最初想法的矢量化版本：

columns = [2,3] 
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                       for col in columns]

详情：

In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                                 for col in columns]  

In [50]: df
Out[50]: 
   0  1        2         3
0  1  2      abc  def haha
1  3  1  foo bar   love it

Answer 4

使用replace - 首先第一个和最后一个条然后替换\n:

df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n',  ' ', regex=True)
print (df)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

通过 pandas 数据框用 str 列的空格替换换行符

Replacing newlines with spaces for str columns through pandas dataframe

python

string

replace

strip

pandas