python：从加载的每一行处理字符串 json

Question

我有一个 json 推文数据，通常在开头有一个推特句柄。

import pandas as pd
data = pd.DataFrame(pd.read_json(filename, orient=columnName),columns=columnName)

我可以使用 pandas 加载和索引推文数据，但我想知道如何智能地处理每一行以删除位于推文开头的句柄（忽略所有其他时间）它被使用）

data['full_text']

示例推文：

@ABC hi there, how much for an apple
@ABC hi there, how much for an orange
@ABC hi there, how much @ABC for an pineapple
hi there @ABC, how much for an car
@ABC hi there, how much for an tree

会变成：

hi there, how much for an apple
hi there, how much for an orange
hi there, how much @ABC for an pineapple
hi there @ABC, how much for an car
hi there, how much for an tree

有 iterrows() 命令，但根据我的阅读，不建议修改它，例如更多用于打印行，例如

===================

for datum in data['full_text']:
    print(datum)
    datum=re.sub("@ABC", "",datum,1)
    print(datum)

我也有以上情况，但这不是坏习惯吗？我在控制台中看到的示例看起来不错，尽管我无法验证我是否有一百万条记录

Answer 1

您可以使用 replace - ^ 表示字符串的开始和 \s+ 一个或多个空格：

data = pd.read_json(filename, orient=columnName) 
data['full_text'] = data['full_text'].replace('^@ABC\s+', '', regex=True)
print (data)
                                  full_text
0           hi there, how much for an apple
1          hi there, how much for an orange
2  hi there, how much @ABC for an pineapple
3        hi there @ABC, how much for an car
4            hi there, how much for an tree

Answer 2

data['full_text'] = data['full_text'].str.replace(r'^(?:\@[^\s]+)\s*','')

python：从加载的每一行处理字符串 json

python: processing string from each row of loaded json

python

json

pandas

python-3.6