在 Python 中从同一 df 中的字典项创建单独的列

Question

我有一个巨大的推特数据框 (9530232x19)。第一列包括字典。我想在同一个 df 中从该字典项目中创建单独的列。另外，我在 'entities' 列中有一本字典，我想对其进行类似的分隔。我想将指标添加为四个新列，例如 'rtcount'、'reply_count'、'like_count' 和 'quote_count'，以及实体 ['htype'] 作为新列在我现有的 df 的右侧，无需创建任何更多数据帧，因为这个大 df 几乎使用了我所有的 16 GB RAM 并且偶尔会崩溃。我知道对这个大的 df 使用 for 循环不是一种有效的方法，但我不知道如何去做。
非常感谢任何帮助。

htypedf=[]
t=[]
for i in range(0,len(d)):
    if i%100==0:
        print(i)
    htype=[]
    hasht=[]
    t=d[i:i+1]
    metrics=pd.Series(t['public_metrics'][0]).to_frame().T
    try:
        htype=list(map(lambda x : x['type'], t['entities'][0]['annotations']))
    except:
        htype=('NaN')

    d.iloc[i] = pd.concat([t, metrics, pd.DataFrame({'htype': [htype]})],axis=1)

Answer 1

代替你的 for 循环，试试这个：

d = pd.concat([t.drop['public_metrics'], t['public_metrics'].apply(pd.Series)], axis=1)

类似的概念可用于获取 htype，但处理方式将取决于您希望如何保留数据。如果您只想要实体列中的 htype，您可能会尝试以下操作：

d = pd.concat([t.drop['entities'], t['entities'].apply(pd.Series)['htype']], axis=1)

除了新的 htype 列之外，要保留 entities 列，您应该能够改用以下代码（只需删除 drop 函数）：

d = pd.concat([t, t['entities'].apply(pd.Series)['htype']], axis=1)

让我知道这对您有何作用！

新代码块：

def fetch_htype(row):
    entities_dict = row['entities']
    if np.isnan(entities_dict):
        return pd.Series(data = '', index = ['htype'])
    else:
        return pd.Series(data = entities_dict['htype'], index = ['htype'])

d = pd.concat([d, t.apply(fetch_htype)], axis=1)

在 Python 中从同一 df 中的字典项创建单独的列

Making seperate columns from dictionary items in the same df in Python

python

dictionary

add

pandas