将 pandas 数据框值插入特定列

Question

我使用python和pandas来分析大数据集。我有几个不同长度的数组。我需要将值插入特定列。如果某些值不存在于列中，它应该是 'not defined'。输入数据看起来像数据框中具有不同位置的行。预期输出：
输入数据示例：

# Example 1
{'Water Solubility': 'Insoluble ', 'Melting Point': '135-138 °C', 'logP': '4.68'}

# Example 2
{'Melting Point': '71 °C (whole mAb)', 'Hydrophobicity': '-0.529', 'Isoelectric Point': '7.89', 'Molecular Weight': '51234.9', 'Molecular Formula': 'C2224H3475N621O698S36'}

# Example 3
{'Water Solubility': '1E+006 mg/L (at 25 °C)', 'Melting Point': '204-205 °C', 'logP': '1.1', 'pKa': '6.78'}

我尝试添加到数组 'Not defined' 但我找不到正确的方法

Answer 1

我认为最好的方法是为每个字典创建一个数据框，然后连接数据框。

d_1 = {'Water Solubility': 'Insoluble ', 'Melting Point': '135-138 °C', 'logP': '4.68'}
    
d_2 =  {'Melting Point': '71 °C (whole mAb)', 'Hydrophobicity': '-0.529', 'Isoelectric Point': '7.89', 'Molecular Weight': '51234.9', 'Molecular Formula': 'C2224H3475N621O698S36'}

df_1 = pd.DataFrame([d_1], columns=d_1.keys()) 

df_2 = pd.DataFrame([d_2], columns=d_2.keys())

final_df = pd.concat([df_1, df_2]).fillna(0)

您可以构建一个接受字典列表和returns最终数据框的函数

Answer 2

这应该可以满足您的要求：

import pandas as pd
import numpy as np

# Example 1
ex1 = {'Water Solubility': 'Insoluble ', 'Melting Point': '135-138 °C', 'logP': '4.68'}

# Example 2
ex2 = {'Melting Point': '71 °C (whole mAb)', 'Hydrophobicity': '-0.529', 'Isoelectric Point': '7.89', 'Molecular Weight': '51234.9', 'Molecular Formula': 'C2224H3475N621O698S36'}

# Example 3
ex3 = {'Water Solubility': '1E+006 mg/L (at 25 °C)', 'Melting Point': '204-205 °C', 'logP': '1.1', 'pKa': '6.78'}


df = pd.DataFrame({
    'Boiling Point':[162-165, 'Not defined'],
    'Hydrophobicity':[-0.5227, -0.427],
    'Isoelectric Point':[9.02, 12.02],
    'Melting Point':[1000.0, 'Not defined'],
    'Molecular Formula':['C1970H3848N50O947S4', 'Not defined'],
    'Molecular Weight':[9.23, 7.13],
    'Radioactivity':['Practically insoluble', 'Not defined'],
    'Water Solubility':[1.23, 2.87],
    'caco2 Permeability':['63.6±55.0', 901],
    'logP':[14, 14],
    'logS':[0.618, 0.238],
    'pKa':['Not defined', 'Not defined']
})

df = pd.concat([df, pd.DataFrame([ex1, ex2, ex3])], ignore_index=True)
df.iloc[-3:] = df.iloc[-3:].apply(lambda x: ['Not defined' if str(v)=='nan' else v for v in x])
print(df.to_string())

输出：

  Boiling Point Hydrophobicity Isoelectric Point      Melting Point      Molecular Formula Molecular Weight          Radioactivity        Water Solubility caco2 Permeability         logP         logS          pKa
0            -3        -0.5227              9.02             1000.0    C1970H3848N50O947S4             9.23  Practically insoluble                    1.23          63.6±55.0           14        0.618  Not defined
1   Not defined         -0.427             12.02        Not defined            Not defined             7.13            Not defined                    2.87                901           14        0.238  Not defined
2   Not defined    Not defined       Not defined         135-138 °C            Not defined      Not defined            Not defined              Insoluble         Not defined         4.68  Not defined  Not defined
3   Not defined         -0.529              7.89  71 °C (whole mAb)  C2224H3475N621O698S36          51234.9            Not defined             Not defined        Not defined  Not defined  Not defined  Not defined
4   Not defined    Not defined       Not defined         204-205 °C            Not defined      Not defined            Not defined  1E+006 mg/L (at 25 °C)        Not defined          1.1  Not defined         6.78

将 pandas 数据框值插入特定列

Insert to pandas dataframe value to specific column

python

data-analysis

pandas