如何根据其他行和其他数据框在数据框中查找行

Question

根据我问的问题我得到了一个 JSON 类似这样的回复：

（请注意：下面我的示例数据中的 id 是数字字符串，但有些是字母数字）

data=↓**

{
  "state": "active",
  "team_size": 20,
  "teams": {
    "id": "12345679",
    "name": "Good Guys",
    "level": 10,
    "attacks": 4,
    "destruction_percentage": 22.6,
    "members": [
      {
        "id": "1",
        "name": "John",
        "level": 12
      },
      {
        "id": "2",
        "name": "Tom",
        "level": 11,
        "attacks": [
          {
            "attackerTag": "2",
            "defenderTag": "4",
            "damage": 64,
            "order": 7
          }
        ]
      }
    ]
  },
  "opponent": {
    "id": "987654321",
    "name": "Bad Guys",
    "level": 17,
    "attacks": 5,
    "damage": 20.95,
    "members": [
      {
        "id": "3",
        "name": "Betty",
        "level": 17,
        "attacks": [
          {
            "attacker_id": "3",
            "defender_id": "1",
            "damage": 70,
            "order": 1
          },
          {
            "attacker_id": "3",
            "defender_id": "7",
            "damage": 100,
            "order": 11
          }
        ],
        "opponentAttacks": 0,
        "some_useless_data": "Want to ignore, this doesn't show in every record"
      },
      {
        "id": "4",
        "name": "Fred",
        "level": 9,
        "attacks": [
          {
            "attacker_id": "4",
            "defender_id": "9",
            "damage": 70,
            "order": 4
          }
        ],
        "opponentAttacks": 0
      }
    ]
  }
}

我使用以下方式加载：

df = json_normalize([data['team'], data['opponent']],
                     'members',
                     ['id', 'name'],
                     meta_prefix='team.',
                     errors='ignore')
print(df.iloc(1))
attacks              [{'damage': 70, 'order': 4, 'defender_id': '9'...
id                                                                   4
level                                                                9
name                                                              Fred
opponentAttacks                                                      0
some_useless_data                                                  NaN
team.name                                                     Bad Guys
team.id                                                      987654321
Name: 3, dtype: object

我有一个本质上由 3 部分组成的问题。

如何使用会员标签获得像上面那样的一行？我试过：

member = df[df['id']=="1"].iloc[0]
#Now this works, but am I correctly doing this?
#It just feels weird is all.

如果仅记录攻击而不记录防御，我将如何检索成员的防御（即使给出 defender_id）？我试过：

df.where(df['tag']==df['attacks'].str.get('defender_id'), df['attacks'], axis=0)
#This is totally not working.. Where am I going wrong?

因为我正在从 API 中检索新数据，所以我需要检查数据库中的旧数据，看看是否有任何新的攻击。然后我可以遍历新的攻击，然后向用户显示攻击信息。

老实说，我想不通，我也试过研究 and ，我觉得已经接近我的需要了，但我仍然无法思考概念。基本上我的逻辑如下：

def get_new_attacks(old_data, new_data)
    '''params
         old_data: Dataframe loaded from JSON in database
         new_data: Dataframe loaded from JSON API response
                   hopefully having new attacks
       returns:
         iterator over the new attacks
    '''

    #calculate a dataframe with new attacks listed
    return df.iterrows()

我知道除了我提供的文档之外，上面的功能几乎没有任何努力 （基本上是为了显示我想要的 input/output） 但请相信我，我一直这部分最让我绞尽脑汁。我一直在研究 merg 所有攻击，然后进行 reset_index() 并且由于攻击是一个列表而引发错误。我在上面链接的第二个问题中的 map() 函数让我很困惑。

Answer 1

按顺序参考您的问题（代码如下）：

我看起来 id 是数据的唯一索引，因此您可以使用 df.set_index('id')，它允许您通过玩家 ID 通过 df.loc['1'] 访问数据。
据我了解您的数据，每个 attacks 中列出的所有词典在某种意义上都是独立的，不需要相应的玩家 ID（如 attacker_id 或 defender_id 似乎足以识别数据）。因此，与其处理包含列表的行，我建议将该数据交换到它自己的数据框中，这样可以轻松访问。
将 attacks 存储在自己的数据框中后，您可以简单地比较索引以过滤掉旧数据。

下面是一些示例代码来说明各个要点：

# Question 1.
df.set_index('id', inplace=True)
print(df.loc['1'])  # For example player id 1.

# Question 2 & 3.
attacks = pd.concat(map(
    lambda x: pd.DataFrame.from_dict(x).set_index('order'),  # Is 'order' the right index?
    df['attacks'].dropna()
))

# Question 2.
print(attacks[attacks['defender_id'] == '1'])  # For example defender_id 1.

# Question 3.
old_attacks = attacks.iloc[:2]  # For example.
new_attacks = attacks[~attacks.index.isin(old_attacks.index)]
print(new_attacks)

如何根据其他行和其他数据框在数据框中查找行

How to find rows in a dataframe based on other rows and other dataframes

python

pandas

python-3.7