尝试 Select jsonl data column in another columns with .loc 但得到 KeyError 即使密钥存在
Try to Select jsonl data column in another columns with .loc but got KeyError even though the key exists
这是我的jsonl数据结构
"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}
我尝试使用此代码select 来自地方列的国家/地区代码
country_df = test_df.loc[test_df['place'].notnull(), ['content', 'place']]
countrycode_df = country_df["place"].loc["countryCode"]
但它给了我这个错误
按键错误:'countryCode'
我该如何解决这个问题?
我试过这个method但它不适合我的情况
您可以通过 str
访问它:
country_df['place'].str['countryCode']
输出:
0 US
Name: place, dtype: object
因为“place”基本上是一个dict
(一个嵌套的字典),你可以像更高级别的访问它dict
country = {"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}}
country["place"]["countryCode"]
输出:
'US'
但是,使用 pandas json_normalize()
:
可能更符合您的目的
country_df = pd.json_normalize(data = country)
print(country_df )
输出:
content
place._type
place.fullName
place.name
place.type
place.country
place.countryCode
Not yall gassing up a gay boy with no rhythm
snscrape.modules.twitter.Place
Manhattan, NY
Manhattan
city
United States
US
这是我的jsonl数据结构
"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}
我尝试使用此代码select 来自地方列的国家/地区代码
country_df = test_df.loc[test_df['place'].notnull(), ['content', 'place']]
countrycode_df = country_df["place"].loc["countryCode"]
但它给了我这个错误
按键错误:'countryCode'
我该如何解决这个问题?
我试过这个method但它不适合我的情况
您可以通过 str
访问它:
country_df['place'].str['countryCode']
输出:
0 US
Name: place, dtype: object
因为“place”基本上是一个dict
(一个嵌套的字典),你可以像更高级别的访问它dict
country = {"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}}
country["place"]["countryCode"]
输出:
'US'
但是,使用 pandas json_normalize()
:
country_df = pd.json_normalize(data = country)
print(country_df )
输出:
content | place._type | place.fullName | place.name | place.type | place.country | place.countryCode |
---|---|---|---|---|---|---|
Not yall gassing up a gay boy with no rhythm | snscrape.modules.twitter.Place | Manhattan, NY | Manhattan | city | United States | US |