Spotify API:如何从不同层次提取 JSON 信息到一个数据帧中
Spotify API: how to extract JSON information from different levels into one datFrame
如何从这个 JSON 对象中提取“艺术家姓名”、“流行度”和“uri”到数据框中?
{
"tracks" : {
"href" : "https://api.spotify.com/v1/search?query=karma+police&offset=0&limit=20&type=track&market=BR",
"items" : [ {
"album" : {
"album_type" : "album",
"available_markets" : [ "AD", "AR", "AT", "AU", "BE", "BG", "BO", "BR", "CA", "CH", "CL", "CO", "CR", "CY", "CZ", "DE", "DK", "DO", "EC", "EE", "ES", "FI", "FR", "GB", "GR", "GT", "HK", "HN", "HU", "ID", "IE", "IS", "IT", "JP", "LI", "LT", "LU", "LV", "MC", "MT", "MX", "MY", "NI", "NL", "NO", "NZ", "PA", "PE", "PH", "PL", "PT", "PY", "SE", "SG", "SK", "SV", "TR", "TW", "US", "UY" ],
"external_urls" : {
"spotify" : "https://open.spotify.com/album/7dxKtc08dYeRVHt3p9CZJn"
},
"href" : "https://api.spotify.com/v1/albums/7dxKtc08dYeRVHt3p9CZJn",
"id" : "7dxKtc08dYeRVHt3p9CZJn",
"images" : [ {
"height" : 640,
"url" : "https://i.scdn.co/image/f89c1ecdd0cc5a23d5ad7303d4ae231d197dde98",
"width" : 640
}, {
"height" : 300,
"url" : "https://i.scdn.co/image/1b898f0b8e3ce499d0fc629a1918c144d982e475",
"width" : 300
}, {
"height" : 64,
"url" : "https://i.scdn.co/image/faf295a70a6531826a8c25d33aad7d2cd9c75c7a",
"width" : 64
} ],
"name" : "OK Computer",
"type" : "album",
"uri" : "spotify:album:7dxKtc08dYeRVHt3p9CZJn"
},
"artists" : [ {
"external_urls" : {
"spotify" : "https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb"
},
"href" : "https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb",
"id" : "4Z8W4fKeB5YxbusRsdQVPb",
"name" : "Radiohead",
"type" : "artist",
"uri" : "spotify:artist:4Z8W4fKeB5YxbusRsdQVPb"
} ]
我可以访问相同级别的信息,但无法获取 JSON 对象的子级别。
如果我理解正确你可以尝试不使用列表结构,像这样编辑它
data = {
"tracks": {
"href": "https://api.spotify.com/v1/search?query=karma+police&offset=0&limit=20&type=track&market=BR",
"items": {
"album": {
"album_type": "album",
"available_markets": ["AD", "AR", "AT", "AU", "BE", "BG", "BO", "BR", "CA", "CH", "CL", "CO", "CR",
"CY", "CZ", "DE", "DK", "DO", "EC", "EE", "ES", "FI", "FR", "GB", "GR", "GT",
"HK", "HN", "HU", "ID", "IE", "IS", "IT", "JP", "LI", "LT", "LU", "LV", "MC",
"MT", "MX", "MY", "NI", "NL", "NO", "NZ", "PA", "PE", "PH", "PL", "PT", "PY",
"SE", "SG", "SK", "SV", "TR", "TW", "US", "UY"],
"external_urls": {
"spotify": "https://open.spotify.com/album/7dxKtc08dYeRVHt3p9CZJn"
},
"href": "https://api.spotify.com/v1/albums/7dxKtc08dYeRVHt3p9CZJn",
"id": "7dxKtc08dYeRVHt3p9CZJn",
"images": [{
"height": 640,
"url": "https://i.scdn.co/image/f89c1ecdd0cc5a23d5ad7303d4ae231d197dde98",
"width": 640
}, {
"height": 300,
"url": "https://i.scdn.co/image/1b898f0b8e3ce499d0fc629a1918c144d982e475",
"width": 300
}, {
"height": 64,
"url": "https://i.scdn.co/image/faf295a70a6531826a8c25d33aad7d2cd9c75c7a",
"width": 64
}],
"name": "OK Computer",
"type": "album",
"uri": "spotify:album:7dxKtc08dYeRVHt3p9CZJn"
},
"artists": {
"external_urls": {
"spotify": "https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb"
},
"href": "https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb",
"id": "4Z8W4fKeB5YxbusRsdQVPb",
"name": "Radiohead",
"type": "artist",
"uri": "spotify:artist:4Z8W4fKeB5YxbusRsdQVPb"
}}}}
那么这个例子会给你“uri”
print(data["tracks"]["items"]["artists"]["uri"])
如果它不是 json 你可以修复,你应该这样做,因为它包含列表
print (data["tracks"]["items"][0]["artists"][0]["uri"])
如果不止一个,可以循环获取所有数据
嵌套 json 有时会令人困惑且难以访问,但 pandas 只需几步即可轻松处理。
根据我认为您正在使用的内容,我决定在此处使用 spotify tracks API。数据样本在这个post.
的底部
TL;DR: 使用 json_normalize()
:
# get access to tracks and put it in a nice variable
# use json_normalize to flatten it into a nice df
# rename columns
# normalize the 'artists' column in the df that contains nested json
# rename columns
# concatenate the original df and the artists df
# you can remove the original 'artists' / 'track_artists' field as it is no longer
# necessary, the values have been flattened out into their own columns.
在查看了您的数据结构和您想要完成的任务后,我认为这是 json_normalize() 的工作!
使用 pandas 的 json_normalize()
会稍微压平您的嵌套 json,并让您完成大部分工作。
然而,棘手的是,对于每个轨道,'artists'
键中的值包含一个包含结果字典的列表,而不是将一个值或另一个字典作为值,json_normalize()
轻松驾驭。
请注意,tracks['album'][0][artists]
保存 key:value
对,这些键的名称与 'album' 字典中的名称相同。
看起来你想要的一切都将在轨道内,所以让我们创建一个变量以便于访问:
tracks = data['tracks]
json_normalize() 救援:
df = pd.json_normalize(tracks)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 28 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 artists 3 non-null object
1 available_markets 3 non-null object
2 disc_number 3 non-null int64
3 duration_ms 3 non-null int64
4 explicit 3 non-null bool
5 href 3 non-null object
6 id 3 non-null object
7 is_local 3 non-null bool
8 name 3 non-null object
9 popularity 3 non-null int64
10 preview_url 3 non-null object
11 track_number 3 non-null int64
12 type 3 non-null object
13 uri 3 non-null object
14 album.album_type 3 non-null object
15 album.artists 3 non-null object
16 album.available_markets 3 non-null object
17 album.external_urls.spotify 3 non-null object
18 album.href 3 non-null object
19 album.id 3 non-null object
20 album.images 3 non-null object
21 album.name 3 non-null object
22 album.release_date 3 non-null object
23 album.release_date_precision 3 non-null object
24 album.type 3 non-null object
25 album.uri 3 non-null object
26 external_ids.isrc 3 non-null object
27 external_urls.spotify 3 non-null object
如果此时检查您的数据框,您会看到大多数值都整齐地平放在它们自己的行中,但由于我们还需要艺术家信息,因此我们必须更进一步。
# change column names because you know there are dupes
# this will create a properly formatted dictionary for renaming columns
keys = {k:f'track_{k}' for k in df.keys()[:14]}
# rename columns
df = df.rename(columns=lambda x: keys.pop(x) if x in keys.keys() else x)
您的 df 现在将在其键前面加上 'track_',这样您就知道它们是主轨道字典的一部分。
'artists' 值仍然不平坦,所以让我们将它们拉平。一个特例,因为每个字典都在一个列表中。
# normalize artist column and cat the resulting columns into a dataframe
# we use a list comprehension to get to the dict to use for json_normalize()
df_artists = pd.concat([pd.DataFrame(pd.json_normalize(y)) for x in df.track_artists for y in x], ignore_index=True)
# make a dict of new column names prepended with 'artist_' so we know it came from the 'artist' nested dict
kys = {k:f'artist_{k}' for k in df_artists.keys()}
# rename the columns
df_artists = df_artists.rename(columns=lambda x: kys.pop(x) if x in kys.keys() else x)
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 artist_href 3 non-null object
1 artist_id 3 non-null object
2 artist_name 3 non-null object
3 artist_type 3 non-null object
4 artist_uri 3 non-null object
5 artist_external_urls.spotify 3 non-null object
我们对列进行了重命名,以便我们知道它们是否与曲目或艺术家相关,因此我们没有任何重复的名称冲突,这些冲突会使数据以后更难查找和排序。
现在我们将所有内容放在一个数据框中:
flat_df = pd.concat([df, df_artists], axis=1)
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 track_artists 3 non-null object
1 track_available_markets 3 non-null object
2 track_disc_number 3 non-null int64
3 track_duration_ms 3 non-null int64
4 track_explicit 3 non-null bool
5 track_href 3 non-null object
6 track_id 3 non-null object
7 track_is_local 3 non-null bool
8 track_name 3 non-null object
9 track_popularity 3 non-null int64
10 track_preview_url 3 non-null object
11 track_track_number 3 non-null int64
12 track_type 3 non-null object
13 track_uri 3 non-null object
14 album.album_type 3 non-null object
15 album.artists 3 non-null object
16 album.available_markets 3 non-null object
17 album.external_urls.spotify 3 non-null object
18 album.href 3 non-null object
19 album.id 3 non-null object
20 album.images 3 non-null object
21 album.name 3 non-null object
22 album.release_date 3 non-null object
23 album.release_date_precision 3 non-null object
24 album.type 3 non-null object
25 album.uri 3 non-null object
26 external_ids.isrc 3 non-null object
27 external_urls.spotify 3 non-null object
28 artist_href 3 non-null object
29 artist_id 3 non-null object
30 artist_name 3 non-null object
31 artist_type 3 non-null object
32 artist_uri 3 non-null object
33 artist_external_urls.spotify 3 non-null object
所以如果最后你想要的只是一个只有你提到的 3 列的数据框,它会是这样的:
final_df = flat_df[['artist_name', 'track_popularity', 'artist_uri']]
下面是数据对象:
data = {
"tracks": [
{
"album": {
"album_type": "single",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/0tGPJ0bkWOUmH7MEOR77qc"
},
"href": "https://api.spotify.com/v1/albums/0tGPJ0bkWOUmH7MEOR77qc",
"id": "0tGPJ0bkWOUmH7MEOR77qc",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/966ade7a8c43b72faa53822b74a899c675aaafee",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/107819f5dc557d5d0a4b216781c6ec1b2f3c5ab2",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/5a73a056d0af707b4119a883d87285feda543fbb",
"width": 64
}
],
"name": "Cut To The Feeling",
"release_date": "2017-05-26",
"release_date_precision": "day",
"type": "album",
"uri": "spotify:album:0tGPJ0bkWOUmH7MEOR77qc"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"disc_number": 1,
"duration_ms": 207959,
"explicit": False,
"external_ids": {
"isrc": "USUM71703861"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/11dFghVXANMlKmJXsNCbNl"
},
"href": "https://api.spotify.com/v1/tracks/11dFghVXANMlKmJXsNCbNl",
"id": "11dFghVXANMlKmJXsNCbNl",
"is_local": False,
"name": "Cut To The Feeling",
"popularity": 63,
"preview_url": "https://p.scdn.co/mp3-preview/3eb16018c2a700240e9dfb8817b6f2d041f15eb1?cid=774b29d4f13844c495f206cafdad9c86",
"track_number": 1,
"type": "track",
"uri": "spotify:track:11dFghVXANMlKmJXsNCbNl"
},
{
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/6SSSF9Y6MiPdQoxqBptrR2"
},
"href": "https://api.spotify.com/v1/albums/6SSSF9Y6MiPdQoxqBptrR2",
"id": "6SSSF9Y6MiPdQoxqBptrR2",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/2fb20bf4c1fb29b503bfc21516ff4b1a334b6372",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/a7b076ed5aa0746a21bc71ab7d2b6ed80dd3ebfe",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/b1d4c7643cf17c06b967b50623d7d93725b31de5",
"width": 64
}
],
"name": "Kiss",
"release_date": "2012-01-01",
"release_date_precision": "day",
"type": "album",
"uri": "spotify:album:6SSSF9Y6MiPdQoxqBptrR2"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"disc_number": 1,
"duration_ms": 193400,
"explicit": False,
"external_ids": {
"isrc": "CAB391100615"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/20I6sIOMTCkB6w7ryavxtO"
},
"href": "https://api.spotify.com/v1/tracks/20I6sIOMTCkB6w7ryavxtO",
"id": "20I6sIOMTCkB6w7ryavxtO",
"is_local": False,
"name": "Call Me Maybe",
"popularity": 74,
"preview_url": "https://p.scdn.co/mp3-preview/335bede49342352cddd53cc83af582e2240303bb?cid=774b29d4f13844c495f206cafdad9c86",
"track_number": 3,
"type": "track",
"uri": "spotify:track:20I6sIOMTCkB6w7ryavxtO"
},
{
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/1DFixLWuPkv3KT3TnV35m3"
},
"href": "https://api.spotify.com/v1/albums/1DFixLWuPkv3KT3TnV35m3",
"id": "1DFixLWuPkv3KT3TnV35m3",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/3f65c5400c7f24541bfd48e60f646e6af4d6c666",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/ff347680d9e62ccc144926377d4769b02a1024dc",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/c836e14a8ceca89e18012cab295f58ceeba72594",
"width": 64
}
],
"name": "Emotion (Deluxe)",
"release_date": "2015-09-18",
"release_date_precision": "day",
"type": "album",
"uri": "spotify:album:1DFixLWuPkv3KT3TnV35m3"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"disc_number": 1,
"duration_ms": 251319,
"explicit": False,
"external_ids": {
"isrc": "USUM71507009"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/7xGfFoTpQ2E7fRF5lN10tr"
},
"href": "https://api.spotify.com/v1/tracks/7xGfFoTpQ2E7fRF5lN10tr",
"id": "7xGfFoTpQ2E7fRF5lN10tr",
"is_local": False,
"name": "Run Away With Me",
"popularity": 50,
"preview_url": "https://p.scdn.co/mp3-preview/3e05f5ed147ca075c7ae77c01f2cc0e14cfad78d?cid=774b29d4f13844c495f206cafdad9c86",
"track_number": 1,
"type": "track",
"uri": "spotify:track:7xGfFoTpQ2E7fRF5lN10tr"
}
]
}
如何从这个 JSON 对象中提取“艺术家姓名”、“流行度”和“uri”到数据框中?
{
"tracks" : {
"href" : "https://api.spotify.com/v1/search?query=karma+police&offset=0&limit=20&type=track&market=BR",
"items" : [ {
"album" : {
"album_type" : "album",
"available_markets" : [ "AD", "AR", "AT", "AU", "BE", "BG", "BO", "BR", "CA", "CH", "CL", "CO", "CR", "CY", "CZ", "DE", "DK", "DO", "EC", "EE", "ES", "FI", "FR", "GB", "GR", "GT", "HK", "HN", "HU", "ID", "IE", "IS", "IT", "JP", "LI", "LT", "LU", "LV", "MC", "MT", "MX", "MY", "NI", "NL", "NO", "NZ", "PA", "PE", "PH", "PL", "PT", "PY", "SE", "SG", "SK", "SV", "TR", "TW", "US", "UY" ],
"external_urls" : {
"spotify" : "https://open.spotify.com/album/7dxKtc08dYeRVHt3p9CZJn"
},
"href" : "https://api.spotify.com/v1/albums/7dxKtc08dYeRVHt3p9CZJn",
"id" : "7dxKtc08dYeRVHt3p9CZJn",
"images" : [ {
"height" : 640,
"url" : "https://i.scdn.co/image/f89c1ecdd0cc5a23d5ad7303d4ae231d197dde98",
"width" : 640
}, {
"height" : 300,
"url" : "https://i.scdn.co/image/1b898f0b8e3ce499d0fc629a1918c144d982e475",
"width" : 300
}, {
"height" : 64,
"url" : "https://i.scdn.co/image/faf295a70a6531826a8c25d33aad7d2cd9c75c7a",
"width" : 64
} ],
"name" : "OK Computer",
"type" : "album",
"uri" : "spotify:album:7dxKtc08dYeRVHt3p9CZJn"
},
"artists" : [ {
"external_urls" : {
"spotify" : "https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb"
},
"href" : "https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb",
"id" : "4Z8W4fKeB5YxbusRsdQVPb",
"name" : "Radiohead",
"type" : "artist",
"uri" : "spotify:artist:4Z8W4fKeB5YxbusRsdQVPb"
} ]
我可以访问相同级别的信息,但无法获取 JSON 对象的子级别。
如果我理解正确你可以尝试不使用列表结构,像这样编辑它
data = {
"tracks": {
"href": "https://api.spotify.com/v1/search?query=karma+police&offset=0&limit=20&type=track&market=BR",
"items": {
"album": {
"album_type": "album",
"available_markets": ["AD", "AR", "AT", "AU", "BE", "BG", "BO", "BR", "CA", "CH", "CL", "CO", "CR",
"CY", "CZ", "DE", "DK", "DO", "EC", "EE", "ES", "FI", "FR", "GB", "GR", "GT",
"HK", "HN", "HU", "ID", "IE", "IS", "IT", "JP", "LI", "LT", "LU", "LV", "MC",
"MT", "MX", "MY", "NI", "NL", "NO", "NZ", "PA", "PE", "PH", "PL", "PT", "PY",
"SE", "SG", "SK", "SV", "TR", "TW", "US", "UY"],
"external_urls": {
"spotify": "https://open.spotify.com/album/7dxKtc08dYeRVHt3p9CZJn"
},
"href": "https://api.spotify.com/v1/albums/7dxKtc08dYeRVHt3p9CZJn",
"id": "7dxKtc08dYeRVHt3p9CZJn",
"images": [{
"height": 640,
"url": "https://i.scdn.co/image/f89c1ecdd0cc5a23d5ad7303d4ae231d197dde98",
"width": 640
}, {
"height": 300,
"url": "https://i.scdn.co/image/1b898f0b8e3ce499d0fc629a1918c144d982e475",
"width": 300
}, {
"height": 64,
"url": "https://i.scdn.co/image/faf295a70a6531826a8c25d33aad7d2cd9c75c7a",
"width": 64
}],
"name": "OK Computer",
"type": "album",
"uri": "spotify:album:7dxKtc08dYeRVHt3p9CZJn"
},
"artists": {
"external_urls": {
"spotify": "https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb"
},
"href": "https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb",
"id": "4Z8W4fKeB5YxbusRsdQVPb",
"name": "Radiohead",
"type": "artist",
"uri": "spotify:artist:4Z8W4fKeB5YxbusRsdQVPb"
}}}}
那么这个例子会给你“uri”
print(data["tracks"]["items"]["artists"]["uri"])
如果它不是 json 你可以修复,你应该这样做,因为它包含列表
print (data["tracks"]["items"][0]["artists"][0]["uri"])
如果不止一个,可以循环获取所有数据
嵌套 json 有时会令人困惑且难以访问,但 pandas 只需几步即可轻松处理。
根据我认为您正在使用的内容,我决定在此处使用 spotify tracks API。数据样本在这个post.
的底部TL;DR: 使用 json_normalize()
:
# get access to tracks and put it in a nice variable
# use json_normalize to flatten it into a nice df
# rename columns
# normalize the 'artists' column in the df that contains nested json
# rename columns
# concatenate the original df and the artists df
# you can remove the original 'artists' / 'track_artists' field as it is no longer
# necessary, the values have been flattened out into their own columns.
在查看了您的数据结构和您想要完成的任务后,我认为这是 json_normalize() 的工作!
使用 pandas 的 json_normalize()
会稍微压平您的嵌套 json,并让您完成大部分工作。
然而,棘手的是,对于每个轨道,'artists'
键中的值包含一个包含结果字典的列表,而不是将一个值或另一个字典作为值,json_normalize()
轻松驾驭。
请注意,tracks['album'][0][artists]
保存 key:value
对,这些键的名称与 'album' 字典中的名称相同。
看起来你想要的一切都将在轨道内,所以让我们创建一个变量以便于访问:
tracks = data['tracks]
json_normalize() 救援:
df = pd.json_normalize(tracks)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 28 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 artists 3 non-null object
1 available_markets 3 non-null object
2 disc_number 3 non-null int64
3 duration_ms 3 non-null int64
4 explicit 3 non-null bool
5 href 3 non-null object
6 id 3 non-null object
7 is_local 3 non-null bool
8 name 3 non-null object
9 popularity 3 non-null int64
10 preview_url 3 non-null object
11 track_number 3 non-null int64
12 type 3 non-null object
13 uri 3 non-null object
14 album.album_type 3 non-null object
15 album.artists 3 non-null object
16 album.available_markets 3 non-null object
17 album.external_urls.spotify 3 non-null object
18 album.href 3 non-null object
19 album.id 3 non-null object
20 album.images 3 non-null object
21 album.name 3 non-null object
22 album.release_date 3 non-null object
23 album.release_date_precision 3 non-null object
24 album.type 3 non-null object
25 album.uri 3 non-null object
26 external_ids.isrc 3 non-null object
27 external_urls.spotify 3 non-null object
如果此时检查您的数据框,您会看到大多数值都整齐地平放在它们自己的行中,但由于我们还需要艺术家信息,因此我们必须更进一步。
# change column names because you know there are dupes
# this will create a properly formatted dictionary for renaming columns
keys = {k:f'track_{k}' for k in df.keys()[:14]}
# rename columns
df = df.rename(columns=lambda x: keys.pop(x) if x in keys.keys() else x)
您的 df 现在将在其键前面加上 'track_',这样您就知道它们是主轨道字典的一部分。
'artists' 值仍然不平坦,所以让我们将它们拉平。一个特例,因为每个字典都在一个列表中。
# normalize artist column and cat the resulting columns into a dataframe
# we use a list comprehension to get to the dict to use for json_normalize()
df_artists = pd.concat([pd.DataFrame(pd.json_normalize(y)) for x in df.track_artists for y in x], ignore_index=True)
# make a dict of new column names prepended with 'artist_' so we know it came from the 'artist' nested dict
kys = {k:f'artist_{k}' for k in df_artists.keys()}
# rename the columns
df_artists = df_artists.rename(columns=lambda x: kys.pop(x) if x in kys.keys() else x)
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 artist_href 3 non-null object
1 artist_id 3 non-null object
2 artist_name 3 non-null object
3 artist_type 3 non-null object
4 artist_uri 3 non-null object
5 artist_external_urls.spotify 3 non-null object
我们对列进行了重命名,以便我们知道它们是否与曲目或艺术家相关,因此我们没有任何重复的名称冲突,这些冲突会使数据以后更难查找和排序。
现在我们将所有内容放在一个数据框中:
flat_df = pd.concat([df, df_artists], axis=1)
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 track_artists 3 non-null object
1 track_available_markets 3 non-null object
2 track_disc_number 3 non-null int64
3 track_duration_ms 3 non-null int64
4 track_explicit 3 non-null bool
5 track_href 3 non-null object
6 track_id 3 non-null object
7 track_is_local 3 non-null bool
8 track_name 3 non-null object
9 track_popularity 3 non-null int64
10 track_preview_url 3 non-null object
11 track_track_number 3 non-null int64
12 track_type 3 non-null object
13 track_uri 3 non-null object
14 album.album_type 3 non-null object
15 album.artists 3 non-null object
16 album.available_markets 3 non-null object
17 album.external_urls.spotify 3 non-null object
18 album.href 3 non-null object
19 album.id 3 non-null object
20 album.images 3 non-null object
21 album.name 3 non-null object
22 album.release_date 3 non-null object
23 album.release_date_precision 3 non-null object
24 album.type 3 non-null object
25 album.uri 3 non-null object
26 external_ids.isrc 3 non-null object
27 external_urls.spotify 3 non-null object
28 artist_href 3 non-null object
29 artist_id 3 non-null object
30 artist_name 3 non-null object
31 artist_type 3 non-null object
32 artist_uri 3 non-null object
33 artist_external_urls.spotify 3 non-null object
所以如果最后你想要的只是一个只有你提到的 3 列的数据框,它会是这样的:
final_df = flat_df[['artist_name', 'track_popularity', 'artist_uri']]
下面是数据对象:
data = {
"tracks": [
{
"album": {
"album_type": "single",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/0tGPJ0bkWOUmH7MEOR77qc"
},
"href": "https://api.spotify.com/v1/albums/0tGPJ0bkWOUmH7MEOR77qc",
"id": "0tGPJ0bkWOUmH7MEOR77qc",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/966ade7a8c43b72faa53822b74a899c675aaafee",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/107819f5dc557d5d0a4b216781c6ec1b2f3c5ab2",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/5a73a056d0af707b4119a883d87285feda543fbb",
"width": 64
}
],
"name": "Cut To The Feeling",
"release_date": "2017-05-26",
"release_date_precision": "day",
"type": "album",
"uri": "spotify:album:0tGPJ0bkWOUmH7MEOR77qc"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"disc_number": 1,
"duration_ms": 207959,
"explicit": False,
"external_ids": {
"isrc": "USUM71703861"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/11dFghVXANMlKmJXsNCbNl"
},
"href": "https://api.spotify.com/v1/tracks/11dFghVXANMlKmJXsNCbNl",
"id": "11dFghVXANMlKmJXsNCbNl",
"is_local": False,
"name": "Cut To The Feeling",
"popularity": 63,
"preview_url": "https://p.scdn.co/mp3-preview/3eb16018c2a700240e9dfb8817b6f2d041f15eb1?cid=774b29d4f13844c495f206cafdad9c86",
"track_number": 1,
"type": "track",
"uri": "spotify:track:11dFghVXANMlKmJXsNCbNl"
},
{
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/6SSSF9Y6MiPdQoxqBptrR2"
},
"href": "https://api.spotify.com/v1/albums/6SSSF9Y6MiPdQoxqBptrR2",
"id": "6SSSF9Y6MiPdQoxqBptrR2",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/2fb20bf4c1fb29b503bfc21516ff4b1a334b6372",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/a7b076ed5aa0746a21bc71ab7d2b6ed80dd3ebfe",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/b1d4c7643cf17c06b967b50623d7d93725b31de5",
"width": 64
}
],
"name": "Kiss",
"release_date": "2012-01-01",
"release_date_precision": "day",
"type": "album",
"uri": "spotify:album:6SSSF9Y6MiPdQoxqBptrR2"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"disc_number": 1,
"duration_ms": 193400,
"explicit": False,
"external_ids": {
"isrc": "CAB391100615"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/20I6sIOMTCkB6w7ryavxtO"
},
"href": "https://api.spotify.com/v1/tracks/20I6sIOMTCkB6w7ryavxtO",
"id": "20I6sIOMTCkB6w7ryavxtO",
"is_local": False,
"name": "Call Me Maybe",
"popularity": 74,
"preview_url": "https://p.scdn.co/mp3-preview/335bede49342352cddd53cc83af582e2240303bb?cid=774b29d4f13844c495f206cafdad9c86",
"track_number": 3,
"type": "track",
"uri": "spotify:track:20I6sIOMTCkB6w7ryavxtO"
},
{
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"external_urls": {
"spotify": "https://open.spotify.com/album/1DFixLWuPkv3KT3TnV35m3"
},
"href": "https://api.spotify.com/v1/albums/1DFixLWuPkv3KT3TnV35m3",
"id": "1DFixLWuPkv3KT3TnV35m3",
"images": [
{
"height": 640,
"url": "https://i.scdn.co/image/3f65c5400c7f24541bfd48e60f646e6af4d6c666",
"width": 640
},
{
"height": 300,
"url": "https://i.scdn.co/image/ff347680d9e62ccc144926377d4769b02a1024dc",
"width": 300
},
{
"height": 64,
"url": "https://i.scdn.co/image/c836e14a8ceca89e18012cab295f58ceeba72594",
"width": 64
}
],
"name": "Emotion (Deluxe)",
"release_date": "2015-09-18",
"release_date_precision": "day",
"type": "album",
"uri": "spotify:album:1DFixLWuPkv3KT3TnV35m3"
},
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/6sFIWsNpZYqfjUpaCgueju"
},
"href": "https://api.spotify.com/v1/artists/6sFIWsNpZYqfjUpaCgueju",
"id": "6sFIWsNpZYqfjUpaCgueju",
"name": "Carly Rae Jepsen",
"type": "artist",
"uri": "spotify:artist:6sFIWsNpZYqfjUpaCgueju"
}
],
"available_markets": [
"AD",
"AR",
"AT",
"AU",
"BE",
"BG",
"BO",
"BR",
"CA",
"CH",
"CL",
"CO",
"CR",
"CY",
"CZ",
"DE",
"DK",
"DO",
"EC",
"EE",
"ES",
"FI",
"FR",
"GB",
"GR",
"GT",
"HK",
"HN",
"HU",
"ID",
"IE",
"IL",
"IS",
"IT",
"JP",
"LI",
"LT",
"LU",
"LV",
"MC",
"MT",
"MX",
"MY",
"NI",
"NL",
"NO",
"NZ",
"PA",
"PE",
"PH",
"PL",
"PT",
"PY",
"RO",
"SE",
"SG",
"SK",
"SV",
"TH",
"TR",
"TW",
"US",
"UY",
"VN",
"ZA"
],
"disc_number": 1,
"duration_ms": 251319,
"explicit": False,
"external_ids": {
"isrc": "USUM71507009"
},
"external_urls": {
"spotify": "https://open.spotify.com/track/7xGfFoTpQ2E7fRF5lN10tr"
},
"href": "https://api.spotify.com/v1/tracks/7xGfFoTpQ2E7fRF5lN10tr",
"id": "7xGfFoTpQ2E7fRF5lN10tr",
"is_local": False,
"name": "Run Away With Me",
"popularity": 50,
"preview_url": "https://p.scdn.co/mp3-preview/3e05f5ed147ca075c7ae77c01f2cc0e14cfad78d?cid=774b29d4f13844c495f206cafdad9c86",
"track_number": 1,
"type": "track",
"uri": "spotify:track:7xGfFoTpQ2E7fRF5lN10tr"
}
]
}