如何对 Pandas Dataframe 的 YAML 进行反规范化?
How to denormalize YAML for Pandas Dataframe?
我正在尝试将 YAML 文件中的数据导入 Pandas DataFrame。举个例子data.yml
:
---
- doc: "Book1"
reviews:
- reviewer: "Paul"
stars: "5"
- reviewer: "Sam"
stars: "2"
- doc: "Book2"
reviews:
- reviewer: "John"
stars: "4"
- reviewer: "Sam"
stars: "3"
- reviewer: "Pete"
stars: "2"
...
所需的 DataFrame 如下所示:
doc reviews.reviewer reviews.stars
0 Book1 Paul 5
1 Book1 Sam 2
2 Book2 John 4
3 Book2 Sam 3
4 Book2 Pete 2
我试过将 YAML 数据提供给 Pandas 不同的方式(比如 with open('data.yml') as f: data = pd.DataFrame(yaml.load(f))
),但是单元格总是包含嵌套的字典。这个 ,但它的代码有点多,似乎可能存在更简单的 YAML 解决方案。
是否有内置的或 Pythonic 的方法来对 YAML 进行非规范化,以便以这种方式转换为 Pandas Dataframe?
您应该在 YAML 加载后使用 json_normalize
来展平字典:
pd.io.json.json_normalize(yaml.load(f), 'reviews', 'doc')
reviewer stars doc
0 Paul 5 Book1
1 Sam 2 Book1
2 John 4 Book2
3 Sam 3 Book2
4 Pete 2 Book2
现在使用上面的方法会导致
FutureWarning:pandas.io.json.json_normalize 已弃用,请改用 pandas.json_normalize
# lets say the yaml file is test_sample.yml
from pandas import json_normalize
from os import getcwd, path
from yaml import SafeLoader, load
path_to_yaml = path.join(getcwd(), ..., "test_sample.yaml")
with open(path_to_yaml) as yaml_file:
yaml_contents = load(path_to_file, Loader=SafeLoader)
yaml_df = json_normalize(yaml_contents)
我正在尝试将 YAML 文件中的数据导入 Pandas DataFrame。举个例子data.yml
:
---
- doc: "Book1"
reviews:
- reviewer: "Paul"
stars: "5"
- reviewer: "Sam"
stars: "2"
- doc: "Book2"
reviews:
- reviewer: "John"
stars: "4"
- reviewer: "Sam"
stars: "3"
- reviewer: "Pete"
stars: "2"
...
所需的 DataFrame 如下所示:
doc reviews.reviewer reviews.stars
0 Book1 Paul 5
1 Book1 Sam 2
2 Book2 John 4
3 Book2 Sam 3
4 Book2 Pete 2
我试过将 YAML 数据提供给 Pandas 不同的方式(比如 with open('data.yml') as f: data = pd.DataFrame(yaml.load(f))
),但是单元格总是包含嵌套的字典。这个
是否有内置的或 Pythonic 的方法来对 YAML 进行非规范化,以便以这种方式转换为 Pandas Dataframe?
您应该在 YAML 加载后使用 json_normalize
来展平字典:
pd.io.json.json_normalize(yaml.load(f), 'reviews', 'doc')
reviewer stars doc
0 Paul 5 Book1
1 Sam 2 Book1
2 John 4 Book2
3 Sam 3 Book2
4 Pete 2 Book2
现在使用上面的方法会导致 FutureWarning:pandas.io.json.json_normalize 已弃用,请改用 pandas.json_normalize
# lets say the yaml file is test_sample.yml
from pandas import json_normalize
from os import getcwd, path
from yaml import SafeLoader, load
path_to_yaml = path.join(getcwd(), ..., "test_sample.yaml")
with open(path_to_yaml) as yaml_file:
yaml_contents = load(path_to_file, Loader=SafeLoader)
yaml_df = json_normalize(yaml_contents)