使用 python 将文件存储在不同的子目录中

Store files in different subdirectories using python

我有一个 pandas 数据框 df 如下所示:

import pandas as pd
d = {'user': ['Peter', 'Peter', 'Peter', 'Peter', 'David', 'David', 'David', 'Emma', 'Joyce', 'Joyce', 'Joyce'], 'date': ['2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04', '2019-03-04'], 'lat': [37.749798119650926, 37.751028173710736, 37.751698332490214, 37.75180012822952, 38.122893081890844, 38.124108467926035, 38.12743379574882, 37.89363489791644, 37.53620628385582, 37.53804390907164, 37.54044296272588], 'lon': [-122.49230146408082, -122.49229073524474, -122.49170064926147, -122.48974800109862, -122.24205136299133, -122.23907947540283, -122.23867177963257, -122.07653760910033, -121.99707984924316, -121.99315309524536, -121.9914150238037]}
df = pd.DataFrame(data=d)
df

user    date        lat         lon
Peter   2019-03-04  37.749798   -122.492301
Peter   2019-03-04  37.751028   -122.492291
Peter   2019-03-04  37.751698   -122.491701
Peter   2019-03-04  37.751800   -122.489748
David   2019-03-04  38.122893   -122.242051
David   2019-03-04  38.124108   -122.239079
David   2019-03-04  38.127434   -122.238672
Emma    2019-03-04  37.893635   -122.076538
Joyce   2019-03-04  37.536206   -121.997080
Joyce   2019-03-04  37.538044   -121.993153
Joyce   2019-03-04  37.540443   -121.991415

使用下面的代码,我能够创建四个单独的 folium 地图,每个用户分组,在地图上显示坐标。四个地图文件以用户命名:Peter.htmlDavid.htmlEmma.htmlJoyce.html.

import folium

users = list(df.user.unique())

def create_user_map(user):
    m = folium.Map(location=[37.733795, -122.446747], 
           zoom_start=9, 
           min_zoom=10, 
           max_zoom=19,
           control_scale=True)
 
    df_user = df[df.user==user]
    for row in df_user.itertuples():
         folium.CircleMarker( location=[row.lat, row.lon],
            radius=4,
            fill=True,
           fill_opacity=0.5).add_to(m)
    return m

for user in users:
    user_map = create_user_map(user)
    user_file = f"{user}.html"
    user_map.save(user_file)

现在我想按照下面的文件夹结构自动将这些文件存放在相应的子目录中。我怎样才能扩展上面的循环来实现这个?

Report/
└── Report_per_date/ 
    ├── 2019-03-01/
    ├── 2019-03-02/
    ├── 2019-03-03/
    └── 2019-03-04/
         └── Users/
             └── Peter/
                 └── Peter.html
             └── David/
                 └── David.html
             └── Emma/
                 └── Emma.html
             └── Joyce/
                 └── Joyce.html

我希望我能实现如下:

import os
rootdir = pathlib.Path('./Report')
report_per_date = df.apply(lambda x: rootdir / 'Report_per_date' / x['date'] / 'Users' / x['user'] / f"{x['user']}.html", axis='columns')

for mapfile, data in df.groupby(report_per_date):
    mapfile.parent.mkdir(parents=True, exist_ok=True)
    user_map = create_user_map(user)
    user_map.save(mapfile)

不幸的是,这返回了一个 AttributeError: 'PosixPath' object has no attribute 'write'

我在我的机器上尝试了代码,遇到了类似的问题,但 AttributeError: 'WindowsPath' object has no attribute 'write'

Folium 似乎使用模块 branca 来保存到 HTML,而且 branca 似乎无法写入路径对象。我可以通过将路径转换为字符串来修复它:

user_map.save(str(mapfile))

这解决了我机器上的问题,并且因为您之前使用字符串时 user_map.save() 可以工作,所以这也可能对您有用。

此外,在下面的代码片段中,您正在对变量 user 调用 create_user_map。

for mapfile, data in df.groupby(report_per_date):
    mapfile.parent.mkdir(parents=True, exist_ok=True)
    user_map = create_user_map(user)
    user_map.save(mapfile)

但是在这个例子中,这将永远是乔伊斯,因为这是上面 for user in users: 部分遗留下来的。一个想法是像这样从循环中的路径获取用户:

for mapfile, data in df.groupby(report_per_date):
    mapfile.parent.mkdir(parents=True, exist_ok=True)
    user_map = create_user_map(mapfile.stem)
    user_map.save(str(mapfile))

这是有效的,因为 .stem returns 没有扩展名的文件名。

您可以更改循环以包含日期,还可以在 df 中创建一个 'path' 列。

df['path'] = df.apply(lambda x: rootdir / 'Report_per_date' / x['date'] / 'Users' / x['user'] / f"{x['user']}.html",
                      axis='columns')

dates = list(df.date.unique())


def create_user_map(df_user):
    m = folium.Map(location=[37.733795, -122.446747], 
               zoom_start=9, 
               min_zoom=10, 
               max_zoom=19,
               control_scale=True)
 
    for row in df_user.itertuples():
          folium.CircleMarker( location=[row.lat, row.lon],
                radius=4,
                fill=True,
                fill_opacity=0.5).add_to(m )
     
    return m

for user in users:
    for date in dates:
        data = df[(df.user==user)&(df.date==date)]
        path = data.iloc[0]['path']        
        path.parent.mkdir(parents=True, exist_ok=True)            
        user_map = create_user_map(data)
        user_map.save(str(path))