Pandas: 需要从 1 开始递增重复的文件名

Pandas: Need to increment duplicate file names starting at 1

我有一个包含文件名的列 - 许多重复项 - 需要从 001、002 等开始递增。前任。 filename_001.pdf、filename_002.pdf

df_files = pd.DataFrame([[1000, 'filename.pdf'], 
                         [1001, 'filename.pdf'], 
                         [1002, 'a_file.txt'],
                         [1003, 'a_file.txt'],
                         [1004, 'a_file.txt']],
                       columns=['ID', 'filename'])

我找到的所有方法都从 2 开始。

首先提取扩展名和文件名减去扩展名:

df_files['ext'] = [os.path.splitext(f)[-1] for f in df_files['filename']]
df_files['Filestub'] = [os.path.splitext(f)[0] for f in df_files['filename']]

以下方法将成功递增,但不会从 1 开始,也不会使用允许三位数重复的约定(例如 00X)。

df_files['NumberedCopy'] = df_files['filename'].where(~df_files['filename'].duplicated(), 
                                                           df_files['Filestub'] + "_"\
                                                           + df_files.groupby('Filestub').cumcount().add(1).astype(str) + df_files['ext'])

输出[错误]:

    ID          filename        Filestub    ext     NumberedCopy
0   1000        filename.pdf    filename    .pdf    filename.pdf
1   1001        filename.pdf    filename    .pdf    filename_2.pdf
2   1002        a_file.txt      a_file      .txt    a_file.txt
3   1003        a_file.txt      a_file      .txt    a_file_2.txt
4   1004        a_file.txt      a_file      .txt    a_file_3.txt

期望的输出:

    ID      filename        Filestub    ext     NumberedCopy
0   1000    filename.pdf    filename    .pdf    filename_001.pdf
1   1001    filename.pdf    filename    .pdf    filename_002.pdf
2   1002    a_file.txt      a_file      .txt    a_file_001.txt
3   1003    a_file.txt      a_file      .txt    a_file_002.txt
4   1004    a_file.txt      a_file      .txt    a_file_003.txt

尝试:

numbered = df_files["Filestub"] + "_" + df_files.groupby("Filestub").cumcount().add(1).astype(str).str.zfill(3) + df_files["ext"]

df["NumberedCopy"] = numbered.where(df_files["Filestub"].duplicated(keep=False), df_files["filename"])

>>> df_files
     ID      filename   ext  Filestub      NumberedCopy
0  1000  filename.pdf  .pdf  filename  filename_001.pdf
1  1001  filename.pdf  .pdf  filename  filename_002.pdf
2  1002    a_file.txt  .txt    a_file    a_file_001.txt
3  1003    a_file.txt  .txt    a_file    a_file_002.txt
4  1004    a_file.txt  .txt    a_file    a_file_003.txt