根据各自的文件扩展名对文件路径进行排序

Question

我正在尝试根据各自的文件扩展名对文件路径进行排序。

我想要这样的输出：

FileType	FilePath
.h	a/b/c/d/xyz.h
.h	a/b/c/d/xyz1.h
.class	a/b/c/d/xyz.class
.class	a/b/c/d/xyz1.class
.jar	a/b/c/d/xyz.jar
.jar	a/b/c/d/xyz1.jar

但是我现在的输出是这样的： output in excel

下面是我的代码：

import pandas as pd
import glob

path = "The path goes here"

yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]]

print(type(yes))  #File type is list
    
df = pd.DataFrame(yes)
df = df.transpose()
df.columns = [".h", ".class",".jar"]
print (df)

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='filepath', index=False)
writer.save()

谁能帮我解决这个问题。提前致谢！

Answer 1

我认为这应该可行：

import os
import pandas as pd
import glob

source = "./*"
paths = glob.glob(path, recursive=True)

# I'll hard code some to demonstrate:
paths = [
    "a/b/c/d/xyz.h",
    "a/b/c/d/xyz1.h",
    "a/b/c/d/xyz.class",
    "a/b/c/d/xyz1.class",
    "a/b/c/d/xyz.jar",
    "a/b/c/d/xyz1.jar",
]
df = pd.DataFrame(paths, columns=["FilePath"])

df["FileType"] = df.FilePath.apply(lambda x : os.path.splitext(x)[-1])    
df = df.sort_values(["FileType", "FilePath"]).reset_index(drop=True)

输出：

             FilePath FileType
0   a/b/c/d/xyz.class   .class
1  a/b/c/d/xyz1.class   .class
2       a/b/c/d/xyz.h       .h
3      a/b/c/d/xyz1.h       .h
4     a/b/c/d/xyz.jar     .jar
5    a/b/c/d/xyz1.jar     .jar

Answer 2

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html

您可以在现有代码中再添加一个转置

import pandas as pd
import glob

path = "The path goes here"

yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]]

print(type(yes))  #File type is list
    
df = pd.DataFrame(yes)
df = df.transpose()
df.columns = [".h", ".class",".jar"]
df = df.transpose() #<-one more transpose
print (df)

writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='filepath', index=False)
writer.save()

Answer 3

请试试这个代码：

import os
import pathlib
import pandas as pd

path = 'C:/'

full_file_paths = []
file_suffix = []
for (root,dirs,files) in os.walk(path): 
        for f in files:
            file_suffix.append(pathlib.PurePosixPath(f).suffix)
            full_file_paths.append(path+f)
        
file_suffix = set(file_suffix)
processed_files = dict()
for fs in file_suffix:
    processed_files[fs]=[]
    for f in full_file_paths:
        if f.find(fs) > 0:
            processed_files[fs].append(f)
    print ('--------------------------------') 
    print(fs)
    print(processed_files[fs])

根据各自的文件扩展名对文件路径进行排序

Sort filepaths according to their respective file extensions

python

dataframe

pandas

pandas.excelwriter