根据各自的文件扩展名对文件路径进行排序
Sort filepaths according to their respective file extensions
我正在尝试根据各自的文件扩展名对文件路径进行排序。
我想要这样的输出:
FileType
FilePath
.h
a/b/c/d/xyz.h
.h
a/b/c/d/xyz1.h
.class
a/b/c/d/xyz.class
.class
a/b/c/d/xyz1.class
.jar
a/b/c/d/xyz.jar
.jar
a/b/c/d/xyz1.jar
但是我现在的输出是这样的:
output in excel
下面是我的代码:
import pandas as pd
import glob
path = "The path goes here"
yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]]
print(type(yes)) #File type is list
df = pd.DataFrame(yes)
df = df.transpose()
df.columns = [".h", ".class",".jar"]
print (df)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='filepath', index=False)
writer.save()
谁能帮我解决这个问题。
提前致谢!
我认为这应该可行:
import os
import pandas as pd
import glob
source = "./*"
paths = glob.glob(path, recursive=True)
# I'll hard code some to demonstrate:
paths = [
"a/b/c/d/xyz.h",
"a/b/c/d/xyz1.h",
"a/b/c/d/xyz.class",
"a/b/c/d/xyz1.class",
"a/b/c/d/xyz.jar",
"a/b/c/d/xyz1.jar",
]
df = pd.DataFrame(paths, columns=["FilePath"])
df["FileType"] = df.FilePath.apply(lambda x : os.path.splitext(x)[-1])
df = df.sort_values(["FileType", "FilePath"]).reset_index(drop=True)
输出:
FilePath FileType
0 a/b/c/d/xyz.class .class
1 a/b/c/d/xyz1.class .class
2 a/b/c/d/xyz.h .h
3 a/b/c/d/xyz1.h .h
4 a/b/c/d/xyz.jar .jar
5 a/b/c/d/xyz1.jar .jar
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html
您可以在现有代码中再添加一个转置
import pandas as pd
import glob
path = "The path goes here"
yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]]
print(type(yes)) #File type is list
df = pd.DataFrame(yes)
df = df.transpose()
df.columns = [".h", ".class",".jar"]
df = df.transpose() #<-one more transpose
print (df)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='filepath', index=False)
writer.save()
请试试这个代码:
import os
import pathlib
import pandas as pd
path = 'C:/'
full_file_paths = []
file_suffix = []
for (root,dirs,files) in os.walk(path):
for f in files:
file_suffix.append(pathlib.PurePosixPath(f).suffix)
full_file_paths.append(path+f)
file_suffix = set(file_suffix)
processed_files = dict()
for fs in file_suffix:
processed_files[fs]=[]
for f in full_file_paths:
if f.find(fs) > 0:
processed_files[fs].append(f)
print ('--------------------------------')
print(fs)
print(processed_files[fs])
我正在尝试根据各自的文件扩展名对文件路径进行排序。
我想要这样的输出:
FileType | FilePath |
---|---|
.h | a/b/c/d/xyz.h |
.h | a/b/c/d/xyz1.h |
.class | a/b/c/d/xyz.class |
.class | a/b/c/d/xyz1.class |
.jar | a/b/c/d/xyz.jar |
.jar | a/b/c/d/xyz1.jar |
但是我现在的输出是这样的: output in excel
下面是我的代码:
import pandas as pd
import glob
path = "The path goes here"
yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]]
print(type(yes)) #File type is list
df = pd.DataFrame(yes)
df = df.transpose()
df.columns = [".h", ".class",".jar"]
print (df)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='filepath', index=False)
writer.save()
谁能帮我解决这个问题。 提前致谢!
我认为这应该可行:
import os
import pandas as pd
import glob
source = "./*"
paths = glob.glob(path, recursive=True)
# I'll hard code some to demonstrate:
paths = [
"a/b/c/d/xyz.h",
"a/b/c/d/xyz1.h",
"a/b/c/d/xyz.class",
"a/b/c/d/xyz1.class",
"a/b/c/d/xyz.jar",
"a/b/c/d/xyz1.jar",
]
df = pd.DataFrame(paths, columns=["FilePath"])
df["FileType"] = df.FilePath.apply(lambda x : os.path.splitext(x)[-1])
df = df.sort_values(["FileType", "FilePath"]).reset_index(drop=True)
输出:
FilePath FileType
0 a/b/c/d/xyz.class .class
1 a/b/c/d/xyz1.class .class
2 a/b/c/d/xyz.h .h
3 a/b/c/d/xyz1.h .h
4 a/b/c/d/xyz.jar .jar
5 a/b/c/d/xyz1.jar .jar
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transpose.html
您可以在现有代码中再添加一个转置
import pandas as pd
import glob
path = "The path goes here"
yes = [glob.glob(path+e,recursive = True) for e in ["/**/*.h","/**/*.class","/**/*..jar"]]
print(type(yes)) #File type is list
df = pd.DataFrame(yes)
df = df.transpose()
df.columns = [".h", ".class",".jar"]
df = df.transpose() #<-one more transpose
print (df)
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='filepath', index=False)
writer.save()
请试试这个代码:
import os
import pathlib
import pandas as pd
path = 'C:/'
full_file_paths = []
file_suffix = []
for (root,dirs,files) in os.walk(path):
for f in files:
file_suffix.append(pathlib.PurePosixPath(f).suffix)
full_file_paths.append(path+f)
file_suffix = set(file_suffix)
processed_files = dict()
for fs in file_suffix:
processed_files[fs]=[]
for f in full_file_paths:
if f.find(fs) > 0:
processed_files[fs].append(f)
print ('--------------------------------')
print(fs)
print(processed_files[fs])