有什么方法可以将 JSON 文件与 python 中的数据集(图像)相匹配
Is there any way to match the JSON file with the dataset (images) in python
我正在研究机器学习(图像分类)
我找到了一个包含两个文件的数据集:
- 图像(20,000 张图像)“图像”图像编号从 1 到 20,000(未 class 化为 classes)
- 一个 JSON 文件,其中包含图像的信息和 class 化(12 classes 图像)
JSON 文件的结构如下:
{
"<image_number>": {
"image_filepath": "images/<image_number>.jpg",
"anomaly_class": "<class_name>"
},
...
}
所以我正在尝试读取 JSON 文件并拆分数据集,以便我可以单独处理每个 class..
然后取“eachclass”的80%作为训练集,20%作为测试集
我试图找到一种方法将 JSON 文件与数据集(图像)相匹配,因此我可以 class 验证各个文件夹中的 classes 然后将它们分成训练和测试集
有人可以帮我吗?
谢谢你
类似下面的内容会为每个 类 创建文件夹,然后将图像移入其中。
import json
import os
from os import path
# Open the json file containing the classifications
with open("clasification.json", "r") as f:
classification = json.load(f)
# Create a set which contains all the classes
classes = set([i["anomaly_class"] for i in classification.values()])
# For each of the classes make a folder to contain them
for c in classes:
os.makedirs(c)
# For each image entry in the json move the image to the folder named it's class
for image_number, image_data in classification.items():
os.rename(image_data["image_filepath"], path.join(image_data["anomaly_class"], "{}.jpg".format(image_number)))
像这样的东西应该可以工作:
import json
from pathlib import Path
currDir = Path(__file__).resolve().parent
# Path where the images will be moved to
imagesDir = currDir / 'images'
testingDir = imagesDir / 'testing'
trainingDir = imagesDir / 'training'
# Load data
infoPerImage = {}
# This has to be the path to the file containing the data
# I assumed it is in the current directory
infoFilePath = currDir / 'data.json'
with infoFilePath.open() as f:
infoPerImage = json.loads(f.read())
# Separate into classes
infoPerClass = {}
for imageNumber, imageInfo in infoPerImage.items():
imageClass = imageInfo['anomaly_class']
imagePath = imageInfo['image_filepath']
currentClassImages = infoPerClass.setdefault(imageClass, [])
currentClassImages.append(imagePath)
# Create directories for the classes
for imageClass in infoPerClass:
pathToImageClassTraining = trainingDir / imageClass
pathToImageClassTraining.mkdir(parents=True)
pathToImageClassTesting = testingDir / imageClass
pathToImageClassTesting.mkdir(parents=True)
# Separate into training and testing images
trainingImages = {}
testingImages = {}
for imageClass, imagePaths in infoPerClass.items():
lenImagePaths = len(imagePaths)
upperLimit = int(lenImagePaths * 0.8)
trainingImages[imageClass] = imagePaths[:upperLimit]
testingImages[imageClass] = imagePaths[upperLimit:]
def moveImagesToTheirDir(imagesDict, imagesBasePath):
for imageClass, imagePaths in imagesDict.items():
for imagePath in imagePaths:
imageSrc = Path(imagePath)
imageDest = imagesBasePath / imageClass / imageSrc.name
imageSrc.rename(imageDest)
moveImagesToTheirDir(trainingImages, trainingDir)
moveImagesToTheirDir(testingImages, testingDir)
我正在研究机器学习(图像分类) 我找到了一个包含两个文件的数据集:
- 图像(20,000 张图像)“图像”图像编号从 1 到 20,000(未 class 化为 classes)
- 一个 JSON 文件,其中包含图像的信息和 class 化(12 classes 图像) JSON 文件的结构如下:
{
"<image_number>": {
"image_filepath": "images/<image_number>.jpg",
"anomaly_class": "<class_name>"
},
...
}
所以我正在尝试读取 JSON 文件并拆分数据集,以便我可以单独处理每个 class.. 然后取“eachclass”的80%作为训练集,20%作为测试集
我试图找到一种方法将 JSON 文件与数据集(图像)相匹配,因此我可以 class 验证各个文件夹中的 classes 然后将它们分成训练和测试集
有人可以帮我吗?
谢谢你
类似下面的内容会为每个 类 创建文件夹,然后将图像移入其中。
import json
import os
from os import path
# Open the json file containing the classifications
with open("clasification.json", "r") as f:
classification = json.load(f)
# Create a set which contains all the classes
classes = set([i["anomaly_class"] for i in classification.values()])
# For each of the classes make a folder to contain them
for c in classes:
os.makedirs(c)
# For each image entry in the json move the image to the folder named it's class
for image_number, image_data in classification.items():
os.rename(image_data["image_filepath"], path.join(image_data["anomaly_class"], "{}.jpg".format(image_number)))
像这样的东西应该可以工作:
import json
from pathlib import Path
currDir = Path(__file__).resolve().parent
# Path where the images will be moved to
imagesDir = currDir / 'images'
testingDir = imagesDir / 'testing'
trainingDir = imagesDir / 'training'
# Load data
infoPerImage = {}
# This has to be the path to the file containing the data
# I assumed it is in the current directory
infoFilePath = currDir / 'data.json'
with infoFilePath.open() as f:
infoPerImage = json.loads(f.read())
# Separate into classes
infoPerClass = {}
for imageNumber, imageInfo in infoPerImage.items():
imageClass = imageInfo['anomaly_class']
imagePath = imageInfo['image_filepath']
currentClassImages = infoPerClass.setdefault(imageClass, [])
currentClassImages.append(imagePath)
# Create directories for the classes
for imageClass in infoPerClass:
pathToImageClassTraining = trainingDir / imageClass
pathToImageClassTraining.mkdir(parents=True)
pathToImageClassTesting = testingDir / imageClass
pathToImageClassTesting.mkdir(parents=True)
# Separate into training and testing images
trainingImages = {}
testingImages = {}
for imageClass, imagePaths in infoPerClass.items():
lenImagePaths = len(imagePaths)
upperLimit = int(lenImagePaths * 0.8)
trainingImages[imageClass] = imagePaths[:upperLimit]
testingImages[imageClass] = imagePaths[upperLimit:]
def moveImagesToTheirDir(imagesDict, imagesBasePath):
for imageClass, imagePaths in imagesDict.items():
for imagePath in imagePaths:
imageSrc = Path(imagePath)
imageDest = imagesBasePath / imageClass / imageSrc.name
imageSrc.rename(imageDest)
moveImagesToTheirDir(trainingImages, trainingDir)
moveImagesToTheirDir(testingImages, testingDir)