我正在尝试解析视觉基因组区域描述中的短语 json 将数据拆分为训练和测试集

Question

from sklearn.model_selection import train_test_split
    
parser = argparse.ArgumentParser(description='Splits visual_genome file into training and test sets.')
parser.add_argument('phrase', metavar='phrase', type=str, help='Path to visual_genome annotations file.')
parser.add_argument('train', type=str, help='Where to store visual_genome training annotations')
parser.add_argument('test', type=str, help='Where to store visual_genome test annotations')
parser.add_argument('-s', dest='split', type=float, required=True, help="A percentage of a split; a number in (0, 1)")
args = parser.parse_args()
    
def save_visual_genome(file, id, x, y,width, height,phrase,images):
    with open(file, 'wt', encoding='UTF-8') as vg:
        json.dump({ 'id': id, 'x': x, 'y': y, 'width': width, 'height': height,'phrase': phrase,'image': image}, vg, indent=2, sort_keys=True)
    
def main(args):
    with open(args.phrase, 'rt', encoding='UTF-8') as phrase:
        vg = json.load(phrase)
        id = vg['id']
        x = vg['x']
        y = vg['y']
        width = vg['width']
        height = vg['height']
        phrase = vg['phrase']
        image = vg['image']
    
        a,b = train_test_split(phrase, train_size=args.split)
    
        save_visual_genome(args.train, id, x, y, width, height,a, image)
        save_visual_genome(args.test, id, x, y, width, height, b, image)
    
        print("Saved {} entries in {} and {} in {}".format(len(a), args.train, len(b), args.test))
    
if __name__ == "__main__":
    main(args)

我收到这个错误：

id = vg['id']    # this line throws the error
TypeError: list indices must be integers or slices, not str

这可能是什么问题？是否有更有效或更简单的方法来执行相同的任务？

Answer 1

尝试替换

vg = json.load(phrase)

与

vg = json.loads(phrase.read())

还有

方法 train_test_split 将一个 numpy 数组作为输入。检查 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
两次指定同一个变量名时要小心（即短语）

我正在尝试解析视觉基因组区域描述中的短语 json 将数据拆分为训练和测试集

I'm trying to parse the phrases from the visual genome regions descriptions json split the data into training and testing set

argparse

python-3.x