如何从 tfrecord 解码 vggish 音频集嵌入?
How to decode vggish audioset embeddings from tfrecord?
我正在尝试使用由 VGGish model for transfer learning on audio data. Using python vggish_inference_demo.py --wav_file ...
to encode my training data to a tfrecord worked fine, but now I want to use this as an input to another model (e.g. a neural network I create with keras or something else). Using some similar questions 和文档的预训练基础生成的 128 字节嵌入,我用一个文件的第一个嵌入记录走了这么远:
tfrecords_filename = 'example1.tfrecord'
record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)
string_record = next(record_iterator)
example = tf.train.SequenceExample()
example.ParseFromString(string_record)
print(example.feature_lists.feature_list['audio_embedding'].feature[0].bytes_list.value)
这会产生
[b'\x99\x07\xaa>\xd2_R_\x9f\xbbqN\x99\xa18V\xad\x7f\x93\xf0)\xdd4\x80~\xb0\xa4d\x8e\x85\xb6\x88\xa3?U\xa6Q[\x9b\x038\xff\x00EE>OJ\xa5\xb8\x828)\x97^\x8a\xaa\x12h\xff\xff\xc39\xce\x9b\x13\x80\x00j\xcaZ\xac\xff\xff\x0f\xac\x1c\x90&\xd2.b\xe2{\xc1\x15\xe9\xba\xed\xd4\xa9\xff\xdc\xb5\x99]!\x04\xca\xff\xa6;b\xe0\x19\xbfW\xebP!\xff\xc5\xff\x82\xff\x1a\xbe\xec-h\xff\x8d\xff\r\x96\x00\x00\xff']
我什至不确定这个 b'...'
是什么(有超过 64 个和少于 128 个 xs - 所以不确定这与任何东西有什么关系)。
也许我在这里遗漏了一些基本的 Python 知识,但是如何将其转换为数字数组,我可以将其用作其他模型的输入?
原来这些是可以转成16进制的字节,可以转成0到255之间的整数数组
import tensorflow as tf
import numpy as np
tfrecords_filename = 'example1.tfrecord'
record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)
string_record = next(record_iterator)
example = tf.train.SequenceExample()
example.ParseFromString(string_record)
hexembed = example.feature_lists.feature_list['audio_embedding'].feature[0].bytes_list.value[0].hex()
arrayembed = [int(hexembed[i:i+2],16) for i in range(0,len(hexembed),2)]
print(arrayembed)
这会以我想要的格式生成输出:
[153, 7, 170, 62, 210, 95, 82, 95, 159, 187, 113, 78, 153, 161, 56,
86, 173, 127, 147, 240, 41, 221, 52, 128, 126, 176, 164, 100, 142,
133, 182, 136, 163, 63, 85, 166, 81, 91, 155, 3, 56, 255, 0, 69, 69,
62, 79, 74, 165, 184, 130, 56, 41, 151, 94, 138, 170, 18, 104, 255,
255, 195, 57, 206, 155, 19, 128, 0, 106, 202, 90, 172, 255, 255, 15,
172, 28, 144, 38, 210, 46, 98, 226, 123, 193, 21, 233, 186, 237, 212,
169, 255, 220, 181, 153, 93, 33, 4, 202, 255, 166, 59, 98, 224, 25,
191, 87, 235, 80, 33, 255, 197, 255, 130, 255, 26, 190, 236, 45, 104,
255, 141, 255, 13, 150, 0, 0, 255]
我正在尝试使用由 VGGish model for transfer learning on audio data. Using python vggish_inference_demo.py --wav_file ...
to encode my training data to a tfrecord worked fine, but now I want to use this as an input to another model (e.g. a neural network I create with keras or something else). Using some similar questions 和文档的预训练基础生成的 128 字节嵌入,我用一个文件的第一个嵌入记录走了这么远:
tfrecords_filename = 'example1.tfrecord'
record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)
string_record = next(record_iterator)
example = tf.train.SequenceExample()
example.ParseFromString(string_record)
print(example.feature_lists.feature_list['audio_embedding'].feature[0].bytes_list.value)
这会产生
[b'\x99\x07\xaa>\xd2_R_\x9f\xbbqN\x99\xa18V\xad\x7f\x93\xf0)\xdd4\x80~\xb0\xa4d\x8e\x85\xb6\x88\xa3?U\xa6Q[\x9b\x038\xff\x00EE>OJ\xa5\xb8\x828)\x97^\x8a\xaa\x12h\xff\xff\xc39\xce\x9b\x13\x80\x00j\xcaZ\xac\xff\xff\x0f\xac\x1c\x90&\xd2.b\xe2{\xc1\x15\xe9\xba\xed\xd4\xa9\xff\xdc\xb5\x99]!\x04\xca\xff\xa6;b\xe0\x19\xbfW\xebP!\xff\xc5\xff\x82\xff\x1a\xbe\xec-h\xff\x8d\xff\r\x96\x00\x00\xff']
我什至不确定这个 b'...'
是什么(有超过 64 个和少于 128 个 xs - 所以不确定这与任何东西有什么关系)。
也许我在这里遗漏了一些基本的 Python 知识,但是如何将其转换为数字数组,我可以将其用作其他模型的输入?
原来这些是可以转成16进制的字节,可以转成0到255之间的整数数组
import tensorflow as tf
import numpy as np
tfrecords_filename = 'example1.tfrecord'
record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)
string_record = next(record_iterator)
example = tf.train.SequenceExample()
example.ParseFromString(string_record)
hexembed = example.feature_lists.feature_list['audio_embedding'].feature[0].bytes_list.value[0].hex()
arrayembed = [int(hexembed[i:i+2],16) for i in range(0,len(hexembed),2)]
print(arrayembed)
这会以我想要的格式生成输出:
[153, 7, 170, 62, 210, 95, 82, 95, 159, 187, 113, 78, 153, 161, 56, 86, 173, 127, 147, 240, 41, 221, 52, 128, 126, 176, 164, 100, 142, 133, 182, 136, 163, 63, 85, 166, 81, 91, 155, 3, 56, 255, 0, 69, 69, 62, 79, 74, 165, 184, 130, 56, 41, 151, 94, 138, 170, 18, 104, 255, 255, 195, 57, 206, 155, 19, 128, 0, 106, 202, 90, 172, 255, 255, 15, 172, 28, 144, 38, 210, 46, 98, 226, 123, 193, 21, 233, 186, 237, 212, 169, 255, 220, 181, 153, 93, 33, 4, 202, 255, 166, 59, 98, 224, 25, 191, 87, 235, 80, 33, 255, 197, 255, 130, 255, 26, 190, 236, 45, 104, 255, 141, 255, 13, 150, 0, 0, 255]