EchoNest API 的 getTimbre 向量是什么意思?

What is the meaning of the EchoNest API's getTimbre vector?

EchoNest Analyzer 文档 说明了以下关于音色的内容:

timbre is the quality of a musical note or sound that distinguishes different types of musical instruments, or voices. It is a complex notion also referred to as sound color, texture, or tone quality, and is derived from the shape of a segment’s spectro-temporal surface, independently of pitch and loudness. The Echo Nest Analyzer’s timbre feature is a vector that includes 12 unbounded values roughly centered around 0. Those values are high level abstractions of the spectral surface, ordered by degree of importance. For completeness however, the first dimension represents the average loudness of the segment; second emphasizes brightness; third is more closely correlated to the flatness of a sound; fourth to sounds with a stronger attack; etc. See an image below representing the 12 basis functions (i.e. template segments). The actual timbre of the segment is best described as a linear combination of these 12 basis functions weighted by the coefficient values: timbre = c1 x b1 + c2 x b2 + ... + c12 x b12, where c1 to c12 represent the 12 coefficients and b1 to b12 the 12 basis functions as displayed below. Timbre vectors are best used in comparison with each other.

我的理解是 b 向量 ({b1...b12}) 是您的 API 的 getTimbre 方法返回的内容。但是 {c1...c12} 系数是从哪里来的呢?我不明白如何从矢量音色中获取标量音色(主要是因为您的分析 API 是封闭源代码)。你能帮我解决这个问题吗?

请注意,本网站的回答均来自志愿者。要获得对库的官方支持,您需要直接联系发布者。

b1 … b12 不是音频分析的结果,它只是描述分析的内容。它们是固定常数,如图所示:

标量向量 c1 … c12 是分析器产生的。当然,仅仅用12个数字是无法完美描述声音的。将标量乘以函数不会再现原始音乐,因为那里没有足够的数据;这只是一个近似值。不过,您可能会从每个片段中得到类似的 "mood",因此尝试和聆听可能会很有趣。