将乱七八糟的词缩减成词种子
Reduce messy words into word seed
例如,spotify API 歌曲流派:
['alternative rock', 'comic', 'funk rock', 'garage rock', 'indie rock', 'pop rock', 'post-grunge', 'rock']
['g funk', 'gangster rap', 'hip hop', 'pop rap', 'rap', 'west coast rap']
['canadian pop', 'dance pop', 'pop', 'pop christmas']
三个列表代表三首歌曲genres.But这样的曲风看起来很乱,我很容易"extract" "genre seed",即三首歌曲是
rock
rap
pop
分别
如何将这些乱七八糟的词精简成词种?
谢谢
好吧,如果您有一个种子列表,例如,我们可以计算每个种子在流派中的出现次数以及 return 具有最大权重的那个。
假设种子列表称为 "seed",流派列表称为 "genre"。我们应该交叉检查所有种子类型组合,并为某些结构增加权重。
def max_seed_return (seeds, genres):
# appending weigths to dictionary
weights= {seed:0 for seed in seeds}
for genre in genres:
for seed in seeds:
if seed in genre:
weights[seed]+=1
max_weight, result = 0, None
# getting result genre with biggest weigth
for seed, seed_weight in weights.items:
if seed_weight>max_weight:
max_weight=seed_weight
result=seed
#returns it or None if no seeds is found in genres
return result
例如,spotify API 歌曲流派:
['alternative rock', 'comic', 'funk rock', 'garage rock', 'indie rock', 'pop rock', 'post-grunge', 'rock']
['g funk', 'gangster rap', 'hip hop', 'pop rap', 'rap', 'west coast rap']
['canadian pop', 'dance pop', 'pop', 'pop christmas']
三个列表代表三首歌曲genres.But这样的曲风看起来很乱,我很容易"extract" "genre seed",即三首歌曲是
rock
rap
pop
分别
如何将这些乱七八糟的词精简成词种? 谢谢
好吧,如果您有一个种子列表,例如,我们可以计算每个种子在流派中的出现次数以及 return 具有最大权重的那个。 假设种子列表称为 "seed",流派列表称为 "genre"。我们应该交叉检查所有种子类型组合,并为某些结构增加权重。
def max_seed_return (seeds, genres):
# appending weigths to dictionary
weights= {seed:0 for seed in seeds}
for genre in genres:
for seed in seeds:
if seed in genre:
weights[seed]+=1
max_weight, result = 0, None
# getting result genre with biggest weigth
for seed, seed_weight in weights.items:
if seed_weight>max_weight:
max_weight=seed_weight
result=seed
#returns it or None if no seeds is found in genres
return result