如何在音频叙述时根据网站上的音频实时突出显示文本
how to highlight text as per audio on a website in realtime as the audio narrates it
我正在尝试确定使用哪种技术来根据音频突出显示文本。很像 https://speechify.com/
正在做的事情。
这是假设我能够 运行 TTS 算法并且我能够将文本转换为语音。
我尝试了多种来源,但我无法确定在音频说话时突出显示文本的确切技术或方法。
如有任何帮助,我们将不胜感激。我已经在互联网上浪费了 2 天时间来解决这个问题,但运气不好:(
一个简单的方法是使用 SpeechSynthesisUtterance boundary event 提供的事件侦听器来突出显示带有普通 JS 的单词。发出的事件为我们提供了字符索引,因此无需为正则表达式或超级 AI 的东西发疯:)
首先,请确保 API 可用
const synth = window.speechSynthesis
if (!synth) {
console.error('no tts for you!')
return
}
tts 语句会发出一个 'boundary' 事件,我们可以用它来突出显示文本。
let text = document.getElementById('text')
let originalText = text.innerText
let utterance = new SpeechSynthesisUtterance(originalText)
utterance.addEventListener('boundary', event => {
const { charIndex, charLength } = event
text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
})
synth.speak(utterance)
完整示例:
const btn = document.getElementById("btn")
const highlight = (text, from, to) => {
let replacement = highlightBackground(text.slice(from, to))
return text.substring(0, from) + replacement + text.substring(to)
}
const highlightBackground = sample => `<span style="background-color:yellow;">${sample}</span>`
btn && btn.addEventListener('click', () => {
const synth = window.speechSynthesis
if (!synth) {
console.error('no tts')
return
}
let text = document.getElementById('text')
let originalText = text.innerText
let utterance = new SpeechSynthesisUtterance(originalText)
utterance.addEventListener('boundary', event => {
const { charIndex, charLength } = event
text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
})
synth.speak(utterance)
})
这是非常基础的,您可以(并且应该)改进它。
编辑
糟糕,我忘了这被标记为 ReactJs。这是与 React 相同的示例(codesandbox link 在评论中):
import React from "react";
const ORIGINAL_TEXT =
"Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.";
const splitText = (text, from, to) => [
text.slice(0, from),
text.slice(from, to),
text.slice(to)
];
const HighlightedText = ({ text, from, to }) => {
const [start, highlight, finish] = splitText(text, from, to);
return (
<p>
{start}
<span style={{ backgroundColor: "yellow" }}>{highlight}</span>
{finish}
</p>
);
};
export default function App() {
const [highlightSection, setHighlightSection] = React.useState({
from: 0,
to: 0
});
const handleClick = () => {
const synth = window.speechSynthesis;
if (!synth) {
console.error("no tts");
return;
}
let utterance = new SpeechSynthesisUtterance(ORIGINAL_TEXT);
utterance.addEventListener("boundary", (event) => {
const { charIndex, charLength } = event;
setHighlightSection({ from: charIndex, to: charIndex + charLength });
});
synth.speak(utterance);
};
return (
<div className="App">
<HighlightedText text={ORIGINAL_TEXT} {...highlightSection} />
<button onClick={handleClick}>klik me</button>
</div>
);
}
我正在尝试确定使用哪种技术来根据音频突出显示文本。很像 https://speechify.com/
正在做的事情。
这是假设我能够 运行 TTS 算法并且我能够将文本转换为语音。 我尝试了多种来源,但我无法确定在音频说话时突出显示文本的确切技术或方法。
如有任何帮助,我们将不胜感激。我已经在互联网上浪费了 2 天时间来解决这个问题,但运气不好:(
一个简单的方法是使用 SpeechSynthesisUtterance boundary event 提供的事件侦听器来突出显示带有普通 JS 的单词。发出的事件为我们提供了字符索引,因此无需为正则表达式或超级 AI 的东西发疯:)
首先,请确保 API 可用
const synth = window.speechSynthesis
if (!synth) {
console.error('no tts for you!')
return
}
tts 语句会发出一个 'boundary' 事件,我们可以用它来突出显示文本。
let text = document.getElementById('text')
let originalText = text.innerText
let utterance = new SpeechSynthesisUtterance(originalText)
utterance.addEventListener('boundary', event => {
const { charIndex, charLength } = event
text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
})
synth.speak(utterance)
完整示例:
const btn = document.getElementById("btn")
const highlight = (text, from, to) => {
let replacement = highlightBackground(text.slice(from, to))
return text.substring(0, from) + replacement + text.substring(to)
}
const highlightBackground = sample => `<span style="background-color:yellow;">${sample}</span>`
btn && btn.addEventListener('click', () => {
const synth = window.speechSynthesis
if (!synth) {
console.error('no tts')
return
}
let text = document.getElementById('text')
let originalText = text.innerText
let utterance = new SpeechSynthesisUtterance(originalText)
utterance.addEventListener('boundary', event => {
const { charIndex, charLength } = event
text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
})
synth.speak(utterance)
})
这是非常基础的,您可以(并且应该)改进它。
编辑
糟糕,我忘了这被标记为 ReactJs。这是与 React 相同的示例(codesandbox link 在评论中):
import React from "react";
const ORIGINAL_TEXT =
"Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.";
const splitText = (text, from, to) => [
text.slice(0, from),
text.slice(from, to),
text.slice(to)
];
const HighlightedText = ({ text, from, to }) => {
const [start, highlight, finish] = splitText(text, from, to);
return (
<p>
{start}
<span style={{ backgroundColor: "yellow" }}>{highlight}</span>
{finish}
</p>
);
};
export default function App() {
const [highlightSection, setHighlightSection] = React.useState({
from: 0,
to: 0
});
const handleClick = () => {
const synth = window.speechSynthesis;
if (!synth) {
console.error("no tts");
return;
}
let utterance = new SpeechSynthesisUtterance(ORIGINAL_TEXT);
utterance.addEventListener("boundary", (event) => {
const { charIndex, charLength } = event;
setHighlightSection({ from: charIndex, to: charIndex + charLength });
});
synth.speak(utterance);
};
return (
<div className="App">
<HighlightedText text={ORIGINAL_TEXT} {...highlightSection} />
<button onClick={handleClick}>klik me</button>
</div>
);
}