如何在音频叙述时根据网站上的音频实时突出显示文本

how to highlight text as per audio on a website in realtime as the audio narrates it

我正在尝试确定使用哪种技术来根据音频突出显示文本。很像 https://speechify.com/ 正在做的事情。

这是假设我能够 运行 TTS 算法并且我能够将文本转换为语音。 我尝试了多种来源,但我无法确定在音频说话时突出显示文本的确切技术或方法。

如有任何帮助,我们将不胜感激。我已经在互联网上浪费了 2 天时间来解决这个问题,但运气不好:(

一个简单的方法是使用 SpeechSynthesisUtterance boundary event 提供的事件侦听器来突出显示带有普通 JS 的单词。发出的事件为我们提供了字符索引,因此无需为正则表达式或超级 AI 的东西发疯:)

首先,请确保 API 可用

const synth = window.speechSynthesis
if (!synth) {
  console.error('no tts for you!')
  return
}

tts 语句会发出一个 'boundary' 事件,我们可以用它来突出显示文本。

let text = document.getElementById('text')
let originalText = text.innerText
let utterance = new SpeechSynthesisUtterance(originalText)
utterance.addEventListener('boundary', event => {
  const { charIndex, charLength } = event
  text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
})
synth.speak(utterance)

完整示例:

const btn = document.getElementById("btn")

const highlight = (text, from, to) => {
  let replacement = highlightBackground(text.slice(from, to))
  return text.substring(0, from) + replacement + text.substring(to)
}
const highlightBackground = sample => `<span style="background-color:yellow;">${sample}</span>`

btn && btn.addEventListener('click', () => {
  const synth = window.speechSynthesis
  if (!synth) {
    console.error('no tts')
    return
  }
  let text = document.getElementById('text')
  let originalText = text.innerText
  let utterance = new SpeechSynthesisUtterance(originalText)
  utterance.addEventListener('boundary', event => {
    const { charIndex, charLength } = event
    text.innerHTML = highlight(originalText, charIndex, charIndex + charLength)
   })
  synth.speak(utterance)
})

CodeSandbox link

这是非常基础的,您可以(并且应该)改进它。

编辑

糟糕,我忘了这被标记为 ReactJs。这是与 React 相同的示例(codesandbox link 在评论中):

import React from "react";

const ORIGINAL_TEXT =
  "Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.";

const splitText = (text, from, to) => [
  text.slice(0, from),
  text.slice(from, to),
  text.slice(to)
];

const HighlightedText = ({ text, from, to }) => {
  const [start, highlight, finish] = splitText(text, from, to);
  return (
    <p>
      {start}
      <span style={{ backgroundColor: "yellow" }}>{highlight}</span>
      {finish}
    </p>
  );
};

export default function App() {
  const [highlightSection, setHighlightSection] = React.useState({
    from: 0,
    to: 0
  });
  const handleClick = () => {
    const synth = window.speechSynthesis;
    if (!synth) {
      console.error("no tts");
      return;
    }

    let utterance = new SpeechSynthesisUtterance(ORIGINAL_TEXT);
    utterance.addEventListener("boundary", (event) => {
      const { charIndex, charLength } = event;
      setHighlightSection({ from: charIndex, to: charIndex + charLength });
    });
    synth.speak(utterance);
  };

  return (
    <div className="App">
      <HighlightedText text={ORIGINAL_TEXT} {...highlightSection} />
      <button onClick={handleClick}>klik me</button>
    </div>
  );
}