Skip to content

feat: read matched word aloud on hover#2869

Open
snomiao wants to merge 6 commits intobirchill:mainfrom
snomiao:feat/auto-speak-minimal
Open

feat: read matched word aloud on hover#2869
snomiao wants to merge 6 commits intobirchill:mainfrom
snomiao:feat/auto-speak-minimal

Conversation

@snomiao
Copy link
Copy Markdown

@snomiao snomiao commented Apr 7, 2026

Summary

Add an opt-in Audio options section with a single checkbox that auto-reads the matched Japanese word aloud when the popup is shown on hover, using the browser's built-in Web Speech API. Defaults to off.

The spoken text is the actual matched surface form (so 寄与しませんでした reads as the whole inflected phrase, not just 寄与), falling back to the dictionary headword reading if no surface match is available.

Why

A common request from learners is to hear how a word actually sounds. The browser's built-in speechSynthesis provides this for free with no additional dependencies, no network calls, no user data leaving the device, and no extra permissions. Wiring it into the existing hover/popup flow is a small, contained change that gives users immediate value while staying out of the way of everyone who doesn't want it.

Behaviour details

  • Speech is triggered from commitPopup() so it rides the existing 400 ms ghost→hover delay (acts as a natural debounce).
  • Same word is not re-spoken while the popup stays on it (#lastSpokenText dedupe).
  • Any in-flight utterance is cancelled when the popup is hidden, and again at the start of the next utterance, so changing words mid-speech feels responsive.
  • The Web Speech API is feature-detected at runtime; environments without speechSynthesis are silently skipped.
  • A ja-JP voice is preferred when available, otherwise we fall back to whatever voice the browser picks for lang='ja-JP'.

Files changed

  • src/common/content-config-params.ts, src/common/config.ts, src/content/content-config.ts — new autoSpeak boolean setting (default false), wired through the existing storage/snapshot pattern that mirrors readingOnly.
  • src/content/content.tsspeakCurrentReading() helper, hook in commitPopup(), cancel + reset in hidePopup(), #lastSpokenText dedupe field.
  • src/options/AudioSettings.tsx (new) + src/options/OptionsPage.tsx — new Audio settings section with one CheckboxRow, placed after Popup interactivity.
  • _locales/en/messages.json — two new i18n keys (options_audio_heading, options_auto_speak).

Total: +136 lines, 0 deletions across 7 files.

Test plan

  • pnpm test:unit — 124/124 pass
  • pnpm build:{firefox,chrome,edge,safari,thunderbird} — all 5 targets compile clean (only the pre-existing Rspack code-splitting size warning)
  • Load the built extension, enable Audio → Read matched word aloud on hover in options
  • Hover a Japanese word → reading is spoken once after the popup appears
  • Move to a different word → previous utterance is cancelled, new one plays
  • Hover the same word repeatedly → not re-spoken while popup remains
  • Move mouse away to dismiss popup → speech stops
  • Disable the setting → no speech, no behaviour change anywhere else

🤖 Generated with Claude Code

Add an "Audio" options section with a single checkbox that, when
enabled, reads the matched Japanese word aloud when its popup is
shown on hover, using the browser's built-in Web Speech API.

The spoken text is the actual matched surface form (including any
inflection, e.g. 寄与しませんでした) so users hear the whole word as
it appears on the page, falling back to the dictionary headword
reading if no surface match is available.

The same utterance is not repeated while the popup remains on the
same word, and any in-flight speech is cancelled when the popup is
hidden. The setting defaults to off.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 7, 2026 10:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in “Audio” option to automatically speak the matched Japanese word when the hover popup is committed, using the browser Web Speech API.

Changes:

  • Introduces a new autoSpeak boolean config setting (default false) and wires it through the existing config/content snapshot flow.
  • Adds a new Options UI section (“Audio”) with a single checkbox to enable auto-speaking on hover.
  • Implements speech synthesis in the content script with basic dedupe and cancellation behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/options/OptionsPage.tsx Adds the new AudioSettings section into the Options page layout.
src/options/AudioSettings.tsx New options UI for toggling autoSpeak.
src/content/content.ts Triggers speech on commitPopup(), adds dedupe state, and cancels speech on hide.
src/content/content-config.ts Exposes autoSpeak via ContentConfig.
src/common/content-config-params.ts Adds autoSpeak to the shared content-config parameter interface.
src/common/config.ts Persists autoSpeak in sync storage and includes it in contentConfig snapshot.
_locales/en/messages.json Adds i18n strings for the new Audio options section/checkbox.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/content/content.ts
Comment on lines +2636 to +2646
this.#lastSpokenText = undefined;

if (
typeof window !== 'undefined' &&
typeof window.speechSynthesis !== 'undefined'
) {
try {
window.speechSynthesis.cancel();
} catch {
// Ignore.
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hidePopup() calls window.speechSynthesis.cancel() unconditionally whenever the popup is hidden. This will cancel any page-initiated speech synthesis even when the user has not enabled the auto-speak option. Gate the cancel/reset logic behind this.#config.autoSpeak and/or only cancel when the content script previously started an utterance (e.g., track the last created SpeechSynthesisUtterance).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: hidePopup() now only calls speechSynthesis.cancel() when we previously created an utterance ourselves (tracked via #lastUtterance), so page-initiated speech is left alone.

Comment thread src/content/content.ts
Comment on lines +2608 to +2616
if (!text || text === this.#lastSpokenText) {
return;
}

this.#lastSpokenText = text;

try {
window.speechSynthesis.cancel();
const utterance = new SpeechSynthesisUtterance(text);
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#lastSpokenText is set before the try { ... } block. If SpeechSynthesisUtterance construction or speechSynthesis.speak() throws, the catch will swallow the error but the dedupe value remains set, preventing subsequent attempts to speak the same text. Consider only updating #lastSpokenText after a successful speak(), or resetting it in the catch path.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: #lastSpokenText is now only set after a successful speak() call, and reset to undefined in the catch path so subsequent attempts to speak the same text aren't blocked.

Comment thread src/common/config.ts
Comment on lines +1178 to +1194
// autoSpeak: Defaults to false

get autoSpeak(): boolean {
return !!this.#settings.autoSpeak;
}

set autoSpeak(value: boolean) {
if (
typeof this.#settings.autoSpeak !== 'undefined' &&
this.#settings.autoSpeak === value
) {
return;
}

this.#settings.autoSpeak = value;
void browser.storage.sync.set({ autoSpeak: value });
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new persisted setting (autoSpeak) was added, but the unit tests for Config don't appear to cover its default value or its storage behavior. Since src/common/config.test.ts already verifies defaults for other settings, add assertions for config.autoSpeak === false by default and (ideally) that toggling it writes to browser.storage.sync and flows into contentConfig.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a default-value assertion (expect(config.autoSpeak).toEqual(false)) in config.test.ts.

Comment thread _locales/en/messages.json
Comment on lines +1370 to +1377
"options_audio_heading": {
"message": "Audio",
"description": "Heading for the audio / speech settings section"
},
"options_auto_speak": {
"message": "Read matched word aloud on hover",
"description": "Label for the checkbox enabling automatic text-to-speech of the matched word when the popup is shown on hover"
},
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New i18n keys were added here, but they are missing from the other shipped locales (_locales/ja/messages.json and _locales/zh_CN/messages.json). Since browser.i18n.getMessage returns an empty string for missing keys, the Options UI will show a blank heading/label for those locales. Add corresponding entries (even as untranslated placeholders) in the other locale files.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added options_audio_heading and options_auto_speak entries to both _locales/ja/messages.json and _locales/zh_CN/messages.json in a separate i18n commit.

@birtles
Copy link
Copy Markdown
Member

birtles commented Apr 8, 2026

Hi! Thanks for doing this. I gave it a try and it's pretty good. On Windows, however, the accuracy is not great. For example, it reads 四畳半 as よじょうはん instead of よじょうはん. There are many other examples I came across where it wasn't right and I suppose Windows probably has better platform support than others.

The code in the PR is pretty good but it's still missing some things like the localization of UI strings, and there are a few minor issues from your Copilot review.

I'm hiring someone to help out with audio support starting next month and I think I'd like to roll this PR into a bigger audio feature that provides the option of playing pre-recorded samples. That way the user can choose between remote samples (more accurate) and local samples (less lag, less network traffic, etc.).

snomiao added 2 commits April 8, 2026 14:19
- Only cancel speech synthesis we started ourselves on hidePopup
- Update lastSpokenText only after a successful speak() call
- Add autoSpeak default-value test in config.test.ts
@snomiao
Copy link
Copy Markdown
Author

snomiao commented Apr 8, 2026

Thanks for the review! I've pushed two follow-up commits:

  1. Copilot review fixes — only cancel speech we initiated ourselves (track #lastUtterance), only update #lastSpokenText after a successful speak() call, and added a default-value test for autoSpeak in config.test.ts.
  2. i18n — added options_audio_heading and options_auto_speak translations for ja and zh_CN.

Re: the Windows accuracy issue (四畳半 → よじょうはん) and the bigger audio feature plan — totally agree, the platform TTS quality varies a lot and pre-recorded samples are clearly the right answer for dictionary readings.

I've actually been experimenting with this on my fork too. My current thinking is that there's room for a layered approach where the user can pick a TTS engine, with Web Speech API as the free default:

  • Web Speech API (current PR) — free, zero-setup, works everywhere, good enough as a baseline. Quality varies by OS but it's a decent starting point.
  • Pre-recorded samples — the most accurate option for dictionary headwords, exactly as you described (remote for accuracy, local for low-latency / offline / less network).
  • Local voice LLMs via WASM — caveat: this only really works well on post-2025 devices that have dedicated AI chips. Not viable as a default but a nice option for users with the hardware.
  • Online APIs / BYOK — for longer sentences and example translations, users could plug in their own Gemini/OpenAI key to use those TTS services. Useful for the example sentence audio case rather than headwords.

For this PR I'd suggest keeping it scoped to Web Speech API as a minimal foundation — it's a good free baseline that everyone gets, and the engine selector + pre-recorded sample pipeline can layer on top once the bigger audio feature lands next month. Happy to rebase / restructure / wait, whatever works best for how you want to land the larger feature. Let me know!

@birtles
Copy link
Copy Markdown
Member

birtles commented Apr 8, 2026

Thanks for those fixes. This looks great. I hope you don't mind if I don't merge it just yet, however.

I want to tackle this as part of the bigger feature. I don't want to ship just the platform TTS since it will cause too much churn for users as we change the available settings and their defaults. Instead I'd like to prepare both options with suitable default settings and ship them together.

Once the other half is ready, I think this PR should be mostly usable as-is.

If we need to merge it sooner to avoid bitrot then I'd want to drop the options screen part so that it's temporarily disabled.

birtles requested in birchill#2869 that we drop the options screen part so the
feature is temporarily disabled while the larger audio feature
(pre-recorded samples + engine selection) is being prepared. The
underlying autoSpeak config field, i18n strings and content-script
speech logic remain in place so this can be re-enabled by restoring
<AudioSettings /> in OptionsPage.
@snomiao
Copy link
Copy Markdown
Author

snomiao commented Apr 9, 2026

Sounds good — totally understand wanting to ship both halves together to avoid settings churn.

To keep this PR from bitrotting in the meantime, I've gone ahead and dropped the options screen part in 6e46f82. The Audio settings section is removed from OptionsPage.tsx and AudioSettings.tsx is deleted, but the underlying autoSpeak config field, the _locales strings, and the content-script speech logic all stay in place. Re-enabling is just a one-line restore of <AudioSettings /> (or wiring it into the new audio settings section) once the bigger feature is ready.

Happy to leave this open and rebase as needed, or close it and reopen later — whichever fits your workflow best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants