feat: read matched word aloud on hover#2869
Conversation
Add an "Audio" options section with a single checkbox that, when enabled, reads the matched Japanese word aloud when its popup is shown on hover, using the browser's built-in Web Speech API. The spoken text is the actual matched surface form (including any inflection, e.g. 寄与しませんでした) so users hear the whole word as it appears on the page, falling back to the dictionary headword reading if no surface match is available. The same utterance is not repeated while the popup remains on the same word, and any in-flight speech is cancelled when the popup is hidden. The setting defaults to off. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an opt-in “Audio” option to automatically speak the matched Japanese word when the hover popup is committed, using the browser Web Speech API.
Changes:
- Introduces a new
autoSpeakboolean config setting (defaultfalse) and wires it through the existing config/content snapshot flow. - Adds a new Options UI section (“Audio”) with a single checkbox to enable auto-speaking on hover.
- Implements speech synthesis in the content script with basic dedupe and cancellation behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/options/OptionsPage.tsx |
Adds the new AudioSettings section into the Options page layout. |
src/options/AudioSettings.tsx |
New options UI for toggling autoSpeak. |
src/content/content.ts |
Triggers speech on commitPopup(), adds dedupe state, and cancels speech on hide. |
src/content/content-config.ts |
Exposes autoSpeak via ContentConfig. |
src/common/content-config-params.ts |
Adds autoSpeak to the shared content-config parameter interface. |
src/common/config.ts |
Persists autoSpeak in sync storage and includes it in contentConfig snapshot. |
_locales/en/messages.json |
Adds i18n strings for the new Audio options section/checkbox. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| this.#lastSpokenText = undefined; | ||
|
|
||
| if ( | ||
| typeof window !== 'undefined' && | ||
| typeof window.speechSynthesis !== 'undefined' | ||
| ) { | ||
| try { | ||
| window.speechSynthesis.cancel(); | ||
| } catch { | ||
| // Ignore. | ||
| } |
There was a problem hiding this comment.
hidePopup() calls window.speechSynthesis.cancel() unconditionally whenever the popup is hidden. This will cancel any page-initiated speech synthesis even when the user has not enabled the auto-speak option. Gate the cancel/reset logic behind this.#config.autoSpeak and/or only cancel when the content script previously started an utterance (e.g., track the last created SpeechSynthesisUtterance).
There was a problem hiding this comment.
Fixed: hidePopup() now only calls speechSynthesis.cancel() when we previously created an utterance ourselves (tracked via #lastUtterance), so page-initiated speech is left alone.
| if (!text || text === this.#lastSpokenText) { | ||
| return; | ||
| } | ||
|
|
||
| this.#lastSpokenText = text; | ||
|
|
||
| try { | ||
| window.speechSynthesis.cancel(); | ||
| const utterance = new SpeechSynthesisUtterance(text); |
There was a problem hiding this comment.
#lastSpokenText is set before the try { ... } block. If SpeechSynthesisUtterance construction or speechSynthesis.speak() throws, the catch will swallow the error but the dedupe value remains set, preventing subsequent attempts to speak the same text. Consider only updating #lastSpokenText after a successful speak(), or resetting it in the catch path.
There was a problem hiding this comment.
Fixed: #lastSpokenText is now only set after a successful speak() call, and reset to undefined in the catch path so subsequent attempts to speak the same text aren't blocked.
| // autoSpeak: Defaults to false | ||
|
|
||
| get autoSpeak(): boolean { | ||
| return !!this.#settings.autoSpeak; | ||
| } | ||
|
|
||
| set autoSpeak(value: boolean) { | ||
| if ( | ||
| typeof this.#settings.autoSpeak !== 'undefined' && | ||
| this.#settings.autoSpeak === value | ||
| ) { | ||
| return; | ||
| } | ||
|
|
||
| this.#settings.autoSpeak = value; | ||
| void browser.storage.sync.set({ autoSpeak: value }); | ||
| } |
There was a problem hiding this comment.
A new persisted setting (autoSpeak) was added, but the unit tests for Config don't appear to cover its default value or its storage behavior. Since src/common/config.test.ts already verifies defaults for other settings, add assertions for config.autoSpeak === false by default and (ideally) that toggling it writes to browser.storage.sync and flows into contentConfig.
There was a problem hiding this comment.
Added a default-value assertion (expect(config.autoSpeak).toEqual(false)) in config.test.ts.
| "options_audio_heading": { | ||
| "message": "Audio", | ||
| "description": "Heading for the audio / speech settings section" | ||
| }, | ||
| "options_auto_speak": { | ||
| "message": "Read matched word aloud on hover", | ||
| "description": "Label for the checkbox enabling automatic text-to-speech of the matched word when the popup is shown on hover" | ||
| }, |
There was a problem hiding this comment.
New i18n keys were added here, but they are missing from the other shipped locales (_locales/ja/messages.json and _locales/zh_CN/messages.json). Since browser.i18n.getMessage returns an empty string for missing keys, the Options UI will show a blank heading/label for those locales. Add corresponding entries (even as untranslated placeholders) in the other locale files.
There was a problem hiding this comment.
Added options_audio_heading and options_auto_speak entries to both _locales/ja/messages.json and _locales/zh_CN/messages.json in a separate i18n commit.
|
Hi! Thanks for doing this. I gave it a try and it's pretty good. On Windows, however, the accuracy is not great. For example, it reads 四畳半 as よんじょうはん instead of よじょうはん. There are many other examples I came across where it wasn't right and I suppose Windows probably has better platform support than others. The code in the PR is pretty good but it's still missing some things like the localization of UI strings, and there are a few minor issues from your Copilot review. I'm hiring someone to help out with audio support starting next month and I think I'd like to roll this PR into a bigger audio feature that provides the option of playing pre-recorded samples. That way the user can choose between remote samples (more accurate) and local samples (less lag, less network traffic, etc.). |
- Only cancel speech synthesis we started ourselves on hidePopup - Update lastSpokenText only after a successful speak() call - Add autoSpeak default-value test in config.test.ts
|
Thanks for the review! I've pushed two follow-up commits:
Re: the Windows accuracy issue (四畳半 → よんじょうはん) and the bigger audio feature plan — totally agree, the platform TTS quality varies a lot and pre-recorded samples are clearly the right answer for dictionary readings. I've actually been experimenting with this on my fork too. My current thinking is that there's room for a layered approach where the user can pick a TTS engine, with Web Speech API as the free default:
For this PR I'd suggest keeping it scoped to Web Speech API as a minimal foundation — it's a good free baseline that everyone gets, and the engine selector + pre-recorded sample pipeline can layer on top once the bigger audio feature lands next month. Happy to rebase / restructure / wait, whatever works best for how you want to land the larger feature. Let me know! |
|
Thanks for those fixes. This looks great. I hope you don't mind if I don't merge it just yet, however. I want to tackle this as part of the bigger feature. I don't want to ship just the platform TTS since it will cause too much churn for users as we change the available settings and their defaults. Instead I'd like to prepare both options with suitable default settings and ship them together. Once the other half is ready, I think this PR should be mostly usable as-is. If we need to merge it sooner to avoid bitrot then I'd want to drop the options screen part so that it's temporarily disabled. |
birtles requested in birchill#2869 that we drop the options screen part so the feature is temporarily disabled while the larger audio feature (pre-recorded samples + engine selection) is being prepared. The underlying autoSpeak config field, i18n strings and content-script speech logic remain in place so this can be re-enabled by restoring <AudioSettings /> in OptionsPage.
|
Sounds good — totally understand wanting to ship both halves together to avoid settings churn. To keep this PR from bitrotting in the meantime, I've gone ahead and dropped the options screen part in 6e46f82. The Audio settings section is removed from Happy to leave this open and rebase as needed, or close it and reopen later — whichever fits your workflow best. |
Summary
Add an opt-in Audio options section with a single checkbox that auto-reads the matched Japanese word aloud when the popup is shown on hover, using the browser's built-in Web Speech API. Defaults to off.
The spoken text is the actual matched surface form (so 寄与しませんでした reads as the whole inflected phrase, not just 寄与), falling back to the dictionary headword reading if no surface match is available.
Why
A common request from learners is to hear how a word actually sounds. The browser's built-in
speechSynthesisprovides this for free with no additional dependencies, no network calls, no user data leaving the device, and no extra permissions. Wiring it into the existing hover/popup flow is a small, contained change that gives users immediate value while staying out of the way of everyone who doesn't want it.Behaviour details
commitPopup()so it rides the existing 400 ms ghost→hover delay (acts as a natural debounce).#lastSpokenTextdedupe).speechSynthesisare silently skipped.ja-JPvoice is preferred when available, otherwise we fall back to whatever voice the browser picks forlang='ja-JP'.Files changed
src/common/content-config-params.ts,src/common/config.ts,src/content/content-config.ts— newautoSpeakboolean setting (defaultfalse), wired through the existing storage/snapshot pattern that mirrorsreadingOnly.src/content/content.ts—speakCurrentReading()helper, hook incommitPopup(), cancel + reset inhidePopup(),#lastSpokenTextdedupe field.src/options/AudioSettings.tsx(new) +src/options/OptionsPage.tsx— new Audio settings section with oneCheckboxRow, placed after Popup interactivity._locales/en/messages.json— two new i18n keys (options_audio_heading,options_auto_speak).Total: +136 lines, 0 deletions across 7 files.
Test plan
pnpm test:unit— 124/124 passpnpm build:{firefox,chrome,edge,safari,thunderbird}— all 5 targets compile clean (only the pre-existing Rspack code-splitting size warning)🤖 Generated with Claude Code