feat: read matched word aloud on hover by snomiao · Pull Request #2869 · birchill/10ten-ja-reader

snomiao · 2026-04-07T10:08:43Z

Summary

Add an opt-in Audio options section with a single checkbox that auto-reads the matched Japanese word aloud when the popup is shown on hover, using the browser's built-in Web Speech API. Defaults to off.

The spoken text is the actual matched surface form (so 寄与しませんでした reads as the whole inflected phrase, not just 寄与), falling back to the dictionary headword reading if no surface match is available.

Why

A common request from learners is to hear how a word actually sounds. The browser's built-in speechSynthesis provides this for free with no additional dependencies, no network calls, no user data leaving the device, and no extra permissions. Wiring it into the existing hover/popup flow is a small, contained change that gives users immediate value while staying out of the way of everyone who doesn't want it.

Behaviour details

Speech is triggered from commitPopup() so it rides the existing 400 ms ghost→hover delay (acts as a natural debounce).
Same word is not re-spoken while the popup stays on it (#lastSpokenText dedupe).
Any in-flight utterance is cancelled when the popup is hidden, and again at the start of the next utterance, so changing words mid-speech feels responsive.
The Web Speech API is feature-detected at runtime; environments without speechSynthesis are silently skipped.
A ja-JP voice is preferred when available, otherwise we fall back to whatever voice the browser picks for lang='ja-JP'.

Files changed

src/common/content-config-params.ts, src/common/config.ts, src/content/content-config.ts — new autoSpeak boolean setting (default false), wired through the existing storage/snapshot pattern that mirrors readingOnly.
src/content/content.ts — speakCurrentReading() helper, hook in commitPopup(), cancel + reset in hidePopup(), #lastSpokenText dedupe field.
src/options/AudioSettings.tsx (new) + src/options/OptionsPage.tsx — new Audio settings section with one CheckboxRow, placed after Popup interactivity.
_locales/en/messages.json — two new i18n keys (options_audio_heading, options_auto_speak).

Total: +136 lines, 0 deletions across 7 files.

Test plan

pnpm test:unit — 124/124 pass
pnpm build:{firefox,chrome,edge,safari,thunderbird} — all 5 targets compile clean (only the pre-existing Rspack code-splitting size warning)
Load the built extension, enable Audio → Read matched word aloud on hover in options
Hover a Japanese word → reading is spoken once after the popup appears
Move to a different word → previous utterance is cancelled, new one plays
Hover the same word repeatedly → not re-spoken while popup remains
Move mouse away to dismiss popup → speech stops
Disable the setting → no speech, no behaviour change anywhere else

🤖 Generated with Claude Code

Add an "Audio" options section with a single checkbox that, when enabled, reads the matched Japanese word aloud when its popup is shown on hover, using the browser's built-in Web Speech API. The spoken text is the actual matched surface form (including any inflection, e.g. 寄与しませんでした) so users hear the whole word as it appears on the page, falling back to the dictionary headword reading if no surface match is available. The same utterance is not repeated while the popup remains on the same word, and any in-flight speech is cancelled when the popup is hidden. The setting defaults to off. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds an opt-in “Audio” option to automatically speak the matched Japanese word when the hover popup is committed, using the browser Web Speech API.

Changes:

Introduces a new autoSpeak boolean config setting (default false) and wires it through the existing config/content snapshot flow.
Adds a new Options UI section (“Audio”) with a single checkbox to enable auto-speaking on hover.
Implements speech synthesis in the content script with basic dedupe and cancellation behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/options/OptionsPage.tsx`	Adds the new `AudioSettings` section into the Options page layout.
`src/options/AudioSettings.tsx`	New options UI for toggling `autoSpeak`.
`src/content/content.ts`	Triggers speech on `commitPopup()`, adds dedupe state, and cancels speech on hide.
`src/content/content-config.ts`	Exposes `autoSpeak` via `ContentConfig`.
`src/common/content-config-params.ts`	Adds `autoSpeak` to the shared content-config parameter interface.
`src/common/config.ts`	Persists `autoSpeak` in sync storage and includes it in `contentConfig` snapshot.
`_locales/en/messages.json`	Adds i18n strings for the new Audio options section/checkbox.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-07T10:14:35Z

+    this.#lastSpokenText = undefined;
+
+    if (
+      typeof window !== 'undefined' &&
+      typeof window.speechSynthesis !== 'undefined'
+    ) {
+      try {
+        window.speechSynthesis.cancel();
+      } catch {
+        // Ignore.
+      }


hidePopup() calls window.speechSynthesis.cancel() unconditionally whenever the popup is hidden. This will cancel any page-initiated speech synthesis even when the user has not enabled the auto-speak option. Gate the cancel/reset logic behind this.#config.autoSpeak and/or only cancel when the content script previously started an utterance (e.g., track the last created SpeechSynthesisUtterance).

Fixed: hidePopup() now only calls speechSynthesis.cancel() when we previously created an utterance ourselves (tracked via #lastUtterance), so page-initiated speech is left alone.

Copilot · 2026-04-07T10:14:35Z

+    if (!text || text === this.#lastSpokenText) {
+      return;
+    }
+
+    this.#lastSpokenText = text;
+
+    try {
+      window.speechSynthesis.cancel();
+      const utterance = new SpeechSynthesisUtterance(text);


#lastSpokenText is set before the try { ... } block. If SpeechSynthesisUtterance construction or speechSynthesis.speak() throws, the catch will swallow the error but the dedupe value remains set, preventing subsequent attempts to speak the same text. Consider only updating #lastSpokenText after a successful speak(), or resetting it in the catch path.

Fixed: #lastSpokenText is now only set after a successful speak() call, and reset to undefined in the catch path so subsequent attempts to speak the same text aren't blocked.

Copilot · 2026-04-07T10:14:35Z

+  // autoSpeak: Defaults to false
+
+  get autoSpeak(): boolean {
+    return !!this.#settings.autoSpeak;
+  }
+
+  set autoSpeak(value: boolean) {
+    if (
+      typeof this.#settings.autoSpeak !== 'undefined' &&
+      this.#settings.autoSpeak === value
+    ) {
+      return;
+    }
+
+    this.#settings.autoSpeak = value;
+    void browser.storage.sync.set({ autoSpeak: value });
+  }


A new persisted setting (autoSpeak) was added, but the unit tests for Config don't appear to cover its default value or its storage behavior. Since src/common/config.test.ts already verifies defaults for other settings, add assertions for config.autoSpeak === false by default and (ideally) that toggling it writes to browser.storage.sync and flows into contentConfig.

Added a default-value assertion (expect(config.autoSpeak).toEqual(false)) in config.test.ts.

Copilot · 2026-04-07T10:14:35Z

+  "options_audio_heading": {
+    "message": "Audio",
+    "description": "Heading for the audio / speech settings section"
+  },
+  "options_auto_speak": {
+    "message": "Read matched word aloud on hover",
+    "description": "Label for the checkbox enabling automatic text-to-speech of the matched word when the popup is shown on hover"
+  },


New i18n keys were added here, but they are missing from the other shipped locales (_locales/ja/messages.json and _locales/zh_CN/messages.json). Since browser.i18n.getMessage returns an empty string for missing keys, the Options UI will show a blank heading/label for those locales. Add corresponding entries (even as untranslated placeholders) in the other locale files.

Added options_audio_heading and options_auto_speak entries to both _locales/ja/messages.json and _locales/zh_CN/messages.json in a separate i18n commit.

birtles · 2026-04-08T03:32:26Z

Hi! Thanks for doing this. I gave it a try and it's pretty good. On Windows, however, the accuracy is not great. For example, it reads 四畳半 as よんじょうはん instead of よじょうはん. There are many other examples I came across where it wasn't right and I suppose Windows probably has better platform support than others.

The code in the PR is pretty good but it's still missing some things like the localization of UI strings, and there are a few minor issues from your Copilot review.

I'm hiring someone to help out with audio support starting next month and I think I'd like to roll this PR into a bigger audio feature that provides the option of playing pre-recorded samples. That way the user can choose between remote samples (more accurate) and local samples (less lag, less network traffic, etc.).

- Only cancel speech synthesis we started ourselves on hidePopup - Update lastSpokenText only after a successful speak() call - Add autoSpeak default-value test in config.test.ts

snomiao · 2026-04-08T05:20:15Z

Thanks for the review! I've pushed two follow-up commits:

Copilot review fixes — only cancel speech we initiated ourselves (track #lastUtterance), only update #lastSpokenText after a successful speak() call, and added a default-value test for autoSpeak in config.test.ts.
i18n — added options_audio_heading and options_auto_speak translations for ja and zh_CN.

Re: the Windows accuracy issue (四畳半 → よんじょうはん) and the bigger audio feature plan — totally agree, the platform TTS quality varies a lot and pre-recorded samples are clearly the right answer for dictionary readings.

I've actually been experimenting with this on my fork too. My current thinking is that there's room for a layered approach where the user can pick a TTS engine, with Web Speech API as the free default:

Web Speech API (current PR) — free, zero-setup, works everywhere, good enough as a baseline. Quality varies by OS but it's a decent starting point.
Pre-recorded samples — the most accurate option for dictionary headwords, exactly as you described (remote for accuracy, local for low-latency / offline / less network).
Local voice LLMs via WASM — caveat: this only really works well on post-2025 devices that have dedicated AI chips. Not viable as a default but a nice option for users with the hardware.
Online APIs / BYOK — for longer sentences and example translations, users could plug in their own Gemini/OpenAI key to use those TTS services. Useful for the example sentence audio case rather than headwords.

For this PR I'd suggest keeping it scoped to Web Speech API as a minimal foundation — it's a good free baseline that everyone gets, and the engine selector + pre-recorded sample pipeline can layer on top once the bigger audio feature lands next month. Happy to rebase / restructure / wait, whatever works best for how you want to land the larger feature. Let me know!

birtles · 2026-04-08T12:04:12Z

Thanks for those fixes. This looks great. I hope you don't mind if I don't merge it just yet, however.

I want to tackle this as part of the bigger feature. I don't want to ship just the platform TTS since it will cause too much churn for users as we change the available settings and their defaults. Instead I'd like to prepare both options with suitable default settings and ship them together.

Once the other half is ready, I think this PR should be mostly usable as-is.

If we need to merge it sooner to avoid bitrot then I'd want to drop the options screen part so that it's temporarily disabled.

birtles requested in birchill#2869 that we drop the options screen part so the feature is temporarily disabled while the larger audio feature (pre-recorded samples + engine selection) is being prepared. The underlying autoSpeak config field, i18n strings and content-script speech logic remain in place so this can be re-enabled by restoring <AudioSettings /> in OptionsPage.

snomiao · 2026-04-09T06:50:24Z

Sounds good — totally understand wanting to ship both halves together to avoid settings churn.

To keep this PR from bitrotting in the meantime, I've gone ahead and dropped the options screen part in 6e46f82. The Audio settings section is removed from OptionsPage.tsx and AudioSettings.tsx is deleted, but the underlying autoSpeak config field, the _locales strings, and the content-script speech logic all stay in place. Re-enabling is just a one-line restore of <AudioSettings /> (or wiring it into the new audio settings section) once the bigger feature is ready.

Happy to leave this open and rebase as needed, or close it and reopen later — whichever fits your workflow best.

Copilot AI review requested due to automatic review settings April 7, 2026 10:08

Copilot started reviewing on behalf of snomiao April 7, 2026 10:09 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

snomiao added 2 commits April 8, 2026 14:19

fix: address Copilot review feedback for auto-speak

131a2f6

- Only cancel speech synthesis we started ourselves on hidePopup - Update lastSpokenText only after a successful speak() call - Add autoSpeak default-value test in config.test.ts

i18n: add ja and zh_CN translations for Audio options

fc2c9e9

Merge branch 'main' into feat/auto-speak-minimal

e58975a

Merge branch 'main' into feat/auto-speak-minimal

35c787d

Conversation

snomiao commented Apr 7, 2026

Summary

Why

Behaviour details

Files changed

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

snomiao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

snomiao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

snomiao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

snomiao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

birtles commented Apr 8, 2026

Uh oh!

snomiao commented Apr 8, 2026

Uh oh!

birtles commented Apr 8, 2026

Uh oh!

snomiao commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants