Skip to content

重構:將爬蟲層抽出成純 Dart 套件 nkust_crawler#428

Merged
abc873693 merged 32 commits intomasterfrom
refactor/extract-crawler-package
May 8, 2026
Merged

重構:將爬蟲層抽出成純 Dart 套件 nkust_crawler#428
abc873693 merged 32 commits intomasterfrom
refactor/extract-crawler-package

Conversation

@abc873693
Copy link
Copy Markdown
Member

@abc873693 abc873693 commented May 2, 2026

摘要

lib/api/lib/models/lib/utils/eucdist.dartlib/utils/captcha_utils.dart 整套抽到新建的純 Dart 套件 packages/nkust_crawler/。merge 之後這個套件可以單獨 dart test(不需要 Flutter SDK)、可以 server-side 拿來用、也能給未來的 native 客戶端共用。

  • 記錄了 63 個 git rename,分散在 5 個 phase commit(用 -c diff.renames=true log --diff-filter=R 可以看到)
  • lib/api/ shim 檔:0 個 — 所有 consumer(pages / widgets / extensions / tests)在搬檔的同一個 commit 直接改 import 為 package:nkust_crawler/nkust_crawler.dartlib/api/exceptions/api_exception_l10n.dart 是唯一留下的,因為它依賴 BuildContext + NkustLocalizations
  • 5 個 dependency injection 介面 讓套件能維持 Flutter-free:CrashReporterKeyValueStoreCaptchaSolver + CaptchaTemplateProviderPdfTextExtractor。Host adapter 放在 lib/integrations/crawler/,由 main.dartbootstrapCrawler() 一次接好

規模

  • 改動 133 個檔案(+3019 / −639)。新增的部分大多是套件 skeleton + 測試代碼;主 app 的程式碼淨變動很小
  • 最終 33 個 commit
  • Commit 分組(Phase 0 cleanup + master merge 之後):
    1. Package skeleton + git mv crash_reporter
    2. 資料層(exceptions、session、registry、capabilities、models)
    3. Parser(5 個檔)+ build_mode.dart(純 Dart kDebugMode shim)
    4. Helper + facade + CaptchaSolver / PdfTextExtractor 注入點
    5. Captcha(eucdistcaptcha_utilscaptcha_solver_impl
    6. bootstrapCrawler() 集中 wiring
    7. 測試 + 後續修補(live integration test 架構、getSemesters 加 antiforgery、wall-clock 學期 helper、fixture 更新)

架構說明

packages/nkust_crawler/ — 純 Dart,依賴只有 ap_common_corediocookie_jarhtmlhttphttp_parserimageintljson_annotationsprintf。沒有 Flutter、Firebase、ImagePicker、native_dio_adapter、syncfusion。

Host 端 adapter(在 lib/integrations/crawler/

  • firebase_crash_reporter.dartCrashReporter 接 Firebase
  • preference_util_key_value_store.dartKeyValueStorePreferenceUtil
  • asset_captcha_template_provider.dartCaptchaTemplateProviderrootBundle
  • syncfusion_pdf_text_extractor.dartPdfTextExtractorsyncfusion_flutter_pdf(這套 transitively 依賴 dart:ui,所以需要這個介面隔離)
  • crawler_bootstrap.dart — 整合所有 wiring 的單一進入點,main.dart 啟動時呼叫

有持久化狀態的 modelbus_reservations_datacrawler_selectorleave_data)的 save / loadcrawlerStorage,bootstrap 時用 PreferenceUtilKeyValueStore 接起來。

驗證

檢查 結果
flutter analyze lib/ 0 errors
flutter test test/ 37 pass / 1 skip / 0 fail
cd packages/nkust_crawler && dart analyze 0 errors / 0 warnings
cd packages/nkust_crawler && dart test 14 pass(hermetic,不需要 Flutter SDK)
dart test -P live-anonymous 真站 acad.nkust.edu.tw + 6 個 health check endpoint 全部通過
dart test -P live(帶 NKUST_USER / NKUST_PASS login + getUserInfo + getSemester + getCourseTables 對真站通過;getScores 故意 skip(PDF 解析需要 Flutter 綁定的 syncfusion)

CI 整合(順手解掉 #331

Issue #331「方案 B 定期 cron 監控」原本因為原生相依(native_dio_adaptersyncfusionimage_picker)跑不起來,這個 PR 順便修:

  • crawler-monitor.ymlflutter testcd packages/nkust_crawler && dart test -P live,secrets 對應,subosito/flutter-action → dart-lang/setup-dart
  • ci.yml:加 package 的 dart analyze + dart test、移除 flutter test \|\| true\|\| true(讓 main test 失敗也擋 PR)
  • 刪除 test/crawler_monitor_test.dart — coverage 全部搬到 packages/nkust_crawler/test/live/{health_check,notifications,authenticated}_live_test.dart

文件

docs/ 下新增兩份:

  • crawler-package-migration-plan.md — 這個 PR 的逐步實作計畫
  • extracting-flutter-crawler-as-dart-package.md — 通用版 cookbook,給之後其他專案做類似抽取時參考,省掉重複調查

測試計畫

  • iOS flutter run — 登入 + 抓課表 + 抓成績 smoke test(驗 native_dio_adapter 注入 + stdsys SSO 還能跑)
  • Android flutter run — 同上
  • flutter build web — 確認沒裝 NativeAdapter 的 fallback path 也能跑基本匿名 endpoint
  • 校車預約建立 / 取消 — 手動測試(寫入路徑刻意不在 live test 裡)
  • 請假上傳照片 — 手動測試(XFile→bytes-tuple 改動有碰到這條路徑)

未來工作(不在本次 PR 範圍)

  • getRoomListgetRoomCourseTablesgetEnrollmentLettergetRewardAndPenaltygetMidtermAlerts 加進 live integration test(read-only,風險低)
  • Helper.username / Helper.password 靜態欄位拿掉,把 Helper 改成 instance-based(reviewer 已點到 cacheSaveTag 用到這個 static field)
  • 套件穩定後 publish nkust_crawler 到 pub.dev

abc873693 added 26 commits May 2, 2026 22:43
…shReporter

Package skeleton (analysis_options, pubspec, main library file, abstractions
test) plus a clean rename of the existing CrashReporter from lib/api/ to
the package's lib/src/abstractions/. Consumers in helpers / parsers /
firebase adapter import from package:nkust_crawler directly — no shim file
left behind at lib/api/crash_reporter.dart.
…lities, models) to nkust_crawler

All 26 .dart files (plus 17 .g.dart counterparts and 6 capability interfaces)
move as git renames. Storage-coupled models route their save/load through
the new KeyValueStore abstraction; PreferenceUtilKeyValueStore is wired at
app bootstrap.

Consumers across lib/pages/, lib/widgets/, lib/extensions/, lib/utils/, and
lib/api/ are updated in this same commit to import from
`package:nkust_crawler/nkust_crawler.dart` directly — no shim files left
behind at lib/api/ or lib/models/.
…e shim

Parsers (ap/stdsys/leave/vms_bus/nkust) all appear as git renames.
build_mode.dart is a new pure-Dart kDebugMode shim that lets parser code
stay independent of package:flutter/foundation.dart.

Consumers updated to import from package:nkust_crawler — no shim files.
…aSolver/PdfTextExtractor seams

api_config, helper (facade), and the 5 helper classes (ap/leave/nkust/stdsys/vms_bus) all appear as git renames.

New seams introduced:
- CaptchaSolver — webap/nkust login no longer hard-codes the EucDist solver
- PdfTextExtractor — stdsys transcript parsing routes through this abstraction so syncfusion stays in the host app

Consumers updated to import from package:nkust_crawler — no shim files at lib/api/.
…cted templates

eucdist.dart moves as a git rename. captcha_utils.dart is replaced by
captcha_solver_impl.dart, which is a substantial rewrite (now a
CaptchaSolver-implementing class with injected CaptchaTemplateProvider)
— honestly recorded as delete+add rather than a forced rename.

AssetCaptchaTemplateProvider on the host side reads BMP templates from
rootBundle and is wired at app bootstrap.
Single entry point for wiring the crawler's host-side dependencies
(platform Dio adapter, KeyValueStore, CrashReporter, CaptchaSolver,
PdfTextExtractor, onLogout). main.dart calls it once after PreferenceUtil
init.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Failed to generate code suggestions for PR

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extracts the NKUST crawler logic into a standalone pure-Dart package, nkust_crawler, to facilitate code reuse across different platforms. The refactoring involves moving core business logic, models, and parsers to the new package while introducing abstraction interfaces for host-side dependencies like storage and crash reporting. Feedback focuses on consolidating redundant imports across several page files, using dynamic filenames instead of hardcoded strings for proof image uploads, and refactoring extension methods to remove dependencies on static global state for improved maintainability.

Comment thread lib/pages/bus/bus_page.dart Outdated
Comment on lines +3 to +5
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are multiple redundant imports of package:nkust_crawler/nkust_crawler.dart. Please consolidate them into a single import statement to improve code clarity and maintainability.

Suggested change
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';

Comment thread lib/pages/leave/leave_apply_page.dart Outdated
if (image != null) {
proof = (
bytes: await image!.readAsBytes(),
filename: 'proof_image.jpg',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The filename for the proof image is hardcoded as 'proof_image.jpg'. It would be more robust to use the original filename from the XFile object via image.name, as XFile provides this information.

Suggested change
filename: 'proof_image.jpg',
filename: image.name,

Comment on lines +6 to +9
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';
import 'package:nkust_ap/api/stdsys_helper.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are multiple redundant imports of package:nkust_crawler/nkust_crawler.dart. Please consolidate them into a single import statement to improve code clarity.

Suggested change
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';
import 'package:nkust_ap/api/stdsys_helper.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';

Comment on lines 4 to 6
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';
import 'package:nkust_ap/api/helper.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/config/constants.dart';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are multiple redundant imports of package:nkust_crawler/nkust_crawler.dart. Please consolidate them into a single import statement to improve code clarity.

Suggested change
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';
import 'package:nkust_ap/api/helper.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/config/constants.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';

Comment thread lib/pages/study/room_list_page.dart Outdated
Comment on lines +3 to +5
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';
import 'package:nkust_ap/api/helper.dart';
import 'package:nkust_ap/models/room_data.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are multiple redundant imports of package:nkust_crawler/nkust_crawler.dart. Please consolidate them into a single import statement to improve code clarity.

Suggested change
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';
import 'package:nkust_ap/api/helper.dart';
import 'package:nkust_ap/models/room_data.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_crawler/nkust_crawler.dart';
import 'package:nkust_ap/api/exceptions/api_exception_l10n.dart';

}

extension SemesterExtension on Semester {
String get cacheSaveTag => '${Helper.username}_$code';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This extension method relies on the static field Helper.username. Using static fields for user-specific data can be problematic, especially in scenarios involving user logout and login. While the PR description notes this as future work, consider making this dependency explicit to improve maintainability. For example, the method could accept the username as a parameter: String cacheSaveTag(String username) => '${username}_$code';

abc873693 added 2 commits May 3, 2026 00:16
Live tests previously printed real student id / name / dept / class /
course title / score detail when run with NKUST_USER+NKUST_PASS, which
would leak the running account's identity if the output was pasted into
PR comments, screenshots, CI logs, or chat.

Mask via redact() so e.g. 'C112151117' shows as 'C••••••••7' and '梁晨恩'
shows as '梁•恩'. Picture URLs (which embed the student id) collapse to
'<set>'. Score detail (conduct, average) is hidden entirely; only the
row count is printed.

NKUST_HTTP_LOG=1 dumps response bodies + cookies — emit a loud warning
when that flag is on so callers know the output must stay private.

.gitignore picks up .env / *.har / *.dump / nkust_*_dump patterns to
block accidental commits.
The remaining lib/api/{helper,ap_helper,stdsys_helper}.dart shims were
removed in the package extraction, so this test had stale imports CI
flagged as URI doesn't exist. Replace them with the single
`package:nkust_crawler/nkust_crawler.dart` facade import (which
re-exports Helper, WebApHelper, StdsysHelper, and the data types this
file uses).
abc873693 added 4 commits May 3, 2026 00:29
- Dedup duplicate package:nkust_crawler/nkust_crawler.dart imports across
  9 page files (reviewer flagged 4; the same sed regex regression hit 5
  more I missed).
- Use image.name as the multipart filename for leave proof uploads
  instead of hardcoded 'proof_image.jpg', so the server (and any
  log/audit downstream of it) sees what the user actually picked.

Skipped: the cacheSaveTag-on-Helper.username comment. That static
dependency is pre-existing in master and removing it would require
threading the username through every page's score/course cache call.
PR description already lists 'drop Helper static fields' as future work.
Root-level flutter analyze was picking up
packages/nkust_crawler/test/*.dart and failing on
package:test/test.dart URI-not-exist — that import resolves via the
sub-package's dev_dependencies, which are not part of the main app's
pub resolution.

The sub-package is still analysed (and its tests still run) via
'cd packages/nkust_crawler && dart analyze && dart test' from inside
its own directory. Add the gotcha to the extraction guide.
Issue #331's '方案 B' (scheduled live monitor) was workflow-passing but
test-failing because flutter test in crawler-monitor.yml could not
resolve native_dio_adapter / syncfusion_flutter_pdf / image_picker
plugins on a Linux runner. Now that the crawler is a pure-Dart package,
that whole class of failure goes away.

ci.yml — Analyze & Test job:
- watch packages/** on push/PR
- analyze + test the sub-package via dart analyze / dart test inside
  packages/nkust_crawler (no Flutter SDK needed)
- drop '|| true' from 'flutter test' so main-app fixture test failures
  block PR merge again (Issue #331 方案 A is now actually enforced)

crawler-monitor.yml:
- replace flutter test with dart test -P live against the package
- map existing NKUST_USERNAME / NKUST_PASSWORD secrets to the package's
  expected NKUST_USER / NKUST_PASS env vars
- swap subosito/flutter-action for dart-lang/setup-dart

New Health Check suite in packages/nkust_crawler/test/live/ covers the
6 endpoints the old test had (Oosaf supersedes the retired
leave.nkust.edu.tw DNS record). Test names still start with 'Health Check'
so the classify-failures step can distinguish server-down from
parser-drift.

Delete test/crawler_monitor_test.dart — its coverage is now in
packages/nkust_crawler/test/live/{health_check,authenticated}_live_test.dart.
dart analyze defaults to --fatal-warnings, so the new
'Analyze (nkust_crawler package)' CI step was failing on 9 unused
imports (helpers, parsers, facade) plus one unnecessary null-aware
'?.' operator in the live test. Drop them — these were leftovers from
the import-shuffle during the package extraction.
@abc873693 abc873693 changed the title refactor: extract crawler layer as pure-Dart nkust_crawler package 重構:將爬蟲層抽出成純 Dart 套件 nkust_crawler May 2, 2026
@abc873693 abc873693 requested review from ryan940618 and yappy2000d May 2, 2026 17:10
@abc873693 abc873693 merged commit 6d1ae62 into master May 8, 2026
6 checks passed
@abc873693 abc873693 deleted the refactor/extract-crawler-package branch May 8, 2026 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant