Skip to content

[Feature] Implement KeyClass (Weak Supervision) for Clinical Notes #1107

@ShuqingZou

Description

@ShuqingZou

Hi PyHealth Team,

I am Shuqing Zou, an MSCS student at USF. My current work and internship focus on heuristic-based label generation and clinical risk prediction from unstructured EHR text, which aligns perfectly with the KeyClass pipeline.

Because of this strong alignment, I would love to give it a try and implement it from scratch for PyHealth. Since this is a non-trivial feature, I'd like to propose a phased approach to keep the PRs easy to review:

  • Phase 1: Data processing pipeline for MIMIC-III discharge summaries (including the TF-IDF filtering step).
  • Phase 2: Integrate weak supervision logic (Labeling Functions + Snorkel Label Model).
  • Phase 3: End-to-end self-training classifier (e.g., via BERT).

I read the contributing guidelines and am ready to branch off develop. Could you let me know if anyone is currently working on this? If not, I can start putting together a draft PR for Phase 1!

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions