Skip to content

v0.1.3

Latest

Choose a tag to compare

@adaamko adaamko released this 18 Mar 20:21
· 9 commits to main since this release

Fixed

  • Model-agnostic inference: replaced hardcoded Qwen ChatML template with tokenizer.apply_chat_template() — local transformers backend now works with any model family
  • vLLM model name bug: server model name from env var was not being passed through to API calls, causing 400 errors on providers like Groq

Added

  • Pooled line classifier backend: new pooled backend for sentence-level classification
  • LoRA auto-detection: transformers backend auto-detects and loads LoRA/PEFT checkpoints
  • Batch extraction: extract_many() with concurrent requests for remote backends
  • Encoder model: token-level line classification with mmBERT

Changed

  • Inference refactor: shared _build_messages() used by both vLLM and transformers backends
  • Query tag: renamed <task> to <query> in prompt formatting

Full changelog: v0.1.2...v0.1.3
PyPI: https://pypi.org/project/squeez/0.1.3/