Fixed
- Model-agnostic inference: replaced hardcoded Qwen ChatML template with
tokenizer.apply_chat_template()— local transformers backend now works with any model family - vLLM model name bug: server model name from env var was not being passed through to API calls, causing 400 errors on providers like Groq
Added
- Pooled line classifier backend: new
pooledbackend for sentence-level classification - LoRA auto-detection: transformers backend auto-detects and loads LoRA/PEFT checkpoints
- Batch extraction:
extract_many()with concurrent requests for remote backends - Encoder model: token-level line classification with mmBERT
Changed
- Inference refactor: shared
_build_messages()used by both vLLM and transformers backends - Query tag: renamed
<task>to<query>in prompt formatting
Full changelog: v0.1.2...v0.1.3
PyPI: https://pypi.org/project/squeez/0.1.3/