MINIFICPP-2719 - Add multimodal capability to llama.cpp processor#2107
MINIFICPP-2719 - Add multimodal capability to llama.cpp processor#2107adamdebreceni wants to merge 19 commits into
Conversation
8cb0923 to
b207ec4
Compare
b207ec4 to
09c3416
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the MiNiFi C++ llama.cpp extension to support multimodal (mtmd) inference, including wiring FlowFile content as “files” into the llama.cpp mtmd pipeline and optionally writing model output to a FlowFile attribute instead of overwriting content.
Changes:
- Bump vendored llama.cpp to
b8944and apply a new patch to build mtmd support and fix missing includes. - Extend
RunLlamaCppInferencewith multimodal model configuration + optional “output to attribute” behavior. - Update the LlamaContext interface and DefaultLlamaContext implementation to accept file buffers and perform mtmd tokenization/eval.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| thirdparty/llamacpp/mtmd-fix.patch | Adds mtmd subdirectory to llama.cpp build, fixes an include, and removes some mtmd tool executables. |
| thirdparty/llamacpp/lu8_macro_fix.patch | Removes an older llama.cpp patch no longer applied after the version bump. |
| thirdparty/llamacpp/cpp-23-fixes.patch | Removes an older llama.cpp patch no longer applied after the version bump. |
| cmake/LlamaCpp.cmake | Bumps llama.cpp tag, enables LLAMA_BUILD_COMMON, applies mtmd patch, and extends include dirs for common/tools/vendor headers. |
| extensions/llamacpp/CMakeLists.txt | Links the extension against mtmd and llama-common in addition to llama. |
| extensions/llamacpp/processors/LlamaContext.h | Extends generate() to accept a list of binary “files” (e.g., images/audio). |
| extensions/llamacpp/processors/DefaultLlamaContext.h | Adds mtmd/chat-template state and updates constructor/generate signature for multimodal support. |
| extensions/llamacpp/processors/DefaultLlamaContext.cpp | Implements mtmd initialization, multimodal tokenization/eval, and updated decode loop. |
| extensions/llamacpp/processors/RunLlamaCppInference.h | Adds MultiModal Model Path and Output Attribute Name properties and stores them in member state. |
| extensions/llamacpp/processors/RunLlamaCppInference.cpp | Passes FlowFile bytes as multimodal files, inserts mtmd marker, and optionally writes output to an attribute. |
| extensions/llamacpp/tests/RunLlamaCppInferenceTests.cpp | Updates the mock context to match the new generate() signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!multimodal_model_path) { | ||
| logger->log_info("No multimodal model path provided"); | ||
| return; | ||
| } | ||
|
|
||
| mtmd_context_params mparams = mtmd_context_params_default(); | ||
| mparams.use_gpu = false; | ||
| mparams.flash_attn_type = LLAMA_FLASH_ATTN_TYPE_DISABLED; | ||
|
|
||
| multimodal_ctx_ = mtmd_init_from_file(multimodal_model_path->string().c_str(), llama_model_, mparams); | ||
| if (!multimodal_ctx_) { | ||
| throw Exception(ExceptionType::PROCESS_SCHEDULE_EXCEPTION, fmt::format("Failed to load multimodal model from '{}'", multimodal_model_path->string())); | ||
| } | ||
|
|
||
| logger->log_info("Successfully loaded multimodal model from '{}'", multimodal_model_path->string()); |
There was a problem hiding this comment.
I would extract this to a separate function and have something like
if (multimodal_model_path) {
initializeMultimodalContext();
}
| auto batch_deleter = gsl::finally([&] {llama_batch_free(batch);}); | ||
| batch.n_tokens = 1; | ||
| batch.n_seq_id[0] = 1; | ||
| batch.seq_id[0][0] = 0; | ||
| batch.logits[0] = true; |
There was a problem hiding this comment.
This can be moved before the while (decode_status == 0) { line as it is only used in the loop. Also it might be better to use a wrapper object for automatic initialization and destruction.
| if (files.empty()) { | ||
| return std::unexpected{"Multimodal input requires at least one file"}; | ||
| } | ||
| std::vector<unique_bitmap_ptr> bitmaps; | ||
| for (auto& file : files) { | ||
| unique_bitmap_ptr bitmap{mtmd_helper_bitmap_init_from_buf(multimodal_ctx_, reinterpret_cast<const unsigned char*>(file.data()), file.size())}; | ||
| if (!bitmap) { | ||
| throw Exception(PROCESSOR_EXCEPTION, "Failed to create multimodal bitmap from buffer"); | ||
| } | ||
| bitmaps.push_back(std::move(bitmap)); | ||
| } | ||
| mtmd_input_text inp_txt = { | ||
| .text = prompt.c_str(), | ||
| .add_special = true, | ||
| .parse_special = true, | ||
| }; | ||
| unique_mtmd_input_chunks_ptr chunks{mtmd_input_chunks_init()}; | ||
| auto bitmap_c_ptrs = bitmaps | ranges::views::transform([] (auto& ptr) {return static_cast<const mtmd_bitmap*>(ptr.get());}) | ranges::to<std::vector>(); | ||
| auto tokenized = mtmd_tokenize(multimodal_ctx_, chunks.get(), &inp_txt, bitmap_c_ptrs.data(), bitmap_c_ptrs.size()); | ||
| if (tokenized != 0) { | ||
| throw Exception(PROCESSOR_EXCEPTION, fmt::format("Failed to tokenize multimodal prompt, error: {}", tokenized)); | ||
| } | ||
| auto status = mtmd_helper_eval_chunks(multimodal_ctx_, llama_ctx_, chunks.get(), 0, 0, 1, true, &n_past); | ||
| if (status != 0) { | ||
| throw Exception(PROCESSOR_EXCEPTION, fmt::format("Failed to eval multimodal chunks, error: {}", status)); | ||
| } |
There was a problem hiding this comment.
I would extract this to a separate function. Additionally why is llama_decode run in case of the of string tokenization, but not in the multimodal use case?
| if (multimodal_) { | ||
| if (files.empty()) { |
There was a problem hiding this comment.
This could be merged to if (multimodal_ && files.empty())
|
|
||
| @step('a directory at "{directory}" has a file with the content from "{path}"') | ||
| @step("a directory at '{directory}' has a file with the content from '{path}'") | ||
| def create_file_with_content_in_directory(context: MinifiTestContext, directory: str, path: str): |
There was a problem hiding this comment.
we should keep function names unique:
| def create_file_with_content_in_directory(context: MinifiTestContext, directory: str, path: str): | |
| def create_file_with_content_from_path_in_directory(context: MinifiTestContext, directory: str, path: str): |
| EXTENSIONAPI static constexpr auto Properties = std::to_array<core::PropertyReference>({ | ||
| ModelPath, | ||
| OutputAttributeName, | ||
| MultiModalModelPath, |
There was a problem hiding this comment.
please add the new properties to PROCESSORS.md, too
Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.