Skip to content

MINIFICPP-2719 - Add multimodal capability to llama.cpp processor#2107

Open
adamdebreceni wants to merge 19 commits into
apache:mainfrom
adamdebreceni:multimodal_llama
Open

MINIFICPP-2719 - Add multimodal capability to llama.cpp processor#2107
adamdebreceni wants to merge 19 commits into
apache:mainfrom
adamdebreceni:multimodal_llama

Conversation

@adamdebreceni
Copy link
Copy Markdown
Contributor

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file?
  • If applicable, have you updated the NOTICE file?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.

@adamdebreceni adamdebreceni marked this pull request as ready for review May 4, 2026 12:18
@lordgamez lordgamez self-requested a review May 5, 2026 11:32
@martinzink martinzink requested review from Copilot and martinzink May 5, 2026 12:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the MiNiFi C++ llama.cpp extension to support multimodal (mtmd) inference, including wiring FlowFile content as “files” into the llama.cpp mtmd pipeline and optionally writing model output to a FlowFile attribute instead of overwriting content.

Changes:

  • Bump vendored llama.cpp to b8944 and apply a new patch to build mtmd support and fix missing includes.
  • Extend RunLlamaCppInference with multimodal model configuration + optional “output to attribute” behavior.
  • Update the LlamaContext interface and DefaultLlamaContext implementation to accept file buffers and perform mtmd tokenization/eval.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
thirdparty/llamacpp/mtmd-fix.patch Adds mtmd subdirectory to llama.cpp build, fixes an include, and removes some mtmd tool executables.
thirdparty/llamacpp/lu8_macro_fix.patch Removes an older llama.cpp patch no longer applied after the version bump.
thirdparty/llamacpp/cpp-23-fixes.patch Removes an older llama.cpp patch no longer applied after the version bump.
cmake/LlamaCpp.cmake Bumps llama.cpp tag, enables LLAMA_BUILD_COMMON, applies mtmd patch, and extends include dirs for common/tools/vendor headers.
extensions/llamacpp/CMakeLists.txt Links the extension against mtmd and llama-common in addition to llama.
extensions/llamacpp/processors/LlamaContext.h Extends generate() to accept a list of binary “files” (e.g., images/audio).
extensions/llamacpp/processors/DefaultLlamaContext.h Adds mtmd/chat-template state and updates constructor/generate signature for multimodal support.
extensions/llamacpp/processors/DefaultLlamaContext.cpp Implements mtmd initialization, multimodal tokenization/eval, and updated decode loop.
extensions/llamacpp/processors/RunLlamaCppInference.h Adds MultiModal Model Path and Output Attribute Name properties and stores them in member state.
extensions/llamacpp/processors/RunLlamaCppInference.cpp Passes FlowFile bytes as multimodal files, inserts mtmd marker, and optionally writes output to an attribute.
extensions/llamacpp/tests/RunLlamaCppInferenceTests.cpp Updates the mock context to match the new generate() signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread extensions/llamacpp/processors/RunLlamaCppInference.cpp
Comment thread extensions/llamacpp/processors/RunLlamaCppInference.cpp
Comment thread extensions/llamacpp/processors/DefaultLlamaContext.cpp Outdated
Comment thread extensions/llamacpp/processors/DefaultLlamaContext.cpp
Comment thread extensions/llamacpp/processors/DefaultLlamaContext.cpp Outdated
Comment thread extensions/llamacpp/processors/RunLlamaCppInference.cpp
Comment thread extensions/llamacpp/processors/RunLlamaCppInference.cpp
Comment thread extensions/llamacpp/processors/RunLlamaCppInference.cpp
Comment on lines +82 to +96
if (!multimodal_model_path) {
logger->log_info("No multimodal model path provided");
return;
}

mtmd_context_params mparams = mtmd_context_params_default();
mparams.use_gpu = false;
mparams.flash_attn_type = LLAMA_FLASH_ATTN_TYPE_DISABLED;

multimodal_ctx_ = mtmd_init_from_file(multimodal_model_path->string().c_str(), llama_model_, mparams);
if (!multimodal_ctx_) {
throw Exception(ExceptionType::PROCESS_SCHEDULE_EXCEPTION, fmt::format("Failed to load multimodal model from '{}'", multimodal_model_path->string()));
}

logger->log_info("Successfully loaded multimodal model from '{}'", multimodal_model_path->string());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract this to a separate function and have something like

if (multimodal_model_path) {
  initializeMultimodalContext();
}

Comment on lines +149 to +153
auto batch_deleter = gsl::finally([&] {llama_batch_free(batch);});
batch.n_tokens = 1;
batch.n_seq_id[0] = 1;
batch.seq_id[0][0] = 0;
batch.logits[0] = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved before the while (decode_status == 0) { line as it is only used in the loop. Also it might be better to use a wrapper object for automatic initialization and destruction.

Comment on lines +156 to +181
if (files.empty()) {
return std::unexpected{"Multimodal input requires at least one file"};
}
std::vector<unique_bitmap_ptr> bitmaps;
for (auto& file : files) {
unique_bitmap_ptr bitmap{mtmd_helper_bitmap_init_from_buf(multimodal_ctx_, reinterpret_cast<const unsigned char*>(file.data()), file.size())};
if (!bitmap) {
throw Exception(PROCESSOR_EXCEPTION, "Failed to create multimodal bitmap from buffer");
}
bitmaps.push_back(std::move(bitmap));
}
mtmd_input_text inp_txt = {
.text = prompt.c_str(),
.add_special = true,
.parse_special = true,
};
unique_mtmd_input_chunks_ptr chunks{mtmd_input_chunks_init()};
auto bitmap_c_ptrs = bitmaps | ranges::views::transform([] (auto& ptr) {return static_cast<const mtmd_bitmap*>(ptr.get());}) | ranges::to<std::vector>();
auto tokenized = mtmd_tokenize(multimodal_ctx_, chunks.get(), &inp_txt, bitmap_c_ptrs.data(), bitmap_c_ptrs.size());
if (tokenized != 0) {
throw Exception(PROCESSOR_EXCEPTION, fmt::format("Failed to tokenize multimodal prompt, error: {}", tokenized));
}
auto status = mtmd_helper_eval_chunks(multimodal_ctx_, llama_ctx_, chunks.get(), 0, 0, 1, true, &n_past);
if (status != 0) {
throw Exception(PROCESSOR_EXCEPTION, fmt::format("Failed to eval multimodal chunks, error: {}", status));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract this to a separate function. Additionally why is llama_decode run in case of the of string tokenization, but not in the multimodal use case?

Comment on lines +45 to +46
if (multimodal_) {
if (files.empty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be merged to if (multimodal_ && files.empty())


@step('a directory at "{directory}" has a file with the content from "{path}"')
@step("a directory at '{directory}' has a file with the content from '{path}'")
def create_file_with_content_in_directory(context: MinifiTestContext, directory: str, path: str):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should keep function names unique:

Suggested change
def create_file_with_content_in_directory(context: MinifiTestContext, directory: str, path: str):
def create_file_with_content_from_path_in_directory(context: MinifiTestContext, directory: str, path: str):

EXTENSIONAPI static constexpr auto Properties = std::to_array<core::PropertyReference>({
ModelPath,
OutputAttributeName,
MultiModalModelPath,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the new properties to PROCESSORS.md, too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants