Skip to content

dbus: rauc: replace tacd-based update polling with native update polling#90

Open
hnez wants to merge 35 commits intolinux-automation:mainfrom
hnez:rauc-polling
Open

dbus: rauc: replace tacd-based update polling with native update polling#90
hnez wants to merge 35 commits intolinux-automation:mainfrom
hnez:rauc-polling

Conversation

@hnez
Copy link
Copy Markdown
Member

@hnez hnez commented Apr 2, 2025

This uses the native RAUC update polling support introduced in rauc/rauc#1672 to replace the tacd-internal update polling.

Among the various benefits of this approach are the following:

  • ETag-based check for changed bundles on the server. This means unchanged bundles do not have to be inspect on every poll.
  • Automatic installation and boot into new bundles.

Other PRs related to this one:

TODO before un-drafting:

  • Write proper commit messages

@hnez hnez marked this pull request as ready for review April 3, 2025 07:19
@hnez hnez requested a review from jluebbe April 3, 2025 07:22
@hnez
Copy link
Copy Markdown
Member Author

hnez commented Apr 3, 2025

Hi @jluebbe,

I'd suggest you go first and review the polling integration first and maybe in a next step we ask @KarlK90 to have a look from a Rust point of view?

@jluebbe
Copy link
Copy Markdown
Member

jluebbe commented May 13, 2025

I've created a RAUC feature request to support service config reloading: rauc/rauc#1709

Comment thread src/dbus/rauc/system_conf.rs Outdated
Comment thread src/dbus/rauc/system_conf.rs Outdated
Comment thread src/dbus/rauc/system_conf.rs Outdated
@hnez
Copy link
Copy Markdown
Member Author

hnez commented Feb 20, 2026

I have decided to base this PR on #105 so I do not have to worry about conflicting changes too much.
Since that PR is a draft - because it is currently untested - I will also mark this PR as draft until I had a chance to change that.

@hnez hnez marked this pull request as draft February 20, 2026 12:18
Comment thread src/dbus/rauc.rs
@hnez hnez force-pushed the rauc-polling branch 3 times, most recently from 989ed7d to e3e26fc Compare February 24, 2026 09:36
@hnez hnez marked this pull request as ready for review February 24, 2026 09:36
@hnez
Copy link
Copy Markdown
Member Author

hnez commented Feb 24, 2026

I think we could review and merge this and carry the RAUC changes as patches in meta-lxatac for a bit, until we are sure that we like the API and how it works in RAUC.
What do you think @jluebbe?

@hnez
Copy link
Copy Markdown
Member Author

hnez commented Mar 5, 2026

We need to get this reviewed. Who could do that? @jluebbe for the RAUC integration and @KarlK90 for the rest?

Comment thread src/dbus/rauc.rs Outdated
Comment on lines +242 to +248
while let Some(ev) = status.next().await {
info!("Current status: {} ({})", ev.active_state, ev.sub_state);

if ev.active_state == "active" {
break;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we handle the case where the state doesn't end up as "active" here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example with a timeout?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved from the tacd::broker::Topic Rube Goldberg machine based service restart to just using the dbus endpoint directly to trigger the restart and get immediate feedback.
This should remove the need for timeouts on our side. Instead the timeouts from the RAUC service file (or the defaults) should take effect.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out some amount of Rube Goldberg machinery is actually required to reload/restart a systemd service and be notified about the result of this action.
I have just pushed an updated version in the form of 6a64740.

}

#[cfg(not(feature = "demo_mode"))]
pub(super) fn update_from_poll_status(&mut self, poll_status: zvariant::Dict) -> Result<bool> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would benefit from some documentation. How do we find the channel to update from the RAUC status information?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add some text.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some text

Comment thread src/dbus/rauc/system_conf.rs Outdated
Comment thread src/dbus/rauc/system_conf.rs Outdated
Comment thread src/dbus/rauc/system_conf.rs Outdated
let inhibit_files = primary_channel
.inhibit_files
.as_deref()
.unwrap_or("/var/run/tacd/inhibit/dut-pwr;/var/run/tacd/inhibit/setup-mode");
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.unwrap_or("/var/run/tacd/inhibit/dut-pwr;/var/run/tacd/inhibit/setup-mode");
.unwrap_or("/run/tacd/inhibit/dut-pwr;/run/tacd/inhibit/setup-mode");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover any existing paths in /var/run/tacd?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether I understand the question. We generate/remove these inhibit files in the tacd and check their existence in RAUC.

Comment thread src/inhibit.rs Outdated
Comment thread src/dbus/rauc/update_channels.rs Outdated
@hnez hnez force-pushed the rauc-polling branch 2 times, most recently from d9272db to adb2b67 Compare April 8, 2026 14:14
hnez added 28 commits May 8, 2026 07:59
RAUC is in the process of adding native polling support, which we want to
integrate into the tacd.

To do the switch in a reviewable way first remove the tacd-based polling
and then add the native support in separate commits.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
RAUC native update polling only supports a single update channel,
while our native update polling did support multiple
(all channels which RAUC would have accepted updates from,
based on the enabled signing certificates, were polled for updates and the
user was asked if they wanted to install updates from them).

Prepare for the change by adding a concept of a single primary update
channel. The primary channel is the first enabled one. Based on the
channel definition file name.

E.g. on production TACs these channel files are available:

    root@lxatac-00011:~# ls /usr/share/tacd/update_channels/
    01_stable.yaml 05_testing.yaml

They are sorted by name when they are read from disk, so if both
`stable.cert.pem` and `testing.cert.pem` are found
in `/etc/rauc/certificates-enabled/`, then the stable channel will be
the primary channel, but bundles from the testing channel may still be
installed via the command line interface (e.g. to facilitate a channel
switch).

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
This restricts the sources that the `/v1/tac/update/install` will accept
update requests from to only the primary channel.
The web interface has not exposed the feature to install arbitrary URLs
for some time now and users that want to do so are better served by using
the command line interface instead.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
This configures RAUC to poll for updates on our behalf.
We do not use the information yet or enable automatic installation but
those are next steps.
We also need to trigger RAUC to re-read the file for this to be useful.
All of these features are added in follow-up commits.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
We will need that in the future to implement cleaner service restarts.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
This triggers RAUC systemd service reload or restart (currently RAUC does
not support reloads, so it will be a restart) and waits for the result.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
RAUC native polling provides us with information about the recent poll
attempts. This includes information about the bundle version and wether
it is an update over what is currently running on the device.

In other words: it gives us everything we need to show update
notifications again. Forward this information to the same places we used
with the tacd-based update polling.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
The RAUC native polling interface provides more information than
just the basic `compatible` and `version` fields.
Among these extra informations are the following:

  - `manifest_hash`

    By using the `manifest_hash` in the `InstallBundle` call we can
    (cryptographically) ensure that the exact bundle (content) that the
    user agreed to install is actually being installed and that no switch
    has happened in between.

  - `effective_url`

    This is the bundle URL after all HTTP redirects have been followed.
    This is e.g. relevant when a "clever" update server is used that
    redirects poll requests to specific bundles to e.g. implement staged
    rollouts or prevents updates from incompatible bundle versions.

    By using this one can ensure that the bundle URL used matches
    the `manifest_hash` provided and that the redirects did not change
    (e.g. because the next step in a staged update was reached)
    between the last poll and the user accepting an update.

The update dialog on the LCD is updated to use this mechanism now,
while the web interface will be updated later.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Automatic installation and boot of updates can be useful when managing a
fleet of devices. This is however a feature that requires strict user
consent, hence why it is off by default.

Add backend-support for enabling this feature. Frontent support in the
web interface will be added later.

We always enable auto-reboot together with auto-install,
since the migration scripts only run once at the end of the installation.
A system that is updated, but not rebooted, would thus accumulate changes
that are not migrated to the other slot.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Users managing a fleet of devices with custom-built update bundles and
update channel may want to automatically enable update polling and
automatic installation of updates without having to do so explicitly via
the web ui. (At least we at Pengutronix do).

Enable this usecase by adding optional `force_polling` and
`force_auto_install` config options to the update channel definition
files.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
The native RAUC polling feature allows configuring when a new update
bundle is even considered as an update candidate (`candidate_criteria`),
when it is considered for automatic installation (`install_criteria`)
and under which conditions to auto-boot into another slot after
installation (`reboot_criteria`).

The defaults we have chosen in previous commits generally make sense,
but allow users with custom update channels to customize them if they
deem it necessary.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
The struct itself and also the functions that take parts of it as
parameters (in particular `channel_list_update_task`) are getting
unwieldy.

Adding another parameter to `channel_list_update_task` would cross cargo
clippy's threshold on "too many arguments".

Work around that by splitting the Rauc struct and just passing the config
part of it to `channel_list_update_task` as a whole.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
When in setup mode we do not want the system to auto-install updates and
suddenly reboot without user input, as that would lead to a bad first
experience with the TAC.

Instead use an inhibit file to delay the first RAUC update poll to when
the setup mode is exited.

This commit does not yet make use of the inhibit file, it only generates
and removes it.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
If a DUT is currently powered by the TAC it is unlikely that we should
reboot for an update.

This commit does not yet make use of the generated inhibit file.
It only creates and removes it based on the DUT power state.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
It would be quite surprising (in the negative sense) if the TAC would
reboot without a warning while you are setting it up after unboxing.
Instead wait for the setup to complete before looking for updates.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Ideally we would not want to auto-reboot when the TAC is in use,
but deciding when that is, is not easy.
One piece of information we do have is if the DUT is currently powered.
In that case we very likely do not want to reboot right now.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
The return value from InstallerProxy::get_slot_status() needs some
post-processing.
Break that out into a separate function, because we will need to call
`get_slot_status` from another place soon.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
This function looks up the two rootfs slots and aranges them based on
which one is booted and which one is not.
This is currently only used in one place, but will be used in another
soon.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
I like this better now, because it makes the control flow more obvious
(bail! returns right there, while Err(anyhow!(...))) follows the normal
control flow.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
At least for now the RAUC service will only poll for updates if the booted
slot was marked good during the current runtime of the service.
We restart the service to dynamically update the configuration.
This means we have to mark the booted slot as good again if we want polling.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
We include the `boot-id` and `uptime` options in the RAUC `send-headers`
config in the hopes of detecting boot-loops during update roll-out.
This was not anticipated when writing the setup page, so we add it now.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
The channel list in the web interface now contains a "Upgrade" column
with one of the following:

  - "Not enabled" for channels which are not enabled, which means bundles
    from it can not be installed for it.
  - "Not primary" (this one is new) for channels which are enabled,
    but are not the primary one and are thus not polled by the native RAUC
    polling feature.
  - "Polling disabled" if the polling feature is not enabled.
  - A spinner if we do not know the status yet.
  - "Up to date" if the TAC is in sync with this update channel.
  - "Upgrade" (a button) if an update is available.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
…dren

This allows us to only show some configuration options when a condition is
met.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
Only make the toggle visible when polling is enabled, since auto-update
without polling does nothing.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
This ensures that the exact bundle (content) that the user agreed to
install is actually installed.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
With the change to native RAUC update polling only the primary update
channel is checked for updates, not all update channels as before.

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
This is an advanced config options that most users should not actually
need, so hide it behind an expandable section (and only show it if polling
is active).

Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>
@hnez
Copy link
Copy Markdown
Member Author

hnez commented May 8, 2026

I have just rebased the PR on top of the recently merged #107 and a small unmerged fixup commit #108.

I would prefer if we merged this sooner than later, because it should be included in the new tacos release, which I will be preparing next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants