Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions CLAUDE_PLAN_AND_RESOLUTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Axiomatic Adapter — Plan & Resolutions

This document captures the design plan and the resolution status of every issue identified during the Axiomatic adapter audit. It mirrors the working plan used while implementing the fixes on branch `zeerek/bugfix-axiomatic-flips-error-logic`.

---

## Context

The Axiomatic CAN-to-Ethernet adapter ([axiomatic_adapter/](axiomatic_adapter/)) is the TCP-over-Ethernet alternative to a Kvaser SocketCAN device. ECU flashing through it failed intermittently, while flashing via Kvaser was reliable. Two audit passes pinned the asymmetry to the TCP transport layer: the adapter treated the byte stream as one-CAN-frame-per-read and only understood the deprecated "CAN Stream" message type. Under the bursty traffic of a flash protocol, this dropped frames three different ways.

The work was staged into three phases so the flash bench could validate intermediate progress.

---

## Audit findings — status table

| # | Severity | Finding | Resolved in |
|---|----------|---------|-------------|
| 1 | CRITICAL | `send()` wrote 8 data bytes regardless of DLC; declared `message_length` disagreed with actual payload | Phase 1 |
| 2 | CRITICAL | No TCP stream framing — coalesced or split CAN frames were silently dropped | Phase 1 (partial) → Phase 2 (complete) |
| 3 | CRITICAL | Heartbeat (Message ID 4, every 1 s) and Status Response (ID 3) were rejected as garbage, corrupting stream alignment | Phase 2 |
| 4 | CRITICAL | The protocol's Message Data Length at bytes 9-10 was ignored; framing was guessed from the control byte | Phase 2 |
| 5 | CRITICAL | A single protocol message can carry multiple CAN + Notification frames; only the first was delivered | Phase 2 |
| 6 | CRITICAL | Bridge error callbacks had inverted logic + UB dereference of an empty `std::optional` | Already fixed in tree prior to this branch (verified) |
| 7 | HIGH | No `TCP_NODELAY` — Nagle's algorithm added up to ~40 ms latency per small write | Phase 3 |
| 8 | HIGH | Synchronous `boost::asio::write()` from the bridge thread raced `async_receive` on the same socket | Phase 3 |

---

## Phase 1 — Quick correctness fixes (DONE)

Goal: ship the smallest changes that make the wire protocol behave so the flash bench could validate immediately.

- **`send()` DLC fix.** `send()` now writes exactly `frame_data_length` bytes after the header instead of all 8 of `frame.get_data()`. The declared `message_length` and the actual payload bytes finally match.
- **First-cut TCP framer.** Added `AxiomaticFrameParser` with a persistent `std::vector<uint8_t>` buffer. The parser scans for a sync header, decodes a complete frame, leaves trailing partial bytes for the next read, and skips garbage prefixes.
- **`receive()` rewired** to drain the parser before doing any I/O and to append fresh bytes after a successful read. Coalesced frames now deliver back-to-back without additional `async_receive` calls.
- **`INCOMPLETE_FRAME_SENTINEL`** suppresses the "bytes arrived but not yet a full frame" case from the reception loop's `error_callback_`.
- **Stale test NOTE removed** — the "first message may fail" comment was a symptom of the absent framer.

Tests: 11 parser unit tests + 4 loopback send-length tests, no hardware required.

### Phase 1 gap that motivated Phase 2

The Phase 1 parser hard-coded a **7-byte** "sync header" including `0x01` as the last byte. That `0x01` is actually the first byte of the Message ID — specifically Message ID 1, "deprecated CAN Stream." The protocol also defines Message IDs 3 (Status Response), 4 (Heartbeat), and 5 (CAN FD Stream), all of which share the first **6** bytes but differ at bytes 6-7. The Phase 1 parser:

- Rejected heartbeats as garbage, dropping their bytes and risking mid-message desync.
- Computed total frame size from the control byte (only meaningful for CAN Stream messages), with no way to skip other message types cleanly.
- Returned only one CAN frame per protocol message, while the spec allows multiple CAN + Notification frames packed together.

---

## Phase 2 — Protocol-correct streaming parser (DONE)

Goal: make the parser understand the actual Axiomatic message envelope, not just a sync prefix.

- **`SYNC_PREFIX` is now 6 bytes** (`AXIO` + `0xBA 0x36`). Message ID lives at bytes 6-7 and is treated as data, not header.
- **Framing uses the protocol's Message Data Length field** (bytes 9-10). Total message size is `11 + data_length`. The parser waits for that many bytes before consuming anything.
- **Message ID dispatch.** CAN Stream messages (ID 1) are decoded; Heartbeat (ID 4), Status Response (ID 3), CAN FD Stream (ID 5), and any unknown ID are consumed and counted, never surfaced as errors.
- **Multi-frame iteration.** `decodeCanStreamBody()` walks the CAN Stream payload one control byte at a time, handling CAN frames (control bit 7 = 0) and skipping Notification frames (bit 7 = 1, fixed 5 bytes). Decoded CAN frames flow through an internal `std::deque` so the existing one-frame-per-call API still works.
- **Diagnostic counters** added: `heartbeats_seen`, `status_responses_seen`, `can_fd_messages_skipped`, `unknown_messages_skipped`, `notification_frames_skipped`, `dropped_bytes`.
- **Verbose mode.** `AxiomaticFrameParser::set_verbose(true)` prints one `std::cout` line per non-CAN event. Plumbed all the way through: `ros2 run axiomatic_adapter axiomatic_socketcan_bridge ... -v` now flips parser verbosity too.
- **`send()` wire layout unchanged**, but the source now references `SYNC_PREFIX` + `MSG_ID_CAN_STREAM_DEPRECATED` explicitly so the intent is obvious.

Tests: parser tests refactored around a `encodeCanStreamMessage` / `encodeHeartbeatMessage` / etc. helper set, plus new cases for packed multi-frame messages, mixed CAN+Notification bodies, heartbeats coalesced with CAN traffic, status responses, FD frames, and split-across-append scenarios.

---

## Phase 3 — Concurrency & throughput (DONE)

Goal: eliminate the remaining intermittent failure classes — Nagle-induced latency and the unsynchronized cross-thread socket access.

- **TCP_NODELAY** is set immediately after a successful connect. Failure is logged but non-fatal.
- **Persistent `io_context`** runs on a dedicated worker thread for the adapter's lifetime, held alive by an `executor_work_guard`. The per-call `restart()`/`run()` pattern is gone.
- **`boost::asio::strand`** serializes every socket operation. `send()` and `receive()` post async work onto the strand and block the calling thread on a `std::promise`/`std::future` — public sync signatures unchanged, so the bridge keeps working as-is.
- **Reception loop is now self-rescheduling.** `startReceptionThread()` no longer spawns a thread; it posts the first `async_receive` to the strand, and the completion handler chains itself. `joinReceptionThread()` posts a `tcp_socket_.cancel()` and waits for the chain to drain.
- **`closeSocket()` posts the close onto the strand** so it can't race the io thread.
- **Parser access is mutex-guarded** between the io thread (writer) and any sync `receive()` caller (reader).
- **`DEFAULT_SOCKET_RECEIVE_TIMEOUT_MS` lowered from 100 ms → 20 ms** now that the loop is async and the timeout only bounds the sync `receive()` public API.
- **Sync `receive()` refuses to run while the reception thread is active** — the two paths would otherwise interleave bytes between two parsers. Pick one mode per adapter instance.

Tests: new `test_axiomatic_concurrent_io` exercise; a localhost peer simultaneously consumes our `send()` traffic and emits inbound CAN Stream messages while one thread hammers `send()` and the reception chain delivers frames via callback. Ready to be re-run under `-fsanitize=thread` for the formal TSAN sign-off.

---

## Verification

### Automated (no hardware required)

`colcon build --merge-install --packages-select axiomatic_adapter --cmake-args -DBUILD_TESTING=ON`
`colcon test --merge-install --packages-select axiomatic_adapter`

| Test binary | Cases | Assertions |
|-------------|-------|------------|
| `test_axiomatic_frame_parser` | 19 | 203 |
| `test_axiomatic_send_wire_format` | 4 | 60 |
| `test_axiomatic_concurrent_io` | 2 | 10 |
| **Total** | **25** | **273** |

All green on a clean build.

### Still owed (manual, requires hardware)

- Re-run the existing hardware Catch2 suite at `192.168.0.34:4000` and confirm the previously flaky "first read" no longer fails.
- Capture a raw TCP stream from the converter for ~30 s while idle and feed those bytes through the parser; expect zero CAN frames, `heartbeats_seen() ≈ 30`, `dropped_bytes() == 0`.
- End-to-end ECU flash through the bridge — should succeed without retries.
- Latency probe: a single `send()` → `receive()` round trip should drop from the previous ~40 ms (first-frame Nagle) to sub-millisecond.
- Optional: rebuild with `-fsanitize=thread` and run `test_axiomatic_concurrent_io` for the TSAN sign-off on Phase 3.

---

## Public API impact

None for existing consumers. The `AxiomaticAdapter` constructor gained a defaulted `bool verbose = false` parameter, so old call sites compile unchanged. `AxiomaticSocketcanBridge` now forwards its `-v` flag into the adapter.
68 changes: 67 additions & 1 deletion axiomatic_adapter/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,10 @@ find_package(CLI11 REQUIRED)
find_package(socketcan_adapter REQUIRED)

# Axiomatic Adapter Library
add_library(axiomatic_adapter SHARED src/axiomatic_adapter.cpp)
add_library(axiomatic_adapter SHARED
src/axiomatic_adapter.cpp
src/axiomatic_frame_parser.cpp
)

target_compile_features(axiomatic_adapter PUBLIC c_std_99 cxx_std_17)
target_include_directories(axiomatic_adapter PUBLIC
Expand Down Expand Up @@ -100,6 +103,69 @@ if(BUILD_TESTING)

option(HARDWARE_CONNECTED "Enable tests that require a connected Axiomatic device and CAN interface" OFF)

# Pure-software tests: framer logic and the send-length wire encoding via a
# localhost TCP loopback. No Axiomatic hardware required.
#
# Sources are compiled directly into each test executable rather than linked
# via the shared library. This makes the tests independent of any older
# libaxiomatic_adapter.so that might be on LD_LIBRARY_PATH from a prior
# install (e.g. /opt/polymath) and would otherwise shadow the freshly-built
# symbols at runtime.
add_executable(test_axiomatic_frame_parser
test/axiomatic_frame_parser_test.cpp
src/axiomatic_frame_parser.cpp
)
target_include_directories(test_axiomatic_frame_parser PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/include
)
target_link_libraries(test_axiomatic_frame_parser PRIVATE
socketcan_adapter::socketcan_adapter
Catch2::Catch2WithMain
)
if(COMMAND catch_discover_tests)
catch_discover_tests(test_axiomatic_frame_parser)
else()
add_test(NAME test_axiomatic_frame_parser COMMAND test_axiomatic_frame_parser)
endif()

add_executable(test_axiomatic_send_wire_format
test/axiomatic_send_wire_format_test.cpp
src/axiomatic_adapter.cpp
src/axiomatic_frame_parser.cpp
)
target_include_directories(test_axiomatic_send_wire_format PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/include
)
target_link_libraries(test_axiomatic_send_wire_format PRIVATE
socketcan_adapter::socketcan_adapter
Boost::system
Catch2::Catch2WithMain
)
if(COMMAND catch_discover_tests)
catch_discover_tests(test_axiomatic_send_wire_format)
else()
add_test(NAME test_axiomatic_send_wire_format COMMAND test_axiomatic_send_wire_format)
endif()

add_executable(test_axiomatic_concurrent_io
test/axiomatic_concurrent_io_test.cpp
src/axiomatic_adapter.cpp
src/axiomatic_frame_parser.cpp
)
target_include_directories(test_axiomatic_concurrent_io PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/include
)
target_link_libraries(test_axiomatic_concurrent_io PRIVATE
socketcan_adapter::socketcan_adapter
Boost::system
Catch2::Catch2WithMain
)
if(COMMAND catch_discover_tests)
catch_discover_tests(test_axiomatic_concurrent_io)
else()
add_test(NAME test_axiomatic_concurrent_io COMMAND test_axiomatic_concurrent_io)
endif()

if(HARDWARE_CONNECTED)
add_executable(test_axiomatic_adapter test/axiomatic_adapter_test.cpp)
target_link_libraries(test_axiomatic_adapter PRIVATE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,21 +48,25 @@ class AxiomaticAdapter : public std::enable_shared_from_this<AxiomaticAdapter>
/// @brief Mapped to std lib, but should be remapped to Polymath Safety compatible versions
using socket_error_string_t = std::string;

static constexpr std::chrono::milliseconds DEFAULT_SOCKET_RECEIVE_TIMEOUT_MS{100};
static constexpr std::chrono::milliseconds DEFAULT_SOCKET_RECEIVE_TIMEOUT_MS{20};
static constexpr std::chrono::milliseconds JOIN_RECEPTION_TIMEOUT_MS{100};

/// @brief AxiomaticAdapter Class Init
/// @param ip_address Axiomatic Device IP address to connect
/// @param port Axiomatic Device Port to connect to
/// @param receive_timeout_ms receive timeout in milliseconds
/// @param verbose when true the parser prints a line to std::cout for each
/// non-CAN protocol message it sees (heartbeat, status, FD skip,
/// unknown, notification). Off by default.
AxiomaticAdapter(
const std::string & ip_address,
const std::string & port,
const std::function<void(std::unique_ptr<const polymath::socketcan::CanFrame> frame)> && receive_callback_function =
[](std::unique_ptr<const polymath::socketcan::CanFrame> /*frame*/) { /*do nothing*/ },
const std::function<void(socket_error_string_t error)> && error_callback_function =
[](socket_error_string_t /*error*/) { /*do nothing*/ },
const std::chrono::milliseconds & receive_timeout_ms = AxiomaticAdapter::DEFAULT_SOCKET_RECEIVE_TIMEOUT_MS);
const std::chrono::milliseconds & receive_timeout_ms = AxiomaticAdapter::DEFAULT_SOCKET_RECEIVE_TIMEOUT_MS,
bool verbose = false);

/// @brief Destructor for AxiomaticAdapter
virtual ~AxiomaticAdapter();
Expand Down
119 changes: 119 additions & 0 deletions axiomatic_adapter/include/axiomatic_adapter/axiomatic_frame_parser.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
// Copyright (c) 2025-present Polymath Robotics, Inc. All rights reserved
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#ifndef AXIOMATIC_ADAPTER__AXIOMATIC_FRAME_PARSER_HPP_
#define AXIOMATIC_ADAPTER__AXIOMATIC_FRAME_PARSER_HPP_

#include <array>
#include <cstddef>
#include <cstdint>
#include <deque>
#include <optional>
#include <vector>

#include "socketcan_adapter/can_frame.hpp"

namespace polymath
{
namespace can
{

/// @class polymath::can::AxiomaticFrameParser
/// @brief Stateful framer for the Axiomatic CAN-over-TCP byte stream.
///
/// The Axiomatic protocol wraps every transmission in an 11-byte envelope:
/// bytes 0-3 : 'A','X','I','O' tag
/// bytes 4-5 : Protocol ID 0xBA 0x36
/// bytes 6-7 : Message ID (little-endian) — selects payload format
/// byte 8 : Message Version (currently always 0)
/// bytes 9-10 : Message Data Length (little-endian) — bytes that follow
/// Total wire size of a message is therefore 11 + data_length. Only the first
/// six bytes are constant across message types; the parser uses bytes 9-10 as
/// the canonical framing field so heartbeats, status responses, and CAN FD
/// streams can be skipped cleanly without losing alignment.
///
/// A single CAN Stream message may carry multiple CAN frames and Notification
/// frames packed sequentially. The parser decodes all CAN frames into an
/// internal queue and delivers them one per tryParseFrame() call so the
/// existing one-frame-per-call consumer continues to work.
class AxiomaticFrameParser
{
public:
/// 6-byte constant prefix shared by every Axiomatic message type.
static constexpr std::array<uint8_t, 6> SYNC_PREFIX = {'A', 'X', 'I', 'O', 0xBA, 0x36};

/// Message IDs from the Axiomatic protocol spec.
static constexpr uint16_t MSG_ID_CAN_STREAM_DEPRECATED = 1;
static constexpr uint16_t MSG_ID_STATUS_RESPONSE = 3;
static constexpr uint16_t MSG_ID_HEARTBEAT = 4;
static constexpr uint16_t MSG_ID_CAN_FD_STREAM = 5;

/// Sanity bound on the declared Message Data Length; anything larger almost
/// certainly indicates the sync prefix matched garbage. Resync when exceeded.
static constexpr uint16_t MAX_REASONABLE_DATA_LENGTH = 2048;

AxiomaticFrameParser() = default;
~AxiomaticFrameParser() = default;

/// Append raw bytes from a TCP read into the internal buffer.
void append(const uint8_t * data, size_t len);

/// Pop the next CAN frame from the parser. Returns nullopt if more bytes
/// are required to assemble one. Non-CAN messages (heartbeat, status, etc.)
/// are consumed and counted without ever surfacing through this method.
std::optional<polymath::socketcan::CanFrame> tryParseFrame();

/// Discard all buffered bytes and pending frames (e.g. on reconnect).
void reset();

/// When enabled, the parser prints one std::cout line per heartbeat,
/// status response, FD-stream skip, unknown message, and notification frame
/// it encounters. Off by default.
void set_verbose(bool enabled) { verbose_ = enabled; }

// -------- Diagnostics ----------------------------------------------------
size_t buffered_bytes() const { return buffer_.size(); }
uint64_t dropped_bytes() const { return dropped_bytes_; }
uint64_t heartbeats_seen() const { return heartbeats_seen_; }
uint64_t status_responses_seen() const { return status_responses_seen_; }
uint64_t can_fd_messages_skipped() const { return can_fd_messages_skipped_; }
uint64_t unknown_messages_skipped() const { return unknown_messages_skipped_; }
uint64_t notification_frames_skipped() const { return notification_frames_skipped_; }

private:
/// Decode the body of a CAN Stream message: a packed sequence of CAN frames
/// (control byte bit 7 = 0) and Notification frames (bit 7 = 1, fixed 5 B).
/// Pushes every decoded CAN frame onto pending_frames_. Returns true if the
/// body parsed cleanly; false if a malformed control byte or truncated
/// frame was encountered mid-iteration (caller still advances past the
/// outer message using the protocol's data_length).
bool decodeCanStreamBody(const uint8_t * body, size_t body_len);

std::vector<uint8_t> buffer_;
std::deque<polymath::socketcan::CanFrame> pending_frames_;

bool verbose_{false};

uint64_t dropped_bytes_{0};
uint64_t heartbeats_seen_{0};
uint64_t status_responses_seen_{0};
uint64_t can_fd_messages_skipped_{0};
uint64_t unknown_messages_skipped_{0};
uint64_t notification_frames_skipped_{0};
};

} // namespace can
} // namespace polymath

#endif // AXIOMATIC_ADAPTER__AXIOMATIC_FRAME_PARSER_HPP_
Loading
Loading