mdrideout
diff --git a/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎HARDENING_PLAN.md‎
Lines changed: 33 additions & 20 deletions b/‎HARDENING_PLAN.md‎
Lines changed: 33 additions & 20 deletions
diff --git a/‎docs/opentelemetry.rst‎
Lines changed: 50 additions & 0 deletions b/‎docs/opentelemetry.rst‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎examples/base/src/base/main.py‎
Lines changed: 18 additions & 4 deletions b/‎examples/base/src/base/main.py‎
Lines changed: 18 additions & 4 deletions
diff --git a/‎examples/base/src/base/otel_config.py‎
Lines changed: 7 additions & 2 deletions b/‎examples/base/src/base/otel_config.py‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎examples/base/src/base/sample_workflow/hooks.py‎
Lines changed: 14 additions & 7 deletions b/‎examples/base/src/base/sample_workflow/hooks.py‎
Lines changed: 14 additions & 7 deletions
diff --git a/‎src/junjo/__init__.py‎
Lines changed: 4 additions & 0 deletions b/‎src/junjo/__init__.py‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎src/junjo/hooks.py‎
Lines changed: 4 additions & 1 deletion b/‎src/junjo/hooks.py‎
Lines changed: 4 additions & 1 deletion
@@ -63,6 +63,9 @@ execution, state management, and lifecycle observation.
 - Workflow telemetry now records `junjo.workflow.execution_graph_snapshot` to make it explicit that the graph payload is an execution-scoped compiled snapshot containing both runtime and structural identities.
 - Failed workflow, subflow, node, and concurrent spans now set the standard OpenTelemetry `error.type` attribute in addition to Junjo-specific error metadata.
 - `JunjoOtelExporter` now exposes `shutdown()`, and the docs/examples now teach provider shutdown as the normal OpenTelemetry lifecycle while keeping `flush()` as a targeted manual drain tool.
+- Core library execution paths now use the standard Python logging system under the `junjo` logger hierarchy instead of writing directly to stdout, and the library root now installs only a `NullHandler`.
+- `JunjoOtelExporter.flush()` and `shutdown()` now log warning details through `junjo.telemetry` when Junjo-owned export components fail or refuse to flush cleanly.
+- Runtime log records now include run-scoped correlation fields such as `run_id`, and propagated execution failures are logged once at the owning workflow or subflow boundary instead of emitting duplicate stack traces from nested execution layers.
 - `on_state_changed` hook payloads and state-change telemetry context now identify the active executable that performed the mutation, rather than mixing workflow metadata with node or subflow runtime identities.
 - Hook callback failures are now recorded as `junjo.hook_error` events on the surrounding workflow, subflow, node, or concurrent span, and terminal hooks now dispatch before span close so those events stay attached to the real execution span.
 - Lifecycle observation examples and docs now show hook registration as a separate concern from workflow definition.
 
@@ -21,17 +21,19 @@ summarized briefly. Open work is described in more detail.
 - Removal of the old subscriber implementation
 - Replacement of `HookManager` with the new `Hooks` + internal lifecycle split
 - Graph validation, compilation, structural IDs, and rendering hardening
+- Standard OpenTelemetry `error.type` alignment on failed spans
+- Provider-owned OpenTelemetry shutdown lifecycle with wrapper-local `shutdown()` / `flush()` semantics
+- Hook callback failure telemetry attached to the surrounding execution span instead of standalone hook-error spans
 - Public docstring, docs, and example alignment for the hardened runtime model
+- Changelog and agent guidance cleanup for the current hardening work
 
 ### Partially Complete
 
 - Regression coverage for major known runtime/store failures
-- Documentation and example truthfulness
-- Changelog and agent guidance cleanup
+- Observability operational safety
 
 ### Still Open
 
-- Production-safe observability controls
 - Release/process discipline improvements
 
 ## Completed Work
@@ -72,17 +74,20 @@ Delivered:
 - Runtime execution, internal lifecycle dispatch, public hooks, and telemetry are separated.
 - Hook failures are isolated and recorded without failing workflow execution.
 - State change hooks now receive detached state snapshots and JSON patch payloads.
+- Hook callback failures are now recorded on the surrounding execution span instead of being modeled as standalone hook-error spans.
+- Terminal hooks now dispatch before span close so hook failure telemetry stays attached to the real workflow, subflow, node, or concurrent span.
 
 ### 4. Docs And Example Truthfulness
 
-Status: largely completed
+Status: completed
 
 Delivered:
 
 - Public runtime docstrings were updated instead of being shortened away.
 - `Workflow` and `Subflow` constructor docs remain on `__init__` for generated docs and hover help.
 - Examples now separate workflow definition from hook/logging wiring.
 - Hook documentation and examples now show real usage rather than placeholder configuration.
+- OpenTelemetry docs now describe provider-owned shutdown and state-model-controlled telemetry serialization truthfully.
 
 ### 5. Graph Hardening
 
@@ -116,36 +121,43 @@ Delivered:
 
 ## Phase B - Observability Operational Safety
 
+Status: partially complete
+
+### Delivered so far
+
+- Failed workflow, subflow, node, and concurrent spans now set standard OpenTelemetry `error.type`.
+- `JunjoOtelExporter` now exposes `shutdown()`, and docs/examples now teach provider shutdown as the normal lifecycle instead of exporter-local flush on exit.
+- Hook callback failures now stay attached to the surrounding execution span rather than creating standalone hook-error spans.
+- State telemetry docs now explain the current control point clearly:
+  state snapshots and JSON patches follow the state model's Pydantic serialization.
+- Core runtime execution paths now emit through package logging instead of direct `print()`.
+- Junjo now uses the standard Python `logging` package under the `junjo` logger hierarchy and installs only a `NullHandler` at the library root.
+- Exporter-local `flush()` and `shutdown()` failures now log through `junjo.telemetry` instead of failing silently.
+- Runtime log records now carry run-scoped correlation fields, and propagated failures now log once at the owning workflow or subflow boundary instead of duplicating nested stack traces.
+
 ### Why this is still open
 
-Telemetry correctness improved, but operational controls are still missing.
-Core runtime paths still use `print()` and there is no real telemetry
-configuration model for redaction, payload size, or capture profiles.
+Telemetry correctness improved, but the library still has no explicit
+library-level telemetry capture policy for redaction, payload size ceilings, or
+different observability profiles.
 
 ### Remaining changes
 
-- Replace direct runtime `print()` calls with package logging.
-- Remove stdout emission from shipped library execution paths, currently including:
-  - workflow start / progress / completion / failure messages in ``src/junjo/workflow.py``
-  - node failure prints in ``src/junjo/node.py``
-  - run-concurrent start / completion / failure prints in ``src/junjo/run_concurrent.py``
-- Define package logger names, log levels, and expectations for library consumers.
-- Keep example-app logging and demo ``print()`` usage out of the core library runtime.
-- Introduce explicit telemetry configuration:
+- Decide whether Junjo needs explicit library-level telemetry capture configuration beyond the current state-model serialization controls and current graph snapshot defaults.
+- If explicit capture controls are added, design them deliberately:
   - state capture policy
   - graph capture policy
   - patch capture policy
   - redaction/masking support
   - size ceilings
   - AI Studio vs generic OTLP profiles
-- Add explicit exporter lifecycle behavior such as shutdown/flush expectations.
-- Consider versioning Junjo-specific telemetry schema fields.
+- Decide later whether point-in-time lifecycle telemetry should stay span/event-first or move toward correlated logs; current instrumentation remains span/event-first intentionally.
+- Consider versioning Junjo-specific telemetry schema fields once the capture model stabilizes.
 
 ### Exit criteria
 
-- Core runtime emits through logging instead of `print()`.
-- Telemetry payload controls are configurable and documented.
-- Exporter lifecycle and failure behavior are explicit.
+- The repo has a clear documented stance on telemetry payload controls, whether that remains state-model serialization only or expands into explicit capture configuration.
+- Exporter lifecycle and failure behavior remain explicit and accurate in docs/examples.
 
 ## Phase C - Quality Gates And Release Discipline
 
@@ -195,4 +207,5 @@ The highest-risk runtime correctness work is already done. The remaining work is
 primarily about:
 
 - making observability production-safe
+- deciding how much explicit telemetry capture policy Junjo should own as a library
 - making release quality enforceable by process instead of memory
@@ -70,6 +70,56 @@ It also exposes:
 Use ``flush()`` for targeted cases such as tests or short-lived scripts. Use
 provider shutdown for the normal application lifecycle.
 
+Library Logging
+===============
+
+Junjo emits library logs under the ``junjo`` logger hierarchy. Applications own
+handlers, formatting, and log levels.
+
+The main library loggers are:
+
+- ``junjo.workflow``
+- ``junjo.node``
+- ``junjo.run_concurrent``
+- ``junjo.telemetry``
+
+Junjo does not install real log handlers of its own. If you want to see Junjo
+execution diagnostics, configure logging in your application and opt in to the
+``junjo`` logger namespace.
+
+.. code-block:: python
+
+    import logging
+
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(levelname)s %(name)s %(message)s",
+    )
+    logging.getLogger("junjo").setLevel(logging.DEBUG)
+
+With that configuration, Junjo emits debug-level execution progress through the
+standard Python logging system without taking over your application's logging
+setup.
+
+Runtime log records include run-scoped correlation fields through standard
+logging ``extra`` attributes when that execution context exists:
+
+- ``run_id``
+- ``executable_definition_id``
+- ``executable_runtime_id``
+- ``span_type``
+
+Applications using structured logging handlers can capture those fields
+directly from the log record without parsing log message text.
+
+Execution failures are logged at the owning workflow or subflow boundary so one
+propagated failure produces one library-owned error log instead of multiple
+stack traces from each nested execution layer.
+
+Exporter-local warning logs under ``junjo.telemetry`` also include the OTLP
+``endpoint`` on the log record so operational failures can be tied back to the
+destination that failed.
+
 Choosing an OpenTelemetry Exporter
 ===================================
 
 
@@ -1,31 +1,45 @@
+import logging
+
 from dotenv import load_dotenv
 
 from base.otel_config import init_otel
 from base.sample_workflow.hooks import create_logging_hooks
 from base.sample_workflow.workflow import create_sample_workflow
 
+logger = logging.getLogger(__name__)
+
+
+def configure_logging() -> None:
+    """Configure application logging for the base example."""
+    logging.basicConfig(
+        level=logging.INFO,
+        format="%(levelname)s %(name)s %(message)s",
+    )
+    logging.getLogger("junjo").setLevel(logging.DEBUG)
+
 
 async def main():
     """The main entry point for the application."""
 
     # Load the environment variables
     load_dotenv()
+    configure_logging()
 
     # Setup OpenTelemetry before anything else happens
     telemetry_providers = init_otel(service_name="Junjo Base Example")
 
     try:
         workflow = create_sample_workflow(hooks=create_logging_hooks())
 
-        print("Executing the workflow...")
+        logger.info("Executing the workflow...")
         result = await workflow.execute()
-        print("Final state: ", result.state.model_dump_json())
+        logger.info("Final state: %s", result.state.model_dump_json())
 
-        print("Done executing the base example workflow.")
+        logger.info("Done executing the base example workflow.")
     finally:
         if telemetry_providers is not None:
             tracer_provider, meter_provider = telemetry_providers
-            print("Shutting down OpenTelemetry providers...")
+            logger.info("Shutting down OpenTelemetry providers...")
             tracer_provider.shutdown()
             meter_provider.shutdown()
 
 
@@ -1,3 +1,4 @@
+import logging
 import os
 
 from junjo.telemetry.junjo_otel_exporter import JunjoOtelExporter
@@ -7,6 +8,8 @@
 from opentelemetry.sdk.resources import Resource
 from opentelemetry.sdk.trace import TracerProvider
 
+logger = logging.getLogger(__name__)
+
 
 def init_otel(
     service_name: str,
@@ -16,8 +19,10 @@ def init_otel(
     # Load the JUNJO_AI_STUDIO_API_KEY from the environment variable
     JUNJO_AI_STUDIO_API_KEY = os.getenv("JUNJO_AI_STUDIO_API_KEY")
     if JUNJO_AI_STUDIO_API_KEY is None:
-        print("JUNJO_AI_STUDIO_API_KEY environment variable is not set. "
-                         "Generate a new API key in the Junjo AI Studio UI.")
+        logger.warning(
+            "JUNJO_AI_STUDIO_API_KEY environment variable is not set. "
+            "Generate a new API key in the Junjo AI Studio UI."
+        )
         return None
 
     # Configure OpenTelemetry for this application
 
@@ -1,5 +1,7 @@
 from __future__ import annotations
 
+import logging
+
 from junjo import Hooks
 from junjo.hooks import (
     LifecycleEvent,
@@ -11,6 +13,8 @@
 from base.sample_workflow.sample_subflow.store import SampleSubflowState
 from base.sample_workflow.store import SampleWorkflowState
 
+logger = logging.getLogger(__name__)
+
 
 def _base_event_details(event: LifecycleEvent) -> dict:
     details = {
@@ -55,15 +59,16 @@ def _format_event_details(event: LifecycleEvent) -> dict:
 
 
 def _log_event(event: LifecycleEvent) -> None:
-    print(f"[hook] {event.hook_name}", _format_event_details(event))
+    logger.info("[hook] %s %s", event.hook_name, _format_event_details(event))
 
 
 def _log_workflow_completed(
     event: WorkflowCompletedEvent[SampleWorkflowState],
 ) -> None:
     state = event.result.state
-    print(
-        f"[hook] {event.hook_name}",
+    logger.info(
+        "[hook] %s %s",
+        event.hook_name,
         {
             **_base_event_details(event),
             "counter": state.counter,
@@ -78,8 +83,9 @@ def _log_subflow_completed(
     event: SubflowCompletedEvent[SampleSubflowState],
 ) -> None:
     state = event.result.state
-    print(
-        f"[hook] {event.hook_name}",
+    logger.info(
+        "[hook] %s %s",
+        event.hook_name,
         {
             **_base_event_details(event),
             "item_count": len(state.items or []),
@@ -91,8 +97,9 @@ def _log_subflow_completed(
 
 def _log_state_changed(event: StateChangedEvent[SampleWorkflowState]) -> None:
     state = event.state
-    print(
-        f"[hook] {event.hook_name}",
+    logger.info(
+        "[hook] %s %s",
+        event.hook_name,
         {
             **_base_event_details(event),
             "action_name": event.action_name,
 
@@ -7,6 +7,8 @@
 This library also produces annotated Opentelemetry Spans to help make sense of
 execution telemetry.
 """
+import logging
+
 from .condition import Condition
 from .edge import Edge
 from .graph import (
@@ -32,6 +34,8 @@
     Workflow,
 )
 
+logging.getLogger("junjo").addHandler(logging.NullHandler())
+
 __all__ = [
     "Condition",
     "Graph",
 
@@ -220,10 +220,13 @@ class Hooks:
 
     .. code-block:: python
 
+        import logging
+
         hooks = Hooks()
+        logger = logging.getLogger(__name__)
 
         def log_completed(event: WorkflowCompletedEvent[MyState]) -> None:
-            print(event.hook_name, event.result.state.model_dump())
+            logger.info("%s %s", event.hook_name, event.result.state.model_dump())
 
         hooks.on_workflow_completed(log_completed)