A complete specification for adding structured logging and developer tracing to every module
in the CLI package (src/agentweave/) and the Hub server (hub/hub/).
AgentWeave uses two separate logging channels that serve different purposes:
Channel
API
Destination (offline)
Destination (Hub connected)
Who reads it
Structured events
log_event() from eventlog.py
.agentweave/logs/events.jsonl
Local file + Hub /api/v1/logs
agentweave log, Hub UI, human operators
Developer tracing
Python logging module
stderr or AW_LOG_FILE
stderr or AW_LOG_FILE only — never Hub
Developers debugging a running process
These are not interchangeable. Use each channel for its intended purpose:
log_event() — every observable business event: message sent, task created, lock
timeout, transport error, watchdog spawn, agent exit. These events are the audit trail.
They are always written locally and forwarded to the Hub when the HTTP transport is active.
logger.* — fine-grained developer tracing: every function entry, every file read,
every poll iteration. These never leave the local process. They are off by default
(AW_LOG_LEVEL=WARNING) and surfaced only when a developer is investigating a problem.
Always appends a JSON line to .agentweave/logs/events.jsonl (swallows exceptions — never
crashes the caller).
If severity ∈ {INFO, WARN, ERROR} and get_transport() returns an HTTP transport,
also calls transport.push_log(event, agent, data, severity) to POST to Hub /api/v1/logs.
DEBUG events are local-only — they are never pushed to the Hub (too noisy).
┌─────────────────────────────────┐
log_event("x", WARN) │ .agentweave/logs/events.jsonl │ ← always (offline + online)
│ └─────────────────────────────────┘
│
└── severity ≥ INFO and transport == http
┌─────────────────────────────────┐
│ Hub POST /api/v1/logs │ ← only when HTTP transport
└─────────────────────────────────┘
Before adding ~90 new log_event() calls across the codebase, it is worth asking whether
eventlog.py is a solid base to build on — or whether Python's standard logging library
would serve better.
Cannot silence a specific noisy subsystem (e.g. transport.git) without touching code
get_transport() called on every WARN/ERROR
Creates a new transport instance per event — expensive for GitTransport
No exc_info capture
Exceptions are stored as str(e) — stack traces are lost
Severity mismatch
Uses "warn" instead of "warning" — out of step with every Python tool and log aggregator
format_event() does not scale
Every new event name needs a new if ev == "..." branch to render in agentweave log
Two parallel systems
Adding logger.debug() everywhere (which this guide requires for tracing) means two separate logging systems running side by side — confusing to maintain
Reinvented wheel
Handler/filter/formatter/rotation infrastructure that Python provides for free has to be rebuilt manually
Recommendation: migrate to Python's logging + two custom handlers¶
Replace log_event() with Python's standard logging module and two lightweight custom
handlers that replicate eventlog.py's behaviour while adding everything it lacks:
Handler 1 — JSONRotatingFileHandler (replaces the file-write block in log_event()):
# src/agentweave/logging_handlers.pyimportjsonimportloggingimportlogging.handlersfromdatetimeimportdatetimeclassJSONRotatingFileHandler(logging.handlers.RotatingFileHandler):"""Writes one JSON object per line to a rotating log file."""defemit(self,record:logging.LogRecord)->None:try:entry={"ts":datetime.fromtimestamp(record.created).isoformat(timespec="seconds"),"event":getattr(record,"event",record.getMessage()),"severity":record.levelname.lower().replace("warning","warn"),**getattr(record,"data",{}),}# RotatingFileHandler manages the file and rotationself.stream.write(json.dumps(entry)+"\n")self.flush()self.doRollover()ifself.shouldRollover(record)elseNoneexceptException:self.handleError(record)
Handler 2 — HubHandler (replaces the push_log() side-effect in log_event()):
classHubHandler(logging.Handler):"""Forwards INFO/WARN/ERROR records to the Hub when HTTP transport is active. Never raises — logging failures must never crash the caller. push_log() itself must not log; see §4 recursion guard. """defemit(self,record:logging.LogRecord)->None:ifrecord.levelno<logging.INFO:return# DEBUG stays localtry:from.transportimportget_transportt=get_transport()ift.get_transport_type()=="http":agent=getattr(record,"data",{}).get("agent","system")severity=record.levelname.lower().replace("warning","warn")t.push_log(getattr(record,"event",record.getMessage()),str(agent),getattr(record,"data",{}),severity,)exceptException:pass# Never raise from a log handler
Configuration (cli.py and watchdog.py):
importloggingimportlogging.handlersimportosfrompathlibimportPathdef_configure_logging(log_dir:Path)->None:root=logging.getLogger("agentweave")root.setLevel(logging.DEBUG)# handlers control what actually emits# Handler 1: JSONL file (structured events + developer tracing)jsonl_path=log_dir/"events.jsonl"file_handler=JSONRotatingFileHandler(jsonl_path,maxBytes=10*1024*1024,backupCount=5,encoding="utf-8")file_handler.setLevel(logging.DEBUG)# Handler 2: Hub forwarding (INFO+ only)hub_handler=HubHandler()hub_handler.setLevel(logging.INFO)# Handler 3: stderr for developer tracing (respects AW_LOG_LEVEL)level_name=os.environ.get("AW_LOG_LEVEL","WARNING").upper()stderr_handler=logging.StreamHandler()stderr_handler.setLevel(getattr(logging,level_name,logging.WARNING))root.addHandler(file_handler)root.addHandler(hub_handler)root.addHandler(stderr_handler)
Call sites — callers use one unified API instead of two:
# Before (two separate calls for different purposes):log_event("msg_sent",severity=INFO,msg_id=mid,**{"from":sender},to=recipient)logger.debug("[Messaging] send_message %s → %s",sender,recipient)# After (one call, handler routing is automatic):logger.info("msg_sent",extra={"event":"msg_sent","data":{"msg_id":mid,"from":sender,"to":recipient}},)logger.debug("[Messaging] send_message %s → %s",sender,recipient)# ↑ same logger, DEBUG goes to file only, INFO goes to file + Hub
The rest of this document is written against current state (log_event() + logger.*)
because that is what exists today. When the migration above is done:
log_event() calls t.push_log(...) with a # type: ignore[attr-defined] annotation
because push_log() is only implemented on HttpTransport. LocalTransport and
GitTransport do not have it. This is a latent AttributeError if the transport check
ever silently fails.
Fix: Add push_log() to BaseTransport as a no-op default, exactly like push_session()
was added:
# src/agentweave/transport/base.pydefpush_log(self,event_type:str,agent:str,data:Optional[Dict[str,Any]],severity:str,)->None:"""Push a log event to the backend (no-op on non-HTTP transports)."""return
Then remove the # type: ignore from eventlog.py and call t.push_log() unconditionally
(non-HTTP transports simply return immediately).
Gap 2 — Python logging is not configured anywhere¶
The CLI package and watchdog subprocess never call logging.basicConfig(). As a result,
logger.debug(...) calls throughout the codebase are silently discarded with no handler
warning.
Fix: Add _configure_logging() to cli.py and watchdog.py (see §4.2 below).
Gap 3 — eventlog.py has no formatter for most new event types¶
format_event() in eventlog.py formats known event types (msg_sent, task_status, etc.)
and falls back to a generic repr for everything else. As new event names are added, add a
matching if ev == "new_event_name": branch to format_event() so that agentweave log
renders them cleanly.
Failures that prevent an operation from completing
CRITICAL does not exist in eventlog.py. Use ERROR for the most severe events.
Python's logging.CRITICAL is available for Hub startup failures (see §6).
The Hub server receives log_event() payloads from the CLI via POST /api/v1/logs.
It does not call log_event() itself. For its own internal tracing it uses Python's
logging module, integrated with uvicorn:
Critical rule: push_log() must never call log_event() or logger.*.
Doing so would create an infinite loop: log_event → push_log → log_event → ….
Silently swallow all exceptions inside push_log() with a bare except Exception: pass.
The same applies to anything called from inside push_log().
Other methods may use log_event() normally (they already do — transport_error is emitted
from push_session()). Extend this pattern:
The Hub is the backend. It receives log_event() payloads from CLI via POST /api/v1/logs
and stores them in the EventLog table. For its own internal observability it uses Python's
logging module only — no log_event() calls.
For all REST endpoints, the pattern is:
- DEBUG for query parameters and result counts (too noisy for INFO)
- INFO for state-changing operations: message created, task updated, question answered
- WARNING for 404s and invalid input
- ERROR for unexpected DB or SSE failures
Initial draft — incorrectly used Python logging as the primary mechanism throughout
v2
Corrected to use log_event() as primary channel; identified push_log() gap in BaseTransport; fixed severity constant names (WARN not WARNING); separated Hub server from CLI logging; listed existing event names
v3
Added §2: honest assessment of eventlog.py weaknesses, recommendation to migrate to Python logging + JSONRotatingFileHandler + HubHandler, migration plan table, before/after call-site examples; expanded existing event name table with all watchdog events
Hub rows show 0 for log_event() because the Hub stores those events, it does not
emit them. CLI log_event() calls reach the Hub automatically via push_log() when
the HTTP transport is active.
After the §2 migration: all 89 log_event() calls become logger.* calls with structured
extra= data. The total count stays the same; the column split disappears.