Minga.SystemObserver (Minga v0.1.0)

Copy Markdown View Source

Collects BEAM process metrics and serves multiple visualization features from a single data source.

Three tiers of data collection, each with different cost profiles:

  1. Always-on (trivially cheap): Monitors named supervisors via Process.monitor/1. Tracks restart events and recovery times. Cost is one handle_info({:DOWN, ...}) per supervisor crash, which is rare. This powers Resilience-as-UX (#1109).

  2. On-demand polling (activated when subscribers exist): Walks the supervision tree via Supervisor.which_children/1, calls Process.info/2 for each process, and stores snapshots in a circular buffer (last 300 samples = 5 minutes at 1Hz). Activated when a UI panel subscribes, deactivated when all subscribers disconnect. This powers BEAM Observatory (#1081) and Living Architecture (#1098).

  3. Domain state queries (no collection): Downstream features read existing APIs (Agent.Session, etc.) directly. SystemObserver doesn't collect this data; it's listed here for completeness.

Supervision placement

Lives as the last child under Minga.Supervisor (top-level). This means it starts after Foundation, Services, and Runtime are all up, giving it full visibility into the process tree. With rest_for_one, a SystemObserver crash restarts nothing (nothing comes after it), and a Foundation/Services/Runtime crash restarts SystemObserver too (correct: re-establishes monitors).

Summary

Types

A snapshot of process metrics for the entire supervision tree.

t()

Internal state for the SystemObserver GenServer.

Functions

Returns a specification to start this module under a supervisor.

Classifies a process by registered name and child type.

Classifies a process for Observatory rendering.

Classifies a process for Observatory rendering using supervisor child modules when available.

Returns the restart history as a list, most recent first.

Returns all collected process tree samples as a list, oldest first.

Returns the latest process tree snapshot, or nil if no samples have been collected yet.

Starts the SystemObserver GenServer.

Subscribes the calling process to process tree snapshots.

Unsubscribes the calling process from process tree snapshots.

Types

child_modules()

@type child_modules() :: [module()] | :dynamic

child_type()

process_class()

process_tree_snapshot()

@type process_tree_snapshot() :: %{
  timestamp: integer(),
  processes: %{required(pid()) => Minga.SystemObserver.ProcessSnapshot.t()}
}

A snapshot of process metrics for the entire supervision tree.

t()

@type t() :: %{
  monitors: %{required(reference()) => atom()},
  restart_history: [Minga.SystemObserver.RestartRecord.t()],
  subscribers: MapSet.t(pid()),
  subscriber_monitors: %{required(reference()) => pid()},
  samples: :queue.queue(process_tree_snapshot()),
  sample_count: non_neg_integer(),
  poll_timer: reference() | nil
}

Internal state for the SystemObserver GenServer.

Functions

child_spec(init_arg)

Returns a specification to start this module under a supervisor.

See Supervisor.

classify_process(registered_name, child_type)

@spec classify_process(atom() | nil, child_type()) :: process_class()

Classifies a process by registered name and child type.

classify_process(pid, registered_name, child_type)

@spec classify_process(pid(), atom() | nil, child_type()) :: process_class()

Classifies a process for Observatory rendering.

classify_process(pid, registered_name, child_type, child_modules)

@spec classify_process(pid(), atom() | nil, child_type(), child_modules()) ::
  process_class()

Classifies a process for Observatory rendering using supervisor child modules when available.

restart_history(server \\ __MODULE__)

@spec restart_history(GenServer.server()) :: [Minga.SystemObserver.RestartRecord.t()]

Returns the restart history as a list, most recent first.

The last 50 restart events are retained. This is always available (always-on tier), regardless of subscriber count.

samples(server \\ __MODULE__)

@spec samples(GenServer.server()) :: [process_tree_snapshot()]

Returns all collected process tree samples as a list, oldest first.

The maximum number of samples retained is 300 (5 minutes at 1Hz). Returns an empty list if polling has not been activated.

snapshot(server \\ __MODULE__)

@spec snapshot(GenServer.server()) :: process_tree_snapshot() | nil

Returns the latest process tree snapshot, or nil if no samples have been collected yet.

This is a one-shot query. For continuous monitoring, subscribe and read samples/0 periodically.

start_link(opts \\ [])

@spec start_link(keyword()) :: GenServer.on_start()

Starts the SystemObserver GenServer.

subscribe(server \\ __MODULE__)

@spec subscribe(GenServer.server()) :: :ok

Subscribes the calling process to process tree snapshots.

While at least one subscriber exists, SystemObserver polls the process tree at 1Hz and stores snapshots. The subscriber receives no messages from SystemObserver directly; use snapshot/0 or samples/0 to read the collected data.

The subscriber is automatically unsubscribed when it exits.

unsubscribe(server \\ __MODULE__)

@spec unsubscribe(GenServer.server()) :: :ok

Unsubscribes the calling process from process tree snapshots.

If this was the last subscriber, polling stops.