Python API¶
Reference for the two Python entry points. For a guided tour with full workloads, see Python examples. For a 60-second introduction, see the Quickstart.
codegreen.Session— manual span-based measurement, imported and used directly in your code.- CLI auto-instrumenter — runs
codegreen measure ...over a script, injects checkpoints automatically.
Both share the same NEMB C++ backend, the same JSON output envelope, and the same libcodegreen-nemb.so ABI (v2+). They can coexist in one process.
Manual API: codegreen.Session¶
For end-to-end examples, see Python examples → Manual measurement with codegreen.Session.
import codegreen
with codegreen.Session("training-run") as s:
with s.task("data_load"):
load_data()
with s.task("train"):
train_model()
By default, results are written to codegreen_<pid>.json in the working directory. CSV is opt-in (pass output_file="x.csv" or output_format="csv"). Pass save_to_file=False to suppress file output.
Three usage forms are supported — context manager, explicit start_task / stop_task, and @codegreen.task decorator. Full code for each is in Python examples.
Constructor parameters¶
| Param | Default | Notes |
|---|---|---|
name |
"default" |
Session name written to output |
output_file |
codegreen_<pid>.json |
Output path; CSV chosen automatically when path ends in .csv |
output_format |
"auto" |
"auto" | "json" | "csv" | "none"; "auto" sniffs from extension, defaults to JSON |
save_to_file |
True |
Set False to suppress file writes entirely |
warn_on_concurrent |
True |
Warn at construction if another codegreen process is active on the same host (RAPL is system-wide) |
record_time_series |
False |
Capture sampled (timestamp, power, energy, per-domain) tuples for each task |
buffer_samples |
None |
Power-user override of the C++ ring-buffer size; usually unnecessary because Python drain is adaptive |
sample_interval_ms |
None (uses config.json) |
Per-session override of the sampler's measurement interval; routes to the existing coordinator.measurement_interval_ms field via nemb_set_measurement_interval_ms — no parallel state |
sampling_mode |
"fixed" |
"adaptive" is reserved for a future runtime-rate-control mode; today only "fixed" is implemented |
Output schema¶
Top-level keys: session_name, tasks (list of task dicts), totals (energy_j, duration_s, n_tasks), providers, abi_version. Per-task fields match the TaskResult dataclass.
{
"session_name": "training-run",
"tasks": [
{"name": "data_load", "depth": 0, "parent": null,
"energy_j": 12.4, "avg_power_w": 4.0, "duration_s": 3.1,
"started_at": 1714155600.123, "ended_at": 1714155603.234,
"domains": {"package-0": 10.2, "core": 0.8, "gpu0": 1.4},
"timeseries": [
{"t_ns": 20364878312447553, "energy_j": 7.94, "power_w": 37.4,
"domain_j": {"core": 0.0018, "package-0": 7.92, "gpu0": 0.022},
"domain_w": {"core": 0.27, "package-0": 31.5, "gpu0": 5.6}}
]}
],
"totals": {"energy_j": 857.4, "duration_s": 123.1, "n_tasks": 2},
"abi_version": 3
}
domains— per-domain RAPL/NVML energy for the task, computed atomically with the session stop (ABI v2 — race-free under concurrent threads).timeseries— present only whenrecord_time_series=True(ABI v3+). Each sample is self-describing:
| Key | Type | Unit | Meaning |
|---|---|---|---|
t_ns |
int |
nanoseconds | CLOCK_MONOTONIC timestamp at sample (Linux); mach_continuous_time on macOS; QueryPerformanceCounter on Windows — all converted to ns |
energy_j |
float |
joules | system-wide cumulative energy from session start (sum across all providers) |
power_w |
float |
watts | system-wide instantaneous power at this sample (sum across all domains) |
domain_j |
Dict[str,float] |
joules | per-domain cumulative energy from session start (e.g. package-0, core, dram, gpu0) |
domain_w |
Dict[str,float] |
watts | per-domain average power since the previous sample. Domains whose provider does not expose per-domain power (Darwin IOReport, Windows EMI, AMD RAPL) are absent rather than reported as 0, so callers can distinguish "0 W" from "not measured" |
So to get only GPU watts directly: [s["domain_w"].get("gpu0", 0.0) for s in ts].
TaskResult fields¶
| Field | Type | Meaning |
|---|---|---|
name |
str |
task name passed to start_task / task() |
energy_j |
float |
total joules during the task (atomic via nemb_stop_session_v2) |
avg_power_w |
float |
average watts over the task window |
duration_s |
float |
wall-clock seconds |
started_at, ended_at |
float |
wall-clock epoch seconds |
depth, parent |
int, Optional[str] |
nesting info |
domains |
Dict[str, float] |
per-RAPL/NVML domain energy (J) for the task |
timeseries |
Optional[List[Dict]] |
[{t_ns, energy_j, power_w, domain_j, domain_w}] samples; present only when record_time_series=True |
noise |
Optional[Dict] |
quality summary for the time-series; populated only when record_time_series=True |
See the schema table above for each timeseries sample's keys (t_ns, energy_j, power_w, domain_j, domain_w) — all keys carry their unit suffix.
Noise / quality reporting¶
When record_time_series=True, every task carries a noise dict and totals carry a roll-up:
"noise": {
"samples_captured": 2847,
"samples_expected": 3000,
"drop_ratio": 0.0510,
"power_mean_w": 102.3,
"power_std_w": 7.4,
"power_cv_percent": 7.25,
"sample_interval_ms": 1,
"quality": "moderate"
},
"totals": {
...,
"worst_power_cv_percent": 7.25,
"noise_warnings": []
}
quality is bucketed by power_cv_percent: excellent <2 %, good <5 %, moderate <10 %, high-noise ≥10 %. A RuntimeWarning is emitted (and the task is added to totals.noise_warnings) when CV ≥10 % or drop_ratio ≥20 % so the user is told that the measurement is unreliable instead of silently using a noisy number. Computation runs once at stop() time (purely on already-captured samples) and is independently verified to add no measurement bias of its own (~0.05 % vs record_time_series=False on identical workloads).
Note — slight overhead when record_time_series=True.
The drain thread that pulls samples out of the C++ ring buffer is cheap but not free. On reproducibility benchmarks (3 fresh subprocesses each, identical workload):
- The mean energy/duration is unchanged:
record_time_series=Truevs=Falseagreed to ≤ 0.3 % (within run-to-run jitter). - The run-to-run spread is slightly wider with sampling on (CV of total energy ~5 % vs ~1 % off) because the drain wakes up at irregular intervals and competes briefly with the workload for CPU.
So enabling time-series gives you per-sample power, plot export and the noise/quality summary, at the cost of a marginally noisier individual total. Best-of-both-worlds: use it during development to inspect power traces and pick the right code regions, then turn it off for production benchmark runs where you want the tightest possible run-to-run CV.
Power-vs-time plotting¶
record_time_series=True collects samples at the coordinator's configured rate (config.json's coordinator.measurement_interval_ms, default 1 ms on this build). The Session.export_plot(path) helper renders a power-vs-time chart per task; area under the curve equals the task's energy.
with codegreen.Session("training", record_time_series=True) as s:
with s.task("epoch1"): train_one_epoch()
with s.task("epoch2"): train_one_epoch()
s.export_plot("training.html") # Plotly (interactive)
s.export_plot("training.png") # Matplotlib (static image)
Numerically, integrating w(t) over a task's window with the trapezoidal rule recovers the NEMB-reported energy_j to within ~0.2% (verified on a 5 s task with ~4,800 samples).
Time-series correctness for long tasks¶
The C++ sampling ring buffer is fixed-size (default 1000 samples — at the default 1 ms interval that's a ~1 s window; with sample_interval_ms=10 it's a ~10 s window, etc.). To prevent silent loss on long tasks, the Session runs a Python drain thread that pulls samples out faster than the buffer rotates. Drain is adaptive:
- starts at 0.5 s,
- halves to a 50 ms floor when buffer >50% saturated on a single drain pass,
- doubles to a 2 s ceiling when <10% for three consecutive drains,
- emits a warning at >90% saturation suggesting
buffer_samplesoverride.
Verified on a 30-second task with defaults only: 28,460 samples, full span, zero gaps >50 ms.
Sampling rate¶
Pre-existing: config.json's coordinator.measurement_interval_ms is the startup default (loaded by nemb::ConfigLoader::load_config()).
Per-session override: pass sample_interval_ms=N to Session(...) — it calls nemb_set_measurement_interval_ms which writes the same config_.measurement_interval field the sample loop reads. No parallel sampling-rate state, no duplicate config parsing.
Behavior rules¶
- Single session per process. Constructing a second
Sessionwhile one is active raisesRuntimeError. - Mismatched stops raise
RuntimeErrorwith the actual innermost task name. - Forgotten
.stop()is recovered by anatexithook — the file is still written, the JSON envelope still emitted. - Concurrent threads can each maintain their own task stack (per-thread).
nemb_stop_session_v2makes domain breakdown race-free. - Forked children become no-ops automatically; only the parent process reports.
- No NEMB lib loaded (CodeGreen built without C++ backend) → Session degrades to a warning + zero-energy results, your program still runs.
Multi-process / RAPL caveat¶
RAPL counters are system-wide, not per-process. If two CodeGreen sessions overlap in wall time on the same socket, both readings include the other's energy (double-counting). The Session constructor warns when it detects another live CodeGreen pid via $XDG_RUNTIME_DIR/codegreen-<uid>.pids. For benchmarks, run sequentially or accept "system energy during this window" semantics.
Runtime module (auto-instrumenter)¶
codegreen/instrumentation/language_runtimes/python/codegreen_runtime.py
This module is injected into instrumented code automatically. It uses ctypes to call libcodegreen-nemb.so.
checkpoint()¶
def checkpoint(checkpoint_id: str, name: str, checkpoint_type: str):
"""Mark a checkpoint in the energy measurement stream."""
Called by instrumented code at function boundaries:
from codegreen_runtime import checkpoint
checkpoint(checkpoint_id="1", name="my_function", checkpoint_type="enter")
# ... function body ...
checkpoint(checkpoint_id="2", name="my_function", checkpoint_type="exit")
Each call records a ~100ns timestamp signal. The NEMB backend tracks invocations automatically (#inv_N suffix).
measure_checkpoint()¶
def measure_checkpoint(checkpoint_id: str, checkpoint_type: str,
name: str, line_number: int, context: str):
"""Record a checkpoint marker with full metadata."""
Lower-level function with additional context. checkpoint() delegates to this.
Auto-instrumenter output format¶
At process exit (atexit), the runtime prints checkpoint data to stdout:
--- CODEGREEN_RESULT_START ---
{"measurements": [
{"checkpoint_id": "enter:main:1#inv_1_t...", "timestamp": 13973..., "joules": 6.80, "watts": 0.76},
{"checkpoint_id": "exit:main:2#inv_1_t...", "timestamp": 13973..., "joules": 8.91, "watts": 71.94}
]}
--- CODEGREEN_RESULT_END ---
The CLI parses this output to extract measurement results.
CLI usage¶
These commands drive the auto-instrumenter; the Quickstart and CLI reference cover them in full:
codegreen measure python script.py # basic
codegreen measure python script.py -g fine --export-plot energy.html
codegreen measure python script.py --json
codegreen analyze python script.py --save-instrumented --output-dir ./out
Package structure¶
codegreen/
cli/cli.py # Typer CLI
instrumentation/
engine.py # MeasurementEngine
language_engine.py # Tree-sitter parsing + query matching
ast_processor.py # Checkpoint injection
configs/*.json # Language-specific instrumentation configs
language_runtimes/
python/codegreen_runtime.py # Python ctypes bridge to NEMB + Session
java/CodeGreenRuntime.java # Java JNI bridge to NEMB
analyzer/plot.py # Plotly / matplotlib visualization
measurement/src/nemb/
codegreen_energy.cpp # C API + EnergyMeter implementation