Glossary¶
Every paper symbol the codebase tracks, with its in-code counterpart and a one-liner meaning. The same notation is used throughout the docs and the docstrings; this page collects it in one place so you can cross-reference quickly.
The authoritative source is the paper. This page summarises the code/paper contract the package maintains.
Core symbols (paper Definitions 1–10)¶
| Symbol | Name | In code | Meaning |
|---|---|---|---|
V |
Vocabulary | (conceptual) | Finite set of tokens. |
L = V* |
Token sequences | (conceptual) | Finite sequences over V — natural-language statements live here. |
B |
Bearer set | Benchmark.bearers (keys); Bearer at runtime |
Non-empty set of bearers — the items that play implicational roles. |
δ : B → L |
Expression function | BearerModel.expression (+ paraphrases) |
Maps each bearer to its natural-language expression. Analyst-supplied. |
ctx_Γ : ℘(B) → L |
Premise-set context builder | ContextBuilders.premise |
Packages a premise set as natural-language context. Default: conjunction by "and". |
ctx_Δ : ℘(B) → L |
Conclusion-set context builder | ContextBuilders.conclusion |
Packages a (single-bearer) conclusion set as natural-language context. |
⟨B, I⟩ |
Implication frame | (Hlobil–Brandom) | A bearer set plus an implication relation I ⊆ ℘(B) × ℘(B). |
⟨B, I_M⟩ |
Derived implication frame | DerivedFrame |
The frame derived from M's endorsement verdicts (paper's Definition 3). Containment-closed by construction (clause i). |
E_M |
Endorsement function | endorse() |
E_M(⟨Γ, {ψ}⟩) ∈ {good, bad, abstain}. Computed by majority vote over n_samples provider calls. |
RSR |
Range of subjunctive robustness | (analytical) | For a target inference ⟨X, {ψ}⟩, the side-premise extensions Y that preserve endorsement of ψ from X. |
β |
Benchmark | Benchmark |
{(I_1, V_1), …, (I_n, V_n)}. Items + analyst verdicts. Paper's Definition 4. |
η |
Evaluation | Evaluation |
{(I_i, V_i, E_M(I_i))}. Paper's Definition 5. |
V_i = (v_{i,1}, …, v_{i,m}) |
Analyst verdict tuple | BenchmarkItem.analyst_verdicts |
Verdicts of each of the m analysts on item i. |
m |
Number of analysts | Benchmark.m |
len(benchmark.analysts). |
n |
Number of items | Benchmark.n |
len(benchmark.items). |
c_i |
Analyst consensus | consensus_verdict() |
Strict majority of (v_{i,1}, …, v_{i,m}); abstain on tie. Paper's Definition 8. |
Agreement measures (paper Definitions 6–10)¶
| Symbol | Name | In code | Meaning |
|---|---|---|---|
cov(η) |
Coverage | coverage() |
Fraction of items where M produced a substantive verdict. |
cov_j(η) |
Per-analyst coverage | coverage() per column |
Analog for each human analyst. |
S(η, r) |
Substantive index | (internal filter) | Items where both M and reference r are substantive. Paper's Definition 7. |
S_F |
Fleiss substantive index | (internal filter) | Items where every annotator is substantive. Paper's Definition 10. |
κ_C(η, r) |
Cohen's kappa | cohens_kappa() |
M's agreement with reference r (consensus or a single analyst). Chance-corrected against {good, bad} marginals. Paper's Definition 9. |
κ_F(η) |
Fleiss' kappa | fleiss_kappa() |
Agreement across all m + 1 annotators (analysts + M). Paper's Definition 10. |
κ_F^*(β) |
Inter-analyst Fleiss baseline | inter_analyst_fleiss() |
Fleiss' kappa over analyst verdicts alone. The baseline against which κ_C and κ_F are interpreted (paper's Remark 4). Undefined when m < 2 or analysts unanimous. |
analyst vs annotator — load-bearing¶
Treat as a hard distinction in code and prose:
| Term | What it means | Count |
|---|---|---|
| Analyst | A human labeler whose verdicts appear in V_i. |
m |
| Annotator | A human-plus-M ensemble member. The (m+1)th annotator in κ_F is M. |
m + 1 |
fleiss_kappa(η) operates on m + 1 annotators (analysts plus M).
inter_analyst_fleiss(β) operates on m analysts (no M). This is the
load-bearing distinction in the Fleiss definition (paper's Definition 10).
Construct-validity terminology¶
Introduced in the v0.3.0–v0.5.x series. See Construct-validity workflow and Closing the gap for the end-to-end story.
| Term | In code | Meaning |
|---|---|---|
| Carving | (analyst-chosen) | The way the discursive practice is partitioned into bearers + δ + context builders. Content-attribution is relative to it (paper's Remark 6). |
| Mastery sense | MasterySenseClaim.sense |
One of evaluative / generative / standing / combination. Declares which sense of mastery the claim concerns (paper's Remark 7). |
| Scope | ScopeClaim.scope |
One of items_in_benchmark / domain_D_as_sampled / general_capacity. The breadth of the claim. Broader scopes require strictly more competing-explanation checks. |
| Constitution | ConstitutionClaim.position |
One of evidence_of_mastery / constitutive_of_mastery. The philosophical position the analyst is taking. |
| Carving-indexed claim | CarvingClaim.acknowledges_carving_indexed |
Required True at non-items_in_benchmark scopes — in-principle claims must take the carving-indexed form (paper's Remark 10). |
κ_F^*-stability |
SweepResult.stability_verdict |
The sensitivity-sweep verdict: stable / moderately sensitive / substantively variable. |
Role tags (RSR-targeted benchmarks)¶
When an item declares an rsr_target and a role tag, the structural check
infereval structure
compares M's verdict to the role-predicted verdict.
| Tag | Predicted verdict | Meaning |
|---|---|---|
base-inference |
(anchors the target) | Items establishing the unconditional inference ⟨X, {ψ}⟩. |
irrelevant-addition |
Same as base | A side premise that should preserve the inference under RSR. |
supporter |
good when base is good |
A side premise that strengthens the inference. |
defeater |
bad when base is good |
A side premise that defeats the inference. |
Construction-metadata fields¶
Per-item provenance for the construct-validity audit (paper-aligned with R5 / R8 / R9 in Closing the gap).
| Field | Type | Meaning |
|---|---|---|
authored_by |
str \| None |
Identifier of the author of this item (R5). |
authored_on |
date \| None |
ISO date the item was authored — substrate for temporal training-data separation arguments (R9). |
authored_blind_to_models |
list[str] |
Models the author had not observed on a draft of this item — the held-out declaration (R8). |
source |
str \| None |
Free-form citation for the primary material the author worked from. |
analyst_rationales |
list[str] \| None |
Optional per-analyst, per-item natural-language rationale, positionally aligned to analyst_verdicts. None (or absent) means "no rationale discipline"; an empty string means "verdict given, no reason recorded" — semantically distinct (paper-aligned with the AR1–AR12 spec). |