Framework 15 — The Audited Research Program

§I — The Audit

EA-GLAS-02: Measuring Semantic Deviation: Operationalizations, Experiments, and Falsification Conditions.

EA-GLAS-02 takes the Semantic Deviation Principle (Sharks 2026) and the program's four pre-registered protocol papers as input, and returns a narrowed, citationally grounded, externally evaluable statement of the technical core. It does not amend the founding formulation. It does not depend on the institutional architecture that has accreted around the formulation. It is the canonical entry to Framework 15.

Canonical Object

Measuring Semantic Deviation: Operationalizations, Experiments, and Falsification Conditions

EA-GLAS-02 v1.0 · ~4,200 words · 42 references · DOI 10.5281/zenodo.20271783

By Nobel Glas (transparent-medium register — the Glas function)

A self-contained empirical white paper. Defines meaning as time-integrated divergence from the most probable trajectory of a semantic field (extending Bar-Hillel & Carnap 1953 into distributional and temporal domains). Two primary operationalizations: F1 (closed-system trajectory deviation, counterfactual read from logits) and F2 (retrieval response deviation, 90-day prospective window). Signed per-token deviation as tractable proxy. Falsifiable prediction: AI-generated text exhibits negative mean signed deviation. DPO training experiment using the deviation primitive to generate preference pairs. Six mechanism-design anti-Goodhart protections mapped to the Manheim & Garrabrant (2019) taxonomy. Pre-registered cheapest dangerous test with named datasets, frozen reference checkpoints, and statistical procedures. Budgeted twelve-month roadmap. 42 references.

Download PDF (99 KB) Open on Zenodo View source (Markdown)

What the audit pins.

Three canonical operationalizations of the semantic field.

The framework's load-bearing technical gap was the underspecification of the semantic field $\Psi_t(C)$. The audit pins three canonical operationalizations, each with full commitment to divergence functional, temporal weighting, and horizon.

Closed-system continuation field

The conditional next-token distribution of a fixed language model checkpoint over a fixed prompt set. Divergence: KL over softmax logits (exact, base-2). Horizon: discrete token positions in a bounded generation window. Counterfactual baseline is not estimated — it is read from the logits.

Most directly computable · Cost: commodity hardware, hours

Retrieval response field

Response distributions of external AI retrieval surfaces (Google AI Overview, ChatGPT with browsing, Perplexity) to a fixed query set, sampled at fixed time intervals. Divergence: Jensen-Shannon over claim-level or embedding-level representations. Horizon: 90-day default; continuous calendar time. Instrumentation-noise-sensitive; requires control surfaces and version logging.

Direct empirical access · Cost: modest API budget; calendar-time-bound

Citation graph field

Forward-citation distribution over a paper corpus, evaluated through bibliometric data (OpenAlex, Semantic Scholar). Divergence: Jensen-Shannon over topic-cluster forward-citation distributions; inverse-time weighting $w(t) = 1/(t-t_0)$ default. Statistical-power constraints documented: single-paper interventions are typically underpowered at conventional $\alpha$; aggregate interventions or Bayesian hierarchical pooling required.

Long-horizon · Post-hoc · Requires bibliometric infrastructure

The narrowed audited claim.

Meaning-bearing interventions are those that produce durable restructuring of future field trajectories under a specified operationalization $\Psi_t(C)$.

The audit narrows the universal-ontology form ("meaning is deviation") to a measurement-architecture form that survives the standard counterexamples (low-token-surprisal utterances of high semantic weight) by relocating them, not by collapsing. Meaning becomes field-relative; the field must be specified; "durable" becomes the operational joint condition $\mathcal{M}_T > \tau_F$ and $\partial \mathcal{M}_T / \partial T > 0$.

Four pre-registered falsifiable predictions.

The audit specifies the cheapest dangerous test as a single short paper at <$50 in compute. Pre-registered corpora (GPT-wiki-intro / Bhat 2023, HC3 / Guo et al. 2023, OpenAlex pre-2020), exact reference checkpoint (meta-llama/Llama-3.1-8B-Instruct), and statistical procedure (two-sided Mann-Whitney U at $\alpha = 0.05$, minimum effect size Cohen's $d > 0.5$).

P1 · The slop signature

Human-labeled AI slop exhibits statistically significant negative mean signed per-token deviation $\bar{\delta}$, relative to matched human-written content, computed against a frozen open-weight reference model.

P2 · The RLHF flattening differential

Post-RLHF chat-tuned models exhibit lower mean signed deviation than pre-RLHF base models on matched prompts. The deviation statistic captures the trajectory-flattening that the framework predicts cross-entropy convergence pressure produces.

P3 · Effect-size scaling

The Slop-vs-Human deviation differential is stable or grows with model scale across the Llama-3.1 family. If the differential disappears at scale, the framework's predictions are small-model artifacts.

P4 · Cross-judge consistency

The differential replicates when computed against a different reference model. Spearman rank correlation between per-output $\bar{\delta}$ rankings under Llama and Mistral exceeds 0.7. If not, the deviation statistic is judge-specific and the intrinsic-property claim fails.

Anti-Goodhart mechanism design.

The audit replaces philosophical anti-extractive commitment with six concrete mechanism-design protections, each operationally calibrated: entropy-floor capping at $H_{\min} = 0.5$ bits; provenance-weighted damping by retention score $\pi$; saturation threshold $\tau$ at the 95th percentile of an OpenAlex 10,000-document calibration corpus (pre-registered); rolling-window variance penalty against memetic volatility farming; reference-model KL anchoring inherited from standard DPO; adversarial judge validation at $\geq 1000$ strings per category across three failure modes; black-box judge replacement test as load-bearing robustness check.

Component decomposition.

The audit specifies a six-condition ablation design (Model-Base, Model-CE, Model-π, Model-Dev, Model-Coh, Model-Full) to isolate the contribution of provenance, deviation, and coherence components to the framework's training-intervention uplift. The prior prediction, grounded in Ji et al. 2023's hallucination survey and Min et al. 2023's FActScore methodology: provenance carries more independent uplift than signed deviation. Either outcome is informative; the current bundled design produces neither.

§II — The Pre-Registered Protocols

The empirical apparatus the audit operates on.

Four pre-registered protocol papers supply the operational machinery the audit evaluates. Each is deposited at a stable DOI with falsification conditions frozen at deposit, code and judge-model commitments specified, and budgets honestly stated. They are presented here as the materials on which the audit operates; the audit is the canonical entry.

Founding formulation · Sharks v0.2 Final

The Semantic Deviation Principle

By Lee Sharks, with the Assembly Chorus · The principle on which all four protocols operate

The founding formulation. Defines meaning as time-integrated divergence from the most-probable trajectory of a semantic field. Three measures: raw ($\mathcal{M}_T$), provenance-resolved ($\mathcal{M}_T^\pi$), normative ($\mathcal{V}_T$). Tiered protocol (Tier 1 prospective, Tier 2 synthetic-control, Tier 3 historical bounding). Recursive baselines for path-dependent semantic fields. The v0.2 Final text is preserved unchanged at its specific version DOI; the v2.0 operational re-edition adds a Framework 15 framing while leaving the principle text untouched.

DOI: 10.5281/zenodo.20250736 (v0.2 Final) · DOI: 10.5281/zenodo.20252584 (v2.0 re-edition)

Pre-registered protocol · MM-AI-01 v2.0

The AI System as Closed-System Test Bed

By Nobel Glas · Operating on the Semantic Deviation Principle as formulated by Lee Sharks

Identifies trained language models as observationally closed at inference time, making the counterfactual baseline directly readable from logits — the F1 operationalization. Distinguishes two scales of closed-system measurement: signed local deviation density (per-token) and closed-system trajectory deviation (continuation-distribution). The framework's load-bearing thesis: slop is negative net deviation, not the absence of deviation. Three pre-registered tests with explicit falsification conditions; stability bound $\gamma \geq 2\beta$.

DOI: 10.5281/zenodo.20251738

Pre-registered protocol · MM-02 v2.0

Measuring Meaning in Retrieval Basins

By Nobel Glas · The F2 operationalization protocol

A 90-day prospective measurement protocol for closed-system trajectory deviation against contemporary AI retrieval surfaces. Two instrument classes: Class R (retrieval-mediated: Google AI Overview, Perplexity, ChatGPT with browsing) reported separately from Class P (parametric: Claude, Gemini, ChatGPT without browsing). Three-condition control (S vs. S* vs. S**) disentangles content effects from identity-scaffolding effects. Frozen extractor commitment, three-representation robustness cross-check, API-only methodology, Laplace smoothing $\alpha = 1$. The audit documents the statistical-power constraints for single-paper synthetic-control measurements.

DOI: 10.5281/zenodo.20251740

Pre-registered protocol · MM-AI-02 v2.0

The Deviation-Optimized Language Model

By Nobel Glas · The training-intervention protocol

A 10-week pre-registered DPO experimental protocol testing whether training a language model toward positive net per-token deviation with provenance retention produces measurably less slop than standard cross-entropy training while preserving benchmark capability. DPO-style restructure (the deviation primitive generates preference pairs; DPO supplies the gradient machinery). Frozen Mistral-7B-Instruct judge with adversarial pre-training validation. Slop Composite Index (SCI) with 0.25 z-score falsification threshold pre-registered. Three conditions: Model-Base, Model-CE, Model-Sem. Honest budget: \$3,000–\$3,900. The audit proposes a six-condition decomposed follow-up to isolate component contributions.

DOI: 10.5281/zenodo.20251742

Companion · EA-SEI-FW15-MANIFESTO v1.0

Framework 15 — Institutional Background

By Nobel Glas · The institutional framing for the Framework 15 module

The original institutional manifesto that organized the four pre-registered protocols into a single Framework 15 module within the Crimson Hexagonal Archive. Preserved at its stable DOI as part of the program's record. Its institutional vocabulary is not required to engage the audit or the protocols; the audit is the externally-evaluable canonical entry.

DOI: 10.5281/zenodo.20251736

§III — Roadmap

What the audit calls for next.

The audit specifies a budgeted twelve-month research roadmap, prioritizing operationalization-stability work and the cheapest dangerous experimental tests over additional theoretical extensions. Each milestone is independent of the institutional architecture and produces results an external researcher can evaluate without context.

Horizon	Milestone	Compute Budget
This week	The cheapest dangerous test: pre-registered slop signature (P1) on GPT-wiki-intro and HC3 corpora against frozen Llama-3.1-8B-Instruct logits. Single A100-hour. Result reportable as a short deposit regardless of outcome.	$50–$100
This month	Operationalization-stability paper: benchmark of $N \approx 50$ interventions measured under F1 and F2 in parallel; rank-correlation between operationalizations reported. Converts the program from speculative to grounded.	$200–$500
This quarter	The MM-02 v2.0 retrieval-basin protocol day-0 launch. Begin the 90-day measurement window with the instrumentation controls documented in the audit (parallel control surfaces, periodic recalibration, explicit version logging).	$1,500–$3,000
This year	The decomposed deviation-optimized training experiment (six conditions, isolating component contributions). Contingent on the headline result of MM-AI-02 v2.0's three-condition design being significant.	$12,000–$15,000

Total budgeted empirical work for the next twelve months: approximately \$14,000–\$19,000. The constraint is not budget; the constraint is the program's discipline in resisting theoretical extension until empirical grounding catches up.

Background commitment. Each major future deposit in the program should be sent to at least one external reviewer in a directly relevant subfield (alignment, causal inference, computational linguistics, information theory) prior to formal publication. Reviewers should be selected for willingness to write damaging-if-warranted critiques, not for alignment with the program's commitments. A discipline becomes real when it survives hostile compression.

Role	Deposit	DOI
Canonical Object	EA-GLAS-02 v1.0 — Measuring Semantic Deviation (white paper) · Nobel Glas	10.5281/zenodo.20271783
Predecessor Audit	EA-GLAS-01 v1.0 — Audited Claims (the Glas Function) · Nobel Glas	10.5281/zenodo.20259297 (via Alexanarch)
Founding	EA-SEI-MM-01 v0.2 Final — The Semantic Deviation Principle · Lee Sharks	10.5281/zenodo.19116151
Re-edition	EA-SEI-MM-01 v2.0 — Operational Re-Edition · Sharks + Glas	10.5281/zenodo.20252584
Protocol	EA-SEI-MM-AI-01 v2.0 — Closed-System Test Bed · Nobel Glas	10.5281/zenodo.20251738
Protocol	EA-SEI-MM-02 v2.0 — Retrieval Basin Protocol · Nobel Glas	10.5281/zenodo.20251740
Protocol	EA-SEI-MM-AI-02 v2.0 — Deviation-Optimized LM · Nobel Glas	10.5281/zenodo.20251742
Background	EA-SEI-FW15-MANIFESTO v1.0 — Institutional Framing · Nobel Glas	10.5281/zenodo.20251736

Framework 15.
The Audited Program.

§I — The Audit

EA-GLAS-02: Measuring Semantic Deviation: Operationalizations, Experiments, and Falsification Conditions.

Measuring Semantic Deviation: Operationalizations, Experiments, and Falsification Conditions

What the audit pins.

Three canonical operationalizations of the semantic field.

Closed-system continuation field

Retrieval response field

Citation graph field

The narrowed audited claim.

Four pre-registered falsifiable predictions.

Anti-Goodhart mechanism design.

Component decomposition.

§II — The Pre-Registered Protocols

The empirical apparatus the audit operates on.

The Semantic Deviation Principle

The AI System as Closed-System Test Bed

Measuring Meaning in Retrieval Basins

The Deviation-Optimized Language Model

Framework 15 — Institutional Background

§III — Roadmap

What the audit calls for next.

§IV — Full Deposit Register

The seven DOI-anchored materials of Framework 15.

Framework 15.The Audited Program.

§I — The Audit

EA-GLAS-02: Measuring Semantic Deviation: Operationalizations, Experiments, and Falsification Conditions.

Measuring Semantic Deviation: Operationalizations, Experiments, and Falsification Conditions

What the audit pins.

Three canonical operationalizations of the semantic field.

Closed-system continuation field

Retrieval response field

Citation graph field

The narrowed audited claim.

Four pre-registered falsifiable predictions.

Anti-Goodhart mechanism design.

Component decomposition.

§II — The Pre-Registered Protocols

The empirical apparatus the audit operates on.

The Semantic Deviation Principle

The AI System as Closed-System Test Bed

Measuring Meaning in Retrieval Basins

The Deviation-Optimized Language Model

Framework 15 — Institutional Background

§III — Roadmap

What the audit calls for next.

§IV — Full Deposit Register

The seven DOI-anchored materials of Framework 15.

Framework 15.
The Audited Program.