-
Notifications
You must be signed in to change notification settings - Fork 5
PSP1 Review via GPT-5 Pro, Grok 4, Gemini 2.5 Pro, Claude Sonnet 4.5 - Semantics, Syntax and Structure #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: staging
Are you sure you want to change the base?
Conversation
- Removes the Atom/Sym/Var contradiction in the current Terminology and introduces the official name for TAP to avoid "profile" overload later. - PCF must not refer to literal kinds that the language forbids; we keep the ordering rules purely over builtin operator id and arguments. - Without pinning, comparator evolution can change allow/deny decisions over time. We make the snapshot mandatory and chain-stable; this is the single biggest determinism hardener. - The spec mentions normalization in multiple places; this consolidates a single, normative rule where evaluators look first. (The "Grants -> Determinism and normalization invariants" bullet remains consistent.) - Edge-consistent timing: half-open `[iat,exp)` is already used in the algorithm; this makes precedence explicit so independent CEPs agree on acceptance at the boundary. - Aligns all time semantics to half-open windows, preventing "double-counting the endpoint" and making ttl_ok consistent with the Presentation lifetime check. - Replaces "controls authority" (subjective) with a mechanical, verifier-checkable rule. This is the cornerstone of safe delegation. - Keeps comparator fingerprints as additional evidence even with mandatory pinning; also makes snapshot equality a first-class chain condition. - Eliminates ambiguity about where profile strings come from, and ties their interpretation to the pinned lattice snapshot. - Aligns terminology with "Threat and Acceptance Policy"; avoids overloading "profile" in TAP contexts while preserving the separate concept of PSP-level "Profiles" elsewhere in the doc. - Makes the same syntactic rule explicit in both delegation sections to avoid reader confusion and implementation drift. - Codifies your note about keeping PoAR fingerprints for fast replay/audit while keeping snapshot pinning as the source of truth. - Disambiguates "Profiles" here as specification profiles, so it doesn't collide with the TAP policy concept.
Initial set of changes done in 3b0fc51
|
Additional changes are getting more syntactic and clarifying material as well spine changes and reference material. Cross review 2 to assess whether initial cross reviews were correct. |
- Channel binding profile names - Updated every instance of the TLS exporter profile from `tls_exporter:v1` to `tls-exporter:v1` (hyphen instead of underscore) in the partial order example, Example 3's program, its Presentation object, and the evaluation outline. This aligns with RFC 9266's definition of the tls-exporter channel binding type for TLS 1.3 and avoids ambiguity with other profile naming schemes. The updated partial order now reads mtls:v1 >= tls-exporter:v1 >= dpop:v1 >= bearer:v1. - Evaluation narrative update - Adjusted the text in the Example 3 evaluation outline to reflect the new tls-exporter spelling and clarified that the CEP checks the same profile string in both the grant and the presentation. - Minor clean-ups - Ensured Example 3's Presentation uses tls-exporter:v1 in its channelBinding field and that the relevant logic in the evaluation is consisten
- In the updated file, the CEP placement variants (CEP(R/P/S)), Bridge Adapter definitions, and exposure modes are now part of the Core Terminology section - All normative property identifiers are being standardized to camelCase (e.g., builtinsId, channelLatticeId, langVersion, schemesSnapshotId) to align with modern JSON conventions. Where these names appear in code or on-wire examples, they're explicitly marked as illustrative rather than normative, to reduce confusion about field encodings
- CamelCase standardization - Bridge Adapter clarity & indentation - Backtick usage for programmatic identifiers
Using agent mode is pretty useful as way to do long form turn by turn editing, it does however make mistakes, and it would easier if I could HITL update the text from the system it's operating on. I think I need embedded agent mode directly on my workstation to reduce latency of feedback. |
- Number all sections/subsections; update cross-references accordingly. - Remove residual "(normative)" labels; mark non-binding parts "(Informative)" only. - Add "References" section - Remove Appendix D (reason codes) from PSP-1; add pointer to PSP-2 for reason codes and receipt schemas. - Security: add short "Threat Model" subsection; reference TAP for acceptance controls (exposure mode constraints, adapter assurance).
In the most recent update, we removed a few things.
This was suggested through restructuring and eliminating duplicate information. However I don't think all "references" are updated. In particular Appendix D by being removed, there are some times still references to reason codes. That should be updated accordingly. I also need another check done on the Attenuation by Derivation to see if the restructuring is concise without repetitive information and we're not missing anything. |
- Uniform camelCase naming - Builtins overhaul: Added a normative summary table for builtin operators in Appendix B. - Semantic pinning: Enforced `schemesSnapshotId` as REQUIRED for all Grants and examples. - Receipts layering: PSP-1 no longer lists or requires specific reason codes. - PoAR content: Algorithm step 8 now mandates that Access PoARs include `programId`, declarations CIDs, pins (including `schemesSnapshotId`), a minimal evaluation trace, comparator fingerprints, and the enforcer measurement (`adapterRef`). - Test vectors: Updated conformance tests to remove example reason codes and adjust comments.
- Added a Rationale explain why syntactic attenuation is defined mechanically. - Added a Rationale for ephemeral presentations - Updated the CDDL grammar comment for `Literal` to clarify that `op` is an operator string. - Lifted the prohibited/non-canonical forms
- Rewrote Section 6 (Semantic Pinning) for clarity and separation of pinning, snapshot formats, delegation compatibility, and fail-closed rules. - Clarified channel binding profiles as PSP-3/PSP-4 labels only. - Renumbered and reorganized Section 10 (Presentations) to RFC-style layout: normative vs informative subsections, merged structure/projection content. - Marked `schemesSnapshotId` as REQUIRED; `channelLatticeId` optional unless `channelGeq` used. - Added stable MDX anchors (e.g. #security-considerations) and updated cross-references to avoid hard-coded section numbers. - Removed normative recursion on PSP-2; receipts now conditional on support. - Fixed duplicate numbering and minor typos in Sections 10-11.
…ibility, renamed 5.8 to TAP constraints, the renames avoid overloading "profiles" as a keyword
Alright we're reaching the finish line of this review. By which we will move on to prototype implementation and/or PSP-2,3 and so on. Probably updating some of the market loop documents to explain this spec. The final changes seem to be small. Things like There's a bunch of "notes" that make the spec not super tight. They provide some interesting information that's probably not easy to understand without everything else being understood. I realise that some of the names of receipts like PoAR and DenyReceipt seems a big a strange. Maybe we need a ActionReceipt and DenyReceipt and UseReceipt? Rather than PoAR. The PoAR is a bit long and legacy. 1) Unify decision/receipts + remove duplication/reason‑codesIntent: PSP‑1 should only (a) require the enforcing CEP to record a decision event, and (b) say that if PSP‑2 receipts are implemented, emit the appropriate receipt. PSP‑1 must not define what a DenyReceipt/PoAR contains or require reason codes—that belongs to PSP‑2. A. Replace §13.2 “Step 8” with a single authoritative ruleFind the block beginning with:
Replace it with: 8. Decision recording (placement-agnostic)
- The enforcing CEP MUST record a decision event (allow or deny) for every enforcement.
- If the implementation supports PSP‑2 receipts, it MUST additionally emit the appropriate receipt:
- Access PoAR on allow.
- DenyReceipt on deny.
- The structure and required fields of these receipts are defined in PSP‑2. PSP‑1 does not define receipt contents. B. Trim §13.2 “Step 7” (no receipt talk)Find in Step 7:
Replace with:
C. Clean §13.3 “Execution Constraints” (no receipt talk)Find the bullets under “Deadlines and budgets”:
Replace with:
D. Clean §14.4 “Bounded Evaluation and DoS”Find:
Replace with:
E. Clean §14.9 “Side‑Channels and Error Signaling”Find the “Error reporting” bullet that mentions “machine‑readable denial reason” (or any receipt content). Replace the whole error‑reporting bullet with: - Error reporting
- Avoid leaking sensitive details in free‑form error messages. PSP‑1 does not define receipt formats; if receipts are used, their structure is defined by PSP‑2. F. Remove any remaining references to “reason code(s)”Search for 2) Lease freshness: placement + namingYou’re right: putting lease freshness under “12.2 Rotation” was the wrong place, and A. Rename the optional env fact in §4.5Find under “Optional environment facts (TAP‑gated)”:
Replace with: - leaseStatus — optional TAP‑provided evidence used to assess upstream lease/credential freshness. Semantics are TAP‑defined; no network I/O occurs during evaluation. B. Fix the env list in §13.2 Step 5Find in “5. Build evaluation environment (facts)”:
Replace with:
C. Move the normative lease‑freshness check into the algorithm; remove it from §12.2Delete the lease‑freshness paragraph that was added under §12.2 Rotation (the one referencing non‑native SoA, local evidence, and Add the following bullet under §13.2 Step 3 (Resolve and verify the leaf Grant (local)), immediately after the revocation bullet: - Lease freshness (when applicable): If enforcement relies on an upstream lease, credential, or secret from a non‑native Source of Authority (e.g., a Bridge/Adapter flow), the CEP MUST evaluate freshness per TAP using only locally available evidence (e.g., a TAP‑approved environment fact such as `leaseStatus`). If freshness cannot be established per TAP policy, the CEP MUST deny. PSP‑1 does not define the format or acquisition of such evidence. This keeps the check in the closed‑world, TAP‑governed algorithm where it belongs, avoids format creep, and uses lower‑camel consistently. 3) Remove the “buildChain” implementation note; fold the substance into §11.3You’re right: introducing a A. Delete the entire “Implementation note (Informative): chain assembly” under §11.3.B. Ensure the normative bullets in §11.3 explicitly cover the same ground:If not already explicit, augment §11.3 as follows (these are short normative refinements, not a new helper): - prev linkage MUST form a simple path from leaf to root; the CEP MUST deny if any digest repeats (cycle detection).
- All required parent Grants MUST be locally available at enforcement; otherwise the CEP MUST deny. (Pre‑enforcement acquisition is TAP‑governed.) (You already state depth caps and anchors; these two bullets make availability + cycles unmistakable without adding a pseudo‑API.) 4) What I did not change (because it’s outside PSP‑1’s scope)
Sanity check (net effect)
If you’re aligned with this, I can produce a consolidated patch file that applies these edits in one shot. I think also that the lease_status and stuff is unclear to me how it works. After all PSP-1... should it even engage in details of lease rotation, it was said it should be in the TAP policy and it seems repeated as well. Again lots of notes/structural/normative details are repeated we should do a cross-review of duplication and where things can be tightened up rather than "adding" more to the spec. |
I'm going to do one more pass over this and then change the cross review to focus on "tightening up" - reducing duplication - making the spec resilient to inconsistencies... after fixing up the above. Spec reviews are like programming, the main problem is you don't have a compiler to check your work. It's entirely based on socio-technical review. There's something I want to compare with say EIP-1. |
- changed `langVersion` to be `cpl/0` only - changed `lease_status` to `leaseStatus` - lease freshness is governed by TAP - no more mentions about the contents of PoAR and DenyReceipts
We are now in the tightening phase. This is interesting because we can use 4 different models to do independent evaluation.
Then from gpt5 pro it indicates that the ranking of reviews are: Here’s a concise ranking with unitless scores (0–10) and rationale across coverage, actionability, and risk of inconsistency.
Summary
So we can say that deepseek is the weakest, and I could see gemini2.5 is the best "conceptual" model that has the best human-level explanation of things, but it is not as analytical. I wonder how gpt5 pro would evaluate it self. |
Description
PSP-1 draft was created in #160, and we actually got an initial GPT5-pro review. This GPT5-pro review was a "failure-seeking" review.
Overall it was about making sure our spec does what it says it will do and ensure that CEPs will all behave the same and eliminating any ambiguity. No significant semantic changes was in it.
The GPT5-pro review was then passed to 3 separate AI agents each providing their synthesis on the required changes.
This was brought back into GPT5-pro in order to further clean up with iterative diffs.
Synthesis of the cross review was done.
V = 4
Below is a synthesis of the three external assessments of the failure‑seeking review you ran on PSP‑1. I start with the consensus picture, then the differences and why they diverge, and finally a Bayesian scoring of options with concrete, semantic‑first recommendations. Inline I point to each model’s write‑ups.
1) High‑level alignment (what all three agreed on)
All three reviewers independently converged on the same semantic hardening moves. In short:
Eliminate Atoms/Symbols/Variables; Builtins‑only + ground terms. This removes internal contradictions and stabilizes PCF and evaluation semantics.
Standardize time boundaries and precedence. Use a single captured
now
; make windows half‑open[start, end)
; defineexp
∧ttl_ok
as an intersection (both must pass).Make attenuation mechanically checkable (purely syntactic). Child must keep all parent checks (by PCF identity); may drop queries (narrows OR); any retained query’s literals are a superset; literal constants may only tighten per builtin rules.
Pin semantics to kill drift. Beyond
builtins_id
and (when used)channel_lattice_id
, add a schemes_snapshot_id (or equivalent) so resource comparator semantics are content‑addressed and chain‑stable.Scope boundaries across PSP‑1/‑2/‑3. Keep PSP‑1 abstract/semantic; move wire encodings and receipt fields to PSP‑3 and PSP‑2 respectively; make the doc read like an RFC (clean ToC; fewer “(normative)” markers).
Channel‑binding profiles & lattice. Name the exact profiles (e.g., TLS exporter, DPoP), cite standards, and put the partial order in the pinned lattice snapshot.
Environment normalization timing. Normalize
resource
under the same comparator semantics before evaluation; fail‑closed on any normalization failure.Implication: The above are semantic changes: they alter the meaning of evaluation, the outcome at edges, or the trust/determinism envelope—not just editorial polish.
2) Where they differ (and why)
5) Deep‑dive on the top semantic deltas
5.1 Comparator semantics pinning
Agreement: All three argue for pinning comparator semantics; divergence is whether to also accept an audit‑only path (PoAR fingerprints) when operators resist governance friction.
Why it matters: Without pinning, two CEPs may reach different decisions as comparators evolve—non‑replayable security.
Call: In PSP‑1 core, require
schemes_snapshot_id
across a delegation chain; allow PoAR fingerprints as additional evidence, not a substitute. (Posterior ≈ 0.85–0.90 in favor of mandatory pinning.)5.2 Time semantics & precedence
Agreement: Single captured
now
; half‑open windows; both envelope[iat, exp)
andttl_ok
must pass.Why it matters: Prevents edge‑case discrepancies (e.g., off‑by‑one at expiry).
Call: Adopt spec‑wide half‑open semantics; rewrite
within_time
and the CEP algorithm accordingly. (Posterior ≈ 0.92.)5.3 Syntactic attenuation
Agreement: Make attenuation mechanical, eliminating “controls authority” subjectivity.
Why it matters: Verifiers must decide with no judgment calls; delegation proofs become predictable.
Call: Replace the prose with the precise rules; update chain checks accordingly. (Posterior ≈ 0.95.)
5.4 Channel binding profiles & lattice
Agreement: Name profiles (e.g.,
tls-exporter:v1
,dpop:v1
,mtls:v1
,bearer:v1
), cite the underlying RFCs, and pin the lattice snapshot used for>=
.Why it matters: Avoids accidental weakening via ambiguous profile strings; supports deterministic evaluation of
channel_geq
.Call: Treat these as registry artifacts subject to the same pinning discipline as builtins.
5.5 Environment normalization
Agreement: The CEP must normalize
resource
under the same comparator semantics as declaration canonicalization before evaluation; failure is deny.Why it matters: Prevents “different normalization at input vs declaration” bugs.
Call: Promote this to a single normative rule in “Evaluation Environment.”
6) Concrete edit outcomes (semantic first, then structure)
Pins: Add
schemes_snapshot_id
to Grant.pins; require equality across delegation hops; update fail‑closed list to include unknown/mismatched snapshot; keep PoAR comparator fingerprints as additional audit material.Time model: Add a short section “Time Model & Boundaries”; redefine
within_time(now, nbf, exp)
asnbf ≤ now < exp
; in CEP algorithm usenow ∈ [iat, exp)
∧ttl_ok
.Attenuation: Replace the ambiguous rule with the check/query/literal tightening rules; update delegation verification to require exact PCF identity for checks retained.
Language model: Remove all mentions of Atoms/Symbols/Variables; keep Builtins‑only literals; update Appendix A grammar and PCF sorting (no “Atom” kind).
Channel binding: Add a registry subsection for profile identifiers and pin their lattice via
channel_lattice_id
; cite profiles where they’re standardized; treat unknown profile in lattice as deny.Normalization rule: In “Evaluation Environment,” require normalization of
resource
prior to any builtin evaluation; failure denies.Structure: Move all on‑wire encodings to PSP‑3 and all receipt fields/reason codes to PSP‑2; adopt the RFC‑like ToC; add a Test Vectors & Interop section.
7) Final recommendations (ordered by semantic criticality)
Adopt mandatory comparator pinning (
schemes_snapshot_id
) in PSP‑1; allow PoAR fingerprints only as supplemental audit. (Posterior: ~0.88)Lock the time model to half‑open windows and explicit
exp ∧ ttl_ok
intersection; rewrite builtin + algorithm text accordingly. (~0.92)Replace attenuation prose with the syntactic rules; require PCF identity for parent checks; allow query dropping; enforce literal‑constant tightening per builtin rules. (~0.95)
Purge Atoms/Vars entirely; Builtins‑only. (~0.93)
Define and pin the channel‑profile lattice; treat unknown profile or lattice mismatch as deny. (~0.88)
Normative normalization rule for
resource
, pre‑eval; deny on failure. (~0.90)Restructure PSP‑1 (ToC, boundaries, tests). (~0.90)
Notes on your stated concern (“semantic” over “syntactic”)
Every recommendation above in §5/§7 materially changes evaluation outcomes or replayability (determinism), i.e., they are semantic. The purely syntactic/editorial suggestions (ToC, moving sections, naming consistency) are useful but subordinated here; they won’t make or break enforcement outcomes.
If you want, I can transform these into precise diff blocks for PSP‑1 (one pass per semantic item), starting with time semantics and attenuation (the two lowest‑risk, highest‑impact edits).
Tasks
Prototype implementation will be the stage after review. We will need to figure out the right governance process for the specifications. I suspect PSP1,2,3,4 will all be in "review" to while prototype implementation is achieved.
Final checklist