Transparent AI Framework — Protocol for Open AI System

A public, versioned, transparent, replayable semantic commons — governed openly and designed to resist capture.

Version: 0.1.0 Status: Draft Author: Technology Shield Last Updated: 2026-03-25

1. Preamble

The large AI systems being built today are funded by billions of dollars, trained on opaque data, governed by invisible rules, and designed to concentrate power.

This protocol is a countermeasure.

It does not claim to produce unbiased truth. Instead, it creates a system where:

All inputs are inspectable
All transforms are declared
All rankings are attributable
All disputes are preserved
Alternative interpretations can coexist

You do not eliminate bias by claiming neutrality. You reduce bias by making every assumption legible, forkable, challengeable, and reversible.

That is the heart of this protocol.

2. Protocol Layers

The Transparent Semantic Commons operates across five layers:

┌──────────────────────────────────────────────────────┐
│                 5. RETRIEVAL LAYER                     │
│  Search, ranking, neighbourhood lookup,               │
│  contradiction surfacing, competing views              │
├──────────────────────────────────────────────────────┤
│                 4. GOVERNANCE LAYER                    │
│  Reputation, dispute resolution, weighting rules,     │
│  moderation, anti-poisoning, forkability              │
├──────────────────────────────────────────────────────┤
│                 3. PROOF LAYER                         │
│  Signatures, hashes, timestamps, lineage,             │
│  challenge records, verification                      │
├──────────────────────────────────────────────────────┤
│                 2. TRANSFORMATION LAYER                │
│  Chunking, normalisation, tagging, translation,       │
│  embedding, summarisation                             │
├──────────────────────────────────────────────────────┤
│                 1. SOURCE LAYER                        │
│  Raw public artefacts: text, images, audio,           │
│  datasets, claims, arguments                          │
└──────────────────────────────────────────────────────┘

Each layer has clear responsibilities, and objects at each layer are defined by the Open Embeddings Schema (separate document).

3. Layer 1 — Source

Purpose

Accept and preserve raw public artefacts with verified provenance.

Rules

Rule	Description
All artefacts must be content-hashed	Integrity verifiable at any time
All artefacts must be signed by their submitter	Attribution is non-repudiable
All artefacts must declare a licence	Consumption rights are clear
All artefacts must be stored durably	Content must survive individual node failure
No silent modification	Updates create new artefact versions linked to the original

Storage

Artefacts are stored off-chain in durable decentralised storage (IPFS, Arweave, Filecoin, or federated node operators). Content hashes and signatures are anchored on-chain for tamper-evidence.

4. Layer 2 — Transformation

Purpose

Record every operation applied to a source artefact, with full transparency about who did it, what software they used, what model they ran, and what parameters they chose.

Rules

Rule	Description
Every transformation must be recorded	No hidden transforms
Every transformation must declare its toolchain	Software, model, version, parameters
Every transformation must be signed	Accountability for the transform
Transformations are append-only	You cannot silently re-run a transform
Multiple transforms per artefact are expected	Different models, chunking strategies, and approaches coexist

Transform Types

Type	Description
`chunk`	Split an artefact into smaller units
`translate`	Translate to another language
`classify`	Assign classification labels
`summarize`	Produce a summary
`embed`	Produce a vector embedding
`redact`	Remove sensitive content (with declaration)
`normalize`	Standardise format or encoding

5. Layer 3 — Proof

Purpose

Provide cryptographic evidence that the provenance chain is intact, that objects have not been tampered with, and that challenges and disputes are preserved.

Mechanisms

Mechanism	Purpose
Content-addressed IDs	Object identity is derived from content; any change produces a different ID
Ed25519 signatures	Every object is signed by its creator
DID-based identity	Signers are identified by decentralised identifiers
Timestamp anchoring	Object timestamps are anchored on a public ledger
Provenance chain verification	Any embedding can be traced back through its transform to the source artefact
Challenge preservation	Challenges are immutable; they cannot be silently dismissed
Governance decision publication	Every governance action is a signed, public record

Verification Protocol

For any object retrieved from the commons:

Hash verification — Does the object's content match its declared hash?
Signature verification — Is the signature valid for the declared signer?
Identity resolution — Does the signer's DID resolve to a valid public key?
Lineage verification — Does the provenance chain back to the source artefact verify?
Challenge check — Are there open challenges against this object or any object in its chain?
Governance check — Are there governance decisions (tombstones, supersessions) affecting this object?

Result: verified | unverifiable | disputed | tombstoned | superseded

6. Layer 4 — Governance

Purpose

Govern the commons in a way that resists capture by any single party, organisation, or alliance.

6.1 Core Governance Rules

These rules are constitutional — they define the minimum governance that all participants accept:

#	Rule
1	No silent edits. Every change creates a new versioned record.
2	No hard deletes. Only tombstones and supersession.
3	Every transformation signed. No anonymous transforms.
4	Every moderation decision public. No shadow bans, no invisible removal.
5	Every ranking policy versioned. No invisible algorithm changes.
6	Every challenge preserved. Challenges cannot be deleted, only resolved.
7	Every fork portable. Any community can fork the full dataset, embeddings, policies, and reputation.
8	Consumers should contribute back. Pure extraction without reciprocity is discouraged through governance mechanisms.

6.2 Roles

Role	Responsibility	Power
Contributors	Submit artefacts, embeddings, claims, evidence links	Create objects
Validators	Verify provenance chains and object integrity	Flag invalid objects
Domain Stewards	Curate quality within a specific domain	Recommend, prioritise, tag
Challenge Reviewers	Evaluate and resolve challenges	Issue governance decisions
Protocol Maintainers	Evolve the protocol and schema	Propose protocol changes (subject to community approval)
Index Operators	Run retrieval indexes	Serve queries (cannot alter underlying data)

Anti-capture constraint: No single role may control all of: - Data admission - Ranking - Dispute resolution

These three powers must be held by different parties.

6.3 Challenge Resolution Process

Challenge opened
    │
    ▼
Evidence gathering period (configurable, default 14 days)
    │
    ▼
Review by Challenge Reviewers (minimum 3 independent reviewers)
    │
    ├── Upheld → Governance action (tombstone, supersede, flag)
    ├── Dismissed → Challenge marked as dismissed with rationale
    └── Escalated → Broader community vote
         │
         ▼
    Appeal window (configurable, default 30 days)
         │
         ├── No appeal → Decision final
         └── Appeal → Re-review with expanded panel

6.4 Forkability

This is the ultimate anti-capture safety valve.

If a community believes governance has been captured, they must be able to fork:

What Can Be Forked	How
The dataset (artefacts)	Content-addressed; any node can replicate
The embeddings	Content-addressed; stored off-chain
The governance policies	Versioned, public documents
The reputation graph	Exported as signed objects
The retrieval policies	Public, versioned objects
The challenge history	Immutable, content-addressed

The fork inherits the full history. The new community can then modify governance going forward while preserving the shared past.

7. Layer 5 — Retrieval

Purpose

Provide transparent, policy-driven retrieval that makes bias visible rather than invisible.

7.1 Retrieval Policy as a Public Object

Retrieval policies are themselves signed, versioned objects:

{
  "type": "RetrievalPolicy",
  "policy_id": "cid:bafy...",
  "candidate_sources": ["public", "science", "english"],
  "similarity_metric": "cosine",
  "recency_weight": 0.1,
  "reputation_weight": 0.2,
  "challenge_penalty": 0.5,
  "diversity_requirement": {
    "minority_view_floor": 0.15,
    "contradiction_surfacing": true,
    "max_single_source_share": 0.3
  },
  "jurisdictional_filters": [],
  "version": "1.0.0",
  "published_by": "did:key:z6Mk...",
  "signature": "ed25519:..."
}

Different communities can publish different retrieval policies while sharing the same underlying commons. This is how pluralism works in practice.

7.2 Bias-Resistant Retrieval Pattern

For any query, the retrieval response includes:

{
  "query": "...",
  "policy_used": "cid:bafy...",
  "model_spaces_used": ["bge-m3", "e5-large-v2"],
  "results": {
    "supporting": [
      {
        "embedding_id": "cid:bafy...",
        "similarity": 0.94,
        "provenance_status": "verified",
        "challenge_status": "none"
      }
    ],
    "contradicting": [
      {
        "embedding_id": "cid:bafy...",
        "similarity": 0.87,
        "provenance_status": "verified",
        "challenge_status": "none"
      }
    ],
    "disputed": [
      {
        "embedding_id": "cid:bafy...",
        "challenge_id": "cid:bafy...",
        "challenge_status": "open"
      }
    ]
  },
  "policy_explanation": "Results include top 10 supporting and top 5 contradicting. Minority view floor of 15% applied. Disputed objects surfaced separately."
}

Instead of returning one answer, the system returns:

Closest matches (supporting evidence)
Strongest counter-matches (contradicting evidence)
Unresolved disputes (challenged objects)
Policy explanation (which rules shaped the response)
Provenance status (verification state of each result)

This is fundamentally healthier than a single invisible ranking.

7.3 Index Operator Rules

Anyone can run an index. Index operators:

Must declare which retrieval policies they support
Must not alter underlying object data
Must serve provenance chains on request
Must surface challenges and disputes
Must declare their funding and governance
Can be forked (anyone can run a competing index)

8. Anti-Capture Mechanisms

8.1 Structural

Mechanism	How It Resists Capture
Content addressing	Data cannot be silently altered
Multiple embedding spaces	No single model priesthood
Multiple index operators	No single retrieval monopoly
Public governance decisions	No shadow moderation
Forkability	Exit option prevents lock-in
Role separation	No single party controls admission + ranking + disputes

8.2 Economic

Mechanism	How It Resists Capture
Contribution receipts	Consumers must give back
Influence caps	Contribution does not buy unlimited influence
Transparent funding	Index operators and stewards declare funding sources
No pay-for-ranking	Objects cannot be promoted through payment

Mechanism	How It Resists Capture
Public challenge process	Anyone can challenge any object
Appeal process	Governance decisions can be appealed
Minority view floors	Retrieval policies must surface dissenting views
Contradiction surfacing	Counter-evidence is returned alongside supporting evidence
Community forking	Disagree? Fork. History is preserved.

9. Bootstrapping Plan

Phase 1 — Foundation

Deliverable	Description
Manifesto	Public statement of principles and intent
Protocol specification	This document, formalised and versioned in a Git repository
Schema repository	The Open Embeddings Schema as a versioned spec
Reference node	A single operational node demonstrating the protocol
Sample corpus	A small, curated dataset with full provenance
Challenge flow demo	A working example of the challenge and resolution process
Community home	A Matrix space for working groups and discussion

Phase 2 — Community

Deliverable	Description
Signed contributors	First cohort of contributors with verified DIDs
Multiple embedding providers	Embeddings from at least 3 different models
Public retrieval explorer	A web UI for browsing artefacts, embeddings, claims, and challenges
Contradiction surfacing demo	Retrieval that shows supporting and contradicting evidence
Governance handbook	Published rules for challenge review and governance decisions

Phase 3 — Federation

Deliverable	Description
Multi-node federation	Multiple independent nodes sharing the commons
Reputation/stake system	Contribution-weighted influence
Alternate index operators	At least 2 independent retrieval services
Client applications	Tools for researchers, journalists, educators
Domain communities	Specialised communities (science, law, history, etc.)

Phase 4 — Maturity

Deliverable	Description
Cross-language support	Artefacts and embeddings in multiple languages
Institutional adoption	Libraries, universities, NGOs contributing
Governance evolution	Community-driven protocol changes
Ecosystem growth	Third-party tools and integrations

10. Community and Communication

Recommended Stack

Layer	Platform	Purpose
Core collaboration	Matrix	Working groups, design discussions, governance rooms, moderated community channels
Public publishing	Fediverse / ActivityPub	Public announcements, essays, manifestos, recruiting aligned people, federated discussion
Discovery	AT Protocol / Bluesky	Short-form thought leadership, finding protocol and open-web people, sharing milestones
Specification	Git repository	Versioned protocol and schema specifications
In-person	FOSDEM-style events	Finding serious contributors, presenting the protocol, meeting adjacent communities

Why This Stack

Matrix is an open network for decentralised communication, governed by a foundation committed to open standards. It fits a serious protocol-building community.
ActivityPub is a W3C Recommendation for decentralised social networking. It provides reach without platform dependence.
AT Protocol is an open framework for public conversation with account portability. It is good for discovery and attracting technical contributors.
Git provides permanence, versioning, and collaboration for specifications.
FOSDEM and similar events connect the open-source and decentralised-web communities in person.

11. Charter

Transparent Semantic Commons Charter v0.1

No hidden transforms.
No silent edits.
No opaque ranking.
No single embedding monopoly.
Every claim must be linkable to evidence.
Every object must be challengeable.
Every governance action must be public.
Every community must be able to fork.
Consumers should contribute back.
The commons exists for collective flourishing, not extraction.

12. The Hard Truths

The enemy is not just secrecy. It is also:

Convenience — closed systems are easier to use
Apathy — most people will not participate in governance
Scale asymmetry — billion-dollar systems have more resources
Moderation burden — open systems attract abuse
Slow institutional capture — governance can be co-opted gradually

This protocol wins only if it is:

Simpler to inspect than closed systems
Easier to fork than captured systems
More useful in practice than being merely morally appealing

A working prototype matters more than a perfect philosophy.

13. Naming

Candidate names for the broader initiative:

Name	Rationale
Transparent Semantic Commons	Says exactly what it is: transparent, semantic, and common
Open Semantic Commons	Emphasises openness
Civic Knowledge Ledger	Emphasises the civic and knowledge dimensions
Forkable Truth Protocol	Emphasises the anti-capture mechanism

The recommended name is Transparent Semantic Commons — it is precise, unglamorous, and resistant to marketing capture.

14. Relationship to Technology Shield

Technology Shield contributes this protocol and framework as part of its commitment to cybersecurity that serves people, not just organisations.

The Transparent AI Framework connects to Technology Shield's other collateral:

Collateral	Relationship
Pattern Blueprint	The protocol layers are themselves architectural patterns
Security Reference Architecture	The commons infrastructure follows zoning principles
Secure Development Framework	Protocol node software is built with the SDF pipeline
Secure Collaborative Development	The open-source commons benefits from SCDA patterns for community contribution
Shield Business	Future integration for organisations tracking AI supply chain risk

15. What to Build First

One repo. One schema. One Matrix room. One public manifesto. One explorer UI. One sample corpus. One challenge flow.

Start there. Everything else follows.

Transparent AI Framework — Protocol for Open AI System

1. Preamble

2. Protocol Layers

3. Layer 1 — Source

Purpose

Rules

Storage

4. Layer 2 — Transformation

Purpose

Rules

Transform Types

5. Layer 3 — Proof

Purpose

Mechanisms

Verification Protocol

6. Layer 4 — Governance

Purpose

6.1 Core Governance Rules

6.2 Roles

6.3 Challenge Resolution Process

6.4 Forkability

7. Layer 5 — Retrieval

Purpose

7.1 Retrieval Policy as a Public Object

7.2 Bias-Resistant Retrieval Pattern

7.3 Index Operator Rules

8. Anti-Capture Mechanisms

8.1 Structural

8.2 Economic

8.3 Social

9. Bootstrapping Plan

Phase 1 — Foundation

Phase 2 — Community

Phase 3 — Federation

Phase 4 — Maturity

10. Community and Communication

Recommended Stack

Why This Stack

11. Charter

Transparent Semantic Commons Charter v0.1

12. The Hard Truths

13. Naming

14. Relationship to Technology Shield

15. What to Build First