Transparent AI Framework — Protocol for Open AI System

A public, versioned, transparent, replayable semantic commons — governed openly and designed to resist capture.

Version: 0.1.0 Status: Draft Author: Technology Shield Last Updated: 2026-03-25


1. Preamble

The large AI systems being built today are funded by billions of dollars, trained on opaque data, governed by invisible rules, and designed to concentrate power.

This protocol is a countermeasure.

It does not claim to produce unbiased truth. Instead, it creates a system where:

  • All inputs are inspectable
  • All transforms are declared
  • All rankings are attributable
  • All disputes are preserved
  • Alternative interpretations can coexist

You do not eliminate bias by claiming neutrality. You reduce bias by making every assumption legible, forkable, challengeable, and reversible.

That is the heart of this protocol.


2. Protocol Layers

The Transparent Semantic Commons operates across five layers:

┌──────────────────────────────────────────────────────┐
│                 5. RETRIEVAL LAYER                     │
│  Search, ranking, neighbourhood lookup,               │
│  contradiction surfacing, competing views              │
├──────────────────────────────────────────────────────┤
│                 4. GOVERNANCE LAYER                    │
│  Reputation, dispute resolution, weighting rules,     │
│  moderation, anti-poisoning, forkability              │
├──────────────────────────────────────────────────────┤
│                 3. PROOF LAYER                         │
│  Signatures, hashes, timestamps, lineage,             │
│  challenge records, verification                      │
├──────────────────────────────────────────────────────┤
│                 2. TRANSFORMATION LAYER                │
│  Chunking, normalisation, tagging, translation,       │
│  embedding, summarisation                             │
├──────────────────────────────────────────────────────┤
│                 1. SOURCE LAYER                        │
│  Raw public artefacts: text, images, audio,           │
│  datasets, claims, arguments                          │
└──────────────────────────────────────────────────────┘

Each layer has clear responsibilities, and objects at each layer are defined by the Open Embeddings Schema (separate document).


3. Layer 1 — Source

Purpose

Accept and preserve raw public artefacts with verified provenance.

Rules

Rule Description
All artefacts must be content-hashed Integrity verifiable at any time
All artefacts must be signed by their submitter Attribution is non-repudiable
All artefacts must declare a licence Consumption rights are clear
All artefacts must be stored durably Content must survive individual node failure
No silent modification Updates create new artefact versions linked to the original

Storage

Artefacts are stored off-chain in durable decentralised storage (IPFS, Arweave, Filecoin, or federated node operators). Content hashes and signatures are anchored on-chain for tamper-evidence.


4. Layer 2 — Transformation

Purpose

Record every operation applied to a source artefact, with full transparency about who did it, what software they used, what model they ran, and what parameters they chose.

Rules

Rule Description
Every transformation must be recorded No hidden transforms
Every transformation must declare its toolchain Software, model, version, parameters
Every transformation must be signed Accountability for the transform
Transformations are append-only You cannot silently re-run a transform
Multiple transforms per artefact are expected Different models, chunking strategies, and approaches coexist

Transform Types

Type Description
chunk Split an artefact into smaller units
translate Translate to another language
classify Assign classification labels
summarize Produce a summary
embed Produce a vector embedding
redact Remove sensitive content (with declaration)
normalize Standardise format or encoding

5. Layer 3 — Proof

Purpose

Provide cryptographic evidence that the provenance chain is intact, that objects have not been tampered with, and that challenges and disputes are preserved.

Mechanisms

Mechanism Purpose
Content-addressed IDs Object identity is derived from content; any change produces a different ID
Ed25519 signatures Every object is signed by its creator
DID-based identity Signers are identified by decentralised identifiers
Timestamp anchoring Object timestamps are anchored on a public ledger
Provenance chain verification Any embedding can be traced back through its transform to the source artefact
Challenge preservation Challenges are immutable; they cannot be silently dismissed
Governance decision publication Every governance action is a signed, public record

Verification Protocol

For any object retrieved from the commons:

  1. Hash verification — Does the object's content match its declared hash?
  2. Signature verification — Is the signature valid for the declared signer?
  3. Identity resolution — Does the signer's DID resolve to a valid public key?
  4. Lineage verification — Does the provenance chain back to the source artefact verify?
  5. Challenge check — Are there open challenges against this object or any object in its chain?
  6. Governance check — Are there governance decisions (tombstones, supersessions) affecting this object?

Result: verified | unverifiable | disputed | tombstoned | superseded


6. Layer 4 — Governance

Purpose

Govern the commons in a way that resists capture by any single party, organisation, or alliance.

6.1 Core Governance Rules

These rules are constitutional — they define the minimum governance that all participants accept:

# Rule
1 No silent edits. Every change creates a new versioned record.
2 No hard deletes. Only tombstones and supersession.
3 Every transformation signed. No anonymous transforms.
4 Every moderation decision public. No shadow bans, no invisible removal.
5 Every ranking policy versioned. No invisible algorithm changes.
6 Every challenge preserved. Challenges cannot be deleted, only resolved.
7 Every fork portable. Any community can fork the full dataset, embeddings, policies, and reputation.
8 Consumers should contribute back. Pure extraction without reciprocity is discouraged through governance mechanisms.

6.2 Roles

Role Responsibility Power
Contributors Submit artefacts, embeddings, claims, evidence links Create objects
Validators Verify provenance chains and object integrity Flag invalid objects
Domain Stewards Curate quality within a specific domain Recommend, prioritise, tag
Challenge Reviewers Evaluate and resolve challenges Issue governance decisions
Protocol Maintainers Evolve the protocol and schema Propose protocol changes (subject to community approval)
Index Operators Run retrieval indexes Serve queries (cannot alter underlying data)

Anti-capture constraint: No single role may control all of: - Data admission - Ranking - Dispute resolution

These three powers must be held by different parties.

6.3 Challenge Resolution Process

Challenge opened
    │
    ▼
Evidence gathering period (configurable, default 14 days)
    │
    ▼
Review by Challenge Reviewers (minimum 3 independent reviewers)
    │
    ├── Upheld → Governance action (tombstone, supersede, flag)
    ├── Dismissed → Challenge marked as dismissed with rationale
    └── Escalated → Broader community vote
         │
         ▼
    Appeal window (configurable, default 30 days)
         │
         ├── No appeal → Decision final
         └── Appeal → Re-review with expanded panel

6.4 Forkability

This is the ultimate anti-capture safety valve.

If a community believes governance has been captured, they must be able to fork:

What Can Be Forked How
The dataset (artefacts) Content-addressed; any node can replicate
The embeddings Content-addressed; stored off-chain
The governance policies Versioned, public documents
The reputation graph Exported as signed objects
The retrieval policies Public, versioned objects
The challenge history Immutable, content-addressed

The fork inherits the full history. The new community can then modify governance going forward while preserving the shared past.


7. Layer 5 — Retrieval

Purpose

Provide transparent, policy-driven retrieval that makes bias visible rather than invisible.

7.1 Retrieval Policy as a Public Object

Retrieval policies are themselves signed, versioned objects:

{
  "type": "RetrievalPolicy",
  "policy_id": "cid:bafy...",
  "candidate_sources": ["public", "science", "english"],
  "similarity_metric": "cosine",
  "recency_weight": 0.1,
  "reputation_weight": 0.2,
  "challenge_penalty": 0.5,
  "diversity_requirement": {
    "minority_view_floor": 0.15,
    "contradiction_surfacing": true,
    "max_single_source_share": 0.3
  },
  "jurisdictional_filters": [],
  "version": "1.0.0",
  "published_by": "did:key:z6Mk...",
  "signature": "ed25519:..."
}

Different communities can publish different retrieval policies while sharing the same underlying commons. This is how pluralism works in practice.

7.2 Bias-Resistant Retrieval Pattern

For any query, the retrieval response includes:

{
  "query": "...",
  "policy_used": "cid:bafy...",
  "model_spaces_used": ["bge-m3", "e5-large-v2"],
  "results": {
    "supporting": [
      {
        "embedding_id": "cid:bafy...",
        "similarity": 0.94,
        "provenance_status": "verified",
        "challenge_status": "none"
      }
    ],
    "contradicting": [
      {
        "embedding_id": "cid:bafy...",
        "similarity": 0.87,
        "provenance_status": "verified",
        "challenge_status": "none"
      }
    ],
    "disputed": [
      {
        "embedding_id": "cid:bafy...",
        "challenge_id": "cid:bafy...",
        "challenge_status": "open"
      }
    ]
  },
  "policy_explanation": "Results include top 10 supporting and top 5 contradicting. Minority view floor of 15% applied. Disputed objects surfaced separately."
}

Instead of returning one answer, the system returns:

  1. Closest matches (supporting evidence)
  2. Strongest counter-matches (contradicting evidence)
  3. Unresolved disputes (challenged objects)
  4. Policy explanation (which rules shaped the response)
  5. Provenance status (verification state of each result)

This is fundamentally healthier than a single invisible ranking.

7.3 Index Operator Rules

Anyone can run an index. Index operators:

  • Must declare which retrieval policies they support
  • Must not alter underlying object data
  • Must serve provenance chains on request
  • Must surface challenges and disputes
  • Must declare their funding and governance
  • Can be forked (anyone can run a competing index)

8. Anti-Capture Mechanisms

8.1 Structural

Mechanism How It Resists Capture
Content addressing Data cannot be silently altered
Multiple embedding spaces No single model priesthood
Multiple index operators No single retrieval monopoly
Public governance decisions No shadow moderation
Forkability Exit option prevents lock-in
Role separation No single party controls admission + ranking + disputes

8.2 Economic

Mechanism How It Resists Capture
Contribution receipts Consumers must give back
Influence caps Contribution does not buy unlimited influence
Transparent funding Index operators and stewards declare funding sources
No pay-for-ranking Objects cannot be promoted through payment

8.3 Social

Mechanism How It Resists Capture
Public challenge process Anyone can challenge any object
Appeal process Governance decisions can be appealed
Minority view floors Retrieval policies must surface dissenting views
Contradiction surfacing Counter-evidence is returned alongside supporting evidence
Community forking Disagree? Fork. History is preserved.

9. Bootstrapping Plan

Phase 1 — Foundation

Deliverable Description
Manifesto Public statement of principles and intent
Protocol specification This document, formalised and versioned in a Git repository
Schema repository The Open Embeddings Schema as a versioned spec
Reference node A single operational node demonstrating the protocol
Sample corpus A small, curated dataset with full provenance
Challenge flow demo A working example of the challenge and resolution process
Community home A Matrix space for working groups and discussion

Phase 2 — Community

Deliverable Description
Signed contributors First cohort of contributors with verified DIDs
Multiple embedding providers Embeddings from at least 3 different models
Public retrieval explorer A web UI for browsing artefacts, embeddings, claims, and challenges
Contradiction surfacing demo Retrieval that shows supporting and contradicting evidence
Governance handbook Published rules for challenge review and governance decisions

Phase 3 — Federation

Deliverable Description
Multi-node federation Multiple independent nodes sharing the commons
Reputation/stake system Contribution-weighted influence
Alternate index operators At least 2 independent retrieval services
Client applications Tools for researchers, journalists, educators
Domain communities Specialised communities (science, law, history, etc.)

Phase 4 — Maturity

Deliverable Description
Cross-language support Artefacts and embeddings in multiple languages
Institutional adoption Libraries, universities, NGOs contributing
Governance evolution Community-driven protocol changes
Ecosystem growth Third-party tools and integrations

10. Community and Communication

Layer Platform Purpose
Core collaboration Matrix Working groups, design discussions, governance rooms, moderated community channels
Public publishing Fediverse / ActivityPub Public announcements, essays, manifestos, recruiting aligned people, federated discussion
Discovery AT Protocol / Bluesky Short-form thought leadership, finding protocol and open-web people, sharing milestones
Specification Git repository Versioned protocol and schema specifications
In-person FOSDEM-style events Finding serious contributors, presenting the protocol, meeting adjacent communities

Why This Stack

  • Matrix is an open network for decentralised communication, governed by a foundation committed to open standards. It fits a serious protocol-building community.
  • ActivityPub is a W3C Recommendation for decentralised social networking. It provides reach without platform dependence.
  • AT Protocol is an open framework for public conversation with account portability. It is good for discovery and attracting technical contributors.
  • Git provides permanence, versioning, and collaboration for specifications.
  • FOSDEM and similar events connect the open-source and decentralised-web communities in person.

11. Charter

Transparent Semantic Commons Charter v0.1

  1. No hidden transforms.
  2. No silent edits.
  3. No opaque ranking.
  4. No single embedding monopoly.
  5. Every claim must be linkable to evidence.
  6. Every object must be challengeable.
  7. Every governance action must be public.
  8. Every community must be able to fork.
  9. Consumers should contribute back.
  10. The commons exists for collective flourishing, not extraction.

12. The Hard Truths

The enemy is not just secrecy. It is also:

  • Convenience — closed systems are easier to use
  • Apathy — most people will not participate in governance
  • Scale asymmetry — billion-dollar systems have more resources
  • Moderation burden — open systems attract abuse
  • Slow institutional capture — governance can be co-opted gradually

This protocol wins only if it is:

  1. Simpler to inspect than closed systems
  2. Easier to fork than captured systems
  3. More useful in practice than being merely morally appealing

A working prototype matters more than a perfect philosophy.


13. Naming

Candidate names for the broader initiative:

Name Rationale
Transparent Semantic Commons Says exactly what it is: transparent, semantic, and common
Open Semantic Commons Emphasises openness
Civic Knowledge Ledger Emphasises the civic and knowledge dimensions
Forkable Truth Protocol Emphasises the anti-capture mechanism

The recommended name is Transparent Semantic Commons — it is precise, unglamorous, and resistant to marketing capture.


14. Relationship to Technology Shield

Technology Shield contributes this protocol and framework as part of its commitment to cybersecurity that serves people, not just organisations.

The Transparent AI Framework connects to Technology Shield's other collateral:

Collateral Relationship
Pattern Blueprint The protocol layers are themselves architectural patterns
Security Reference Architecture The commons infrastructure follows zoning principles
Secure Development Framework Protocol node software is built with the SDF pipeline
Secure Collaborative Development The open-source commons benefits from SCDA patterns for community contribution
Shield Business Future integration for organisations tracking AI supply chain risk

15. What to Build First

One repo. One schema. One Matrix room. One public manifesto. One explorer UI. One sample corpus. One challenge flow.

Start there. Everything else follows.