# ClinicalMem — Full LLM Index

> Persistent, auditable, contradiction-safe clinical memory for healthcare AI agents.
> A 6-layer drug-safety pipeline anchored by a 50,949-parameter BitNet b1.58 ternary
> classifier with bit-identical Q16.16 fixed-point replay across phones, browsers,
> $15 Raspberry Pi Zero 2 W boards, NVIDIA A100s, and NVIDIA GB300 devices.

ClinicalMem is published by **STARGA, Inc.** under the **Apache License 2.0**.
This `llms-full.txt` is the expanded counterpart of [`/llms.txt`](https://clinicalmem-demo.pages.dev/llms.txt)
and follows the proposed [llmstxt.org](https://llmstxt.org) standard so language
models can ingest the project's full claim surface in a single fetch.

- **Version:** 4.1.0 (hackathon submission cut, 2026-05-11)
- **License:** Apache-2.0 with explicit § 3 patent grant
- **Repo:** https://github.com/star-ga/clinicalmem
- **Demo:** https://clinicalmem-demo.pages.dev/
- **Live MCP:** https://clinicalmem-mcp.thankfulpond-9c3fdc1e.eastus.azurecontainerapps.io/mcp
- **Live A2A card:** https://clinicalmem-a2a.thankfulpond-9c3fdc1e.eastus.azurecontainerapps.io/.well-known/agent-card.json
- **Model card:** https://huggingface.co/star-ga/clinicalmem-bitnet-b158
- **Devpost:** https://devpost.com/software/clinimalmem
- **PyPI memory layer:** https://pypi.org/project/mind-mem/ (>= 4.0.0)
- **Contact:** info@star.ga

---

## What ClinicalMem actually claims (verified, machine-checkable)

| Claim | Number | Source-of-truth |
|---|---|---|
| PCCP cohort recall | **139 / 139** (44 contraindicated · 4 major · 69 serious · 22 moderate) | `scripts/run_clinical_regression_eval.py` over `docs/openevidence_cache.json` |
| Negative-control false positives | **0 / 10** (incl. clopidogrel + pantoprazole, atorvastatin + amlodipine, simvastatin + diltiazem, spironolactone + trimethoprim) | `scripts/run_negative_control_eval.py` over `docs/negative_control_cohort.json` |
| Cross-architecture determinism | **1200 / 1200** bit-identical Q16.16 replays (12 pairs × 100 iterations) | `scripts/run_bitnet_determinism_stress.py` |
| Python ↔ JavaScript parity | byte-identical `repro_hash` between server pass and browser pass | `tests/test_engine/test_browser_bitnet_pin.py` |
| FHIR cohort | **30 patients · 47 NPIs · 239 resources**, all CMS-Luhn-valid synthetic | `docs/synthea_demo_cohort.json` |
| Engine + scripts test suite | **1425 / 1425** PASS (pinned floor) | `tests/test_engine/test_test_count_drift_pin.py` |
| Federation invariants | **21 typed runtime invariants** in `JointMemoryFederation.flow.mind`, 21/21 PASS | `scripts/federation_mock_demo.py` |
| Audit-replay verifier | **47 / 47** `repro_hash` values reproduced byte-for-byte under live bundle | `scripts/verify_audit_replay.py` |
| Layer-4 LLM consensus surface | 6 US-headquartered providers (OpenAI GPT-5.5, Google Gemini 3.1 Pro, xAI Grok 4.3, Anthropic Claude Opus 4.7, Perplexity Sonar Pro, Meta Llama 4 Maverick (400B MoE) via NIM) | `engine/consensus_engine.py` |
| Layer-4.5 BitNet bundle A (contra/major gate) | bundle_id `1f0f88591c05af57…` · 50,688 ternary weights + 261 Q16.16 biases · ~118 KB | `engine/bitnet_weights.json` |
| Layer-4.5 BitNet bundle B (tier-2 specialist) | bundle_id `5f7ed5f6…` · 12,741 ternary params · ~30 KB | `engine/bitnet_weights_b_specialist.json` |
| Live MCP serverInfo | `ClinicalMem` v4.1.0 · 18 tools | `mcp_server/server.py` |
| Live A2A agent-card | `clinicalmem_agent` v4.1.0 · 5 skills · 13 tools · `supportedInterfaces` present | `a2a_agent/app.py` |
| Independent multi-model code review | Six frontier models on a 10-category rubric | `docs/eval_runs/README.md` |

## What ClinicalMem deliberately does NOT claim

- **Real-world adverse-event reduction** on actual patient data — requires an IRB-approved cohort and 4–12 weeks of retrospective review. Out of scope for a hackathon entry.
- **Head-to-head accuracy vs Epic BPA / Cerner First Databank** under NPI-attributed prospective study.
- **Alert-fatigue measurement** — requires a real prescribing flow with clinician end-users.
- **FDA 510(k) clearance** — `docs/fda_q_sub_draft.md` documents the Q-sub-ready posture, not a cleared product.

The full "what's not claimed" list and the v2 resolution path live in `docs/clinical_validation.md` § Limitations and `docs/fda_q_sub_draft.md` § Gaps + Planned Resolution.

---

## Six-layer drug-safety pipeline

Each drug pair flowing through ClinicalMem traverses these six layers in order; a high-confidence verdict from an early deterministic layer short-circuits the later, more expensive LLM layers.

1. **Layer 1 — Deterministic table.** Rule-based safety rails catch known drug-drug interactions, allergy conflicts, lab contraindications, and provider disagreements. Microsecond response; cannot hallucinate. Source: `engine/clinical_scoring.py`.

2. **Layer 2 — OpenEvidence API.** For medication pairs not in the deterministic table, ClinicalMem queries OpenEvidence — the medical AI engine that powers Elsevier's ClinicalKey AI (developed with Mayo Clinic). Result cached in `docs/openevidence_cache.json`. Source: `engine/openevidence_cache.py`.

3. **Layer 3 — NIH RxNorm + Drug Interaction API.** Resolves drug names to RxCUI identifiers, normalizes medication lists, and pulls pairwise interactions from the federal database used by Epic, Cerner, and every certified EHR. No API key required. Source: `engine/rxnorm_client.py` and `engine/nih_interaction_client.py`.

4. **Layer 4 — Six-model US-headquartered LLM consensus.** Verifies findings across six providers in parallel: OpenAI GPT-5.5, Google Gemini 3.1 Pro, xAI Grok 4.3, Anthropic Claude Opus 4.7, Perplexity Sonar Pro, Meta Llama 4 Maverick (400B MoE) (via NIM at `integrate.api.nvidia.com`). HIPAA-compatible data residency: all six providers are US-headquartered. Cascade scales gracefully (1–6 providers); HIGH verdict requires majority consensus. Source: `engine/consensus_engine.py`.

5. **Layer 4.5 — BitNet b1.58 ternary classifier (FDA SaMD reproducibility primitive).** A clean-room Python implementation of the architecture from Ma et al. ([arXiv:2402.17764](https://arxiv.org/abs/2402.17764)). Pure-integer Q16.16 fixed-point forward pass over ternary weights ∈ {−1, 0, +1} — no multiplication, only addition and subtraction. **Output is bit-identical across ARM, x86_64, CUDA, and NPU targets.** Every classification carries a SHA-256 `repro_hash` an auditor can re-verify in <1 ms per pair, no proprietary toolchain. iter-421 Path B 2-bundle ensemble: frozen v8 contra/major gate (bundle A, `1f0f8859…`) cascades into a tier-2 specialist (bundle B, `5f7ed5f6…`) under constrained argmax, hitting 100% recall on every severity class on the live 139-pair cohort. Total ensemble footprint **~148 KB**. Source: `engine/bitnet_classifier.py` + `engine/bitnet_features_v8.py`.

6. **Layer 5 — Medical LLM synthesis cascade.** Generates patient-specific clinical narratives from detected findings, citing evidence blocks by ID. The LLM never invents facts — it explains what the detection layers found. Source: `engine/llm_synthesizer.py`.

7. **Layer 6 — Abstention gate.** When confidence is insufficient (BM25 low, missing evidence, conflicting providers without quorum), the system refuses to generate a narrative. In healthcare, "I don't know" saves lives. Source: `engine/clinical_memory.py::recall_with_abstention()`.

---

## MCP server — 18 tools (FastMCP 2.x · Streamable HTTP)

The MCP server exposes 18 SHARP-on-MCP tools over Streamable HTTP. The current deployed version is `4.1.0` (read via `initialize`):

```
ingest_fhir_bundle · ingest_clinical_note · recall_clinical_context · medication_safety_review ·
detect_record_contradictions · what_if_add_medication · what_if_remove_medication · what_if_swap_medication ·
explain_clinical_conflict · query_drug_safety_alerts · match_clinical_trials · detect_phi · validate_hallucination ·
audit_chain_verify · export_audit_part11 · get_engine_health · list_active_invariants · diagnose_block_lineage
```

Live endpoint: https://clinicalmem-mcp.thankfulpond-9c3fdc1e.eastus.azurecontainerapps.io/mcp

Source: `mcp_server/server.py`.

## A2A agent — 5 skills · 13 tools (Google ADK)

The A2A agent wraps 13 underlying tools behind 5 declared skills:

| Skill | Description |
|---|---|
| `medication-safety-review` | Drug-drug interaction detection, allergy cross-reference, severity scoring. Uses MIND Lang clinical kernels. |
| `clinical-context-recall` | Patient-history Q&A using persistent memory with importance-scored retrieval + confidence gating + abstention. |
| `contradiction-assessment` | Cross-provider conflict scan: dangerous interactions, allergy-medication conflicts, conflicting care plans. |
| `care-transition-summary` | Structured handoff summary highlighting active issues, medication conflicts, pending actions. |
| `explain-conflict` | GenAI-powered patient-specific rationale for a detected safety conflict, with evidence citations and hard abstention. |

The card at `/.well-known/agent-card.json` reports `version=4.1.0`, includes both legacy 0.3.x fields (`url`, `preferredTransport`, `protocolVersion`) and forward-compat 1.0.x fields (`supportedInterfaces`, `securityRequirements`). FHIR R4 context is required as an `AgentExtension` so the caller (e.g. Prompt Opinion) injects patient identity + FHIR server credentials per request.

Live endpoint: https://clinicalmem-a2a.thankfulpond-9c3fdc1e.eastus.azurecontainerapps.io/

Source: `a2a_agent/app.py` + `a2a_agent/agent.py` + `a2a_agent/tools/`.

---

## Federation — 21 typed runtime invariants

`flows/JointMemoryFederation.flow.mind` declares 21 typed runtime invariants enforced by the MIND compiler:

1–9: structural · PHI-gate · severity-quorum · KeyEpoch revocation
10–14: X25519 sealing · ChaCha20-Poly1305 envelope · Ed25519 attestation
15–17: vector clock advancement · conflict-log monotonicity · merge-strategy determinism
18–21: cross-site cohort dedup · audit-chain replay · cohort cardinality · plan_hash equality

The mock demo (`scripts/federation_mock_demo.py`) exercises **all 21 invariants end-to-end** — the X25519 sealing invariants 10–14 are exercised in-process via a `SealedEnvelope` + `_x25519_seal` / `_x25519_open` round-trip mirroring the v4 federation HTTP wire transport released in `mind-mem v4.0.1` (commit `16a3e25`, 4 endpoints flag-gated by `v4.federation`, X-MindMem-Token auth, 1 MiB body cap, 11/11 wire tests + 40/40 existing transport tests pass).

ClinicalMem's `engine/federation_transport.py` bridges through `mind_mem.v4.federation_client.FederationClient` on the next Azure rebuild (pin: `mind-mem>=4.0.1`).

---

## Audit chain — 21 CFR Part 11

`engine/audit_export_part11.py` implements a 21 CFR Part 11 § 11.10 export shape:

- Ed25519-signed event chain
- SHA-256 predecessor hashing (TAG_v1 NUL-separated preimages)
- Q16.16 fixed-point scoring inside the audit hash preimages
- 30 unit tests pinning the structural integrity

Every clinical decision logged carries `audit_hash` cryptographic proof. The ClinicalMem demo emits `Audit: <hash>` at the end of every response.

The replay verifier `scripts/verify_audit_replay.py` re-classifies every pinned drug pair under the live bundle_id and asserts byte-for-byte agreement on every `repro_hash`. Current pin set: **47 pairs** (3 hand-picked severity-class anchors + every contraindicated cache entry).

---

## License + patent grant

- **Apache-2.0** with explicit § 3 patent grant (every contributor licenses their patent claims to every user).
- Single `LICENSE` file at the repo root — no buried clauses.
- The patent-pending MIC@2 / MAP / binary wire formats ride inside `mind-mem` (also Apache-2.0); the grant transitively applies.
- IRB-exempt synthetic cohort under 45 C.F.R. § 46.102(e)(1) — see `docs/irb_exemption.md`.

## Clinical review pathway

- The target reviewer profile (US-licensed family medicine, multi-hospital affiliation, CMS-NPPES-verifiable NPI) and the scope of review (Sarah Mitchell demo bundle, NTI cohort severity calls, abstention-gate triggers, BP-target conflict) are described in `docs/clinical_validation.md`.
- Independent clinical sign-off remains a v2 follow-up.

## Brand + visual identity

- **Palette:** Prompt Opinion orange (`#FF6B35`) accent · Trust teal (`#0F766E`) primary · Mint canvas
- **Type:** Lexend display · Source Sans 3 body · Baloo 2 brand-accent
- **Logo:** `docs/logo.svg` (canonical) · `docs/og-image.svg` (Open Graph, 1200×630)
- **Aesthetic:** FDA-grade clinical, production-grade polish

---

## How to cite

```bibtex
@software{clinicalmem_2026,
  title  = {ClinicalMem: Auditable BitNet b1.58 Drug-Safety Memory for Healthcare AI},
  author = {{STARGA, Inc.}},
  year   = {2026},
  version = {4.1.0},
  url    = {https://github.com/star-ga/clinicalmem},
  license = {Apache-2.0}
}
```

## Files an LLM should fetch first for deep context

1. `README.md` — quick-tour, six-layer summary, audit-replay invitation
2. `JUDGES.md` — 5-minute audit script + 30-second pillar table
3. `DEVPOST.md` — clinical narrative, "Inspiration" + "What we built" sections
4. `docs/architecture.md` — single-site / multi-site / edge deployment patterns
5. `docs/clinical_validation.md` — every claim mapped to its source
6. `docs/federated_memory.md` — typed federation flow + PHI gate
7. `docs/why_bitnet_b158.md` — reproducibility rationale for the FDA SaMD primitive
8. `docs/bitnet_training.md` — bundle-A + bundle-B retrain recipe
9. `docs/fda_q_sub_draft.md` — Q-sub-ready filing draft + gap acknowledgements
10. `docs/eval_runs/README.md` — external judge-LLM evaluation summary

These ten files cover the entire ClinicalMem submission surface and total ≈ 80 KB of focused markdown — comfortably within any frontier LLM's context window.

---

*ClinicalMem v4.1.0 · STARGA, Inc. · Apache-2.0 · 2026-05-11*