Module 1 — Site & Investigator Intelligence
This document defines 30 user requirements for ClinicalOS Module 1. Each requirement has a unique ID, priority (Critical/High), acceptance criteria, and rationale. Requirements are grouped by functional category.
| ID | Category | Requirement | Priority |
|---|---|---|---|
| URS-001 | Data Ingestion | The system shall ingest clinical trial site data from ClinicalTrials.gov via its public REST API (v2). Acceptance: Sites are retrieved, parsed, and stored in the database with source provenance recorded. Minimum 5,000 sites ingested. Rationale: Primary data source for site intelligence. Public data, no patient data involved. | Critical |
| URS-002 | Data Ingestion | The system shall ingest investigator publication data from PubMed via NCBI E-utilities API. Acceptance: Investigators are enriched with h-index and publication count. At least 400 investigators enriched. Rationale: Publication metrics are a key indicator of investigator research output. | Critical |
| URS-003 | Data Ingestion | The system shall support incremental (daily) and full ingestion modes. Acceptance: Daily mode fetches trials updated in last 24-168 hours. Full mode performs broad search by conditions/countries. Rationale: Daily updates keep data current; full mode enables comprehensive initial load. | High |
| URS-004 | Data Ingestion | The system shall record data provenance for each ingestion run (source, date, record count, hash). Acceptance: Each ingestion creates a data_provenance record with SHA-256 integrity hash. ALCOA+ compliant. Rationale: ALCOA+ requirement: data must be Attributable, Legible, Contemporaneous, Original, Accurate. | Critical |
| URS-005 | Site Scoring | The system shall compute a composite site score (0-100) based on 5 weighted dimensions. Acceptance: Score = Recruitment (30%) + Experience (25%) + Publications (15%) + Infrastructure (15%) + Regulatory (15%). Each dimension normalized 0-1. Rationale: Multi-dimensional scoring provides objective, comparable site evaluation. | Critical |
| URS-006 | Site Scoring | The system shall provide score explainability showing contributing factors and data sources for each dimension. Acceptance: Each score includes breakdown by dimension with source attribution (e.g., 'trial_count from ClinicalTrials.gov: +0.3'). Rationale: RC-3 AI Governance: scores must be explicable and auditable. | Critical |
| URS-007 | Site Scoring | The system shall allow users to customize dimension weights and recalculate scores in real-time. Acceptance: Custom weights applied via API; recalculated score returned within 2 seconds. Rationale: Different trials have different priorities (e.g., oncology vs. rare disease). | High |
| URS-008 | Site Scoring | Scoring shall be deterministic: same input + same model version = same output. Acceptance: Repeated scoring of same site with same data produces identical results (no random variance). Rationale: Reproducibility is a core ALCOA+ and ICH requirement for auditable systems. | Critical |
| URS-009 | Search & Discovery | The system shall support structured search with filters: therapeutic area, phase, country, enrollment status, min trials, min capacity. Acceptance: All filter combinations return correct, tenant-isolated results. Pagination supported. Rationale: Users need to quickly find relevant sites from large datasets. | Critical |
| URS-010 | Search & Discovery | The system shall support natural language (AI-powered) search queries with synonym expansion. Acceptance: Query 'oncology sites in Switzerland' returns relevant results. Synonyms expanded (e.g., HCC to hepatocellular). Rationale: Natural language reduces friction and improves discoverability. | High |
| URS-011 | Search & Discovery | The system shall provide a conversational AI agent for multi-turn site queries. Acceptance: Agent maintains session context, interprets follow-up questions, returns relevant sites with explanations. Rationale: Complex queries benefit from conversational interaction. | High |
| URS-012 | Investigator Profiling | The system shall maintain investigator profiles with affiliation, specialty, h-index, publication count, and trial history. Acceptance: Profiles display all fields. Data sourced from ClinicalTrials.gov and PubMed. Rationale: Investigator selection requires comprehensive profile information. | Critical |
| URS-013 | Investigator Profiling | The system shall support side-by-side comparison of 2-5 investigators. Acceptance: Comparison view shows all metrics in parallel columns for selected investigators. Rationale: Sponsors need to compare candidates during investigator selection. | High |
| URS-014 | Export & Reporting | The system shall export search results and site details as PDF and Excel formats. Acceptance: PDF includes formatted scores and details. Excel includes raw data suitable for further analysis. Rationale: Users need to share results with stakeholders and import into other systems. | Critical |
| URS-015 | Export & Reporting | The system shall support project workspaces with site shortlisting and notes. Acceptance: Users can create projects, add/remove sites to shortlist, add notes per site, track selection status. Rationale: Site selection is a workflow, not a single search. | High |
| URS-016 | Recruitment Prediction | The system shall predict patient recruitment rates per site with confidence intervals. Acceptance: Predictions include optimistic/realistic/pessimistic scenarios with confidence score and explanatory factors. Rationale: Recruitment prediction is a key differentiator for site selection decisions. | High |
| URS-017 | Multi-Tenancy | The system shall enforce tenant isolation: each organization sees only its own data. Acceptance: All database queries filtered by org_id. No cross-tenant data leakage. Verified by automated tests. Rationale: GDPR and contractual requirement for B2B SaaS platform. | Critical |
| URS-018 | Authentication & Security | The system shall authenticate users via JWT tokens with 15-minute access and 7-day refresh tokens. Acceptance: Login returns JWT. Access token expires in 15 min. Refresh token extends session up to 7 days. Rationale: Industry-standard session management balancing security and usability. | Critical |
| URS-019 | Authentication & Security | The system shall enforce RBAC with 4 levels: Super Admin, Org Admin, User, Read Only. Acceptance: Each role has defined permissions. Unauthorized actions return 403. Role verified on every request. Rationale: 21 CFR Part 11 and ICH E6 require access control commensurate with responsibility. | Critical |
| URS-020 | Authentication & Security | The system shall support Two-Factor Authentication (TOTP) for all user accounts. Acceptance: Users can enable 2FA. Login requires TOTP code when enabled. Backup codes provided. Rationale: Enhanced security for access to clinical trial data. | High |
| URS-021 | Audit Trail | The system shall log all state-changing actions (POST, PUT, PATCH, DELETE) to an immutable audit trail. Acceptance: Audit log records: user, action, entity, timestamp, IP hash. Records cannot be updated or deleted (trigger enforced). Rationale: 21 CFR Part 11 requires complete, immutable audit trail. | Critical |
| URS-022 | Audit Trail | The system shall maintain a SHA-256 hash chain on audit records for tamper detection. Acceptance: Each audit record has record_hash and prev_hash. Chain integrity verifiable programmatically. Rationale: Hash chain provides cryptographic proof of audit trail integrity. | Critical |
| URS-023 | Data Encryption | The system shall encrypt data in transit (TLS 1.3) and at rest (AES-256). Acceptance: All HTTP traffic uses TLS 1.3. Database uses pgcrypto extension. Sensitive columns encrypted. Rationale: GDPR Article 32 and 21 CFR Part 11 require appropriate technical measures. | Critical |
| URS-024 | Performance | Site search API shall respond within 5 seconds (p95) for queries returning up to 500 results. Acceptance: Load test confirms p95 latency < 5s with concurrent users. Rationale: User experience requires responsive search. | High |
| URS-025 | Performance | The system shall maintain 99.5% availability (SLA). Acceptance: Monitoring confirms uptime over rolling 30-day window. Rationale: Clinical operations require reliable access to site selection tools. | High |
| URS-026 | AI Governance | The system shall maintain an AI model registry tracking model versions, deployment dates, and benchmark scores. Acceptance: ai_models table records every model version used. No 'latest' model references in scoring. Rationale: RC-3 requirement: AI models must be versioned and auditable. | Critical |
| URS-027 | AI Governance | The system shall log every AI inference (input hash, model version, output hash, timestamp) to an AI audit log. Acceptance: ai_audit_log table records all AEGIS API calls with input/output hashes. Rationale: Reproducibility and auditability of AI-driven decisions. | Critical |
| URS-028 | User Interface | The system shall display a legal disclaimer indicating it is a decision support tool, not a GxP system of record. Acceptance: Disclaimer visible in app footer and on /legal/disclaimer page. Badges on non-validated modules. Rationale: RC-1 requirement: clear product classification to avoid liability. | Critical |
| URS-029 | User Interface | The system shall support 9 languages for the user interface (EN, FR, DE, ES, IT, PT, JA, ZH, KO). Acceptance: Language switcher available. All UI labels translated. Fallback to English for missing translations. Rationale: Global clinical trials require multilingual support. | High |
| URS-030 | User Interface | The system shall present sites in three view modes: card grid, sortable table, and geographic map. Acceptance: All three views available and switchable. Map shows markers colored by score. Rationale: Different users prefer different visualization modes for site evaluation. | High |
Total requirements: 30 (18 Critical, 12 High)
Document approval: Validated by Quality Assurance
Next review: Upon major feature change or annually