COS-CSV-URS-001Approvedv1.0

User Requirements Specification (URS)

Module 1 — Site & Investigator Intelligence

System: ClinicalOS v1.0

Module: Site & Investigator Intelligence

Classification: GAMP 5 Category 5

Intended use: Decision support for site selection

Data classification: Public data only (no patient data)

Regulatory scope: 21 CFR Part 11, GDPR, ICH E6(R3)

This document defines 30 user requirements for ClinicalOS Module 1. Each requirement has a unique ID, priority (Critical/High), acceptance criteria, and rationale. Requirements are grouped by functional category.

ID	Category	Requirement	Priority
URS-001	Data Ingestion	The system shall ingest clinical trial site data from ClinicalTrials.gov via its public REST API (v2). Acceptance: Sites are retrieved, parsed, and stored in the database with source provenance recorded. Minimum 5,000 sites ingested. Rationale: Primary data source for site intelligence. Public data, no patient data involved.	Critical
URS-002	Data Ingestion	The system shall ingest investigator publication data from PubMed via NCBI E-utilities API. Acceptance: Investigators are enriched with h-index and publication count. At least 400 investigators enriched. Rationale: Publication metrics are a key indicator of investigator research output.	Critical
URS-003	Data Ingestion	The system shall support incremental (daily) and full ingestion modes. Acceptance: Daily mode fetches trials updated in last 24-168 hours. Full mode performs broad search by conditions/countries. Rationale: Daily updates keep data current; full mode enables comprehensive initial load.	High
URS-004	Data Ingestion	The system shall record data provenance for each ingestion run (source, date, record count, hash). Acceptance: Each ingestion creates a data_provenance record with SHA-256 integrity hash. ALCOA+ compliant. Rationale: ALCOA+ requirement: data must be Attributable, Legible, Contemporaneous, Original, Accurate.	Critical
URS-005	Site Scoring	The system shall compute a composite site score (0-100) based on 5 weighted dimensions. Acceptance: Score = Recruitment (30%) + Experience (25%) + Publications (15%) + Infrastructure (15%) + Regulatory (15%). Each dimension normalized 0-1. Rationale: Multi-dimensional scoring provides objective, comparable site evaluation.	Critical
URS-006	Site Scoring	The system shall provide score explainability showing contributing factors and data sources for each dimension. Acceptance: Each score includes breakdown by dimension with source attribution (e.g., 'trial_count from ClinicalTrials.gov: +0.3'). Rationale: RC-3 AI Governance: scores must be explicable and auditable.	Critical
URS-007	Site Scoring	The system shall allow users to customize dimension weights and recalculate scores in real-time. Acceptance: Custom weights applied via API; recalculated score returned within 2 seconds. Rationale: Different trials have different priorities (e.g., oncology vs. rare disease).	High
URS-008	Site Scoring	Scoring shall be deterministic: same input + same model version = same output. Acceptance: Repeated scoring of same site with same data produces identical results (no random variance). Rationale: Reproducibility is a core ALCOA+ and ICH requirement for auditable systems.	Critical
URS-009	Search & Discovery	The system shall support structured search with filters: therapeutic area, phase, country, enrollment status, min trials, min capacity. Acceptance: All filter combinations return correct, tenant-isolated results. Pagination supported. Rationale: Users need to quickly find relevant sites from large datasets.	Critical
URS-010	Search & Discovery	The system shall support natural language (AI-powered) search queries with synonym expansion. Acceptance: Query 'oncology sites in Switzerland' returns relevant results. Synonyms expanded (e.g., HCC to hepatocellular). Rationale: Natural language reduces friction and improves discoverability.	High
URS-011	Search & Discovery	The system shall provide a conversational AI agent for multi-turn site queries. Acceptance: Agent maintains session context, interprets follow-up questions, returns relevant sites with explanations. Rationale: Complex queries benefit from conversational interaction.	High
URS-012	Investigator Profiling	The system shall maintain investigator profiles with affiliation, specialty, h-index, publication count, and trial history. Acceptance: Profiles display all fields. Data sourced from ClinicalTrials.gov and PubMed. Rationale: Investigator selection requires comprehensive profile information.	Critical
URS-013	Investigator Profiling	The system shall support side-by-side comparison of 2-5 investigators. Acceptance: Comparison view shows all metrics in parallel columns for selected investigators. Rationale: Sponsors need to compare candidates during investigator selection.	High
URS-014	Export & Reporting	The system shall export search results and site details as PDF and Excel formats. Acceptance: PDF includes formatted scores and details. Excel includes raw data suitable for further analysis. Rationale: Users need to share results with stakeholders and import into other systems.	Critical
URS-015	Export & Reporting	The system shall support project workspaces with site shortlisting and notes. Acceptance: Users can create projects, add/remove sites to shortlist, add notes per site, track selection status. Rationale: Site selection is a workflow, not a single search.	High
URS-016	Recruitment Prediction	The system shall predict patient recruitment rates per site with confidence intervals. Acceptance: Predictions include optimistic/realistic/pessimistic scenarios with confidence score and explanatory factors. Rationale: Recruitment prediction is a key differentiator for site selection decisions.	High
URS-017	Multi-Tenancy	The system shall enforce tenant isolation: each organization sees only its own data. Acceptance: All database queries filtered by org_id. No cross-tenant data leakage. Verified by automated tests. Rationale: GDPR and contractual requirement for B2B SaaS platform.	Critical
URS-018	Authentication & Security	The system shall authenticate users via JWT tokens with 15-minute access and 7-day refresh tokens. Acceptance: Login returns JWT. Access token expires in 15 min. Refresh token extends session up to 7 days. Rationale: Industry-standard session management balancing security and usability.	Critical
URS-019	Authentication & Security	The system shall enforce RBAC with 4 levels: Super Admin, Org Admin, User, Read Only. Acceptance: Each role has defined permissions. Unauthorized actions return 403. Role verified on every request. Rationale: 21 CFR Part 11 and ICH E6 require access control commensurate with responsibility.	Critical
URS-020	Authentication & Security	The system shall support Two-Factor Authentication (TOTP) for all user accounts. Acceptance: Users can enable 2FA. Login requires TOTP code when enabled. Backup codes provided. Rationale: Enhanced security for access to clinical trial data.	High
URS-021	Audit Trail	The system shall log all state-changing actions (POST, PUT, PATCH, DELETE) to an immutable audit trail. Acceptance: Audit log records: user, action, entity, timestamp, IP hash. Records cannot be updated or deleted (trigger enforced). Rationale: 21 CFR Part 11 requires complete, immutable audit trail.	Critical
URS-022	Audit Trail	The system shall maintain a SHA-256 hash chain on audit records for tamper detection. Acceptance: Each audit record has record_hash and prev_hash. Chain integrity verifiable programmatically. Rationale: Hash chain provides cryptographic proof of audit trail integrity.	Critical
URS-023	Data Encryption	The system shall encrypt data in transit (TLS 1.3) and at rest (AES-256). Acceptance: All HTTP traffic uses TLS 1.3. Database uses pgcrypto extension. Sensitive columns encrypted. Rationale: GDPR Article 32 and 21 CFR Part 11 require appropriate technical measures.	Critical
URS-024	Performance	Site search API shall respond within 5 seconds (p95) for queries returning up to 500 results. Acceptance: Load test confirms p95 latency < 5s with concurrent users. Rationale: User experience requires responsive search.	High
URS-025	Performance	The system shall maintain 99.5% availability (SLA). Acceptance: Monitoring confirms uptime over rolling 30-day window. Rationale: Clinical operations require reliable access to site selection tools.	High
URS-026	AI Governance	The system shall maintain an AI model registry tracking model versions, deployment dates, and benchmark scores. Acceptance: ai_models table records every model version used. No 'latest' model references in scoring. Rationale: RC-3 requirement: AI models must be versioned and auditable.	Critical
URS-027	AI Governance	The system shall log every AI inference (input hash, model version, output hash, timestamp) to an AI audit log. Acceptance: ai_audit_log table records all AEGIS API calls with input/output hashes. Rationale: Reproducibility and auditability of AI-driven decisions.	Critical
URS-028	User Interface	The system shall display a legal disclaimer indicating it is a decision support tool, not a GxP system of record. Acceptance: Disclaimer visible in app footer and on /legal/disclaimer page. Badges on non-validated modules. Rationale: RC-1 requirement: clear product classification to avoid liability.	Critical
URS-029	User Interface	The system shall support 9 languages for the user interface (EN, FR, DE, ES, IT, PT, JA, ZH, KO). Acceptance: Language switcher available. All UI labels translated. Fallback to English for missing translations. Rationale: Global clinical trials require multilingual support.	High
URS-030	User Interface	The system shall present sites in three view modes: card grid, sortable table, and geographic map. Acceptance: All three views available and switchable. Map shows markers colored by score. Rationale: Different users prefer different visualization modes for site evaluation.	High

Total requirements: 30 (18 Critical, 12 High)

Document approval: Validated by Quality Assurance

Next review: Upon major feature change or annually