← Back to CSV Documentation
COS-CSV-URS-001Approvedv1.0

User Requirements Specification (URS)

Module 1 — Site & Investigator Intelligence

System: ClinicalOS v1.0
Module: Site & Investigator Intelligence
Classification: GAMP 5 Category 5
Intended use: Decision support for site selection
Data classification: Public data only (no patient data)
Regulatory scope: 21 CFR Part 11, GDPR, ICH E6(R3)

This document defines 30 user requirements for ClinicalOS Module 1. Each requirement has a unique ID, priority (Critical/High), acceptance criteria, and rationale. Requirements are grouped by functional category.

IDCategoryRequirementPriority
URS-001Data Ingestion

The system shall ingest clinical trial site data from ClinicalTrials.gov via its public REST API (v2).

Acceptance: Sites are retrieved, parsed, and stored in the database with source provenance recorded. Minimum 5,000 sites ingested.

Rationale: Primary data source for site intelligence. Public data, no patient data involved.

Critical
URS-002Data Ingestion

The system shall ingest investigator publication data from PubMed via NCBI E-utilities API.

Acceptance: Investigators are enriched with h-index and publication count. At least 400 investigators enriched.

Rationale: Publication metrics are a key indicator of investigator research output.

Critical
URS-003Data Ingestion

The system shall support incremental (daily) and full ingestion modes.

Acceptance: Daily mode fetches trials updated in last 24-168 hours. Full mode performs broad search by conditions/countries.

Rationale: Daily updates keep data current; full mode enables comprehensive initial load.

High
URS-004Data Ingestion

The system shall record data provenance for each ingestion run (source, date, record count, hash).

Acceptance: Each ingestion creates a data_provenance record with SHA-256 integrity hash. ALCOA+ compliant.

Rationale: ALCOA+ requirement: data must be Attributable, Legible, Contemporaneous, Original, Accurate.

Critical
URS-005Site Scoring

The system shall compute a composite site score (0-100) based on 5 weighted dimensions.

Acceptance: Score = Recruitment (30%) + Experience (25%) + Publications (15%) + Infrastructure (15%) + Regulatory (15%). Each dimension normalized 0-1.

Rationale: Multi-dimensional scoring provides objective, comparable site evaluation.

Critical
URS-006Site Scoring

The system shall provide score explainability showing contributing factors and data sources for each dimension.

Acceptance: Each score includes breakdown by dimension with source attribution (e.g., 'trial_count from ClinicalTrials.gov: +0.3').

Rationale: RC-3 AI Governance: scores must be explicable and auditable.

Critical
URS-007Site Scoring

The system shall allow users to customize dimension weights and recalculate scores in real-time.

Acceptance: Custom weights applied via API; recalculated score returned within 2 seconds.

Rationale: Different trials have different priorities (e.g., oncology vs. rare disease).

High
URS-008Site Scoring

Scoring shall be deterministic: same input + same model version = same output.

Acceptance: Repeated scoring of same site with same data produces identical results (no random variance).

Rationale: Reproducibility is a core ALCOA+ and ICH requirement for auditable systems.

Critical
URS-009Search & Discovery

The system shall support structured search with filters: therapeutic area, phase, country, enrollment status, min trials, min capacity.

Acceptance: All filter combinations return correct, tenant-isolated results. Pagination supported.

Rationale: Users need to quickly find relevant sites from large datasets.

Critical
URS-010Search & Discovery

The system shall support natural language (AI-powered) search queries with synonym expansion.

Acceptance: Query 'oncology sites in Switzerland' returns relevant results. Synonyms expanded (e.g., HCC to hepatocellular).

Rationale: Natural language reduces friction and improves discoverability.

High
URS-011Search & Discovery

The system shall provide a conversational AI agent for multi-turn site queries.

Acceptance: Agent maintains session context, interprets follow-up questions, returns relevant sites with explanations.

Rationale: Complex queries benefit from conversational interaction.

High
URS-012Investigator Profiling

The system shall maintain investigator profiles with affiliation, specialty, h-index, publication count, and trial history.

Acceptance: Profiles display all fields. Data sourced from ClinicalTrials.gov and PubMed.

Rationale: Investigator selection requires comprehensive profile information.

Critical
URS-013Investigator Profiling

The system shall support side-by-side comparison of 2-5 investigators.

Acceptance: Comparison view shows all metrics in parallel columns for selected investigators.

Rationale: Sponsors need to compare candidates during investigator selection.

High
URS-014Export & Reporting

The system shall export search results and site details as PDF and Excel formats.

Acceptance: PDF includes formatted scores and details. Excel includes raw data suitable for further analysis.

Rationale: Users need to share results with stakeholders and import into other systems.

Critical
URS-015Export & Reporting

The system shall support project workspaces with site shortlisting and notes.

Acceptance: Users can create projects, add/remove sites to shortlist, add notes per site, track selection status.

Rationale: Site selection is a workflow, not a single search.

High
URS-016Recruitment Prediction

The system shall predict patient recruitment rates per site with confidence intervals.

Acceptance: Predictions include optimistic/realistic/pessimistic scenarios with confidence score and explanatory factors.

Rationale: Recruitment prediction is a key differentiator for site selection decisions.

High
URS-017Multi-Tenancy

The system shall enforce tenant isolation: each organization sees only its own data.

Acceptance: All database queries filtered by org_id. No cross-tenant data leakage. Verified by automated tests.

Rationale: GDPR and contractual requirement for B2B SaaS platform.

Critical
URS-018Authentication & Security

The system shall authenticate users via JWT tokens with 15-minute access and 7-day refresh tokens.

Acceptance: Login returns JWT. Access token expires in 15 min. Refresh token extends session up to 7 days.

Rationale: Industry-standard session management balancing security and usability.

Critical
URS-019Authentication & Security

The system shall enforce RBAC with 4 levels: Super Admin, Org Admin, User, Read Only.

Acceptance: Each role has defined permissions. Unauthorized actions return 403. Role verified on every request.

Rationale: 21 CFR Part 11 and ICH E6 require access control commensurate with responsibility.

Critical
URS-020Authentication & Security

The system shall support Two-Factor Authentication (TOTP) for all user accounts.

Acceptance: Users can enable 2FA. Login requires TOTP code when enabled. Backup codes provided.

Rationale: Enhanced security for access to clinical trial data.

High
URS-021Audit Trail

The system shall log all state-changing actions (POST, PUT, PATCH, DELETE) to an immutable audit trail.

Acceptance: Audit log records: user, action, entity, timestamp, IP hash. Records cannot be updated or deleted (trigger enforced).

Rationale: 21 CFR Part 11 requires complete, immutable audit trail.

Critical
URS-022Audit Trail

The system shall maintain a SHA-256 hash chain on audit records for tamper detection.

Acceptance: Each audit record has record_hash and prev_hash. Chain integrity verifiable programmatically.

Rationale: Hash chain provides cryptographic proof of audit trail integrity.

Critical
URS-023Data Encryption

The system shall encrypt data in transit (TLS 1.3) and at rest (AES-256).

Acceptance: All HTTP traffic uses TLS 1.3. Database uses pgcrypto extension. Sensitive columns encrypted.

Rationale: GDPR Article 32 and 21 CFR Part 11 require appropriate technical measures.

Critical
URS-024Performance

Site search API shall respond within 5 seconds (p95) for queries returning up to 500 results.

Acceptance: Load test confirms p95 latency < 5s with concurrent users.

Rationale: User experience requires responsive search.

High
URS-025Performance

The system shall maintain 99.5% availability (SLA).

Acceptance: Monitoring confirms uptime over rolling 30-day window.

Rationale: Clinical operations require reliable access to site selection tools.

High
URS-026AI Governance

The system shall maintain an AI model registry tracking model versions, deployment dates, and benchmark scores.

Acceptance: ai_models table records every model version used. No 'latest' model references in scoring.

Rationale: RC-3 requirement: AI models must be versioned and auditable.

Critical
URS-027AI Governance

The system shall log every AI inference (input hash, model version, output hash, timestamp) to an AI audit log.

Acceptance: ai_audit_log table records all AEGIS API calls with input/output hashes.

Rationale: Reproducibility and auditability of AI-driven decisions.

Critical
URS-028User Interface

The system shall display a legal disclaimer indicating it is a decision support tool, not a GxP system of record.

Acceptance: Disclaimer visible in app footer and on /legal/disclaimer page. Badges on non-validated modules.

Rationale: RC-1 requirement: clear product classification to avoid liability.

Critical
URS-029User Interface

The system shall support 9 languages for the user interface (EN, FR, DE, ES, IT, PT, JA, ZH, KO).

Acceptance: Language switcher available. All UI labels translated. Fallback to English for missing translations.

Rationale: Global clinical trials require multilingual support.

High
URS-030User Interface

The system shall present sites in three view modes: card grid, sortable table, and geographic map.

Acceptance: All three views available and switchable. Map shows markers colored by score.

Rationale: Different users prefer different visualization modes for site evaluation.

High

Total requirements: 30 (18 Critical, 12 High)

Document approval: Validated by Quality Assurance

Next review: Upon major feature change or annually