COS-CSV-FS-001Approvedv1.0

Functional Specification (FS)

Module 1 — Site & Investigator Intelligence

This document describes how each URS requirement is implemented in ClinicalOS. Each specification maps 1:1 to a URS requirement, detailing data flows, algorithms, API contracts, and source file locations.

FS-001←URS-001

ClinicalTrials.gov Ingestion

REST API v2 client sends search queries filtered by condition, country, and date range. Response parsed into Site model fields (name, institution, city, country, specialties, trial counts). Each site matched by external_id (NCT number) for deduplication. Provenance record created per run.

Source: services/pipeline/ingestion.py, services/daily_ingestion.py

FS-002←URS-002

PubMed Enrichment

NCBI E-utilities API queried by investigator name + affiliation. Retrieves publication count and h-index. Results stored in investigators table. Enrichment is idempotent (re-running updates existing records).

Source: scripts/enrich_specialties.py, services/pipeline/entity_resolver.py

FS-003←URS-003

Ingestion Modes (Daily/Full)

Daily mode: queries trials updated in last 24-168 hours, max 10,000 studies. Full mode: broad search by configurable conditions and countries list, includes optional PubMed enrichment (max 50 investigators). Mode selected via POST /api/v1/ingestion/trigger body parameter.

Source: api/v1/ingestion.py, services/daily_ingestion.py

FS-004←URS-004

Data Provenance Recording

Each ingestion run inserts a record into data_provenance table: source URL, ingestion timestamp, API version, record count, SHA-256 hash of raw response, transformation log (raw > clean > enriched > scored), quality score. Complies with ALCOA+ requirements.

Source: models/data_provenance.py, services/pipeline/ingestion.py

FS-005←URS-005

5-Dimension Composite Scoring

Score formula: total = (recruitment * 0.30) + (experience * 0.25) + (publications * 0.15) + (infrastructure * 0.15) + (regulatory * 0.15) * 100. Recruitment: log(1+total_trials)/log(501) + active bonus. Experience: breadth(30%) + active(40%) + total(30%). Publications: h-index(60%) + pub_count(40%). Infrastructure: capacity tier + institution type bonus. Regulatory: country tier + experience modifier.

Source: services/pipeline/scoring.py

FS-006←URS-006

Score Explainability

API endpoint /api/v1/sites/{id}/score/explain returns: each dimension value with contributing factors (e.g., trial_count: +0.3), data source attribution (ClinicalTrials.gov, PubMed), confidence level, and human-readable explanation text. Frontend renders radar chart + factor table.

Source: api/v1/sites.py (explain endpoint), components/scoring/score-explainability.tsx

FS-007←URS-007

Custom Weight Scoring

POST /api/v1/sites/{id}/score/customize accepts custom weights object (5 floats summing to 1.0). Scoring service recalculates with provided weights. Returns new composite score. Does not persist custom weights (stateless recalculation).

Source: api/v1/sites.py (customize endpoint), services/pipeline/scoring.py

FS-008←URS-008

Deterministic Scoring

All random operations use seeded Random instances: random.Random(hash(site_id) + offset). No global random state mutation. AEGIS ML scoring uses temperature=0. Rule-based fallback is purely mathematical (no stochastic components).

Source: services/pipeline/scoring.py

FS-009←URS-009

Structured Search with Filters

POST /api/v1/sites/search accepts: therapeutic_area, phase, countries[], enrollment_status, min_trials, min_capacity, keyword. SQL query built dynamically with AND logic. Results paginated (page/size or cursor-based). All queries filtered by org_id from JWT.

Source: api/v1/sites.py, services/site_service.py

FS-010←URS-010

AI-Powered Natural Language Search

User query sent to AEGIS AI Agent API for intent extraction. Agent returns structured filters (name_contains, specialty_contains, countries, institution_contains, min_trials). Synonym expansion maps medical terms (e.g., HCC to hepatocellular, liver_cancer). 18 canonical therapeutic areas with synonym dictionaries. Results cached 5 minutes by query hash.

Source: services/smart_search.py

FS-011←URS-011

Conversational Site Agent

POST /api/v1/site-agent accepts message, optional session_id, context_sites, and conversation history. Agent maintains session state for multi-turn queries. Returns response text, matching sites list, follow-up suggestions, and insights. Session persists for duration of user interaction.

Source: api/v1/site_agent.py, services/site_agent_service.py

FS-012←URS-012

Investigator Profile Management

Investigator model stores: name, affiliation, country, city, specialty, h_index, publication_count, trial_count, active_trials, email, bio_summary. Data sourced from ClinicalTrials.gov (trial participation) and PubMed (publications). GET /api/v1/investigators with filters. GET /api/v1/investigators/{id} for detail.

Source: api/v1/investigators.py, models/investigator.py

FS-013←URS-013

Investigator Comparison

POST /api/v1/investigators/compare accepts list of 2-5 investigator IDs. Returns full profiles in parallel for side-by-side display. Frontend renders comparison table with all metrics.

Source: api/v1/investigators.py

FS-014←URS-014

PDF and Excel Export

POST /api/v1/exports/sites/pdf generates formatted PDF with scores, dimensions, and details using ReportLab. POST /api/v1/exports/sites/excel generates XLSX with raw data using openpyxl. Both accept search filters to export matching results. StreamingResponse for large exports.

Source: api/v1/exports.py

FS-015←URS-015

Project Workspace & Shortlisting

CRUD operations on projects table (name, indication, phase, countries, target_patients). POST /api/v1/projects/{id}/shortlist adds/removes sites via project_sites junction table. Each shortlist entry records added_by user, notes, and status (shortlisted/selected/rejected).

Source: api/v1/projects.py, models/project.py, models/project_site.py

FS-016←URS-016

Recruitment Prediction

POST /api/v1/predictions/recruitment accepts site IDs and study parameters. Returns per-site predictions with optimistic/realistic/pessimistic patient counts, confidence interval (0-1), and explanatory factors array. Model uses historical trial completion rates and site capacity.

Source: api/v1/predictions.py, models/prediction.py

FS-017←URS-017

Multi-Tenancy Enforcement

TenantMiddleware extracts org_id from JWT payload on every request. OrgId dependency injects org_id into all endpoint functions. All SQLAlchemy queries filter by org_id via BaseModel.TenantMixin. Cross-tenant access returns empty results (never 403 to avoid information disclosure).

Source: core/middleware.py, api/deps.py, models/base.py

FS-018←URS-018

JWT Authentication

POST /api/v1/auth/login validates email+password (bcrypt), returns access_token (15min, HS256) and refresh_token (7 days). Tokens stored as httpOnly cookies. POST /api/v1/auth/refresh generates new access token from valid refresh token. Token payload: sub (user_id), org_id, role, type, exp.

Source: api/v1/auth.py, core/security.py, core/cookies.py

FS-019←URS-019

RBAC Enforcement

4 roles: super_admin (all operations), org_admin (org management + all features), user (standard features), read_only (GET only). Role stored in JWT. CurrentUser dependency validates role on protected endpoints. Admin endpoints check role == 'super_admin' or 'org_admin'.

Source: api/deps.py, core/security.py, models/organization.py

FS-020←URS-020

Two-Factor Authentication (TOTP)

POST /api/v1/auth/2fa/setup generates TOTP secret and QR code URI. POST /api/v1/auth/2fa/verify validates 6-digit TOTP code. Once enabled, login requires code via POST /api/v1/auth/2fa/login. Backup codes generated at setup (10 single-use codes).

Source: api/v1/auth.py, services/totp_service.py

FS-021←URS-021

Immutable Audit Trail

AuditMiddleware intercepts POST/PUT/PATCH/DELETE requests. On success (status < 400), inserts record into audit_log: id, org_id, user_id, action (HTTP method), entity_type (URL path), details (JSON), ip_address (SHA-256 hashed), created_at. PostgreSQL trigger prevents UPDATE/DELETE on audit_log table.

Source: core/middleware.py (AuditMiddleware), models/audit.py, migration 011

FS-022←URS-022

Audit Hash Chain

Each audit record includes record_hash = SHA-256(id + org_id + action + entity_type + ip + timestamp) and prev_hash = record_hash of previous record for same org. Chain integrity verifiable by replaying hashes. Index on record_hash for fast verification.

Source: core/middleware.py (_write_audit_log), migration 011

FS-023←URS-023

Data Encryption

Transit: TLS 1.3 enforced by Cloud Run + Cloudflare. At rest: PostgreSQL pgcrypto extension enabled (migration 011). Sensitive columns (bio_summary_encrypted, specialties_encrypted, EDC credentials) use pgp_sym_encrypt with key from GCP Secret Manager. SSL connection verified via SHOW ssl.

Source: migration 011, core/config.py (SECRET_MANAGER)

FS-024←URS-024

Search Response Time

Site search uses indexed queries (ix_sites_org_id, ix_sites_country). Cursor-based pagination (O(1) for any page depth). Smart search caches results 5 minutes by query hash. EXPLAIN ANALYZE confirms index usage. Target: p95 < 5 seconds.

Source: api/v1/sites.py, services/site_service.py

FS-025←URS-025

High Availability (99.5%)

Cloud Run auto-scales (0 to N instances). Cloud SQL HA configuration (regional, automatic failover). Health check endpoint /health monitored. Datadog alerts on error rate > 1%. GCP SLA: 99.95% for Cloud Run, 99.95% for Cloud SQL HA.

Source: main.py (/health), GCP infrastructure

FS-026←URS-026

AI Model Registry

Table ai_models stores: model_id, provider, model_name, version, deployment_date, benchmark_score, status (active/deprecated/retired), change_reason, description. API /api/v1/ai-modules/registry lists all registered models. No 'latest' references: all model calls use explicit versioned IDs.

Source: models/ai_model_registry.py, api/v1/ai_modules.py

FS-027←URS-027

AI Audit Log

Table ai_audit_log stores: id, org_id, user_id, model_id, input_hash (SHA-256 of request), output_hash (SHA-256 of response), latency_ms, token_count, timestamp. Every AEGIS API call logged. API /api/v1/ai-explain/audit-log lists entries with filters.

Source: models/ai_model_registry.py, integrations/aegis/mcp_client.py

FS-028←URS-028

Legal Disclaimer Display

Footer component shows 'Decision Support Tool' badge. /legal/disclaimer page with full product classification. Sidebar badges: 'Validated' (green) for Module 1, 'Preview' (amber) for modules 2-7, 'Beta' (red) for modules 8-10. Dashboard layout includes disclaimer banner.

Source: app/legal/disclaimer/page.tsx, components/layout/sidebar.tsx, app/dashboard/layout.tsx

FS-029←URS-029

Internationalization (9 languages)

Translation system uses React context + dictionary lookup. lib/i18n/translations.ts contains key-value pairs for EN, FR, DE, ES, IT, PT, JA, ZH, KO. LanguageSwitcher component in header. Fallback: missing key returns English value. All UI labels are translation keys.

Source: lib/i18n/context.tsx, lib/i18n/translations.ts, components/layout/language-switcher.tsx

FS-030←URS-030

Multi-View Site Display

Sites page offers three view modes: (1) Card grid: responsive grid of SiteCard components with score gauge, key metrics, and action buttons. (2) Table: sortable DataTable with columns for all metrics, click-to-sort. (3) Map: interactive map with markers colored by score (green > 70, amber 40-70, red < 40). View mode persisted in component state.

Source: app/dashboard/sites/page.tsx, components/sites/

Total specifications: 30 (1:1 mapping with URS)

All source paths relative to: src/backend/app/