COS-CSV-DS-001Approvedv1.0

Design Specification (DS)

Module 1 — Site & Investigator Intelligence

1. System Architecture

Frontend

Next.js 15 (App Router, React Server Components)
TypeScript (strict mode)
Tailwind CSS
PWA capable (offline mode)

Backend

FastAPI (Python 3.12+, async)
Pydantic v2 (validation)
SQLAlchemy 2.0 (async ORM)
Alembic (migrations)

Database

PostgreSQL 16 (Cloud SQL)
pgcrypto extension (encryption)
43 tables, 14 migrations
Async connection pool (asyncpg)

Infrastructure

GCP Cloud Run (Zurich, europe-west6)
Cloud SQL (HA, automatic failover)
GCP Secret Manager (credentials)
Cloudflare CDN (TLS 1.3)

2. Database Schema — Module 1 Core Tables

`sites`— Clinical trial sites sourced from ClinicalTrials.gov

Column	Type	Description
id	UUID	PK, auto-generated
org_id	VARCHAR(50)	Tenant isolation key, indexed
external_id	VARCHAR(100)	ClinicalTrials.gov NCT number, indexed
name	VARCHAR(500)	Site/facility name
institution	VARCHAR(500)	Parent institution
city	VARCHAR(200)	City
country	VARCHAR(100)	Country, indexed
region	VARCHAR(100)	Geographic region
latitude	FLOAT	GPS latitude
longitude	FLOAT	GPS longitude
site_type	VARCHAR(50)	hospital, clinic, research_center
capacity	INTEGER	Estimated patient capacity
specialties	TEXT	JSON array of therapeutic areas
total_trials	INTEGER	Cumulative trial count
active_trials	INTEGER	Currently active trials
created_at	TIMESTAMP	Record creation time
updated_at	TIMESTAMP	Last update time

`investigators`— Investigator profiles enriched from PubMed

Column	Type	Description
id	UUID	PK
org_id	VARCHAR(50)	Tenant key
external_id	VARCHAR(100)	External identifier
name	VARCHAR(300)	Full name
affiliation	VARCHAR(500)	Institution affiliation
country	VARCHAR(100)	Country, indexed
specialty	VARCHAR(200)	Primary specialty
h_index	INTEGER	H-index from PubMed
publication_count	INTEGER	Total publications
trial_count	INTEGER	Total trials participated
active_trials	INTEGER	Currently active
email	VARCHAR(255)	Contact email
bio_summary	TEXT	AI-generated profile summary
bio_summary_encrypted	BYTEA	pgcrypto encrypted copy

`site_scores`— Computed scores with 5-dimension breakdown

Column	Type	Description
id	UUID	PK
org_id	VARCHAR(50)	Tenant key
site_id	UUID	FK to sites.id, indexed
project_id	UUID	FK to projects.id (optional)
score_total	FLOAT	Composite 0-100
dim_recruitment	FLOAT	Recruitment dimension 0-1
dim_experience	FLOAT	Experience dimension 0-1
dim_publications	FLOAT	Publications dimension 0-1
dim_infrastructure	FLOAT	Infrastructure dimension 0-1
dim_regulatory	FLOAT	Regulatory dimension 0-1
explanation	TEXT	LLM-generated explanation

`audit_log`— Immutable audit trail (21 CFR Part 11)

Column	Type	Description
id	UUID	PK
org_id	VARCHAR(50)	Tenant key
user_id	UUID	FK to users.id
action	VARCHAR(20)	HTTP method
entity_type	VARCHAR(500)	URL path
entity_id	VARCHAR(100)	Affected entity
details	JSON	Action details
ip_address	VARCHAR(64)	SHA-256 hashed IP
record_hash	VARCHAR(64)	SHA-256 of record
prev_hash	VARCHAR(64)	Previous record hash
created_at	TIMESTAMP	Immutable timestamp

3. API Design

Method	Endpoint	Auth	Description
POST	/api/v1/sites/search	JWT	Search sites with filters
POST	/api/v1/sites/search/cursor	JWT	Cursor-based pagination
GET	/api/v1/sites/{id}	JWT	Site detail with score
GET	/api/v1/sites/{id}/score	JWT	Score breakdown
GET	/api/v1/sites/{id}/score/explain	JWT	Score explainability
POST	/api/v1/sites/{id}/score/customize	JWT	Custom weight scoring
GET	/api/v1/sites/therapeutic-areas	JWT	Filter options
GET	/api/v1/sites/phases	JWT	Phase filter options
GET	/api/v1/investigators	JWT	Search investigators
GET	/api/v1/investigators/{id}	JWT	Investigator detail
POST	/api/v1/investigators/compare	JWT	Compare 2-5 investigators
POST	/api/v1/site-agent	JWT	Conversational AI query
POST	/api/v1/predictions/recruitment	JWT	Recruitment prediction
POST	/api/v1/exports/sites/pdf	JWT	PDF export
POST	/api/v1/exports/sites/excel	JWT	Excel export
POST	/api/v1/ingestion/trigger	Admin	Trigger data ingestion
GET	/api/v1/ingestion/status	Admin	Ingestion status

4. Security Architecture

Authentication

JWT HS256 (access: 15min, refresh: 7d)
httpOnly secure cookies (SameSite=Lax)
TOTP 2FA with backup codes
Password: bcrypt (12 rounds)
CSRF protection middleware

Encryption

Transit: TLS 1.3 (Cloudflare + Cloud Run)
At rest: pgcrypto (pgp_sym_encrypt)
Key management: GCP Secret Manager
FIPS 140-2 Level 3 (Cloud KMS)
6 encrypted columns in production

Access Control

RBAC: super_admin, org_admin, user, read_only
Multi-tenancy: org_id on every table
Rate limiting middleware
Security headers (CSP, HSTS, X-Frame-Options)

Audit & Compliance

Immutable audit_log (PG trigger blocks UPDATE/DELETE)
SHA-256 hash chain (tamper detection)
AI audit log (every inference logged)
Data provenance (ALCOA+ compliant)

5. Deployment Topology

Region: europe-west6 (Zurich, Switzerland) — GDPR/nLPD compliance

Backend: Cloud Run (auto-scaling, min 0 / max 100 instances)

Database: Cloud SQL PostgreSQL 16 (HA, 34.65.122.178, private IP)

Frontend: Cloud Run (Next.js SSR, Docker container)

Secrets: GCP Secret Manager (DATABASE_URL, JWT_SECRET_KEY, etc.)

CDN: Cloudflare (310 PoPs, TLS 1.3 termination)

CI/CD: GitHub Actions → Cloud Build → Cloud Run deploy

Monitoring: Cloud Run metrics, Cloud Logging, custom health endpoint

6. Scoring Algorithm — Mathematical Definition

score_total = ( dim_recruitment * 0.30 + dim_experience * 0.25 + dim_publications * 0.15 + dim_infrastructure * 0.15 + dim_regulatory * 0.15 ) * 100 dim_recruitment: base = log(1 + total_trials) / log(501) # 0-1, log scale bonus = min(active_trials / 10, 0.2) # up to +20% result = min(base + bonus, 1.0) dim_experience: breadth = min(unique_specialties / 5, 1.0) * 0.30 active = min(active_trials / 20, 1.0) * 0.40 total = min(total_trials / 200, 1.0) * 0.30 result = breadth + active + total dim_publications: h_score = min(h_index / 50, 1.0) * 0.60 pub_cnt = min(publication_count / 200, 1.0) * 0.40 result = h_score + pub_cnt fallback = min(total_trials / 300, 0.7) # if no PubMed data dim_infrastructure: capacity >= 800: base = 1.0 capacity >= 500: base = 0.85 capacity >= 200: base = 0.70 capacity >= 50: base = 0.55 else: base = 0.40 bonus: +0.15 (university), +0.10 (research center) result = min(base + bonus, 1.0) dim_regulatory: country_score = REGULATORY_TIER[country] # 0.55–0.90 exp_modifier = +0.10 (100+ trials), +0.05 (30+ trials) result = min(country_score + exp_modifier, 1.0)

Source files: All paths relative to src/backend/app/

Database version: Migration 014 (current)