← Back to CSV Documentation
COS-CSV-DS-001Approvedv1.0

Design Specification (DS)

Module 1 — Site & Investigator Intelligence

1. System Architecture

Frontend

  • Next.js 15 (App Router, React Server Components)
  • TypeScript (strict mode)
  • Tailwind CSS
  • PWA capable (offline mode)

Backend

  • FastAPI (Python 3.12+, async)
  • Pydantic v2 (validation)
  • SQLAlchemy 2.0 (async ORM)
  • Alembic (migrations)

Database

  • PostgreSQL 16 (Cloud SQL)
  • pgcrypto extension (encryption)
  • 43 tables, 14 migrations
  • Async connection pool (asyncpg)

Infrastructure

  • GCP Cloud Run (Zurich, europe-west6)
  • Cloud SQL (HA, automatic failover)
  • GCP Secret Manager (credentials)
  • Cloudflare CDN (TLS 1.3)

2. Database Schema — Module 1 Core Tables

sitesClinical trial sites sourced from ClinicalTrials.gov

ColumnTypeDescription
idUUIDPK, auto-generated
org_idVARCHAR(50)Tenant isolation key, indexed
external_idVARCHAR(100)ClinicalTrials.gov NCT number, indexed
nameVARCHAR(500)Site/facility name
institutionVARCHAR(500)Parent institution
cityVARCHAR(200)City
countryVARCHAR(100)Country, indexed
regionVARCHAR(100)Geographic region
latitudeFLOATGPS latitude
longitudeFLOATGPS longitude
site_typeVARCHAR(50)hospital, clinic, research_center
capacityINTEGEREstimated patient capacity
specialtiesTEXTJSON array of therapeutic areas
total_trialsINTEGERCumulative trial count
active_trialsINTEGERCurrently active trials
created_atTIMESTAMPRecord creation time
updated_atTIMESTAMPLast update time

investigatorsInvestigator profiles enriched from PubMed

ColumnTypeDescription
idUUIDPK
org_idVARCHAR(50)Tenant key
external_idVARCHAR(100)External identifier
nameVARCHAR(300)Full name
affiliationVARCHAR(500)Institution affiliation
countryVARCHAR(100)Country, indexed
specialtyVARCHAR(200)Primary specialty
h_indexINTEGERH-index from PubMed
publication_countINTEGERTotal publications
trial_countINTEGERTotal trials participated
active_trialsINTEGERCurrently active
emailVARCHAR(255)Contact email
bio_summaryTEXTAI-generated profile summary
bio_summary_encryptedBYTEApgcrypto encrypted copy

site_scoresComputed scores with 5-dimension breakdown

ColumnTypeDescription
idUUIDPK
org_idVARCHAR(50)Tenant key
site_idUUIDFK to sites.id, indexed
project_idUUIDFK to projects.id (optional)
score_totalFLOATComposite 0-100
dim_recruitmentFLOATRecruitment dimension 0-1
dim_experienceFLOATExperience dimension 0-1
dim_publicationsFLOATPublications dimension 0-1
dim_infrastructureFLOATInfrastructure dimension 0-1
dim_regulatoryFLOATRegulatory dimension 0-1
explanationTEXTLLM-generated explanation

audit_logImmutable audit trail (21 CFR Part 11)

ColumnTypeDescription
idUUIDPK
org_idVARCHAR(50)Tenant key
user_idUUIDFK to users.id
actionVARCHAR(20)HTTP method
entity_typeVARCHAR(500)URL path
entity_idVARCHAR(100)Affected entity
detailsJSONAction details
ip_addressVARCHAR(64)SHA-256 hashed IP
record_hashVARCHAR(64)SHA-256 of record
prev_hashVARCHAR(64)Previous record hash
created_atTIMESTAMPImmutable timestamp

3. API Design

MethodEndpointAuthDescription
POST/api/v1/sites/searchJWTSearch sites with filters
POST/api/v1/sites/search/cursorJWTCursor-based pagination
GET/api/v1/sites/{id}JWTSite detail with score
GET/api/v1/sites/{id}/scoreJWTScore breakdown
GET/api/v1/sites/{id}/score/explainJWTScore explainability
POST/api/v1/sites/{id}/score/customizeJWTCustom weight scoring
GET/api/v1/sites/therapeutic-areasJWTFilter options
GET/api/v1/sites/phasesJWTPhase filter options
GET/api/v1/investigatorsJWTSearch investigators
GET/api/v1/investigators/{id}JWTInvestigator detail
POST/api/v1/investigators/compareJWTCompare 2-5 investigators
POST/api/v1/site-agentJWTConversational AI query
POST/api/v1/predictions/recruitmentJWTRecruitment prediction
POST/api/v1/exports/sites/pdfJWTPDF export
POST/api/v1/exports/sites/excelJWTExcel export
POST/api/v1/ingestion/triggerAdminTrigger data ingestion
GET/api/v1/ingestion/statusAdminIngestion status

4. Security Architecture

Authentication

  • JWT HS256 (access: 15min, refresh: 7d)
  • httpOnly secure cookies (SameSite=Lax)
  • TOTP 2FA with backup codes
  • Password: bcrypt (12 rounds)
  • CSRF protection middleware

Encryption

  • Transit: TLS 1.3 (Cloudflare + Cloud Run)
  • At rest: pgcrypto (pgp_sym_encrypt)
  • Key management: GCP Secret Manager
  • FIPS 140-2 Level 3 (Cloud KMS)
  • 6 encrypted columns in production

Access Control

  • RBAC: super_admin, org_admin, user, read_only
  • Multi-tenancy: org_id on every table
  • Rate limiting middleware
  • Security headers (CSP, HSTS, X-Frame-Options)

Audit & Compliance

  • Immutable audit_log (PG trigger blocks UPDATE/DELETE)
  • SHA-256 hash chain (tamper detection)
  • AI audit log (every inference logged)
  • Data provenance (ALCOA+ compliant)

5. Deployment Topology

Region: europe-west6 (Zurich, Switzerland) — GDPR/nLPD compliance

Backend: Cloud Run (auto-scaling, min 0 / max 100 instances)

Database: Cloud SQL PostgreSQL 16 (HA, 34.65.122.178, private IP)

Frontend: Cloud Run (Next.js SSR, Docker container)

Secrets: GCP Secret Manager (DATABASE_URL, JWT_SECRET_KEY, etc.)

CDN: Cloudflare (310 PoPs, TLS 1.3 termination)

CI/CD: GitHub Actions → Cloud Build → Cloud Run deploy

Monitoring: Cloud Run metrics, Cloud Logging, custom health endpoint

6. Scoring Algorithm — Mathematical Definition

score_total = ( dim_recruitment * 0.30 + dim_experience * 0.25 + dim_publications * 0.15 + dim_infrastructure * 0.15 + dim_regulatory * 0.15 ) * 100 dim_recruitment: base = log(1 + total_trials) / log(501) # 0-1, log scale bonus = min(active_trials / 10, 0.2) # up to +20% result = min(base + bonus, 1.0) dim_experience: breadth = min(unique_specialties / 5, 1.0) * 0.30 active = min(active_trials / 20, 1.0) * 0.40 total = min(total_trials / 200, 1.0) * 0.30 result = breadth + active + total dim_publications: h_score = min(h_index / 50, 1.0) * 0.60 pub_cnt = min(publication_count / 200, 1.0) * 0.40 result = h_score + pub_cnt fallback = min(total_trials / 300, 0.7) # if no PubMed data dim_infrastructure: capacity >= 800: base = 1.0 capacity >= 500: base = 0.85 capacity >= 200: base = 0.70 capacity >= 50: base = 0.55 else: base = 0.40 bonus: +0.15 (university), +0.10 (research center) result = min(base + bonus, 1.0) dim_regulatory: country_score = REGULATORY_TIER[country] # 0.55–0.90 exp_modifier = +0.10 (100+ trials), +0.05 (30+ trials) result = min(country_score + exp_modifier, 1.0)

Source files: All paths relative to src/backend/app/

Database version: Migration 014 (current)