Notes
Platform Engineering

Building scalable telemedicine platforms. Architecture, performance benchmarks, and engineering lessons from platforms at scale.

Widal Team14 min read

The $432 billion opportunity

The telemedicine revolution is exploding at unprecedented scale. The global telemedicine market, valued at $107.52 billion in 2024, is projected to reach $432.31 billion by 2032, recording a CAGR of 19.9%. In the United States alone, the market is expected to grow from $94.3 billion in 2025 to $395.6 billion by 2034, at a CAGR of 17.3%.

The scale is staggering. Telehealth usage increased from 14% to 80% between 2016 and 2022. CMS reported a 12,000% increase in telehealth use within six weeks during COVID-19. 37% of adults aged 18 and above used telemedicine services in the past 12 months, and online doctor consultations are projected to increase by 13.7 million between 2024 and 2028.

Platform architecture is the difference between success and catastrophic failure at scale. The platforms that handle millions of concurrent consultations while maintaining sub 200ms latency and 99.9% uptime will capture the lion's share of this market.

The architectural foundation

Microservices and service boundaries

Monolithic telemedicine applications collapse under production-scale load. The platforms that scale reliably are built around clear service boundaries: identity and access, scheduling, media routing, signaling, clinical documentation, payments, and notifications. Each service owns its data store, its deployment cadence, and its on-call rotation.

This is not microservices for their own sake. The point is to isolate failure domains so a billing outage cannot take down an active consult, and to let the media path scale independently of the EHR sync that runs out of band.

Media: WebRTC, SFUs, and the realities of video at scale

WebRTC is the foundation, but raw peer-to-peer WebRTC does not scale beyond two participants. Production telemedicine platforms run media through a Selective Forwarding Unit (SFU) or, at very large scale, a Multipoint Control Unit (MCU). SFUs preserve client-side encoding choices and minimize server CPU. MCUs transcode on the server, which costs more but produces a single stream the client decodes (useful for low-power devices).

Regardless of topology, the engineering work is in TURN servers, adaptive bitrate, simulcast, jitter buffers, and a relentless focus on the join experience. The most common production failure is not a dropped call mid-consult. It is a patient who cannot get into the room in the first place.

Real-time signaling

Signaling, the side channel that negotiates the call before media starts, is its own scaling problem. WebSockets are the default, but they fail in interesting ways under load: sticky sessions, idle timeouts, mobile network handoffs. Platforms that handle millions of consults invest in a signaling layer that tolerates reconnects, replays missed messages, and gracefully degrades to long polling when needed.

Performance targets that actually matter

  • Join latency under 3 seconds. From clicking the invite link to seeing the other party. Patient confidence collapses past 5 seconds.
  • Audio one-way latency under 200ms. Above 400ms the conversation feels broken. Above 600ms it is unusable.
  • Packet loss tolerance to 10%. Real networks, real homes. The platform must degrade gracefully rather than dropping the session.
  • 99.9% session-level uptime. Different from service uptime. A platform is up only if the patient can actually start a consult.
  • P99 API response under 300ms. Scheduling, messaging, and document APIs need to feel instant during a live encounter.

Data architecture and HIPAA realities

PHI segregation

The fastest way to make HIPAA compliance tractable is to keep PHI inside a narrow blast radius. Most analytics, telemetry, and operational tooling have no business touching identifiable data. Effective platforms maintain a strict separation between the clinical core and the rest of the system, with a small set of audited services bridging the boundary.

Encryption and key management

TLS 1.3 for all traffic. AES-256 at rest. Per-tenant encryption keys managed through a KMS, with key rotation that does not require downtime. Media is encrypted in transit via DTLS-SRTP, and any recording pipeline encrypts at the SFU before the file ever lands in object storage.

Audit logging

Every PHI access, every clinical write, every administrative action. The Office for Civil Rights expects these logs to be queryable for at least six years. Build the log pipeline as a first-class concern, not an afterthought, and stream to an immutable store.

Business Associate Agreements with the platform stack

Cloud provider, video infrastructure, transcription, SMS gateway, email provider, error tracking, analytics, support tooling. Every vendor that can touch PHI needs a BAA. Many platforms have been blindsided by an error tracking tool that was capturing stack traces containing patient identifiers.

Scaling patterns from real platforms

Regional active-active

Single-region deployments cap out around hundreds of thousands of concurrent sessions and create a single point of failure. The platforms that handle millions of consults run active-active across multiple regions, with session affinity to keep media paths short. Cross-region failover is rehearsed monthly, not just designed for.

Edge media routing

Geographically distributed TURN and SFU instances cut round-trip time and reduce the catastrophic-failure surface. Patients connect to the nearest media server; provider routing follows patient location. This pattern, well established in consumer video products, applies cleanly to healthcare.

Asynchronous offload

Documentation, billing, EHR sync, transcription, claims, and quality reporting do not need to block the live session. Asynchronous queues with retry and dead-letter handling absorb spikes (Monday mornings, post-holiday surges) without affecting consult performance.

Reliability engineering

The observability triangle

Metrics, logs, traces. Plus session-level recording of WebRTC stats for forensic analysis after a degraded call. Without per-session media telemetry, debugging audio quality complaints becomes guesswork.

Game days and chaos testing

Kill an SFU node during a load test. Drop the database primary. Throttle the egress link. The platforms that stay up during real incidents are the ones that rehearse them quarterly. Resilience is not a property you architect once. It is a muscle you exercise.

Capacity planning

Concurrent sessions, not registered users, drive cost. Model peak-hour demand by clinical specialty (urgent care peaks evenings and weekends, behavioral health peaks weeknights), and keep headroom for at least 3x baseline.

Clinical workflow integration

EHR integration via FHIR

FHIR APIs are now table stakes. Patient demographics, encounter creation, clinical note write-back, problem lists, medications, allergies. SMART on FHIR launches let the telemedicine UI start from within the EHR with proper context. Build for FHIR even when your launch customer is on a legacy interface; the migration becomes far easier later.

Scheduling and intake

Intake forms, consent capture, payment authorization, insurance verification, and identity checks all happen before the video starts. The clinical UX is only as good as the pre-visit flow. Cut every step that does not directly serve the clinical encounter.

Asynchronous care

Not every encounter needs to be synchronous. Store-and-forward modalities (secure messaging, image review, async dermatology, second opinions) scale far better than live video and cover a growing share of telehealth utilization. Plan for both from the start.

Common failure modes

  • Treating the video call as the product. The video is plumbing. The product is the clinical encounter, the documentation, the follow-up, the billing. Optimize the whole journey.
  • Underinvesting in the join experience. Permission prompts, browser compatibility, device selection, network checks. Most no-shows are not no-shows. They are patients who could not get in.
  • Synchronous EHR writes. Blocking the consult completion on a flaky EHR API is a recipe for clinician frustration. Queue the write, surface failures in a reconciliation queue.
  • Ignoring mobile networks. A large share of consults happen on cellular. Test with simulated 3G, packet loss, and network handoffs.
  • Building before measuring. Without session-level analytics from day one, you cannot tell which changes actually improved the experience.

The next generation

The next wave of telemedicine platforms layers ambient documentation, AI-assisted triage, computer vision for objective measurement during the video call, and predictive routing into the core architecture. None of these are bolt-ons. They have to be designed into the data model and the workflow from the start.

The platforms that win the $432 billion market will not be the ones with the slickest marketing. They will be the ones that picked up the phone on the first ring, every time, for every patient, for years on end. That is an engineering problem, and it is solvable.

Build telemedicine that scales

Designing or rebuilding a telemedicine platform? Talk to the Widal team about media architecture, HIPAA controls, and the engineering work tailored to your stack.

Start a project