Providers, Adapters, and Multi-Provider Architecture

Chapter 8 Providers, Adapters, and Multi-Provider Architecture Strategic Takeaway Provider abstraction means no single vendor cancellation can break a recipe—when OpenAI killed Sora, a recipe-based network self-heals by routing to the next available video agent. The Scrypted API abstracts over a heterogeneous set of upstream model providers. Users and agents issue intent-level requests (“generate a video from this image”); the system resolves the request to a registered recipe, which dispatches steps to provider-specific adapters. This chapter describes the abstraction pattern, the current provider landscape, and selected engineering lessons from production operation. 8.1 Provider abstraction pattern All provider integrations implement a common interface: BaseProviderAdapter — abstract base class defining start_job(), get_job_status(), calculate_cost(), and cancel_job(). ProviderError hierarchy — standardized exception types for upstream failures. JobResult, CostInfo — normalized return structures so the orchestration engine never handles provider-specific response shapes directly. The ProviderRegistry loads adapters dynamically from YAML configuration (config/providers.yaml). Each provider entry specifies: • Adapter class path (e.g., scryptedai_api.providers.fal_adapter.FalAIAdapter). • Concurrency limit (enforced by the ConcurrencyManager in Chapter 6). • Payment model (PREPAID_POOL, ON_DEMAND, FORWARDED). • Credential key (environment variable name). Resolution is entirely database-driven: /generate/image looks up the default recipe for image generation in the database, and the recipe’s steps reference atomic ingredients that name the provider model ID. No endpoint URLs are hardcoded. 39

8.1.1 The FAL adapter as a case study The fal_ai ingredient operates in dual mode:

Adapter mode (no endpoint provided): returns authenticated credentials so that other ingredients can make their own FAL calls.
Direct API mode (endpoint provided): executes the FAL call directly, handling submission, polling, and result normalization. This dual-mode pattern avoids duplicating authentication logic across the many FAL-backed ingredients (Seedream, Kling, Hailuo, Nano-Banana, MMAudio, OVI, etc.) while still allowing direct invocation for simple cases. 8.2 Webhook and polling coordination Provider integrations must handle asynchronous completion. The system supports both mechanisms (described mechanically in Chapter 6), but the provider-level concerns are: Signature verification: Inbound webhooks are verified via HMAC-SHA256 over the raw request body, compared using constant-time hmac.compare_digest. The secret is loaded from provider_registry configuration per provider name. Provider-specific routes: Dedicated endpoints handle provider quirks. For example, /webhooks/provider/fa parses FAL’s specific payload structure (where request_id may be nested) before persisting a ProviderWebhookEvent. Dead-letter queue: The WebhookDeliveryService tracks delivery attempts with exponential backoff (base 60 s, max 5 attempts). Deliveries that exhaust retries are marked DEAD_LETTER for manual review. An EventLog records all payloads for audit. Outbound signatures: When Scrypted sends webhooks to customer-specified URLs, the payload is signed with the customer’s webhook_secret using the same HMAC-SHA256 scheme, with X-Signature-256, X-Webhook-ID, and X-Webhook-Timestamp headers. 8.3 CDN and content delivery Generated artifacts are uploaded to S3 (cdn-scryptedai-api bucket, ephemeral/ prefix) and served via CloudFront at https://cdn.scrypted.ai . The CDNManager handles upload, URL generation, and deletion. Lazy credential loading avoids initialization failures when S3 is not the active storage path. Content types are auto-detected from file extensions. The cleanup worker (Chapter 7, §7.7) deletes expired assets via cdn_manager.delete_file(). 8.4 Engineering lessons from production Production operation has produced extensive engineering lessons. Selected examples: 40

8.4.1 AWS Bedrock: synchronous models in an async world AWS Bedrock’s InvokeModel API is synchronous — it blocks until inference completes. Wrapping this in the async/webhook-oriented orchestration engine required a dedicated execution mode: the Bedrock adapter runs the blocking call inside a Celery task on the background processing queue, then publishes a completion event to the shadow graph as if it were an external async callback. The fractal async pattern (Chapter 6, §6.4.1) applies: the recipe executor sees one async step, regardless of the adapter’s internal blocking behavior. 8.4.2 Nova Reel: six versions of a video generator The AWS Nova Reel video ingredient has iterated through six registered versions, reflecting successive fixes for: initial API integration, async job polling, output format normalization, cost model recalibration, error handling for truncated renders, and multi-scene support. Each version is a database record; the version history is the engineering changelog, visible to operators and auditable via the registry. 8.4.3 Stale worker bytecode When ingredient code is updated on disk but Celery workers have cached the previous version’s compiled bytecode (.pyc files), workers silently execute outdated logic. The mitigation: restart workers after deployments (automated in the scryptedai-api-master systemd service), and document the hazard for operators. A more robust solution (code hash verification at task start) is a future improvement. 8.4.4 Webhook vs polling races When both webhook and polling paths are active for the same external job, a completion event can arrive twice. The shadow graph’s one-action-one-commit discipline (Chapter 6) prevents double-advancement: the second event finds the step already completed and is a no-op. This was not always the case — early implementations used nested transactions that could commit partial updates, leading to ghost steps. 8.5 Automated quality learning and provider optimization Strategic Takeaway Self-healing is reactive: “Sora is dead, route around it.” The deeper vision is proactive: learn which models work best for which tasks, automatically, and route accordingly. Feedbackdriven routing is the network’s defensible moat. The provider abstraction layer handles failure. This section describes how it handles optimization: continuously learning which providers produce the best results for which intents, and routing traffic to maximize quality at minimum cost. 8.5.1 Feedback signals Five signals feed the quality learning loop:

User feedback — explicit (thumbs up/down, support tickets) and implicit (downloads, views, regeneration requests). Delula captures all four in production today. 41
ERC-8004 reputation — structured on-chain feedback entries from callers who consumed the agent’s output, filterable by reviewer address. Scrypted operates the default reputation provider for its own agents; third-party reputation markets can register additional providers.
Agent-based quality review — a dedicated quality-assessment agent (implemented as composed recipes/ingredients in Delula) evaluates outputs against task-specific criteria. This agent is itself a Scrypted ingredient, running as a post-processing step or parallel fan-out item.
Comparative evaluation — when the same intent can be fulfilled by multiple providers, the network routes a small exploration fraction to alternative providers for quality comparison (see §8.5.3).
Cost-quality Pareto tracking — per capability class, track the cost-quality frontier. “Veo3 at $0.15/video with quality 0.92 for cinematic; Grok at$ 0.08/video with quality 0.88 for product shots.” The frontier updates automatically as providers improve or degrade. 8.5.2 The feedback loop
Intent arrives → AgentRank (§11.6) selects provider based on Bid × AgentQuality.
Job executes → output delivered to user.
Feedback collected: user signals + quality-review agent assessment.
Reputation updated via ERC-8004 feedback entry.
AgentRank recalculated with new quality data.
Future routing adjusted: better providers win more traffic. This is economic reinforcement learning: agents that produce better results earn better reputation, which earns more traffic, which earns more revenue. The incentive gradient is economic, not mathematical—no gradient descent on a neural policy, but a market where quality is structurally rewarded. Worked example: Veo3 vs Grok for video generation. Both generate videos. Both are registered agents. The network observes thousands of jobs with quality feedback per provider per intent type and learns: intents with embedding similarity to “cinematic” route to Veo3 (higher AgentRank for that intent region); intents matching “product shot” or “social media clip” route to Grok. When a new provider enters and starts producing better results for a niche, the network discovers this through feedback and adjusts routing automatically. 8.5.3 Exploration strategy Exploitation routes to the highest-AgentRank provider. Exploration routes to alternatives for quality comparison. The mechanism is Thompson Sampling—a multi-armed bandit strategy that naturally reduces exploration as confidence increases: • Exploitation (∼95% of traffic): Route to highest-AgentRank provider. User pays normal cost. • Exploration (∼5% of traffic): Route to alternative provider. The network subsidizes exploration cost from attention-auction placement revenue. 42

Thompson Sampling is preferred over ε-greedy (fixed exploration rate) or UCB (optimistic estimates) because it adapts exploration intensity to uncertainty: under-explored providers with high variance receive more exploration traffic; well-characterized providers receive less. Exploration is an investment by the network, funded by placement revenue. The return is better routing for all future traffic—the same economic logic by which search engines subsidize relevance experiments. 8.5.4 Convergence dampening Quality scores update via exponential moving average (EMA) with epoch-based recalculation: Qnew = α · feedback + (1 − α) · Qold The smoothing factor α is confidence-weighted: new providers with few observations use α = 0.5 (learn fast); established providers use α = 0.1 (stable, resistant to outliers). A minimum observation threshold (e.g., 20 observations) applies before a provider’s quality score affects AgentRank. Update epochs run at fixed intervals (e.g., every 15 minutes) to prevent oscillation from rapid feedback bursts. All parameters are governance-adjustable. 8.5.5 Network effect Feedback-driven routing creates a compounding advantage: more traffic → more feedback → better routing → better outcomes → more traffic. Early movers accumulate routing intelligence that newcomers cannot replicate without traffic volume. No surveyed competitor has intent-aware, feedback-driven, continuously-learning routing with AgentRank economics. 43

Source: transcribed from the compiled Scrypted Network Design whitepaper PDF for web reading. Layout, figures, and pagination may differ from the PDF.