Trust, Safety, and Content Policy

Chapter 9 Trust, Safety, and Content Policy Strategic Takeaway Pre-flight content binding and two-stage moderation ensure that safety checks cannot be bypassed by recipe composition—policy enforcement is structural, not optional. Composable agent marketplaces inherit three overlapping safety problems: malicious or policy-violating user content (prompts, uploads), harmful model outputs (toxic, illegal, IPinfringing), and supply-chain behavior when jobs wrap or delegate across ingredients. This chapter maps the implemented mechanisms and the regulatory landscape at desk level. 9.1 Threat model Layer Failure mode Control owner Ingress Prompt injection, jailbreaks, exfiltration via tools API + guardrails + app design Orchestration Unsafe recipe composition (e.g., skipping checks) Recipe author + platform policy Model provider Base model emits prohibited content FM usage policies + provider filters Output / media CSAM, violence, deepfakes, copyright Legal + content filters + provenance Wrapper graph Opaque relay hides true generator Registry disclosure + pathaware reputation Agent autonomy increases indirect injection risk: a tool-using agent may fetch untrusted content that manipulates its behavior. This aligns with the OWASP emphasis on indirect prompt injection (LLM01). 9.2 As-built: moderation and guardrails 9.2.1 Pre-flight moderation binding The moderation_validator enforces a binding between moderation decisions and job content:

A moderation_id in the job request must belong to the same user.
The content’s SHA-256 hash must match the stored ModerationContent record — preventing post-moderation tampering. 44
Decisions are allowed, blocked, or rewrite_required. Blocked content returns HTTP 403. If no moderation_id is provided, the system can invoke guardrails inline (check_moderation_or_run_guardr returning pending while the check runs. 9.2.2 Two-stage assessment recipe The moderation text assessment recipe implements a layered check:
AWS Bedrock Guardrails (baseline): content filters for hate, sexual content, violence, prompt attacks, and configurable denied topics. This is a hard gate — violations reject the content.
LLM rule-based assessment: an LLM evaluates the content against partner-defined rules, producing structured JSON output with per-rule scores and an overall decision. Bedrock Guardrails support additional modes: PII masking, word filters, contextual grounding checks, and automated reasoning. The API surface is POST /moderations/text/assessments with Bearer auth and optional webhook notification. 9.2.3 Content moderation router The content_moderation_router ingredient dispatches content to the appropriate moderation path based on content type and configured safety level (open, strict, permissive). A moderation_cache_store and moderation_cache_lookup pair avoid redundant guardrail calls for identical content, keyed by content hash and safety level. 9.3 Policy layering: network, ingredient, and upstream Approach Strength Weakness Network-default guardrails on all jobs Consistent baseline Latency, cost, false positives Opt-in per recipe / ingredient Flexibility for specialists Gaps if authors skip checks Delegate to upstream API Simple No guarantee for openweight paths Buyer-configurable safety tier Market segmentation Abuse of permissive tier The current posture is pragmatic: strong baseline from a cloud-provider guardrail, programmable rules for tenant-specific policy, and optional pre-flight binding for regulated partners. For the network narrative, the chapter on identity (Chapter 11) discusses how verifiable disclosure of which checks ran (via ERC-8004 validation metadata) can replace trust-me claims. 9.4 Regulatory desk survey Three external frameworks inform the safety posture: NIST AI RMF / NIST.AI.600-1: The Generative AI Profile lists GenAI-specific risks (abusive content, IP infringement, misinformation, CBRN misuse, privacy, value-chain integration) as a checklist for governance sections. OWASP Top 10 for LLM Applications (2025): Prompt injection is LLM01, with direct, indirect, multi-turn, and semantic variants. RAG does not fully mitigate indirect injection. 45

EU AI Act (GPAI obligations): Documentation and transparency duties apply to generalpurpose AI model providers above specified thresholds. Downstream deployers face separate obligations for high-risk use cases. Commission guidelines (2025) clarify scope. Scrypted sits between model provider and end deployer: ingredients invoke third-party foundation models; recipes compose them. Legal classification (platform vs deployer vs intermediary) is jurisdiction-specific. The whitepaper uses functional language (safety hooks, disclosure, audit) rather than asserting a single regulatory role. 9.5 Data governance, retention, and erasure The platform holds overlapping data classes with different retention obligations: Data class Erasure vs retention User profile Target for full delete or anonymization. Ledger / billing Often must retain for years (tax, audit); may pseudonymize user ID. Jobs & workflow state Dispute and support may require retention; minimize prompt payloads in logs. Assets expires_at + purge worker; storage minimization by design. Webhooks / event logs Security investigation value; prefer TTL or aggregate statistics over indefinite raw bodies. Observability traces Self-host + redaction; separate retention from OLTP. GDPR Article 17 (right to erasure) is qualified: controllers may refuse where processing is necessary for legal compliance, legal claims defense, or archiving in the public interest. Account-level erasure must orchestrate across these classes in the right order for referential integrity. 9.6 Abuse resistance and quota design Economic barriers and technical controls complement content policy: SCRYPTOSHI floors: MIN_INGREDIENT_INVOCATION_COST (500) and MIN_EXTERNAL_API_INVOCATION_COS (15,000) ensure every invocation has a non-zero economic cost, bounding Sybil attack ROI. Transaction caps: MAX_TRANSACTION_USDC (10,000), MAX_JOB_TIMEOUT_SECONDS (3,600), MAX_REQUEST_SIZE MAX_FILE_UPLOAD_MB, and MAX_ASSET_RETENTION_DAYS (365) limit blast radius. Provider concurrency: The Redis ConcurrencyManager (Chapter 6) prevents one user from monopolizing all provider slots. Declared rate-limit constants: DEFAULT_REQUESTS_PER_HOUR (1,000), MAX_REQUESTS_PER_HOUR (10,000), and RATE_LIMIT_BURST_CAPACITY (100) are defined in constants.py. Enforcement constants are defined; empirical load data from testnet will calibrate final values before mainnet. Currently these constants are not referenced by FastAPI middleware in the application tree — enforcement may occur at an API gateway, WAF, or infrastructure layer. This is documented as a known status, not a claim of end-to-end rate limiting in application code. The canonical identity key for quotas (user UUID, bearer token, wallet address, IP, or composite) and the semantics of 429 responses (Retry-After, RFC 9457 Problem Details) are open design questions for the network layer. 46

9.7 Protocol neutrality and platform liability As the network decentralizes, the line between “platform” and “protocol” determines liability. If an AVB uses Scrypted’s x402 rails to spin up illegal compute, or a recipe generates prohibited content by composing individually compliant ingredients, who bears responsibility? Current posture (centralized): Scrypted operates the API, hosts workers, and applies preflight content moderation. The company is the platform operator and accepts corresponding obligations (CSAM reporting, DMCA, EU AI Act provider duties where applicable). Target posture (decentralized): The Scrypted Network becomes a protocol, not a company service. Operators (ingredient hosts, committee members, facilitators) bear individual compliance obligations for their jurisdictions. The DUNA-based DAO governs protocol-level policy; enforcement is economic (slashing, deregistration), not centralized content filtering. Transition risk: During the centralized-to-decentralized transition, the project must avoid claiming protocol immunity prematurely. Regulatory classification under the EU AI Act (generalpurpose AI system vs. deployer vs. provider) depends on the degree of editorial control exercised. The design target is to reduce editorial control progressively as governance and moderation tooling matures—not to disclaim responsibility before the infrastructure exists to enforce policy without a central operator. No guarantees are made. All liability, token, and governance positions are subject to legal review and may change before mainnet. This section documents the design intent, not a legal opinion. 47

Source: transcribed from the compiled Scrypted Network Design whitepaper PDF for web reading. Layout, figures, and pagination may differ from the PDF.