Skip to main content

Binary Collections

By default every Starfish collection stores JSON documents. Setting allowedMimeTypes to anything other than "application/json" switches the collection to binary mode: the server stores raw bytes instead of a structured document, and the client uses pullBlob / pushBlob (TypeScript) or pull_blob / push_blob (Python) instead of pull / push.

Typical uses: user avatars, PDF attachments, audio files, pre-built WASM modules, or any opaque binary asset.

Prerequisites: StarfishClient, Collection Patterns


Server configuration

// TypeScript server config
{
name: "avatars",
storagePath: "users/{identity}/avatar",
readRoles: ["self"],
writeRoles: ["self"],
encryption: "none", // required — binary collections cannot be encrypted
maxBodyBytes: 2_097_152, // 2 MB limit
allowedMimeTypes: ["image/*"], // accept any image type
}
# Python server config
CollectionConfig(
name="avatars",
storage_path="users/{identity}/avatar",
read_roles=["self"],
write_roles=["self"],
encryption="none", # required — binary collections cannot be encrypted
max_body_bytes=2_097_152, # 2 MB limit
allowed_mime_types=["image/*"], # accept any image type
)

allowedMimeTypes patterns

PatternMatches
"image/png"Exactly image/png
"image/*"Any image subtype
"application/pdf"PDF only
"*/*"Any content type

The server returns 415 Unsupported Media Type if the Content-Type header doesn't match.

Constraints

  • encryption must be "none" or "delegated""identity", "server", and "group" are rejected by the config validator
  • objectSchema, bundle, remote, and appendOnly cannot be combined with binary collections
  • The GET /config endpoint exposes allowedMimeTypes so clients can discover it at runtime

Client API

TypeScript

import {
StarfishClient,
bootstrapRootIdentity,
} from "@drakkar.software/starfish-client"

const creds = await bootstrapRootIdentity(passphrase)

const client = new StarfishClient({
baseUrl: "https://api.example.com/v1",
capProvider: {
getCap: async () => ({ cap: creds.capCert, devEdPrivHex: creds.device.edPriv }),
},
})

// ── Push a file upload ──────────────────────────────────────────────────────

// From a browser File / Blob
async function uploadAvatar(file: File) {
const result = await client.pushBlob(
`/push/users/${userId}/avatar`,
file,
file.type, // e.g. "image/jpeg"
)
console.log("stored hash:", result.hash)
}

// From an ArrayBuffer (e.g. canvas.toBlob)
async function uploadPng(buffer: ArrayBuffer) {
await client.pushBlob(
`/push/users/${userId}/avatar`,
buffer,
"image/png",
)
}

// ── Pull and display ────────────────────────────────────────────────────────

async function loadAvatar(): Promise<string> {
const result = await client.pullBlob(`/pull/users/${userId}/avatar`)
// result.data → ArrayBuffer
// result.hash → SHA-256 hex from ETag header (null if server omitted it)
// result.contentType → e.g. "image/jpeg"

const blob = new Blob([result.data], { type: result.contentType })
return URL.createObjectURL(blob) // use as <img src=...>
}

Python

from starfish_sdk import StarfishClient

async with StarfishClient("https://api.example.com/v1", auth=auth) as client:
# Push bytes
with open("avatar.png", "rb") as f:
data = f.read()

result = await client.push_blob(
f"/push/users/{user_id}/avatar",
data,
"image/png",
)
print("stored hash:", result.hash)

# Pull bytes
blob = await client.pull_blob(f"/pull/users/{user_id}/avatar")
# blob.data → bytes
# blob.hash → SHA-256 hex from ETag header (None if server omitted it)
# blob.content_type → e.g. "image/png"

with open("downloaded.png", "wb") as f:
f.write(blob.data)

Conflict model

Binary collections do not use hash-based conflict detection. Every push unconditionally overwrites the stored bytes — there is no baseHash parameter.

This is the correct model for most binary assets: user avatars, thumbnails, and generated files are naturally last-write-wins. If you need versioning (e.g. allow a user to revert to a previous logo), use a versioned path pattern like logos/{versionId} so each push targets a new key rather than overwriting.


Caching

Set cacheDurationMs to add a Cache-Control header to pull responses:

{
name: "avatars",
storagePath: "users/{identity}/avatar",
readRoles: ["public"],
writeRoles: ["self"],
encryption: "none",
maxBodyBytes: 2_097_152,
allowedMimeTypes: ["image/*"],
cacheDurationMs: 3_600_000, // 1 hour; adds Cache-Control: max-age=3600
}

Pull responses also include an ETag header containing the SHA-256 of the stored bytes. Clients can use this for conditional requests (If-None-Match) to avoid re-downloading unchanged assets.


Public read-only assets

To serve files that anyone can read but only certain roles can write (e.g. brand logos):

{
name: "logo",
storagePath: "brand/logo",
readRoles: ["public"],
writeRoles: ["admin"],
encryption: "none",
maxBodyBytes: 1_048_576,
allowedMimeTypes: ["image/png", "image/svg+xml"],
cacheDurationMs: 86_400_000, // 24 hours
}

Because readRoles contains "public", the Cache-Control header is emitted without the private directive, making CDN caching safe.


What binary collections cannot do

FeatureAvailable?
Incremental sync (checkpoint)No — always returns full bytes
Hash-based conflict detectionNo — last-write-wins
Server-side encryptionNo (encryption must be "none" or "delegated")
Group encryptionNo
JSON Schema validationNo
Field-level permissionsNo
Replica / remote collectionsNo
appendOnlyNo