Skip to content

OKTO Server Management

Full-featured remote management of edge devices from the factory server. This document describes the command protocol, WebSocket channel, REST API, RBAC model, firmware workflow, and security model.


1. Architecture

 ┌────────────────┐      ┌──────────────────────────────────────────┐
 │ OKTO Cloud     │      │ Management UI (React Admin)              │
 │ (app.okto.ru)  │      │                                          │
 └───────▲────────┘      └────────────────┬─────────────────────────┘
         │ POST /companies/{id}/bottles   │ REST (JWT) + WS(/dashboard)
         │ POST /batches  POST /pallets   │
         │  ▲                             │
         │  │  CloudSyncService           ▼
 ┌───────┴──┴───────────────────────────────────────────────┐
 │ Factory Server (Ktor + Postgres)                         │
 │   • FactoryCloudClient  (bearer authToken)               │
 │   • EdgeSyncService     (persists + enqueues)            │
 │   • DeviceConnectionRegistry                             │
 │   • CommandDispatchService   FirmwareService             │
 │   • AuthService + JwtService                             │
 └───────▲──────────────────────┬───────────────────────────┘
         │ POST /api/v1/sync    │ WS /ws/device
         │ (VIA_LOCAL_SERVER)   │ (commands + events)
 ┌───────┴──────────────────────┴─────────────────┐
 │ Edge Service (Kotlin/Ktor + SQLite)            │
 │   • ServerConnectionService (persistent WS)    │
 │   • CommandHandlerService                      │
 │   • OfflineQueueService (forwards to factory   │
 │     OR direct to OKTO Cloud in DIRECT_CLOUD)   │
 └────────────────────────────────────────────────┘

Two end-to-end paths — factory-mode and direct-cloud — coexist and can be switched per device via connection-mode:

  1. VIA_LOCAL_SERVER (default): edge → factory → OKTO cloud. Data is persisted into the aggregated_* tables on the factory server, then enqueued on cloud_sync_queue, then pushed by CloudSyncService using the real OKTO cloud endpoints (/companies/{id}/bottles, /batches, /pallets, /batches/fixate). FactoryCloudClient wraps those calls.
  2. DIRECT_CLOUD: edge → OKTO cloud (existing OktoCloudClient). Factory-server is optional in this mode and is used only for device management (commands, firmware).

  3. Each edge device opens a single persistent WebSocket to wss://<factory>/ws/device?token=<deviceJwt> and re-establishes it with exponential backoff on failure (1s → 30s cap).

  4. The server pushes device commands down the socket; the device responds with CommandResult, optional CommandProgress, and arbitrary telemetry (StatusEvent, LogLineEvent, scan/print events, alerts).
  5. The dashboard subscribes to /ws/dashboard to receive the same telemetry stream, filtered by device or event type.

2. Authentication

Two flavours of JWT, both signed with HS256 (auth.jwtSecret):

Token Issued by Subject Scope claim
User JWT POST /api/v1/auth/login <userId> user
Device JWT POST /api/v1/devices/{id}/token (with X-Enrollment-Key) <deviceId> device
  • User JWTs carry a role claim (ADMIN, MANAGER, OPERATOR, VIEWER).
  • /ws/dashboard accepts only user JWTs.
  • /ws/device accepts only device JWTs.

Device enrollment flow

The edge service bootstraps itself on first boot by calling:

POST /api/v1/devices/{deviceId}/token
  X-Enrollment-Key: <shared-secret>
  ?name=<name>&companyId=<company>&productionLineId=<line>&version=<version>
  • If the device exists, a fresh device JWT is returned.
  • If the device is unknown and auth.allowAutoEnrollment = true, the server auto-registers it (OFFLINE status) and returns a token.
  • If the enrollment key is missing/wrong, the request is rejected.

Configure both sides symmetrically:

  • Factory: auth.deviceEnrollmentKey
  • Edge: factoryServer.enrollmentKey

Default admin credentials on a fresh database:

admin / admin123

Change the password immediately and remove the seeded accounts in production.


3. Device commands

All commands extend DeviceCommand (see common/api/DeviceControl.kt):

Command Purpose Dangerous
force_sync Immediately process the offline queue No
clear_queue Delete pending (+ optionally completed) queue rows Yes
pull_logs Stream the last N log lines back as LogLineEvents No
restart_service Gracefully exit edge-service (supervisor restarts it) Medium
reboot_os systemctl reboot Yes
shutdown_os systemctl poweroff Yes
push_config Merge a JSON patch into device config No
update_firmware Download + sha256-verify + stage a new edge-service binary Medium
exec_shell Run an allow-listed shell template (see ShellTemplates) Medium
enable_device Resume production on the device No
disable_device Pause production (requires a reason) Yes

Dispatch request

POST /api/v1/devices/{id}/commands
Authorization: Bearer <userJwt>
Content-Type: application/json

{
  "command": { "type": "force_sync", "id": "cmd-uuid" },
  "timeoutMs": 15000
}

Response (200):

{
  "success": true,
  "data": {
    "commandId": "cmd-uuid",
    "success": true,
    "output": "Force-sync completed. In-progress: 7"
  }
}

If the device is offline, the server returns success=false and persists the command with status FAILED.

Bulk dispatch

POST /api/v1/device-groups/{groupId}/commands

Body is identical to the single-device endpoint. The server fans out a fresh command-id per target and returns Map<deviceId, CommandResult>.

Command history

  • GET /api/v1/devices/{id}/commands?limit=50&offset=0
  • GET /api/v1/devices/{id}/commands/{cmdId} — single record

4. Firmware workflow

  1. Upload

    POST /api/v1/firmware/releases?version=1.2.3&channel=stable&filename=edge-service.jar
    Authorization: Bearer <userJwt>
    Content-Type: application/octet-stream
    
    <binary artifact>
    
    Server stores the artifact in data/firmware/<safeVersion>-<filename>, computes SHA-256, and persists a FirmwareRelease row.

  2. Deploy

    POST /api/v1/firmware/deployments
    { "releaseId": "r1", "deviceIds": ["edge-1", "edge-2"] }
    
    Server creates a FirmwareDeployment per device (status PENDING → IN_PROGRESS → SUCCESS/FAILED) and dispatches an UpdateFirmwareCmd with the artifact URL and sha256.

  3. Device behaviour

  4. UpdateFirmwareExecutor downloads the artifact, verifies sha256, and stages it at <okto.firmware.staging.dir>/edge-service-<version>.jar.
  5. The supervisor (systemd or Docker) is expected to pick up the staged JAR during its next restart.

  6. Browse

  7. GET /api/v1/firmware/releases
  8. GET /api/v1/firmware/releases/{id}/artifact (binary download, used by the device during deployment)

5. Device groups

Groups enable bulk operations:

POST /api/v1/device-groups               — create
GET  /api/v1/device-groups               — list
PUT  /api/v1/device-groups/{id}          — update
DELETE /api/v1/device-groups/{id}        — delete
POST /api/v1/device-groups/{id}/members  — add devices
DELETE /api/v1/device-groups/{id}/members — remove devices
POST /api/v1/device-groups/{id}/commands — bulk dispatch

6. Audit log

Every privileged REST call is recorded in audit_log. Query via:

GET /api/v1/audit-log?userId=...&entityId=...&since=2026-01-01T00:00:00Z&limit=200

7. RBAC

Roles and default permissions:

Role Can do
ADMIN Everything, including user/terminal CRUD, firmware upload, OS shutdown
MANAGER Device config + safe commands (force_sync, pull_logs, clear_queue); user listing
OPERATOR Dispatch force_sync, pull_logs, enable_device
VIEWER Read-only — list devices, view commands, view audit log

The Ktor JWT plugin enforces authentication; role-based gatekeeping is applied in individual route handlers (see AuthRoutes.kt and ServerManagementRoutes.kt).


8. Security model

  • Device JWTs default to a 1-year expiry. Rotate by calling POST /api/v1/devices/{id}/token and re-provisioning the edge service.
  • User JWTs default to 24 hours (auth.tokenExpirationMs).
  • exec_shell is strictly allow-listed — see ShellTemplates in CommandHandlerService.kt. Arbitrary shell is NOT accepted.
  • Firmware artifacts must pass SHA-256 verification before being staged. An optional signatureBase64 field is reserved for Ed25519 supply-chain signatures — hook it up in production with a keyring bundled in the edge service JAR.
  • Dangerous commands (reboot_os, shutdown_os, disable_device, clear_queue) surface a confirmation dialog in the dashboard UI and are recorded in audit_log before dispatch.

9. Dashboard WebSocket protocol

Connect:

wss://<factory>/ws/dashboard?token=<userJwt>

Send a subscription filter:

{ "type": "subscribe", "deviceIds": ["edge-1"], "eventTypes": ["status","scan","alert"] }

Receive typed events:

{ "type": "status", "deviceId": "edge-1", "status": "ONLINE", "ts": "...", "metrics": { ... } }
{ "type": "scan",   "deviceId": "edge-1", "code": "010...", "valid": true }
{ "type": "log_line", "deviceId": "edge-1", "line": "...", "level": "ERROR", "ts": "..." }

10. Known limitations

The implementation is intentionally scoped to a single factory-server instance per site and a trusted LAN between the server and its edge devices. The following limits are known and documented so you can plan around them:

  • In-memory WebSocket registry: DeviceConnectionRegistry is not replicated. Running two factory-server replicas will split the device-session set — a command issued against replica A for a device connected to replica B will fail with Device offline. Fix: put a shared state layer (e.g. Redis / PostgreSQL LISTEN/NOTIFY) in front of the registry before horizontal scaling.
  • In-memory InMemoryDeviceConfigStore on the edge: push_config persists for the lifetime of the JVM only. Triggering restart_service afterwards loses the changes unless you also write them through to /etc/okto/application.yaml. For durable config, either restart the service (so the YAML is re-read) or write a PersistentDeviceConfigStore that maps the patch to the SQLite config table.
  • Config hot-reload: most running services (scanner, printer, modbus, cloud client) capture their configuration at startup. push_config without a follow-up restart_service mostly just records intent, it does not reconfigure the hardware stack in place.
  • WebSocket device JWT in query string: device tokens are passed as a URL query parameter. Production deployments should terminate TLS on a reverse proxy and strip the query parameter from its access logs, or switch the edge service to a first-message-authentication scheme (open the socket, then send the token as the first text frame before the server registers it).
  • No JWT revocation: user logout invalidates the session row but the JWT remains valid until its exp claim. Rotate auth.jwtSecret to force-revoke all tokens; per-user revocation requires a blacklist you'd need to add.
  • Cloud auth token is static: cloudSync.authToken is a long-lived bearer. If your OKTO cloud tenant moves to OAuth / short-lived tokens, wrap FactoryCloudClient with an auth refresher.
  • Log retention: device_logs grows unbounded. Add a periodic cleanup job (e.g. DELETE FROM device_logs WHERE ts < NOW() - INTERVAL '30 days').
  • Firmware signatures: the protocol carries an optional signatureBase64 field but verification isn't wired up by default. If you enable Ed25519 signing, bundle the trusted public key in the edge service JAR and add the check in UpdateFirmwareExecutor before the swap.
  • Privilege escalation: reboot/shutdown/firmware swap require the sudoers file at packaging/sudoers/okto. Without it, those commands will return non-zero exit codes. Docker containers without systemd should rely on restart_service + supervisor restart instead.

11. Troubleshooting

  • Device shows OFFLINE despite being powered on
  • Confirm its connection mode is VIA_LOCAL_SERVER.
  • Check GET /api/v1/devices/connected — does the identifier appear?
  • Tail the edge-service log for "Connecting to factory server WS".
  • Commands always TIMEOUT
  • Either the edge-service is not online, or its CommandHandlerService raised an uncaught exception. Inspect device_logs via the Logs page.
  • Firmware deploy shows SUCCESS but device still reports old version
  • The device only stages the artifact; the supervisor must swap the JAR on its next restart. Trigger a restart_service command to force the swap on the next process start.

See also: - API_REFERENCE.md for the complete endpoint list. - DEPLOYMENT.md for production hardening guidance.