Roadmap — Loka

215 / 232 items complete (93%)

Active Work

AI Agent Installer In Progress

The loka install-agent command sets up a database interactively for AI agents. Currently debugging path handling, HNSW index rebuild on startup, and error messaging for non-interactive environments. Goal: any AI agent says "set up a database" and it just works.

Cost-Based Query Planning In Progress

Cardinality estimation and predicate pushdown are done. Remaining: planner should choose HNSW index scan vs SPO triple scan based on cost, and adaptive execution should observe intermediate result sizes at runtime and reorder mid-query.

HNSW Traversal via SPARQL Property Paths In Progress

HNSW topology is exposed as virtual RDF triples (loka:hnswNeighbor). Remaining: make property path evaluation produce correct ANN results by letting greedy descent + beam search emerge from the graph structure.

Predicate-Based Exit Conditions (UNTIL) Design Phase

Standard SPARQL property paths can't express "traverse and stop when a condition is met." The UNTIL extension adds per-step predicate evaluation during traversal with backtracking support and ordered traversal semantics.

Background Maintenance Cycle In Progress

During low-usage periods, rebuild HNSW indexes and rediscover pseudo-tables in the background. Old indexes stay live until the new ones are ready, then atomic swap.

SDK Publishing

Java SDK → Maven Central Priority

The semantic web ecosystem runs on Java. The SDK code is complete; publishing to Maven Central requires GPG signing configuration, Sonatype account setup, and pom.xml updates for distribution management. This is the highest-priority SDK to publish.

Other SDKs Planned

Python (PyPI), TypeScript (npm), Rust (crates.io), Go (pkg.go.dev), and .NET (NuGet). All SDK code is complete and tested. Publishing requires registry account setup for each platform. See SDK Accounts Setup.

What's Done

Triple Storage Engine Complete

SPO/POS/OSP indexes over interned u64 IDs. sled LSM-tree persistence with crash recovery. RDF-star quoted triple support. Bulk insert at 20K triples/sec.

HNSW Vector Index Complete

Per-predicate HNSW indexes with cosine, Euclidean, and dot product distance metrics. SIMD acceleration (AVX2/SSE). Tombstone-based deletion. Multiple entry points for search quality.

SPARQL+ Query Engine Complete

Full SPARQL 1.1 parser, cost-based query planner, and iterator-based executor. Extensions: VECTOR_SIMILAR, VECTOR_SCORE, metric-specific search operators. Property paths, subqueries, aggregates, all working.

HTTP Server & Protocol Complete

SPARQL endpoint, Graph Store Protocol, REST API for triples and vectors. Content negotiation (JSON, XML, CSV, TSV, Turtle). Optional passcode auth, rate limiting, query timeouts, periodic backups.

Pseudo-Tables & Vectorized Execution Complete

Automatic discovery of columnar indexes from graph structure. SIMD-accelerated column scans, zonemap pruning, segment-level storage. Deep subgraph pattern mining with fan-in detection.

6 Client SDKs Complete

Python, TypeScript, Go, Rust, Java, .NET. All with client-side OWL validation, typed error handling, and vector search support. Code complete, awaiting registry publishing.

Loka Studio (Flutter GUI) Complete

Desktop and web client. Graph visualization, SPARQL editor, HNSW health dashboard, triple editor, OWL ontology viewer, backup management. 90% feature complete.

CI/CD & Benchmarks Complete

6 GitHub Actions workflows. Clippy + rustfmt enforcement. 256 tests, 50+ Criterion benchmarks tracked historically. Docker image. Cross-platform install scripts.

Self-Update & Version Management Complete

loka update command, --version flag, startup version check, and HNSW rebuild endpoint. The CLI can check for new releases, self-update in place, and trigger index rebuilds after version upgrades.

MCP Server Complete

Dual-mode MCP server supporting both serverless and server configurations. Provides 8 maintenance tools for AI agents to manage Loka instances through the Model Context Protocol.

ACID Compliance Complete

Atomic transactions, startup verification, durability guarantees, and isolation. All write operations are fully transactional with crash recovery support.

Future

Query Language Wrappers Future

Cypher and GQL (ISO 39075) as translation layers over SPARQL. The database still speaks SPARQL+ internally; these wrappers let users query with familiar graph query syntax. SQL and MQL are deliberately excluded.

Premium Tier Future

RBAC, encryption at rest, TLS, audit logging, replication, clustering, multi-tenancy. These will be shaped by customer feedback. The open-source core is designed to be sufficient for most use cases.