Oxibase Roadmap

The roadmap outlines the journey from a single-node into scalable distributed autonomous system. The goal is not to become the fastest, the most scalable solution but to learn and explore how the computation and data processing paradigms can be rethought.

Figure 1: Current Priorities Dependency Flow

classDiagram
    %% --- RELATIONSHIPS ---
    %% Use the short class names here, not "Layer.Class"
    Validation ..> External_Gateway
    Validation ..> Workstation
    Validation ..> Performance

    Workstation ..> Horizontal_Architecture
    External_Gateway ..> Horizontal_Architecture
    Performance ..> Horizontal_Architecture
    Unikernel_Compiler ..> Horizontal_Architecture

    %% --- LAYERS & CLASS DEFINITIONS ---

    class Validation {
        - Embedded Scripting Languages
        - Stored functions
        - Triggers
        - Queues
        - Debugger support
        - FaaS-like DevEx
        - Out-of-core processing
    }
    class External_Gateway {
        - DML routes
        - REST / GraphQL endpoints
        - HTML Render
        - Postgres Wire Protocol Server
        - Role based authorization control
        - Authentication
    }
    class Workstation {
        - Manage any object
        - Develop any object
        - Debug Queries step-by-step
        - Debug stored procedures
        
    }

    class Performance {
        - Vertical scaling support
        - Out-of-core processing
        - Self-monitoring
        - Undo-log based MVCC
        - Copy-on-write checkpoint
        - Row level replication
        - Storage backends
    }

    class Horizontal_Architecture {
        - Deterministic Simulator
        - Failure simulation
        - Dedicated Clustering
        - Sharding
        - Geolocation
        - Distributed Computation
        - Data / Computation rebalancing
    }

    class Unikernel_Compiler {
        - Unikernel Compilation
        - OS-free images
        - Bootable image generation
        - SQL based OS management
    }

    class Kernel_Integration {
        - Zero-copy data paths
        - Privileged hardware access
        - Lock-free page-table walks
        - Active virtual memory
        - Elastic resource allocation
    }

    Unikernel_Compiler ..> Kernel_Integration
    Kernel_Integration ..> Horizontal_Architecture

Phases
Current Goals
1. Working areas
2. Supported objects
Other ideas

Phases

Phase 0: Validation

No idea survives alone. No open-source project can thrive without community support. We need to validate our vision and gather feedback from the community. This phase will focus on building a strong developer experience and validating through community engagement, documentation, and user testing.

Starting with the initial release of the core experience as an small embedded system providing a self-contained computational experience. The system needs to provide support to store relational data in an MVCC (Multi-Version Concurrency Control) system, store and execute procedures, debug them, monitor and trace them, invoke them and finally provide a dev-friendly interface for interacting with the system. This should serve as a demo of the system’s capabilities.

Goal

Validate the development experience and build a community around the project.

Phase 1: Foundation

Parallel efforts to establish Oxibase’s core capabilities and prepare for scaling. They focus on single-node enhancements that enable seamless development, connectivity, and deployment without relying on external systems.

Optimize the Wire Protocols: Implement FlightSQL for high-performance client connections and PostgresSQL wire protocol for seamless integration.
Query engine optimization: Add DataFusion for query parsing, planning, and optimization of executor/. Adopting Apache Arrow’s memory management and data structures to optimize data loading and spill-to-disk operations.
MVCC storage optimization: Add Vortex for compressed columnar storage that supports random access for efficient data retrieval for the mvcc layer.
Catalog optimization: Use Iceberg for efficient metadata management and leverage the DataFusion interface.
Implement Hermit Unikernel: Add unikernel compilation support for bare-metal performance.

Goal

Leverage the open-source data community to build a reliable system that can handle large-scale data processing and storage.

Phase 2a: Scale

Explore distributing the system across multiple nodes, with a focus on multi-node parallelism, and distributed storage. Explore having dedicated nodes for metadata management, compute, and data storage.

Support for distributed execution: Implement Ballista for distributed execution and scaling across nodes.
Metadata layer: Implement a consensus protocol to manage metadata between nodes.
Use a deterministic simulator: Simulate failure points in the system to prepare for horizontal scaling by testing most scenarios without real-world risks.
Distributed Storage: Distribute storage for fault tolerance.
- Geo-replication: Implement geo-replication for data redundancy and availability.
- Data Partitioning: Implement data partitioning for efficient data distribution.
- Data locality: Implement data locality for efficient and/or compliant data access and computation.
- Sharding: Implement sharding for horizontal scaling.

Goal

Scale horizontally.

Phase 2b: Workstation

Create an Integrated Development Environment (IDE) for local and remote management of the system. It should serve as an object manager and debugger at the same time, providing a seamless experience for developers. Draw inspiration from IDEs like Zed for code editing; Chrome DevTools for the debugging experience; and DataGrip for the Database Management.

Goal

Create a workstation that serves developers of all levels to create, test, and deploy applications.

Phase 3: Auto Scaling

Explore how the system can autonomously scale resources based on demand. Explore distributing data and computation based on the workload as well as the availability of adding resources on demand for elastic scaling.

Load Balancing: Use the internal resource monitoring to implement load balancing for efficient resource utilization.
Third-party resource allocation: Integrate with cloud providers and third-party resource allocation services for dynamic resource management.
Data lifecycle management: Implement a comprehensive data lifecycle management system to ensure data is properly managed throughout its entire lifecycle. Use different policies for data retention, backup, and deletion.

Goal

Global autonomous computing environment.

Current Goals

We are currently at Phase 0.

Working areas

Supported objects

Object Type	Status	Notes
Schemas	Available	No CREATE SCHEMA/DROP SCHEMA support
User-defined Functions	Available	CREATE FUNCTION/DROP FUNCTION supported
Stored Procedures	Missing	No CREATE PROCEDURE/DROP PROCEDURE
Materialized Views	Missing	No CREATE MATERIALIZED VIEW
Custom Types/Domains	Missing	No CREATE TYPE/CREATE DOMAIN
Rules	Missing	No CREATE RULE/DROP RULE
Extensions	Missing	No CREATE EXTENSION
Foreign Data Wrappers	Missing	No foreign table support
Aggregates	Missing	No custom aggregate functions
Operators	Missing	No custom operator definitions
Event Triggers	Missing	No DDL event triggers
Tablespaces	Missing	No CREATE TABLESPACE
Roles/Users	Missing	No user/role management
Publications/Subscriptions	Missing	No logical replication support

Other ideas

Declarative Schema Migration

Modify the schema by applying DDL in static create files stored in the database version management system. (oxigration) or

Automatic blue-green migrations

Any schema changes triggers an automatic blue-green migration.

DML based schema migrations

Setup an special internal schema that let’s the user use DML to manipulate the schema of the database with DML in stead of DDL, bringing the DevEx of aquameta.

Oxibase Roadmap

Table of contents

Phases

Phase 0: Validation

Goal

Phase 1: Foundation

Goal

Phase 2a: Scale

Goal

Phase 2b: Workstation

Goal

Phase 3: Auto Scaling

Goal

Current Goals

Working areas

Supported objects

Other ideas

Declarative Schema Migration

Automatic blue-green migrations

DML based schema migrations