Performance Optimization in Oxibase

This document provides guidelines and best practices for optimizing performance in Oxibase, including database design, query optimization, and system configuration.

Database Design Optimization

Table Design

  • Choose appropriate data types - Use the smallest data type that can reliably store your data
  • Normalize when appropriate - Balance normalization for data integrity with denormalization for query performance
  • Use primary keys - Every table should have a primary key, preferably a simple integer
  • Consider column order - Place frequently used columns first
  • Limit column count - Tables with fewer columns generally perform better

Indexing Strategy

  • Index selection - Create indexes on columns used in WHERE, JOIN, and ORDER BY clauses
  • Multi-column indexes - Create multi-column indexes for frequently combined filters
  • Index order matters - For multi-column indexes, place high-selectivity columns first
  • Avoid over-indexing - Each index increases write overhead and storage requirements
  • Monitor index usage - Periodically review which indexes are being used
-- Create an index on a frequently filtered column
CREATE INDEX idx_user_email ON users (email);

-- Create a multi-column index for common query patterns
CREATE INDEX idx_product_category_price ON products (category_id, price);

Query Optimization

Recent Performance Improvements

Oxibase includes significant performance optimizations:

  • Hash-Based IN Subqueries - Up to 2048x faster for large IN lists
  • Array-Based Row Storage - Eliminated map allocations throughout query execution
  • Efficient Aggregations - Optimized operations for COUNT, SUM, AVG, MIN, MAX
  • Parallel Execution - Rayon-based parallelism for filter, join, sort, and distinct operations

SELECT Statement Optimization

  • Select only needed columns - Avoid SELECT * when possible
  • Use WHERE clauses effectively - Apply filters early to reduce the result set
  • Leverage indexes - Ensure queries can use available indexes
  • Minimize function calls - Avoid functions on indexed columns in WHERE clauses
  • Use LIMIT for large result sets - Apply LIMIT to prevent excessive memory usage
-- Instead of this:
SELECT * FROM large_table WHERE status = 'active';

-- Do this:
SELECT id, name, created_at FROM large_table WHERE status = 'active' LIMIT 1000;

JOIN Optimization

  • Join order - Join smaller tables first when possible
  • Use appropriate join types - Choose INNER, LEFT, RIGHT joins as needed
  • Index join columns - Ensure columns used in join conditions are indexed
  • Consider denormalization - For critical queries, strategic denormalization may help
-- Ensure both user_id in orders and id in users are indexed
SELECT u.name, o.order_date FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed';

Aggregate Query Optimization

  • Filter before aggregating - Apply WHERE clauses before GROUP BY
  • Index GROUP BY columns - Ensure columns used in GROUP BY are indexed
  • Use HAVING efficiently - Apply HAVING only for conditions on aggregated results
-- Filter first, then aggregate
SELECT category_id, COUNT(*) FROM products
WHERE price > 100
GROUP BY category_id
HAVING COUNT(*) > 10;

Prepared Statements

Use prepared statements for repeated queries to leverage query caching:

// Prepare once, execute many times
let stmt = db.prepare("SELECT * FROM users WHERE id = $1")?;

for id in 1..=100 {
    for row in stmt.query((id,))? {
        let row = row?;
        // Process row...
    }
}

Parallel Execution

Oxibase’s parallel execution engine processes data using Rayon’s work-stealing scheduler:

  • Automatic Parallelization - Operations exceeding thresholds are automatically parallelized
  • Work-Stealing - Optimal load balancing across CPU cores
  • Configurable Thresholds - Filter (10K rows), Hash Join (5K), ORDER BY (50K), DISTINCT (10K)

Transaction Management

  • Keep transactions short - Long-running transactions can impact concurrency
  • Choose appropriate isolation level - Use the minimum isolation level needed
  • Batch operations - Group related operations within a single transaction
  • Handle conflicts - Implement retry logic for optimistic concurrency conflicts
-- Example of a focused transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Memory Management

  • Buffer pool sizing - Configure buffer pool size based on available memory
  • Query memory limits - Set appropriate memory limits for complex queries
  • Monitor memory usage - Watch for excessive memory consumption

Bulk Operations

Use bulk operations for better performance:

-- Bulk insert example
INSERT INTO products (name, price, category_id) VALUES
  ('Product A', 10.99, 1),
  ('Product B', 15.99, 1),
  ('Product C', 20.99, 2);

Performance Monitoring

Oxibase provides several performance monitoring features:

  • EXPLAIN - Shows the query execution plan
  • EXPLAIN ANALYZE - Shows the plan with actual runtime statistics
  • Query timing - The CLI displays execution time for each query
  • Row counts - Result sets show the number of rows affected/returned
-- View query plan
EXPLAIN SELECT * FROM orders WHERE amount > 1000;

-- View plan with actual execution statistics
EXPLAIN ANALYZE SELECT * FROM products WHERE category = 'Electronics';

See EXPLAIN for detailed documentation.

Implementation-Specific Optimizations

Oxibase includes several specialized optimizations:

Parallel Processing

Oxibase uses Rayon for parallel query execution:

  • Parallel Filtering - Predicate evaluation across multiple threads
  • Parallel Aggregation - Concurrent aggregation operations with DashMap
  • Parallel Sorting - Multi-threaded sorting via par_sort_by()

Custom Data Structures

Oxibase uses specialized data structures for better performance:

  • DashMap - Concurrent hash tables for parallel operations
  • Efficient Index Structures - B-tree, Hash, and Bitmap indexes
  • Row Version Chains - MVCC implementation with version chains

Advanced Optimization Techniques

Expression Pushdown

Oxibase pushes down expressions to minimize data processing:

-- Filter and projection will be pushed down to the storage layer
SELECT name, price FROM products WHERE price > 100;

Join Algorithms

Oxibase selects among several join algorithms:

  • Hash Join - For equality joins with large tables
  • Merge Join - For pre-sorted data
  • Nested Loop Join - For small tables or when using indexes

Parallel Execution

Oxibase can execute some operations in parallel:

  • Parallel scans - Multiple segments scanned concurrently
  • Parallel aggregations - Divided work for faster aggregations
  • Concurrent index operations - Background index building

CTE and Subquery Optimization

Oxibase includes advanced optimizations for CTEs and subqueries:

Common Table Expressions (CTEs)

  • Single Materialization: CTEs are evaluated once and reused throughout the query
  • Efficient Storage: CTE results are stored efficiently in memory
  • Optimized Aggregations: Fast operations for COUNT, SUM, AVG, MIN, MAX
-- Efficient: CTE materializes once
WITH summary AS (
    SELECT region, product_id, SUM(amount) as total
    FROM sales
    WHERE year = 2024
    GROUP BY region, product_id
)
SELECT
    s1.region,
    COUNT(*) as product_count,
    SUM(s1.total) as region_total
FROM summary s1
GROUP BY s1.region;

Subquery Optimization

  • Hash-Based IN/NOT IN: Automatic conversion to hash lookups
  • Performance Gains: Up to 2048x faster for large IN lists
  • Memory Efficiency: O(1) lookups instead of O(n×m) comparisons
-- Automatically optimized with hash table
DELETE FROM orders
WHERE customer_id IN (
    SELECT id FROM customers
    WHERE last_order < '2024-01-01'
);

-- Scalar subqueries are optimized
SELECT name, salary,
    (SELECT AVG(salary) FROM employees) as company_avg
FROM employees;

Best Practices for CTEs and Subqueries

  1. Use CTEs for Complex Queries: Break down complex logic into readable CTEs
  2. Filter Early: Apply WHERE clauses in CTE definitions
  3. Avoid Redundant Subqueries: Use CTEs when the same subquery is needed multiple times

Best Practices Summary

  1. Design schema carefully - Choose appropriate data types and normalization level
  2. Create targeted indexes - Index columns used in filters, joins, and sorts
  3. Write optimized queries - Select only needed columns and filter early
  4. Use prepared statements - Leverage the query cache for repeated queries
  5. Manage transactions efficiently - Keep transactions short and focused
  6. Monitor performance - Use query timing and row counts to identify bottlenecks
  7. Rebuild indexes - Recreate indexes periodically for optimal performance
  8. Consider bulk operations - Use bulk inserts and updates for better throughput
  9. Leverage parallel execution - Structure operations to benefit from multi-threaded processing
  10. Configure for your workload - Adjust memory settings based on your specific needs
  11. Use CTEs for complex queries - Benefit from single materialization
  12. Optimize subqueries - Leverage automatic hash-based IN/NOT IN optimization

Copyright © 2025-2026 Oxibase Contributors. Gabriel Maeztu.