Performance Optimization in Oxibase

This document provides guidelines and best practices for optimizing performance in Oxibase, including database design, query optimization, and system configuration.

Database Design Optimization

Table Design

Choose appropriate data types - Use the smallest data type that can reliably store your data
Normalize when appropriate - Balance normalization for data integrity with denormalization for query performance
Use primary keys - Every table should have a primary key, preferably a simple integer
Consider column order - Place frequently used columns first
Limit column count - Tables with fewer columns generally perform better

Indexing Strategy

Index selection - Create indexes on columns used in WHERE, JOIN, and ORDER BY clauses
Multi-column indexes - Create multi-column indexes for frequently combined filters
Index order matters - For multi-column indexes, place high-selectivity columns first
Avoid over-indexing - Each index increases write overhead and storage requirements
Monitor index usage - Periodically review which indexes are being used

-- Create an index on a frequently filtered column
CREATE INDEX idx_user_email ON users (email);

-- Create a multi-column index for common query patterns
CREATE INDEX idx_product_category_price ON products (category_id, price);

Query Optimization

Recent Performance Improvements

Oxibase includes significant performance optimizations:

Hash-Based IN Subqueries - Up to 2048x faster for large IN lists
Array-Based Row Storage - Eliminated map allocations throughout query execution
Efficient Aggregations - Optimized operations for COUNT, SUM, AVG, MIN, MAX
Parallel Execution - Rayon-based parallelism for filter, join, sort, and distinct operations

SELECT Statement Optimization

Select only needed columns - Avoid SELECT * when possible
Use WHERE clauses effectively - Apply filters early to reduce the result set
Leverage indexes - Ensure queries can use available indexes
Minimize function calls - Avoid functions on indexed columns in WHERE clauses
Use LIMIT for large result sets - Apply LIMIT to prevent excessive memory usage

-- Instead of this:
SELECT * FROM large_table WHERE status = 'active';

-- Do this:
SELECT id, name, created_at FROM large_table WHERE status = 'active' LIMIT 1000;

JOIN Optimization

Join order - Join smaller tables first when possible
Use appropriate join types - Choose INNER, LEFT, RIGHT joins as needed
Index join columns - Ensure columns used in join conditions are indexed
Consider denormalization - For critical queries, strategic denormalization may help

-- Ensure both user_id in orders and id in users are indexed
SELECT u.name, o.order_date FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed';

Aggregate Query Optimization

Filter before aggregating - Apply WHERE clauses before GROUP BY
Index GROUP BY columns - Ensure columns used in GROUP BY are indexed
Use HAVING efficiently - Apply HAVING only for conditions on aggregated results

-- Filter first, then aggregate
SELECT category_id, COUNT(*) FROM products
WHERE price > 100
GROUP BY category_id
HAVING COUNT(*) > 10;

Prepared Statements

Use prepared statements for repeated queries to leverage query caching:

// Prepare once, execute many times
let stmt = db.prepare("SELECT * FROM users WHERE id = $1")?;

for id in 1..=100 {
    for row in stmt.query((id,))? {
        let row = row?;
        // Process row...
    }
}

Parallel Execution

Oxibase’s parallel execution engine processes data using Rayon’s work-stealing scheduler:

Automatic Parallelization - Operations exceeding thresholds are automatically parallelized
Work-Stealing - Optimal load balancing across CPU cores
Configurable Thresholds - Filter (10K rows), Hash Join (5K), ORDER BY (50K), DISTINCT (10K)

Transaction Management

Keep transactions short - Long-running transactions can impact concurrency
Choose appropriate isolation level - Use the minimum isolation level needed
Batch operations - Group related operations within a single transaction
Handle conflicts - Implement retry logic for optimistic concurrency conflicts

-- Example of a focused transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

Memory Management

Buffer pool sizing - Configure buffer pool size based on available memory
Query memory limits - Set appropriate memory limits for complex queries
Monitor memory usage - Watch for excessive memory consumption

Bulk Operations

Use bulk operations for better performance:

-- Bulk insert example
INSERT INTO products (name, price, category_id) VALUES
  ('Product A', 10.99, 1),
  ('Product B', 15.99, 1),
  ('Product C', 20.99, 2);

Performance Monitoring

Oxibase provides several performance monitoring features:

EXPLAIN - Shows the query execution plan
EXPLAIN ANALYZE - Shows the plan with actual runtime statistics
Query timing - The CLI displays execution time for each query
Row counts - Result sets show the number of rows affected/returned

-- View query plan
EXPLAIN SELECT * FROM orders WHERE amount > 1000;

-- View plan with actual execution statistics
EXPLAIN ANALYZE SELECT * FROM products WHERE category = 'Electronics';

See EXPLAIN for detailed documentation.

Implementation-Specific Optimizations

Oxibase includes several specialized optimizations:

Parallel Processing

Oxibase uses Rayon for parallel query execution:

Parallel Filtering - Predicate evaluation across multiple threads
Parallel Aggregation - Concurrent aggregation operations with DashMap
Parallel Sorting - Multi-threaded sorting via par_sort_by()

Custom Data Structures

Oxibase uses specialized data structures for better performance:

DashMap - Concurrent hash tables for parallel operations
Efficient Index Structures - B-tree, Hash, and Bitmap indexes
Row Version Chains - MVCC implementation with version chains

Advanced Optimization Techniques

Expression Pushdown

Oxibase pushes down expressions to minimize data processing:

-- Filter and projection will be pushed down to the storage layer
SELECT name, price FROM products WHERE price > 100;

Join Algorithms

Oxibase selects among several join algorithms:

Hash Join - For equality joins with large tables
Merge Join - For pre-sorted data
Nested Loop Join - For small tables or when using indexes

Parallel Execution

Oxibase can execute some operations in parallel:

Parallel scans - Multiple segments scanned concurrently
Parallel aggregations - Divided work for faster aggregations
Concurrent index operations - Background index building

CTE and Subquery Optimization

Oxibase includes advanced optimizations for CTEs and subqueries:

Common Table Expressions (CTEs)

Single Materialization: CTEs are evaluated once and reused throughout the query
Efficient Storage: CTE results are stored efficiently in memory
Optimized Aggregations: Fast operations for COUNT, SUM, AVG, MIN, MAX

-- Efficient: CTE materializes once
WITH summary AS (
    SELECT region, product_id, SUM(amount) as total
    FROM sales
    WHERE year = 2024
    GROUP BY region, product_id
)
SELECT
    s1.region,
    COUNT(*) as product_count,
    SUM(s1.total) as region_total
FROM summary s1
GROUP BY s1.region;

Subquery Optimization

Hash-Based IN/NOT IN: Automatic conversion to hash lookups
Performance Gains: Up to 2048x faster for large IN lists
Memory Efficiency: O(1) lookups instead of O(n×m) comparisons

-- Automatically optimized with hash table
DELETE FROM orders
WHERE customer_id IN (
    SELECT id FROM customers
    WHERE last_order < '2024-01-01'
);

-- Scalar subqueries are optimized
SELECT name, salary,
    (SELECT AVG(salary) FROM employees) as company_avg
FROM employees;

Best Practices for CTEs and Subqueries

Use CTEs for Complex Queries: Break down complex logic into readable CTEs
Filter Early: Apply WHERE clauses in CTE definitions
Avoid Redundant Subqueries: Use CTEs when the same subquery is needed multiple times

Best Practices Summary

Design schema carefully - Choose appropriate data types and normalization level
Create targeted indexes - Index columns used in filters, joins, and sorts
Write optimized queries - Select only needed columns and filter early
Use prepared statements - Leverage the query cache for repeated queries
Manage transactions efficiently - Keep transactions short and focused
Monitor performance - Use query timing and row counts to identify bottlenecks
Rebuild indexes - Recreate indexes periodically for optimal performance
Consider bulk operations - Use bulk inserts and updates for better throughput
Leverage parallel execution - Structure operations to benefit from multi-threaded processing
Configure for your workload - Adjust memory settings based on your specific needs
Use CTEs for complex queries - Benefit from single materialization
Optimize subqueries - Leverage automatic hash-based IN/NOT IN optimization