Performance Optimization in Oxibase
This document provides guidelines and best practices for optimizing performance in Oxibase, including database design, query optimization, and system configuration.
Database Design Optimization
Table Design
- Choose appropriate data types - Use the smallest data type that can reliably store your data
- Normalize when appropriate - Balance normalization for data integrity with denormalization for query performance
- Use primary keys - Every table should have a primary key, preferably a simple integer
- Consider column order - Place frequently used columns first
- Limit column count - Tables with fewer columns generally perform better
Indexing Strategy
- Index selection - Create indexes on columns used in WHERE, JOIN, and ORDER BY clauses
- Multi-column indexes - Create multi-column indexes for frequently combined filters
- Index order matters - For multi-column indexes, place high-selectivity columns first
- Avoid over-indexing - Each index increases write overhead and storage requirements
- Monitor index usage - Periodically review which indexes are being used
-- Create an index on a frequently filtered column
CREATE INDEX idx_user_email ON users (email);
-- Create a multi-column index for common query patterns
CREATE INDEX idx_product_category_price ON products (category_id, price);
Query Optimization
Recent Performance Improvements
Oxibase includes significant performance optimizations:
- Hash-Based IN Subqueries - Up to 2048x faster for large IN lists
- Array-Based Row Storage - Eliminated map allocations throughout query execution
- Efficient Aggregations - Optimized operations for COUNT, SUM, AVG, MIN, MAX
- Parallel Execution - Rayon-based parallelism for filter, join, sort, and distinct operations
SELECT Statement Optimization
- Select only needed columns - Avoid
SELECT *when possible - Use WHERE clauses effectively - Apply filters early to reduce the result set
- Leverage indexes - Ensure queries can use available indexes
- Minimize function calls - Avoid functions on indexed columns in WHERE clauses
- Use LIMIT for large result sets - Apply LIMIT to prevent excessive memory usage
-- Instead of this:
SELECT * FROM large_table WHERE status = 'active';
-- Do this:
SELECT id, name, created_at FROM large_table WHERE status = 'active' LIMIT 1000;
JOIN Optimization
- Join order - Join smaller tables first when possible
- Use appropriate join types - Choose INNER, LEFT, RIGHT joins as needed
- Index join columns - Ensure columns used in join conditions are indexed
- Consider denormalization - For critical queries, strategic denormalization may help
-- Ensure both user_id in orders and id in users are indexed
SELECT u.name, o.order_date FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'completed';
Aggregate Query Optimization
- Filter before aggregating - Apply WHERE clauses before GROUP BY
- Index GROUP BY columns - Ensure columns used in GROUP BY are indexed
- Use HAVING efficiently - Apply HAVING only for conditions on aggregated results
-- Filter first, then aggregate
SELECT category_id, COUNT(*) FROM products
WHERE price > 100
GROUP BY category_id
HAVING COUNT(*) > 10;
Prepared Statements
Use prepared statements for repeated queries to leverage query caching:
// Prepare once, execute many times
let stmt = db.prepare("SELECT * FROM users WHERE id = $1")?;
for id in 1..=100 {
for row in stmt.query((id,))? {
let row = row?;
// Process row...
}
}
Parallel Execution
Oxibase’s parallel execution engine processes data using Rayon’s work-stealing scheduler:
- Automatic Parallelization - Operations exceeding thresholds are automatically parallelized
- Work-Stealing - Optimal load balancing across CPU cores
- Configurable Thresholds - Filter (10K rows), Hash Join (5K), ORDER BY (50K), DISTINCT (10K)
Transaction Management
- Keep transactions short - Long-running transactions can impact concurrency
- Choose appropriate isolation level - Use the minimum isolation level needed
- Batch operations - Group related operations within a single transaction
- Handle conflicts - Implement retry logic for optimistic concurrency conflicts
-- Example of a focused transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
Memory Management
- Buffer pool sizing - Configure buffer pool size based on available memory
- Query memory limits - Set appropriate memory limits for complex queries
- Monitor memory usage - Watch for excessive memory consumption
Bulk Operations
Use bulk operations for better performance:
-- Bulk insert example
INSERT INTO products (name, price, category_id) VALUES
('Product A', 10.99, 1),
('Product B', 15.99, 1),
('Product C', 20.99, 2);
Performance Monitoring
Oxibase provides several performance monitoring features:
- EXPLAIN - Shows the query execution plan
- EXPLAIN ANALYZE - Shows the plan with actual runtime statistics
- Query timing - The CLI displays execution time for each query
- Row counts - Result sets show the number of rows affected/returned
-- View query plan
EXPLAIN SELECT * FROM orders WHERE amount > 1000;
-- View plan with actual execution statistics
EXPLAIN ANALYZE SELECT * FROM products WHERE category = 'Electronics';
See EXPLAIN for detailed documentation.
Implementation-Specific Optimizations
Oxibase includes several specialized optimizations:
Parallel Processing
Oxibase uses Rayon for parallel query execution:
- Parallel Filtering - Predicate evaluation across multiple threads
- Parallel Aggregation - Concurrent aggregation operations with DashMap
- Parallel Sorting - Multi-threaded sorting via
par_sort_by()
Custom Data Structures
Oxibase uses specialized data structures for better performance:
- DashMap - Concurrent hash tables for parallel operations
- Efficient Index Structures - B-tree, Hash, and Bitmap indexes
- Row Version Chains - MVCC implementation with version chains
Advanced Optimization Techniques
Expression Pushdown
Oxibase pushes down expressions to minimize data processing:
-- Filter and projection will be pushed down to the storage layer
SELECT name, price FROM products WHERE price > 100;
Join Algorithms
Oxibase selects among several join algorithms:
- Hash Join - For equality joins with large tables
- Merge Join - For pre-sorted data
- Nested Loop Join - For small tables or when using indexes
Parallel Execution
Oxibase can execute some operations in parallel:
- Parallel scans - Multiple segments scanned concurrently
- Parallel aggregations - Divided work for faster aggregations
- Concurrent index operations - Background index building
CTE and Subquery Optimization
Oxibase includes advanced optimizations for CTEs and subqueries:
Common Table Expressions (CTEs)
- Single Materialization: CTEs are evaluated once and reused throughout the query
- Efficient Storage: CTE results are stored efficiently in memory
- Optimized Aggregations: Fast operations for COUNT, SUM, AVG, MIN, MAX
-- Efficient: CTE materializes once
WITH summary AS (
SELECT region, product_id, SUM(amount) as total
FROM sales
WHERE year = 2024
GROUP BY region, product_id
)
SELECT
s1.region,
COUNT(*) as product_count,
SUM(s1.total) as region_total
FROM summary s1
GROUP BY s1.region;
Subquery Optimization
- Hash-Based IN/NOT IN: Automatic conversion to hash lookups
- Performance Gains: Up to 2048x faster for large IN lists
- Memory Efficiency: O(1) lookups instead of O(n×m) comparisons
-- Automatically optimized with hash table
DELETE FROM orders
WHERE customer_id IN (
SELECT id FROM customers
WHERE last_order < '2024-01-01'
);
-- Scalar subqueries are optimized
SELECT name, salary,
(SELECT AVG(salary) FROM employees) as company_avg
FROM employees;
Best Practices for CTEs and Subqueries
- Use CTEs for Complex Queries: Break down complex logic into readable CTEs
- Filter Early: Apply WHERE clauses in CTE definitions
- Avoid Redundant Subqueries: Use CTEs when the same subquery is needed multiple times
Best Practices Summary
- Design schema carefully - Choose appropriate data types and normalization level
- Create targeted indexes - Index columns used in filters, joins, and sorts
- Write optimized queries - Select only needed columns and filter early
- Use prepared statements - Leverage the query cache for repeated queries
- Manage transactions efficiently - Keep transactions short and focused
- Monitor performance - Use query timing and row counts to identify bottlenecks
- Rebuild indexes - Recreate indexes periodically for optimal performance
- Consider bulk operations - Use bulk inserts and updates for better throughput
- Leverage parallel execution - Structure operations to benefit from multi-threaded processing
- Configure for your workload - Adjust memory settings based on your specific needs
- Use CTEs for complex queries - Benefit from single materialization
- Optimize subqueries - Leverage automatic hash-based IN/NOT IN optimization