Query Optimizer
Oxibase includes a sophisticated cost-based query optimizer that analyzes queries and chooses efficient execution plans. The optimizer uses table statistics, index information, and cost models to make intelligent decisions.
Overview
The query optimizer performs several tasks:
- Cost estimation: Calculates the expected cost of different execution strategies
- Access method selection: Chooses between table scans, index lookups, and primary key access
- Join ordering: Determines the optimal order to join multiple tables
- Join algorithm selection: Chooses Hash Join, Merge Join, or Nested Loop Join
- Predicate pushdown: Moves filters close to data sources
- Expression simplification: Optimizes boolean and arithmetic expressions
Optimization Phases
graph TD
SQL["SQL Query"]
Parse["Parser<br/>AST Generation"]
Analyze["Query Analysis<br/>• Collect predicates<br/>• Detect aggregations<br/>• Identify subqueries"]
subgraph "Optimization Phases"
Simplify["Expression Simplification<br/>ExpressionSimplifier"]
SubqueryOpt["Subquery Optimization<br/>• Cache non-correlated<br/>• Semi-join conversion<br/>• EXISTS to index probe"]
PredicateOpt["Predicate Optimization<br/>• Pushdown to storage<br/>• Partition for JOINs<br/>• Zone map pruning"]
IndexOpt["Index Selection<br/>• PK lookup detection<br/>• Single/multi-column<br/>• Type-based selection"]
JoinOpt["Join Optimization<br/>• AQE algorithm selection<br/>• Build side selection<br/>• Filter pushdown"]
LimitOpt["LIMIT Optimization<br/>• Storage pushdown<br/>• TOP-N heap<br/>• Early termination"]
end
Physical["Physical Plan<br/>ScannerResult pipeline"]
Execute["Execution<br/>Storage Layer"]
SQL --> Parse
Parse --> Analyze
Analyze --> Simplify
Simplify --> SubqueryOpt
SubqueryOpt --> PredicateOpt
PredicateOpt --> IndexOpt
IndexOpt --> JoinOpt
JoinOpt --> LimitOpt
LimitOpt --> Physical
Physical --> Execute
Cost Model
Cost Components
The optimizer considers two main cost components:
- I/O Cost: Reading data from storage
- Sequential page reads (table scans)
- Random page reads (index access)
- CPU Cost: Processing data
- Tuple processing
- Predicate evaluation
- Join operations
Access Methods
| Method | Use Case | Cost |
|---|---|---|
| Primary Key Lookup | WHERE pk = value | Lowest (O(1)) |
| Index Scan | WHERE indexed_col = value | Low (O(log n)) |
| Index Range Scan | WHERE indexed_col > value | Medium |
| Sequential Scan | No usable index | Highest (O(n)) |
Primary Key Lookup Flow
graph LR
Query["WHERE id = 42"]
Detect["try_pk_lookup<br/>Extract PK value"]
Storage["get_visible_version<br/>O(1) HashMap lookup"]
Result["Single Row"]
Query --> Detect
Detect -->|"pk_id = 42"| Storage
Storage --> Result
Single-Column Index Usage Flow
graph TD
Predicate["WHERE Predicate"]
Extract["collect_comparisons<br/>Extract column comparisons"]
subgraph "Index Selection"
CheckEq["Equality?<br/>col = value"]
CheckRange["Range?<br/>col > value"]
CheckIn["IN List?<br/>col IN (...)"]
CheckLike["Prefix LIKE?<br/>col LIKE 'abc%'"]
end
IndexLookup["Index Lookup"]
Results["Filtered row_ids"]
Predicate --> Extract
Extract --> CheckEq
Extract --> CheckRange
Extract --> CheckIn
Extract --> CheckLike
CheckEq -->|"get_row_ids_equal"| IndexLookup
CheckRange -->|"get_row_ids_in_range"| IndexLookup
CheckIn -->|"get_row_ids_in"| IndexLookup
CheckLike -->|"find_range (prefix scan)"| IndexLookup
IndexLookup --> Results
Example
-- The optimizer evaluates multiple strategies
SELECT * FROM users WHERE id = 123;
-- Strategy 1: Sequential scan - O(n) rows
-- Strategy 2: PK lookup - O(1)
-- Winner: PK lookup (much lower cost)
Table Statistics
The optimizer uses statistics collected by ANALYZE to make better decisions.
ANALYZE Command
-- Collect statistics for a table
ANALYZE users;
-- Statistics are stored in system tables
SELECT * FROM _sys_table_stats WHERE table_name = 'users';
SELECT * FROM _sys_column_stats WHERE table_name = 'users';
Statistics Collected
| Statistic | Description |
|---|---|
| Row count | Total number of rows in the table |
| Distinct count | Number of unique values per column |
| NULL count | Number of NULL values per column |
| Min/Max values | Range of values per column |
| Histograms | Distribution of values for selectivity estimation |
Selectivity Estimation
Statistics help estimate how selective predicates are:
-- Without statistics: assume 10% selectivity
-- With statistics: use actual data distribution
SELECT * FROM orders WHERE status = 'pending';
Join Optimization
Join Ordering
For queries joining multiple tables, the optimizer uses dynamic programming to find the optimal join nav_order:
-- The optimizer considers all possible join orders
SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id;
-- Possible orders:
-- 1. (orders JOIN customers) JOIN products
-- 2. (orders JOIN products) JOIN customers
-- 3. (customers JOIN orders) JOIN products
-- etc.
Join Algorithms
The optimizer selects the best join algorithm:
| Algorithm | Best For | Requirements |
|---|---|---|
| Hash Join | Large tables, equality joins | Memory for hash table |
| Merge Join | Pre-sorted inputs | Sorted data |
| Nested Loop | Small tables, any join condition | None |
Semi-Join Optimization
For EXISTS and IN subqueries, the optimizer may use semi-join:
-- Can use semi-join optimization
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
Adaptive Query Execution
Oxibase implements Adaptive Query Execution (AQE), which can adjust the execution plan at runtime based on actual data characteristics.
How AQE Works
- Initial Plan: Optimizer creates plan using statistics
- Runtime Monitoring: Actual row counts are tracked during execution
- Plan Adjustment: If actual differs significantly from estimated, the plan may be adjusted
Cardinality Feedback
The optimizer learns from query execution:
-- First execution: uses estimated cardinality
SELECT * FROM orders WHERE amount > 1000;
-- Subsequent executions: uses feedback from actual row counts
-- This improves future estimates for similar queries
Bloom Filter Propagation
For join-heavy queries, the optimizer can use Bloom filters to reduce data movement:
-- Bloom filter can pre-filter orders before join
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'US';
The filter on customers.region creates a Bloom filter that filters orders early.
Expression Simplification
The optimizer simplifies expressions when possible:
Constant Folding
-- Before: SELECT * FROM t WHERE x > 1 + 1
-- After: SELECT * FROM t WHERE x > 2
Boolean Optimization
-- Before: SELECT * FROM t WHERE true AND x > 5
-- After: SELECT * FROM t WHERE x > 5
-- Before: SELECT * FROM t WHERE false OR x > 5
-- After: SELECT * FROM t WHERE x > 5
Predicate Merging
-- Before: WHERE x > 5 AND x > 10
-- After: WHERE x > 10
-- Before: WHERE x = 5 AND x = 5
-- After: WHERE x = 5
Filter Pushdown to Storage
graph TD
WHERE["WHERE age > 30<br/>AND status = 'active'"]
subgraph "Compilation"
Parse["Parse to Expression AST"]
Compile["RowFilter::new<br/>Compile to bytecode"]
Program["Arc<Program><br/>Shared bytecode"]
end
subgraph "Storage Layer"
Arena["Arena Read<br/>get_all_visible_rows_filtered"]
Filter["CompiledFilter::matches<br/>Eliminate virtual dispatch"]
Collect["Collect matching rows"]
end
WHERE --> Parse
Parse --> Compile
Compile --> Program
Program --> Arena
Arena --> Filter
Filter --> Collect
Collect --> Results["Filtered Results<br/>3-5x faster"]
LIMIT and TOP-N Optimization
LIMIT Pushdown
For queries with LIMIT but no ORDER BY, the optimizer enables true early termination by pushing LIMIT directly to the storage layer:
graph TD
Query["SELECT * FROM users<br/>LIMIT 100"]
Check["Has ORDER BY?"]
Ordered["Sorted LIMIT<br/>get_visible_rows_with_limit<br/>• Scan all rows<br/>• Sort by row_id<br/>• Take first 100"]
Unordered["Unordered LIMIT<br/>get_visible_rows_with_limit_unordered<br/>• Early termination<br/>• Stop after 100 rows<br/>• 50x faster"]
Query --> Check
Check -->|"No"| Unordered
Check -->|"Yes"| Ordered
TOP-N Heap Optimization
For ORDER BY ... LIMIT N queries, the optimizer uses a bounded heap instead of full sorting:
graph TD
Query["SELECT * FROM users<br/>ORDER BY age DESC<br/>LIMIT 10"]
Detect["Detect ORDER BY + LIMIT"]
TopN["TopNResult<br/>Bounded heap of size 10"]
subgraph "Processing"
Scan["Scan rows"]
Compare["Compare with heap top"]
Insert["Insert if smaller<br/>Evict largest"]
end
Query --> Detect
Detect --> TopN
TopN --> Scan
Scan --> Compare
Compare -->|"Better than top"| Insert
Insert --> Scan
Compare -->|"Worse than top"| Skip["Skip row"]
Skip --> Scan
Scan --> Results["10 best rows<br/>O(n log k) vs O(n log n)<br/>5-50x faster"]
Viewing Query Plans
EXPLAIN
Shows the planned execution strategy:
EXPLAIN SELECT * FROM users WHERE id = 1;
Output:
plan
----
SELECT
Columns: *
-> PK Lookup on users
id = 1
EXPLAIN ANALYZE
Shows the plan with actual runtime statistics:
EXPLAIN ANALYZE SELECT * FROM users WHERE status = 'active';
Output:
plan
----
SELECT (actual time=2.5ms, rows=1500)
Columns: *
-> Seq Scan on users (actual rows=1500)
Filter: (status = 'active')
Best Practices
Keep Statistics Updated
Run ANALYZE after significant data changes:
-- After bulk insert
INSERT INTO orders SELECT * FROM staging_orders;
ANALYZE orders;
Create Appropriate Indexes
The optimizer can only use indexes that exist:
-- Create index for common query patterns
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
Use EXPLAIN to Understand Plans
Before optimizing, understand current behavior:
-- Check what plan is being used
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;
Consider Query Patterns
Design indexes based on how queries filter and join:
-- If queries often filter by status and date together
CREATE INDEX idx_orders_status_date ON orders(status, order_date);
Cost Constants
The optimizer uses these tunable constants:
| Constant | Description | Default |
|---|---|---|
| cpu_tuple_cost | Cost to process one row | 0.01 |
| cpu_operator_cost | Cost to evaluate one predicate | 0.0025 |
| seq_page_cost | Cost for sequential I/O | 1.0 |
| random_page_cost | Cost for random I/O | 2.0 |
| pk_lookup_cost | Cost for primary key lookup | 0.1 |
These values are tuned for in-memory operation and may differ from disk-based databases.