Query Optimizer

Oxibase includes a sophisticated cost-based query optimizer that analyzes queries and chooses efficient execution plans. The optimizer uses table statistics, index information, and cost models to make intelligent decisions.

Overview

The query optimizer performs several tasks:

Cost estimation: Calculates the expected cost of different execution strategies
Access method selection: Chooses between table scans, index lookups, and primary key access
Join ordering: Determines the optimal order to join multiple tables
Join algorithm selection: Chooses Hash Join, Merge Join, or Nested Loop Join
Predicate pushdown: Moves filters close to data sources
Expression simplification: Optimizes boolean and arithmetic expressions

Optimization Phases

graph TD
    SQL["SQL Query"]
    Parse["Parser<br/>AST Generation"]
    Analyze["Query Analysis<br/>• Collect predicates<br/>• Detect aggregations<br/>• Identify subqueries"]
    
    subgraph "Optimization Phases"
        Simplify["Expression Simplification<br/>ExpressionSimplifier"]
        SubqueryOpt["Subquery Optimization<br/>• Cache non-correlated<br/>• Semi-join conversion<br/>• EXISTS to index probe"]
        PredicateOpt["Predicate Optimization<br/>• Pushdown to storage<br/>• Partition for JOINs<br/>• Zone map pruning"]
        IndexOpt["Index Selection<br/>• PK lookup detection<br/>• Single/multi-column<br/>• Type-based selection"]
        JoinOpt["Join Optimization<br/>• AQE algorithm selection<br/>• Build side selection<br/>• Filter pushdown"]
        LimitOpt["LIMIT Optimization<br/>• Storage pushdown<br/>• TOP-N heap<br/>• Early termination"]
    end
    
    Physical["Physical Plan<br/>ScannerResult pipeline"]
    Execute["Execution<br/>Storage Layer"]
    
    SQL --> Parse
    Parse --> Analyze
    Analyze --> Simplify
    Simplify --> SubqueryOpt
    SubqueryOpt --> PredicateOpt
    PredicateOpt --> IndexOpt
    IndexOpt --> JoinOpt
    JoinOpt --> LimitOpt
    LimitOpt --> Physical
    Physical --> Execute

Cost Model

Cost Components

The optimizer considers two main cost components:

I/O Cost: Reading data from storage
- Sequential page reads (table scans)
- Random page reads (index access)
CPU Cost: Processing data
- Tuple processing
- Predicate evaluation
- Join operations

Access Methods

Method	Use Case	Cost
Primary Key Lookup	WHERE pk = value	Lowest (O(1))
Index Scan	WHERE indexed_col = value	Low (O(log n))
Index Range Scan	WHERE indexed_col > value	Medium
Sequential Scan	No usable index	Highest (O(n))

Primary Key Lookup Flow

graph LR
    Query["WHERE id = 42"]
    Detect["try_pk_lookup<br/>Extract PK value"]
    Storage["get_visible_version<br/>O(1) HashMap lookup"]
    Result["Single Row"]
    
    Query --> Detect
    Detect -->|"pk_id = 42"| Storage
    Storage --> Result

Single-Column Index Usage Flow

graph TD
    Predicate["WHERE Predicate"]
    Extract["collect_comparisons<br/>Extract column comparisons"]
    
    subgraph "Index Selection"
        CheckEq["Equality?<br/>col = value"]
        CheckRange["Range?<br/>col > value"]
        CheckIn["IN List?<br/>col IN (...)"]
        CheckLike["Prefix LIKE?<br/>col LIKE 'abc%'"]
    end
    
    IndexLookup["Index Lookup"]
    Results["Filtered row_ids"]
    
    Predicate --> Extract
    Extract --> CheckEq
    Extract --> CheckRange
    Extract --> CheckIn
    Extract --> CheckLike
    
    CheckEq -->|"get_row_ids_equal"| IndexLookup
    CheckRange -->|"get_row_ids_in_range"| IndexLookup
    CheckIn -->|"get_row_ids_in"| IndexLookup
    CheckLike -->|"find_range (prefix scan)"| IndexLookup
    
    IndexLookup --> Results

Example

-- The optimizer evaluates multiple strategies
SELECT * FROM users WHERE id = 123;

-- Strategy 1: Sequential scan - O(n) rows
-- Strategy 2: PK lookup - O(1)
-- Winner: PK lookup (much lower cost)

Table Statistics

The optimizer uses statistics collected by ANALYZE to make better decisions.

ANALYZE Command

-- Collect statistics for a table
ANALYZE users;

-- Statistics are stored in system tables
SELECT * FROM _sys_table_stats WHERE table_name = 'users';
SELECT * FROM _sys_column_stats WHERE table_name = 'users';

Statistics Collected

Statistic	Description
Row count	Total number of rows in the table
Distinct count	Number of unique values per column
NULL count	Number of NULL values per column
Min/Max values	Range of values per column
Histograms	Distribution of values for selectivity estimation

Selectivity Estimation

Statistics help estimate how selective predicates are:

-- Without statistics: assume 10% selectivity
-- With statistics: use actual data distribution
SELECT * FROM orders WHERE status = 'pending';

Join Optimization

Join Ordering

For queries joining multiple tables, the optimizer uses dynamic programming to find the optimal join nav_order:

-- The optimizer considers all possible join orders
SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id;

-- Possible orders:
-- 1. (orders JOIN customers) JOIN products
-- 2. (orders JOIN products) JOIN customers
-- 3. (customers JOIN orders) JOIN products
-- etc.

Join Algorithms

The optimizer selects the best join algorithm:

Algorithm	Best For	Requirements
Hash Join	Large tables, equality joins	Memory for hash table
Merge Join	Pre-sorted inputs	Sorted data
Nested Loop	Small tables, any join condition	None

Semi-Join Optimization

For EXISTS and IN subqueries, the optimizer may use semi-join:

-- Can use semi-join optimization
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

Adaptive Query Execution

Oxibase implements Adaptive Query Execution (AQE), which can adjust the execution plan at runtime based on actual data characteristics.

How AQE Works

Initial Plan: Optimizer creates plan using statistics
Runtime Monitoring: Actual row counts are tracked during execution
Plan Adjustment: If actual differs significantly from estimated, the plan may be adjusted

Cardinality Feedback

The optimizer learns from query execution:

-- First execution: uses estimated cardinality
SELECT * FROM orders WHERE amount > 1000;

-- Subsequent executions: uses feedback from actual row counts
-- This improves future estimates for similar queries

Bloom Filter Propagation

For join-heavy queries, the optimizer can use Bloom filters to reduce data movement:

-- Bloom filter can pre-filter orders before join
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'US';

The filter on customers.region creates a Bloom filter that filters orders early.

Expression Simplification

The optimizer simplifies expressions when possible:

Constant Folding

-- Before: SELECT * FROM t WHERE x > 1 + 1
-- After:  SELECT * FROM t WHERE x > 2

Boolean Optimization

-- Before: SELECT * FROM t WHERE true AND x > 5
-- After:  SELECT * FROM t WHERE x > 5

-- Before: SELECT * FROM t WHERE false OR x > 5
-- After:  SELECT * FROM t WHERE x > 5

Predicate Merging

-- Before: WHERE x > 5 AND x > 10
-- After:  WHERE x > 10

-- Before: WHERE x = 5 AND x = 5
-- After:  WHERE x = 5

Filter Pushdown to Storage

graph TD
    WHERE["WHERE age > 30<br/>AND status = 'active'"]
    
    subgraph "Compilation"
        Parse["Parse to Expression AST"]
        Compile["RowFilter::new<br/>Compile to bytecode"]
        Program["Arc<Program><br/>Shared bytecode"]
    end
    
    subgraph "Storage Layer"
        Arena["Arena Read<br/>get_all_visible_rows_filtered"]
        Filter["CompiledFilter::matches<br/>Eliminate virtual dispatch"]
        Collect["Collect matching rows"]
    end
    
    WHERE --> Parse
    Parse --> Compile
    Compile --> Program
    Program --> Arena
    Arena --> Filter
    Filter --> Collect
    Collect --> Results["Filtered Results<br/>3-5x faster"]

LIMIT and TOP-N Optimization

LIMIT Pushdown

For queries with LIMIT but no ORDER BY, the optimizer enables true early termination by pushing LIMIT directly to the storage layer:

graph TD
    Query["SELECT * FROM users<br/>LIMIT 100"]
    
    Check["Has ORDER BY?"]
    
    Ordered["Sorted LIMIT<br/>get_visible_rows_with_limit<br/>• Scan all rows<br/>• Sort by row_id<br/>• Take first 100"]
    
    Unordered["Unordered LIMIT<br/>get_visible_rows_with_limit_unordered<br/>• Early termination<br/>• Stop after 100 rows<br/>• 50x faster"]
    
    Query --> Check
    Check -->|"No"| Unordered
    Check -->|"Yes"| Ordered

TOP-N Heap Optimization

For ORDER BY ... LIMIT N queries, the optimizer uses a bounded heap instead of full sorting:

graph TD
    Query["SELECT * FROM users<br/>ORDER BY age DESC<br/>LIMIT 10"]
    
    Detect["Detect ORDER BY + LIMIT"]
    
    TopN["TopNResult<br/>Bounded heap of size 10"]
    
    subgraph "Processing"
        Scan["Scan rows"]
        Compare["Compare with heap top"]
        Insert["Insert if smaller<br/>Evict largest"]
    end
    
    Query --> Detect
    Detect --> TopN
    TopN --> Scan
    Scan --> Compare
    Compare -->|"Better than top"| Insert
    Insert --> Scan
    
    Compare -->|"Worse than top"| Skip["Skip row"]
    Skip --> Scan
    
    Scan --> Results["10 best rows<br/>O(n log k) vs O(n log n)<br/>5-50x faster"]

Viewing Query Plans

EXPLAIN

Shows the planned execution strategy:

EXPLAIN SELECT * FROM users WHERE id = 1;

Output:

plan
----
SELECT
  Columns: *
  -> PK Lookup on users
       id = 1

EXPLAIN ANALYZE

Shows the plan with actual runtime statistics:

EXPLAIN ANALYZE SELECT * FROM users WHERE status = 'active';

Output:

plan
----
SELECT (actual time=2.5ms, rows=1500)
  Columns: *
  -> Seq Scan on users (actual rows=1500)
       Filter: (status = 'active')

Best Practices

Keep Statistics Updated

Run ANALYZE after significant data changes:

-- After bulk insert
INSERT INTO orders SELECT * FROM staging_orders;
ANALYZE orders;

Create Appropriate Indexes

The optimizer can only use indexes that exist:

-- Create index for common query patterns
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

Use EXPLAIN to Understand Plans

Before optimizing, understand current behavior:

-- Check what plan is being used
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;

Consider Query Patterns

Design indexes based on how queries filter and join:

-- If queries often filter by status and date together
CREATE INDEX idx_orders_status_date ON orders(status, order_date);

Cost Constants

The optimizer uses these tunable constants:

Constant	Description	Default
cpu_tuple_cost	Cost to process one row	0.01
cpu_operator_cost	Cost to evaluate one predicate	0.0025
seq_page_cost	Cost for sequential I/O	1.0
random_page_cost	Cost for random I/O	2.0
pk_lookup_cost	Cost for primary key lookup	0.1

These values are tuned for in-memory operation and may differ from disk-based databases.