Expression Pushdown
This document explains Oxibase’s expression pushdown optimization, which moves filtering operations closer to the data to improve query performance.
What is Expression Pushdown?
Expression pushdown is a query optimization technique that “pushes” filter expressions (typically from WHERE clauses) down to the lowest possible level in the query execution stack. This minimizes the amount of data that needs to be processed at higher levels of the execution engine.
Benefits of Expression Pushdown
Performance Improvements
- Reduced Data Processing - Filter data early to avoid unnecessary processing
- Reduced Memory Usage - Process only relevant data
- Improved Cache Efficiency - Better CPU cache utilization with smaller data sets
- Optimized I/O - Less data loaded from storage
- Improved Parallelism - Filter operations can be parallelized at lower levels
Measurable Impacts
Depending on the query, expression pushdown can provide:
- 10-100x reduction in processed data volume
- 2-10x improvement in query execution time
- Significant memory usage reduction
How Expression Pushdown Works in Oxibase
Pushdown Levels
Oxibase implements expression pushdown at multiple levels:
- Storage Level Pushdown - Expressions pushed directly to storage
- Index Level Pushdown - Expressions leveraging indexes (B-tree, Hash, Bitmap)
- Scan Level Pushdown - Expressions applied during table scanning
- Join Level Pushdown - Expressions pushed before or into joins
Note: Subquery pushdown is not yet implemented.
The Pushdown Process
When processing a query with filtering conditions:
- The SQL parser creates an abstract syntax tree (AST)
- The query analyzer examines filter expressions
- Pushdown eligibility is determined for each expression
- Expressions are transformed into a format suitable for lower levels
- Execution plans are modified to incorporate pushed-down expressions
- The storage engine applies filters directly when possible
Expression Types
Not all expressions can be pushed down. Oxibase categorizes expressions by pushdown eligibility:
Fully Pushable Expressions
These expressions can be pushed all the way to the storage layer:
- Simple Comparisons - Equality (=), inequality (!=, <>), greater/less than (>, <, >=, <=)
- Range Conditions - BETWEEN, IN
- Null Checks - IS NULL, IS NOT NULL
- String Patterns - LIKE (with limitations)
- Logical Operations - AND, OR, NOT
-- All filters can be pushed down to storage
SELECT * FROM orders WHERE
status = 'shipped' AND
order_date BETWEEN '2022-01-01' AND '2022-12-31' AND
customer_id IN (1, 2, 3);
Partially Pushable Expressions
These expressions can be partially pushed down:
- Simple Functions - ABS(), LOWER(), UPPER()
- Date/Time Functions - DATE_TRUNC(), TIME_TRUNC()
- Type Casts - CAST()
- Simple Arithmetic - +, -, *, /
-- The date_trunc can be pushed down, making this more efficient
SELECT * FROM orders WHERE
DATE_TRUNC('month', order_date) = DATE_TRUNC('month', NOW());
Non-Pushable Expressions
These typically cannot be pushed down:
- Complex Functions - User-defined functions, complex built-in functions
- Aggregates - SUM(), COUNT(), AVG() (though parts of the expression may be pushed)
- Window Functions - ROW_NUMBER(), RANK()
- Subquery References - Correlated subqueries
-- The aggregate function prevents full pushdown
SELECT * FROM orders WHERE
total > (SELECT AVG(total) FROM orders);
Implementation Details
Expression Translation
Expressions are translated from SQL syntax to storage-level predicates:
- Parser - Parses SQL into an abstract syntax tree
- Analyzer - Identifies pushable expressions
- Translator - Converts SQL expressions to storage predicates
- Optimizer - Determines optimal pushdown strategy
- Executor - Applies the pushdown during execution
Expression Evaluation System
Oxibase uses a compile-once, execute-many bytecode VM for expression evaluation:
graph LR
subgraph "Compilation Phase"
AST["Expression AST<br/>(parser/ast.rs)"]
Compiler["ExprCompiler<br/>(executor/expression/compiler.rs)"]
Program["Program<br/>(bytecode instructions)"]
end
subgraph "Execution Phase"
VM["ExprVM<br/>(executor/expression/vm.rs)"]
Stack["Value Stack"]
ExecuteCtx["ExecuteContext<br/>(row data, params)"]
end
subgraph "High-Level APIs"
ExprEval["ExpressionEval<br/>(owns Program + VM)"]
RowFilter["RowFilter<br/>(Arc Program + thread_local VM)"]
MultiExpr["MultiExpressionEval<br/>(Vec Program + shared VM)"]
end
subgraph "Function System"
FuncRegistry["FunctionRegistry<br/>(functions/registry.rs)"]
ScalarFuncs["ScalarFunction<br/>(101+ functions)"]
AggFuncs["AggregateFunction<br/>(COUNT, SUM, AVG, etc.)"]
WindowFuncs["WindowFunction<br/>(ROW_NUMBER, RANK, etc.)"]
end
AST -->|"compile"| Compiler
Compiler -->|"generates"| Program
Program -->|"Arc clone"| ExprEval
Program -->|"Arc clone"| RowFilter
Program -->|"clone per expr"| MultiExpr
ExprEval -->|"owns"| VM
RowFilter -->|"thread_local"| VM
MultiExpr -->|"owns"| VM
VM -->|"executes"| Program
ExecuteCtx -->|"provides"| VM
VM -->|"manipulates"| Stack
VM -->|"FunctionCall instruction"| FuncRegistry
FuncRegistry --> ScalarFuncs
FuncRegistry --> AggFuncs
FuncRegistry --> WindowFuncs
Key Design Decisions:
- Zero Recursion: Linear instruction sequences eliminate stack overflow risks
- Immutable Programs: Bytecode is
Arc-shared across threads for parallel execution - Thread-Safe Filtering:
RowFilteruses thread-local VMs for parallel table scans - Context Separation: Compilation context (schema) vs execution context (row data) enables pre-compilation
Instruction Set:
Load(index)- Load column value onto stackLoadConst(Value)- Load constant valueBinOp(Operator)- Execute binary operation (Add, Sub, Mul, Div, etc.)UnaryOp(Operator)- Execute unary operation (Neg, Not)FunctionCall(id, argc)- Call function from registryJumpIf(offset)- Conditional jump for CASE expressionsReturn- End execution, return top of stack
Specialized Expression Types
Oxibase implements specialized expression types for efficient pushdown:
- SimpleExpression - Basic comparison operations
- BetweenExpression - Range checks
- InListExpression - Set membership tests
- LogicalExpression - Combining expressions with AND/OR
- NullCheckExpression - NULL checks
- RangeExpression - Optimized range queries
- NotExpression - Logical negation
These expression types are implemented in src/storage/expression/.
Storage-Level Implementation
At the storage level, Oxibase implements optimized filtering:
- Index-Based Filtering - Filter operations using B-tree, Hash, and Bitmap indexes
- Parallel Evaluation - Multi-threaded predicate evaluation using Rayon
- Bitmap Results - Bitmap representation of matching positions
- Efficient Traversal - Row-based filtering with MVCC visibility checks
Pushdown Strategies
Filter Pushdown Levels
Oxibase implements pushdown at multiple architectural levels:
- Storage Level Pushdown - Expressions pushed directly to storage with MVCC visibility checks
- Index Level Pushdown - B-tree, Hash, and Bitmap indexes filter data during index scans
- Arena Level Pushdown - Zero-copy filtering during arena-based scans
- Join Level Pushdown - Filters applied before expensive join operations
Filter Pushdown
Basic filter pushdown moves WHERE conditions down:
-- Before pushdown (conceptual)
Table Scan (orders) → Filter (status = 'shipped') → Project
-- After pushdown (conceptual)
Table Scan with Filter (orders, status = 'shipped') → Project
Join Pushdown
Filters are pushed before or into joins:
-- Before pushdown (conceptual)
Table Scan (orders) → Join with customers → Filter (status = 'shipped')
-- After pushdown (conceptual)
Table Scan with Filter (orders, status = 'shipped') → Join with customers
Index Utilization
Pushed-down expressions leverage indexes:
-- When an index exists on status, this becomes:
Index Scan (orders_status_idx, 'shipped') → Project
Predicate Simplification
Expressions are simplified when possible:
-- Before simplification
WHERE age >= 18 AND age <= 65 AND age != 30
-- After simplification
WHERE age BETWEEN 18 AND 65 AND age != 30
LIMIT Pushdown
For queries with LIMIT clauses, pushdown can significantly reduce processing:
-- Without LIMIT pushdown: scan all rows, then limit
SELECT * FROM products WHERE category = 'electronics' LIMIT 10;
-- With LIMIT pushdown: scan only until 10 matching rows found
-- Uses index to find first 10 matches efficiently
TOP-N Pushdown
For ORDER BY + LIMIT queries, pushdown optimizes sorting:
-- Without TOP-N pushdown: scan all rows, sort, then limit
SELECT * FROM orders ORDER BY total DESC LIMIT 5;
-- With TOP-N pushdown: use heap to track top 5, scan efficiently
-- 50x+ performance improvement for large datasets
Performance Optimizations
Zero-Copy Scanning
The RowArena provides contiguous memory storage for efficient pushdown:
Performance characteristics:
- 50x+ speedup for full table scans vs. per-row cloning
- Pre-acquire locks once per scan (O(1) instead of O(N))
- Direct slice access via bounds-checked reads
- Cache locality from contiguous layout
Parallel Predicate Evaluation
Pushdown expressions are evaluated in parallel using Rayon:
Batch Operations:
| Operation | Traditional (Per-Row) | Batch API | Improvement |
|---|---|---|---|
| Filter 1000 rows | 1000 evaluations | 1 batch evaluation | ~10x reduction |
| Index scans | Sequential evaluation | Parallel chunks | CPU-bound speedup |
| Complex predicates | N evaluations | Parallel workers | Multi-core scaling |
Thread-Safe Row Filtering
RowFilter uses thread-local VMs for parallel table scans:
// Thread-local VM ensures no contention
let filter = RowFilter::new(predicate_program);
let matching_rows: Vec<Row> = rows.par_iter()
.filter(|row| filter.eval(row))
.collect();
Predicate Pushdown in Practice
Example: Complex WHERE clause pushdown:
SELECT * FROM orders
WHERE status IN ('shipped', 'delivered')
AND total > 100
AND customer_id BETWEEN 1000 AND 2000
AND created_at >= '2023-01-01';
Pushdown execution:
- Index intersection: Status IN uses Hash/Bitmap index, customer_id BETWEEN uses B-tree
- Predicate combination: AND conditions combined at storage level
- Parallel evaluation: Multiple threads evaluate the complex predicate
- Early termination: Results streamed as soon as available
Performance Considerations
When Pushdown Helps Most
Expression pushdown provides the greatest benefits when:
- High Selectivity - Filters eliminate a large percentage of rows
- Early Filtering - Filters can be applied before expensive operations
- Index Availability - Filters can use available indexes
- Complex Queries - Queries with joins, subqueries, or aggregations
Potential Limitations
Some scenarios may limit pushdown benefits:
- Low Selectivity Filters - If most rows match, pushdown may not help much
- Complex Expressions - Not all expressions can be pushed down
- Function-Based Filters - Functions on columns may prevent index usage
Query Examples
Simple Pushdown
-- Highly efficient: Pushes filter to storage and uses index if available
SELECT * FROM customers WHERE country = 'US';
Multiple Filter Pushdown
-- All conditions are pushed down and combined at the storage level
SELECT * FROM orders
WHERE status = 'shipped'
AND order_date > '2022-01-01'
AND total > 100;
Join Pushdown
-- Filters are pushed before the join
SELECT c.name, o.order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.country = 'US' AND o.status = 'shipped';
Function Pushdown
-- LOWER function can be pushed down
SELECT * FROM products
WHERE LOWER(name) LIKE '%organic%';
Implementation Details
Oxibase’s expression pushdown is implemented in several components:
- src/executor/query.rs - High-level pushdown decisions
- src/executor/planner.rs - Query planning with pushdown
- src/storage/expression/ - Specialized expression types
- src/storage/mvcc/table.rs - Storage-level filtering
Expression Compilation Architecture
Expressions are compiled once and executed many times:
graph LR
subgraph "SQL Parser"
SQL["SELECT * FROM t WHERE x > 5"]
AST["Abstract Syntax Tree"]
end
subgraph "Expression Compiler"
AST --> Compiler["ExprCompiler"]
Compiler --> Program["Bytecode Program<br/>(Arc-shared)"]
end
subgraph "Execution Contexts"
Single["ExpressionEval<br/>(single evaluation)"]
Filter["RowFilter<br/>(thread-local VM)"]
Multi["MultiExpressionEval<br/>(batch evaluation)"]
end
subgraph "VM Execution"
VM["ExprVM<br/>(stack-based)"]
Stack["Value Stack"]
Context["ExecuteContext<br/>(row + params)"]
end
Program --> Single
Program --> Filter
Program --> Multi
Single --> VM
Filter --> VM
Multi --> VM
VM --> Stack
Context --> VM
Pushdown Decision Logic
The optimizer determines pushdown eligibility:
fn can_pushdown(expr: &Expression) -> PushdownLevel {
match expr {
// Storage level - direct to MVCC
Expression::BinaryOp { op: Eq, .. } => PushdownLevel::Storage,
Expression::Between { .. } => PushdownLevel::Storage,
// Index level - leverages indexes
Expression::InList { .. } if has_index => PushdownLevel::Index,
// Scan level - during table iteration
Expression::FunctionCall { name: "lower", .. } => PushdownLevel::Scan,
// Cannot push down
Expression::Subquery { .. } => PushdownLevel::None,
_ => PushdownLevel::None,
}
}
Filter Pushdown Examples
Primary Key Lookup Pushdown:
-- Direct index lookup, O(1)
SELECT * FROM users WHERE id = 12345;
Execution: VersionStore::get_visible_version(12345, txn_id)
Index Scan Pushdown:
-- B-tree range scan with pushdown
SELECT * FROM orders WHERE total > 1000;
Execution: BTreeIndex::scan_range(1000.., txn_id)
Bitmap Index Pushdown:
-- Bitmap operations for low-cardinality filters
SELECT * FROM products WHERE category IN ('electronics', 'books');
Execution: BitmapIndex::bitmap_or(['electronics', 'books'])
Monitoring Pushdown Effectiveness
To verify pushdown is working:
-- View query execution plan
EXPLAIN SELECT * FROM orders WHERE status = 'shipped' AND total > 100;
-- Expected output shows pushdown usage
-- "Filter Pushdown: status = 'shipped' AND total > 100"
-- "Index Utilization: orders_status_idx (Hash), orders_total_idx (B-tree)"
Pushdown Limitations
Current Limitations:
- Subquery pushdown - Not yet implemented
- Complex aggregations - Limited pushdown support
- Window functions - No pushdown currently
- User-defined functions - Cannot be pushed down
Workarounds:
- Materialized views - Pre-compute complex expressions
- Query restructuring - Rewrite queries to enable pushdown
- Index design - Create indexes on computed columns
Best Practices
To maximize the benefits of expression pushdown:
- Use direct column references - Avoid functions on indexed columns in WHERE clauses
- Create appropriate indexes - Indexes enable better pushdown optimizations
- Use simple predicates - Simple expressions are more likely to be pushed down
- Monitor query performance - Use execution timing to verify optimization effectiveness
- Combine multiple conditions - AND conditions can be pushed down effectively
Advanced Techniques
Custom Expressions
Oxibase allows for specialized expression types:
-- Range expressions are highly optimized
SELECT * FROM time_series
WHERE timestamp BETWEEN '2022-01-01' AND '2022-01-31';
Combined Index and Expression Pushdown
When expressions match available indexes:
-- With a composite index on (status, order_date), this is very efficient
SELECT * FROM orders
WHERE status = 'shipped' AND order_date > '2022-01-01';
Function Index Pushdown
When functions are used in filtering:
-- If an index exists on LOWER(email), this pushdown is efficient
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';