Kinesis DB: Technical Documentation
This document provides a deep technical overview of the Kinesis database layer, focusing on its architecture, core modules, transaction management, persistence, and extensibility. It is intended for developers working on or extending the database internals.
Architecture Overview
The Kinesis database layer is designed as a modular, transactional, and persistent storage engine supporting multiple backends:
- InMemory: Purely in-memory, non-persistent.
- OnDisk: Fully persistent, disk-based.
- Hybrid: Combines in-memory caching with disk persistence for performance.
The main entry point is the DBEngine
, which orchestrates all database operations, transaction management, and persistence.
Core Modules
1. database/engine.rs
(DBEngine
)
- Central orchestrator for all database operations.
- Manages:
- In-memory state (
Database
) - Disk persistence (via
PageStore
,BufferPool
,BlobStore
) - Transaction lifecycle (
TransactionManager
) - Write-Ahead Logging (
WriteAheadLog
) - Table schemas and schema migrations
- Optional secondary indexes (
IndexManager
)
- In-memory state (
Key Responsibilities
- Transaction Management: Begin, commit, rollback, and validate transactions.
- Record Operations: Insert, update, delete, and search records with schema validation.
- Persistence: Save/load state to/from disk, manage page allocation, and handle large data via blobs.
- Crash Recovery: Replay WAL and restore consistent state after failures.
- Schema Evolution: Support for schema updates with migration and validation.
2. database/database.rs
(Database
)
- Represents the in-memory state of all tables.
- Each table is a
Table
with its own schema, records, and next record ID.
3. database/table.rs
(Table
)
- Stores records as a map of record ID to
Record
. - Enforces schema constraints and manages record versioning.
4. database/record.rs
(Record
)
- Represents a single row in a table.
- Contains:
id
: Unique identifiervalues
: Map of field name toValueType
version
andtimestamp
for MVCC and concurrency control
5. database/schema.rs
(TableSchema
, FieldConstraint
)
- Defines table structure, field types, constraints (required, unique, min/max, pattern), and default values.
- Supports schema validation and migration logic.
6. database/value_type.rs
(ValueType
, StringValue
)
- Strongly-typed representation of all supported field types (Int, Float, String, Bool, etc.).
StringValue
supports both inline and blob-referenced storage for large strings.
7. storage/page_store.rs
, storage/buffer_pool.rs
, storage/blob_store.rs
- PageStore: Manages allocation, reading, and writing of fixed-size pages on disk.
- BufferPool: In-memory cache for pages, with LRU eviction.
- BlobStore: Efficient storage for large binary/string data, with reference counting and garbage collection.
8. storage/wal.rs
(WriteAheadLog
)
- Ensures durability and crash recovery by logging all transactional changes before commit.
9. transaction/manager.rs
, transaction/transaction.rs
- TransactionManager: Tracks active transactions, locks, timeouts, and deadlock detection.
- Transaction: Encapsulates all pending changes, isolation level, and MVCC snapshot.
Key Functions and Internal Workings
Loading the Database (DBEngine::load_from_disk
)
- Purpose: Loads the database state from disk into memory.
- How it works:
- Reads the Table of Contents (TOC) from the first page using the buffer pool and page store.
- Deserializes table locations, schemas, and next record IDs.
- For each table, loads all records from their page chains (handling both old and new page formats, including overflow pages).
- Reconstructs in-memory
Table
objects and inserts them into the database. - Skips loading for
InMemory
databases.
Saving the Database (DBEngine::save_to_disk
)
- Purpose: Persists the current in-memory state to disk.
- How it works:
- Iterates over all tables, serializes their records in batches.
- Writes each batch to disk using the page store, allocating new pages as needed.
- Updates the TOC with new page locations, schemas, and next IDs.
- Flushes all pages via the buffer pool and syncs the page store to ensure durability.
- Returns a checksum of the serialized state for verification.
Committing Transactions (DBEngine::commit
and commit_internal
)
- Purpose: Atomically applies all changes in a transaction, ensuring ACID properties.
- How it works:
- Validates the transaction for isolation level conflicts (e.g., repeatable read, serializable).
- Acquires necessary locks for all records being written.
- Applies all staged changes (inserts, updates, deletes, schema changes) to the in-memory database.
- Logs the transaction to the WAL (unless it's a blob operation or in-memory DB).
- Persists changes to disk (if not in-memory).
- Releases locks and ends the transaction.
- On failure, rolls back all changes using the transaction's snapshot.
Rolling Back Transactions (DBEngine::rollback
)
- Purpose: Reverts all changes made by a transaction if commit fails or is aborted.
- How it works:
- Restores original state for all modified records using the transaction's snapshot.
- Releases any acquired locks.
- Cleans up any staged blob references.
Transaction Lifecycle
- Begin:
begin_transaction
creates a new transaction, optionally with a snapshot for MVCC. - Read/Write: All record operations are staged in the transaction's pending sets.
- Commit: See above.
- Rollback: See above.
Table and Record Operations
- Creating Tables:
create_table
orcreate_table_with_schema
adds a new table definition to the transaction's pending creates. - Inserting Records:
insert_record
validates the record against the schema, handles large strings (blobs), and stages the insert. - Updating Records:
update_record
validates updates, manages blob references, and stages the update. - Deleting Records:
delete_record
stages the deletion and tracks any blob references for cleanup.
Blob Storage
- Large strings are stored in the
BlobStore
if they exceed a threshold. - Records store a reference to the blob key.
- On update or delete, old blob references are cleaned up.
- Blob store is synced to disk after changes.
Write-Ahead Log (WAL)
- Purpose: Ensures durability and crash recovery.
- How it works:
- All transactional changes are serialized and appended to the WAL before being applied.
- On startup, the WAL is replayed to recover any committed but not yet persisted transactions.
Crash Recovery (DBEngine::recover_from_crash
)
- Purpose: Restores a consistent state after a crash.
- How it works:
- Loads the database from disk.
- Loads and replays all valid transactions from the WAL.
- Applies each transaction and persists the state after each.
Schema Management
- Schema Validation: All record operations are validated against the current table schema.
- Schema Migration:
update_table_schema
stages a schema update, which is validated for compatibility and applied atomically.
Transaction Types and Isolation Levels
Kinesis supports four transaction isolation levels, each with distinct semantics and internal handling:
1. ReadUncommitted
- Behavior: Transactions can see uncommitted changes from other transactions ("dirty reads").
- Implementation: No snapshot is taken. Reads always reflect the latest in-memory state, regardless of commit status.
- Use Case: Maximum performance, minimal isolation. Rarely recommended except for analytics or non-critical reads.
2. ReadCommitted
- Behavior: Transactions only see data that has been committed by other transactions.
- Implementation:
- No snapshot is taken.
- On each read, the engine checks the committed state of the record.
- During commit, the engine validates that records read have not changed since they were read (prevents "non-repeatable reads").
- Use Case: Standard for many OLTP systems, balances consistency and concurrency.
3. RepeatableRead
- Behavior: All reads within a transaction see a consistent snapshot of the database as of the transaction's start.
- Implementation:
- A full snapshot of the database is taken at transaction start.
- Reads are served from the snapshot.
- On commit, the engine validates that no records in the read set have changed since the snapshot.
- Prevents non-repeatable reads and phantom reads (to the extent possible without full serializability).
- Use Case: Applications requiring strong consistency for reads within a transaction.
4. Serializable
- Behavior: Transactions are fully isolated as if executed serially.
- Implementation:
- Like RepeatableRead, but with additional validation:
- Checks for write-write conflicts.
- Ensures no new records (phantoms) have appeared in the range of interest.
- Validates that the snapshot and current state are equivalent for all read and written data.
- May abort transactions if conflicts are detected.
- Like RepeatableRead, but with additional validation:
- Use Case: Highest level of isolation, required for financial or critical systems.
Transaction Internals
Transaction Structure
Each transaction (Transaction
) tracks:
- ID: Unique identifier.
- Isolation Level: Determines snapshot and validation logic.
- Snapshot: Optional, for higher isolation levels.
- Read Set: Records read (table, id, version).
- Write Set: Records written (table, id).
- Pending Operations: Inserts, updates, deletes, schema changes.
- Metadata: For tracking blob references and other auxiliary data.
Transaction Flow
- Begin:
DBEngine::begin_transaction
creates a new transaction, possibly with a snapshot. - Read: Reads are tracked in the read set. For RepeatableRead/Serializable, reads come from the snapshot.
- Write: Writes are staged in the write set and pending operations.
- Validation: On commit, the engine validates the transaction according to its isolation level.
- Commit: If validation passes, changes are applied atomically.
- Rollback: If validation fails or an error occurs, all changes are reverted.
Locking and Deadlock Detection
- Locks: Acquired at the record level for writes. Managed by
TransactionManager
. - Deadlock Detection: Periodically checks for cycles in the lock graph. If detected, aborts one or more transactions.
- Timeouts: Transactions can be configured to expire after a set duration.
Detailed Function Descriptions
DBEngine::begin_transaction
- Purpose: Starts a new transaction.
- Details: Assigns a unique ID, sets the isolation level, and optionally takes a snapshot of the database for higher isolation.
DBEngine::commit
- Purpose: Commits a transaction, applying all staged changes.
- Details:
- Validates the transaction (see above).
- Acquires all necessary locks.
- Applies inserts, updates, deletes, and schema changes.
- Logs the transaction to the WAL (unless skipped for blob ops).
- Persists changes to disk.
- Releases locks and ends the transaction.
DBEngine::rollback
- Purpose: Rolls back a transaction, reverting all staged changes.
- Details: Uses the transaction's snapshot and pending operations to restore the previous state.
DBEngine::insert_record
- Purpose: Stages a new record for insertion.
- Details: Validates against schema, handles large strings (blobs), and adds to the transaction's pending inserts.
DBEngine::update_record
- Purpose: Stages updates to an existing record.
- Details: Validates updates, manages blob references, and adds to pending updates.
DBEngine::delete_record
- Purpose: Stages a record for deletion.
- Details: Tracks blob references for cleanup and adds to pending deletes.
DBEngine::search_records
- Purpose: Searches for records matching a query string.
- Details: Supports case-insensitive search, uses the appropriate snapshot or committed state based on isolation level.
DBEngine::create_table_with_schema
- Purpose: Stages creation of a new table with a specified schema.
- Details: Adds to the transaction's pending table creates.
DBEngine::update_table_schema
- Purpose: Stages a schema update for a table.
- Details: Validates that the migration is safe and adds to pending schema updates.
DBEngine::load_from_disk
- Purpose: Loads the database state from disk.
- Details: Reads the TOC, loads all tables and records, reconstructs in-memory state.
DBEngine::save_to_disk
- Purpose: Persists the current state to disk.
- Details: Serializes all tables and records, writes to page store, updates TOC, flushes and syncs.
DBEngine::recover_from_crash
- Purpose: Restores the database to a consistent state after a crash.
- Details: Loads from disk, replays WAL, applies all valid transactions, persists state.
DBEngine::compact_database
- Purpose: Compacts the database file, removing unused pages and reorganizing data.
- Details: Writes all live data to a new file, replaces the old file, and reopens the page store.
REPL Shell (REPL
)
- Purpose: Interactive shell for database commands.
- Supported Commands:
CREATE_TABLE <name> ...
INSERT INTO <table> ...
UPDATE <table> ...
DELETE FROM <table> ...
SEARCH_RECORDS FROM <table> MATCH <query>
GET_RECORDS FROM <table>
DROP_TABLE <name>
ALTER_TABLE <name> ...
- Features:
- Supports output formats: standard, table, JSON, etc.
- Handles errors and transaction boundaries per command.
- Useful for development, testing, and demos.
Error Handling & Recovery
- All disk and transactional errors are surfaced as
Result<T, String>
. - On commit failure, a rollback is automatically attempted.
- On startup, the engine attempts recovery using the WAL and disk state.
Extensibility
- Indexes: Optional secondary indexes can be enabled via environment variable.
- Custom Field Types: Extend
ValueType
and update schema validation logic. - Storage Backends: Implement new
DatabaseType
variants and adaptDBEngine
initialization.
Testing
- Extensive test suite under
src/database/tests/
covers:- Basic operations
- Transactional semantics
- Persistence and recovery
- Schema validation and migration
- Large data and overflow page handling
- Concurrency and rollback
Developer Notes
- CommitGuard: RAII pattern to ensure rollback on drop if commit fails.
- Isolation Levels: Each level has custom validation logic; review
validate_transaction
. - BlobStore: Always clean up blob references on update/delete.
- Buffer Pool: Tune via
DB_BUFFER_POOL_SIZE
env variable. - Logging: Enable for debugging concurrency, persistence, or recovery issues.
- Schema Evolution: Use
can_migrate_from
to validate safe schema changes.
Advanced Topics and Additional Details
Buffer Pool and Page Management
- BufferPool: Implements an LRU cache for disk pages, reducing disk I/O and improving performance.
- Pages are pinned/unpinned as they are accessed and modified.
- Eviction policy ensures hot pages remain in memory.
- Buffer pool size is configurable via environment variable.
- PageStore: Handles allocation, reading, writing, and freeing of fixed-size pages on disk.
- Supports overflow pages for large records or batches.
- Ensures atomicity and durability of page writes.
Large Data and Blob Handling
- BlobStore: Used for storing large strings or binary data that exceed the inline threshold.
- Data is stored in separate files with reference counting.
- Blob references are cleaned up on record deletion or update.
- BlobStore is memory-mapped for efficient access and syncs to disk after changes.
- Blob index file tracks all blob keys and their locations.
Indexing (Optional)
- IndexManager: If enabled, maintains secondary indexes for fast lookups.
- Indexes are updated on insert, update, and delete.
- Can be extended to support custom index types or full-text search.
Schema Evolution and Migration
- Schema Versioning: Each table schema has a version number.
- Migration Safety:
can_migrate_from
checks for safe migrations (e.g., prevents dropping required fields without defaults, or changing types incompatibly). - Default Values: New required fields can be added if a default is provided; existing records are backfilled.
WAL Rotation and Compaction
- WAL Rotation: WAL files are rotated after reaching a configurable threshold to prevent unbounded growth.
- Database Compaction: Periodically, the database file can be compacted to reclaim space from deleted or obsolete pages.
Error Handling and Diagnostics
- Error Propagation: All major operations return
Result<T, String>
for robust error handling. - Diagnostics: Warnings and errors are logged to stderr for troubleshooting.
- Assertions and Invariants: Internal checks ensure data consistency and integrity.
Testing and Validation
- Test Coverage: Unit, integration, and property-based tests cover all major features.
- Test Utilities: Helpers for setting up test databases, cleaning up files, and simulating crash/recovery scenarios.
- Performance Testing: Benchmark modules and stress tests for buffer pool, WAL, and blob store.
Extending the Engine
- Adding New Field Types: Extend
ValueType
and update schema validation and serialization logic. - Custom Storage Backends: Implement new
DatabaseType
variants and adaptDBEngine
initialization. - Custom REPL Commands: Extend the REPL parser and executor for new administrative or diagnostic commands.
Security and Data Integrity
- Checksums: Data is checksummed before and after disk writes to detect corruption.
- Atomicity: All disk writes are atomic at the page level; WAL ensures atomicity at the transaction level.
- Crash Consistency: WAL and careful ordering of disk writes ensure no partial transactions are visible after a crash.
Performance Considerations
- Batching: Record serialization and disk writes are batched for efficiency.
- Parallelism: The engine is designed to allow concurrent transactions, subject to isolation and locking.
- Tuning: Buffer pool size, WAL rotation threshold, and blob threshold can be tuned for workload characteristics.
Additional Considerations
Multi-threading and Concurrency
- Thread Safety: The engine uses
Arc
,Mutex
, andRwLock
to ensure safe concurrent access to shared data structures. - Concurrent Transactions: Multiple transactions can be processed in parallel, subject to locking and isolation constraints.
- Lock Granularity: Record-level locks minimize contention, but schema/table-level locks may be used for DDL operations.
Serialization and Deserialization
- Bincode: All on-disk data (records, schemas, TOC) is serialized using
bincode
for compactness and speed. - Version Compatibility: Care is taken to support both old and new page formats for forward/backward compatibility.
Data Integrity and Consistency
- Checksums: Used to verify data integrity after disk writes and during recovery.
- Atomic Operations: Disk writes and WAL appends are performed atomically to prevent partial updates.
- Consistency Checks: On startup and after recovery, the engine verifies that all tables and records are consistent with their schemas.
Maintenance and Operations
- Backup and Restore: The engine can be stopped and files copied for backup; restore is as simple as replacing the files.
- Monitoring: Logging can be enabled for monitoring transaction throughput, errors, and performance metrics.
- Upgrades: Schema versioning and migration logic allow for safe upgrades without data loss.
Limitations and Future Work
- Distributed Transactions: Currently, transactions are local to a single engine instance.
- Query Language: The REPL supports a simple command language; SQL compatibility is a possible future enhancement.
- Advanced Indexing: Only basic secondary indexes are supported; more advanced indexing (e.g., B-trees, full-text) can be added.
- Encryption: Data-at-rest encryption is not yet implemented.
Storage Layer Deep Dive
Page Format and Layout
Kinesis uses a structured page format for optimal storage and retrieval:
┌─────────────────────────────────────────────┬──────────────────────┐
│ Header (40B) │ Data (16344B) │
├──────┬──────────┬────────┬──────────┬───────┼──────────────────────┤
│ Type │ Next │ Length │ Checksum │ Rsrvd │ Records/Data │
│ 1B │ 8B │ 4B │ 8B │ 19B │ │
└──────┴──────────┴────────┴──────────┴───────┴──────────────────────┘
Page Constants:
- PAGE_SIZE: 16,384 bytes (16KB)
- PAGE_HEADER_SIZE: 40 bytes
- MAX_DATA_SIZE: 16,344 bytes (PAGE_SIZE - PAGE_HEADER_SIZE)
Header Fields:
- Type (1 byte): Page type (0=regular, 1=overflow)
- Next (8 bytes): Next page ID in chain (0 if last page)
- Length (4 bytes): Valid data length in this page
- Checksum (8 bytes): Data integrity verification hash
- Reserved (19 bytes): Future use, zero-filled
Overflow Page Chains
Large records spanning multiple pages use linked chains:
┌─────────┐ ┌──────────┐ ┌─────────┐
│ Page 1 │───▶│ Page 2 │───▶│ Page 3 │
│(Header) │ │(Overflow)│ │ (Last) │
│ Data... │ │ Data... │ │ Data... │
└─────────┘ └──────────┘ └─────────┘
Table of Contents (TOC) Structure
#![allow(unused)] fn main() { struct TOC { table_locations: HashMap<String, Vec<u64>>, // Table → Page IDs table_schemas: HashMap<String, TableSchema>, // Table → Schema table_next_ids: HashMap<String, u64>, // Table → Next Record ID } }
TOC Handling:
- Small TOCs: Stored directly in page 0
- Large TOCs: Stored in overflow chain with reference in page 0
- Format:
TOC_REF:<page_id>
for overflow references
Memory Management and Performance
Buffer Pool Architecture
The buffer pool implements a sophisticated caching strategy:
#![allow(unused)] fn main() { // Configuration based on database type DatabaseType::InMemory => 10,000 pages // No disk I/O overhead DatabaseType::Hybrid => 2,500 pages // Balanced performance DatabaseType::OnDisk => 100 pages // Minimal memory usage }
LRU Eviction Policy:
- Pages are ranked by access time
- Dirty pages are flushed before eviction
- Pin/unpin semantics prevent premature eviction
- Configurable via
DB_BUFFER_POOL_SIZE
environment variable
Performance Characteristics
Operation Complexity
Operation | Time Complexity | Notes
-------------------|-----------------|----------------------------------
Insert | O(1) + schema | Staging only, validation overhead
Update | O(1) + schema | In-place updates when possible
Delete | O(1) | Lazy deletion, cleanup on commit
Point Query | O(1) | Hash-based record lookup
Table Scan | O(n) | Linear scan through all records
Schema Migration | O(n) | All records validated/migrated
Transaction Commit | O(k) | k = number of operations in tx
Memory Usage Patterns
- Record Storage: ~50-100 bytes overhead per record
- Transaction Tracking: ~200 bytes per active transaction
- Page Cache: 4KB per cached page
- Blob References: ~50 bytes per large string reference
Disk I/O Patterns
- Sequential Writes: WAL, batch serialization, compaction
- Random Reads: Buffer pool minimizes seeks for hot data
- Bulk Operations: Chunked to prevent memory pressure
- Checkpointing: Periodic full database sync
Advanced Configuration and Tuning
Environment Variables
# Core Performance Settings
export DB_BUFFER_POOL_SIZE=5000 # Pages in memory cache (default: varies by DB type)
# Database Engine Configuration
export DB_NAME=main_db # Database filename (default: main_db)
export DB_STORAGE_ENGINE=hybrid # Storage type: memory, disk, hybrid (default: hybrid)
export DB_ISOLATION_LEVEL=serializable # Transaction isolation level (default: serializable)
export DB_AUTO_COMPACT=true # Automatic database compaction (default: true)
export DB_RESTORE_POLICY=recover_pending # WAL recovery policy (default: recover_pending)
# Feature Toggles
export DB_INDEXING=true # Enable secondary indexes (if supported)
# Logging and Debugging
export RUST_LOG=kinesis_db=debug # Enable debug logging
Database Storage Engines:
memory
: Purely in-memory, non-persistentdisk
: Fully persistent, disk-basedhybrid
: In-memory caching with disk persistence (recommended)
Isolation Levels:
read_uncommitted
: No isolation, maximum performanceread_committed
: Prevents dirty readsrepeatable_read
: Consistent snapshot within transactionserializable
: Full isolation with conflict detection
Restore Policies:
discard
: Ignore WAL on startuprecover_pending
: Recover only uncommitted transactions (default)recover_all
: Recover all transactions from WAL
Performance Tuning Guidelines
For High-Throughput Workloads
# Optimize for write performance
export DB_STORAGE_ENGINE=hybrid
export DB_BUFFER_POOL_SIZE=10000
export DB_ISOLATION_LEVEL=read_committed # Lower isolation for better concurrency
export DB_AUTO_COMPACT=false # Manual compaction only
For Memory-Constrained Environments
# Minimize memory usage
export DB_STORAGE_ENGINE=disk
export DB_BUFFER_POOL_SIZE=500
export DB_AUTO_COMPACT=true
For Read-Heavy Workloads
# Optimize for read performance
export DB_STORAGE_ENGINE=hybrid
export DB_BUFFER_POOL_SIZE=20000
export DB_INDEXING=true # If available
export DB_ISOLATION_LEVEL=repeatable_read # Good balance of consistency and performance
For Development and Testing
# Fast, non-persistent setup
export DB_STORAGE_ENGINE=memory
export DB_BUFFER_POOL_SIZE=1000
export DB_ISOLATION_LEVEL=read_uncommitted
export RUST_LOG=kinesis_db=trace
Debugging and Diagnostics
Logging and Monitoring
Enable comprehensive logging for troubleshooting:
# Full debug logging
export RUST_LOG=kinesis_db=trace,kinesis_db::storage=debug
# Component-specific logging
export RUST_LOG=kinesis_db::transaction=debug,kinesis_db::wal=info
Log Categories:
kinesis_db::transaction
: Transaction lifecycle eventskinesis_db::storage
: Page I/O and buffer pool activitykinesis_db::wal
: Write-ahead log operationskinesis_db::blob
: Large data storage operations
Common Issues and Solutions
Transaction Timeouts
Symptoms: Transaction timeout
errors during commit
Causes: Long-running transactions, deadlocks, excessive lock contention
Solutions:
#![allow(unused)] fn main() { // Increase timeout in transaction config let tx_config = TransactionConfig { timeout_secs: 300, // 5 minutes max_retries: 10, deadlock_detection_interval_ms: 100, }; }
Memory Pressure
Symptoms: Slow performance, frequent page evictions, OOM errors Causes: Insufficient buffer pool, large transactions, memory leaks Solutions:
- Increase
DB_BUFFER_POOL_SIZE
- Batch large operations
- Use streaming for bulk imports
- Monitor RSS with
ps
or system monitoring
Disk Space Issues
Symptoms: No space left on device
, WAL growth
Causes: WAL accumulation, blob storage growth, failed compaction
Solutions:
# Manual WAL cleanup (when safe)
find /path/to/wal -name "*.log.*" -delete
# Force database compaction
echo "COMPACT DATABASE;" | your_repl_tool
# Monitor disk usage
du -sh /path/to/database/
Data Corruption
Symptoms: Checksum mismatch warnings, deserialization errors Causes: Hardware issues, incomplete writes, software bugs Solutions:
- Restore from backup
- Check hardware (disk, memory)
- Enable more frequent syncing
- Verify file permissions
Diagnostic Tools and Commands
Built-in REPL Diagnostics
-- Check database statistics
GET_TABLES;
-- Verify table schema
GET_TABLE users;
-- Examine specific records
GET_RECORD FROM users 1;
-- Search for patterns
SEARCH_RECORDS FROM users MATCH "corrupted";
File System Analysis
# Check file sizes and growth
ls -lh data/*.pages data/*.log data/*.blobs*
# Verify file integrity
file data/*.pages # Should show "data"
Memory Analysis
# Monitor memory usage during operations
watch 'ps aux | grep your_process'
# Check for memory leaks
valgrind --tool=memcheck --leak-check=full your_binary
# Analyze heap usage
heaptrack your_binary
Development Workflow and Best Practices
Setting Up Development Environment
# Clone and setup
git clone <repository>
cd kinesis-api
# Install dependencies
cargo build
# Setup test environment
mkdir -p data/tests
export RUST_LOG=debug
export DB_BUFFER_POOL_SIZE=1000
Running Tests
# Full test suite
cargo test database::tests
# Specific test categories
cargo test database::tests::overflow_pages # Storage tests
cargo test database::tests::concurrency # Transaction tests
cargo test database::tests::schema # Schema validation tests
cargo test database::tests::benchmark # Performance tests
# Run with logging
RUST_LOG=debug cargo test database::tests::basic_operations
Adding New Features
1. Extending ValueType for New Data Types
#![allow(unused)] fn main() { // 1. Add to ValueType enum #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] pub enum ValueType { // ...existing variants... Decimal(rust_decimal::Decimal), Json(serde_json::Value), } // 2. Update FieldType enum #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] pub enum FieldType { // ...existing variants... Decimal, Json, } // 3. Add validation logic impl FieldConstraint { pub fn validate_value(&self, value: &ValueType) -> Result<(), String> { match (&self.field_type, value) { // ...existing cases... (FieldType::Decimal, ValueType::Decimal(_)) => Ok(()), (FieldType::Json, ValueType::Json(_)) => Ok(()), _ => Err("Type mismatch".to_string()), } } } // 4. Add display formatting impl fmt::Display for ValueType { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { match self { // ...existing cases... ValueType::Decimal(d) => write!(f, "{}", d), ValueType::Json(j) => write!(f, "{}", j), } } } }
2. Adding New REPL Commands
#![allow(unused)] fn main() { // 1. Add command variant #[derive(Debug, Clone)] pub enum Command { // ...existing commands... ExplainQuery { table: String, query: String }, ShowIndexes { table: Option<String> }, } // 2. Add parser case fn parse_commands(&self, input: &str) -> Result<Vec<Command>, String> { // ...existing parsing... match tokens[0].to_uppercase().as_str() { // ...existing cases... "EXPLAIN" => self.parse_explain(&tokens[1..])?, "SHOW" if tokens.len() > 1 && tokens[1].to_uppercase() == "INDEXES" => { self.parse_show_indexes(&tokens[2..])? }, _ => return Err(format!("Unknown command: {}", tokens[0])), } } // 3. Add executor case fn execute(&mut self, input: &str, format: Option<OutputFormat>) -> Result<String, String> { // ...existing execution... match command { // ...existing cases... Command::ExplainQuery { table, query } => { self.explain_query(&table, &query) }, Command::ShowIndexes { table } => { self.show_indexes(table.as_deref()) }, } } }
3. Implementing New Storage Backends
#![allow(unused)] fn main() { // 1. Add database type #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] pub enum DatabaseType { // ...existing variants... Distributed { nodes: Vec<String> }, Compressed { algorithm: CompressionType }, } // 2. Update engine initialization impl DBEngine { pub fn new(db_type: DatabaseType, /* other params */) -> Self { let (page_store, blob_store) = match &db_type { // ...existing cases... DatabaseType::Distributed { nodes } => { (Some(DistributedPageStore::new(nodes)?), Some(DistributedBlobStore::new(nodes)?)) }, DatabaseType::Compressed { algorithm } => { (Some(CompressedPageStore::new(algorithm)?), Some(CompressedBlobStore::new(algorithm)?)) }, }; // ...rest of initialization... } } }
Code Style and Standards
#![allow(unused)] fn main() { // Use descriptive names and comprehensive documentation /// Commits a transaction, applying all staged changes atomically. /// /// This method validates the transaction according to its isolation level, /// acquires necessary locks, applies changes to the in-memory database, /// logs to WAL (if applicable), and persists to disk. /// /// # Arguments /// * `tx` - The transaction to commit /// /// # Returns /// * `Ok(())` if the transaction was committed successfully /// * `Err(String)` describing the failure reason /// /// # Examples /// ```rust /// let mut tx = engine.begin_transaction(); /// engine.insert_record(&mut tx, "users", record)?; /// engine.commit(tx)?; /// ``` pub fn commit(&mut self, tx: Transaction) -> Result<(), String> { CommitGuard::new(self, tx).commit() } // Use proper error handling with context match self.validate_transaction(&tx) { Ok(()) => { /* continue */ } Err(e) => return Err(format!("Transaction validation failed: {}", e)), } // Prefer explicit types for complex generics let buffer_pool: Arc<Mutex<BufferPool>> = Arc::new(Mutex::new( BufferPool::new(buffer_pool_size, db_type) )); }
Testing Guidelines
Unit Tests
#![allow(unused)] fn main() { #[test] fn test_transaction_isolation() { let mut engine = setup_test_db("isolation_test", IsolationLevel::Serializable); // Test specific isolation behavior let mut tx1 = engine.begin_transaction(); let mut tx2 = engine.begin_transaction(); // Simulate concurrent operations engine.insert_record(&mut tx1, "test", record1)?; engine.insert_record(&mut tx2, "test", record2)?; // Verify isolation guarantees assert!(engine.commit(tx1).is_ok()); assert!(engine.commit(tx2).is_err()); // Should conflict } }
Integration Tests
#![allow(unused)] fn main() { #[test] fn test_crash_recovery_scenario() { let db_path = "test_crash_recovery"; // Phase 1: Create initial state { let mut engine = create_engine(db_path); perform_operations(&mut engine); // Simulate crash - don't clean shutdown } // Phase 2: Recovery { let mut engine = create_engine(db_path); // Should recover automatically verify_recovered_state(&engine); } } }
Performance Tests
#![allow(unused)] fn main() { #[test] fn test_bulk_insert_performance() { let mut engine = setup_test_db("perf_test", IsolationLevel::ReadCommitted); let start = Instant::now(); let mut tx = engine.begin_transaction(); for i in 0..10_000 { let record = create_test_record(i, &format!("Record {}", i)); engine.insert_record(&mut tx, "perf_test", record)?; } engine.commit(tx)?; let duration = start.elapsed(); println!("Bulk insert of 10k records: {:?}", duration); assert!(duration < Duration::from_secs(10)); // Performance threshold } }
FAQ and Troubleshooting
- Q: Why is my data not persisted?
- A: Ensure you are not using the
InMemory
backend. OnlyOnDisk
andHybrid
persist data.
- A: Ensure you are not using the
- Q: How do I recover from a crash?
- A: On startup, the engine automatically loads from disk and replays the WAL.
- Q: How do I enable indexes?
- A: Set the
DB_INDEXING
environment variable totrue
before starting the engine.
- A: Set the
- Q: How do I tune performance?
- A: Adjust
DB_BUFFER_POOL_SIZE
, WAL rotation threshold, and use the appropriate backend for your workload.
- A: Adjust
Glossary
- MVCC: Multi-Version Concurrency Control, enables snapshot isolation.
- WAL: Write-Ahead Log, ensures durability and crash recovery.
- TOC: Table of Contents, metadata page mapping tables to page chains.
- Blob: Large binary or string data stored outside the main page store.
For further details, refer to the source code and inline documentation in each module.
References
- See the
src/database/
directory for implementation details. - Consult module-level Rust docs for API usage and extension points.
- For design discussions and roadmap, refer to the project wiki or issue tracker.
This document is intended to be comprehensive. If you find any missing details or have suggestions for improvement, please update this file or open a new ticket.