Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Kinesis DB: Technical Documentation

This document provides a deep technical overview of the Kinesis database layer, focusing on its architecture, core modules, transaction management, persistence, and extensibility. It is intended for developers working on or extending the database internals.


Architecture Overview

The Kinesis database layer is designed as a modular, transactional, and persistent storage engine supporting multiple backends:

  • InMemory: Purely in-memory, non-persistent.
  • OnDisk: Fully persistent, disk-based.
  • Hybrid: Combines in-memory caching with disk persistence for performance.

The main entry point is the DBEngine, which orchestrates all database operations, transaction management, and persistence.


Core Modules

1. database/engine.rs (DBEngine)

  • Central orchestrator for all database operations.
  • Manages:
    • In-memory state (Database)
    • Disk persistence (via PageStore, BufferPool, BlobStore)
    • Transaction lifecycle (TransactionManager)
    • Write-Ahead Logging (WriteAheadLog)
    • Table schemas and schema migrations
    • Optional secondary indexes (IndexManager)

Key Responsibilities

  • Transaction Management: Begin, commit, rollback, and validate transactions.
  • Record Operations: Insert, update, delete, and search records with schema validation.
  • Persistence: Save/load state to/from disk, manage page allocation, and handle large data via blobs.
  • Crash Recovery: Replay WAL and restore consistent state after failures.
  • Schema Evolution: Support for schema updates with migration and validation.

2. database/database.rs (Database)

  • Represents the in-memory state of all tables.
  • Each table is a Table with its own schema, records, and next record ID.

3. database/table.rs (Table)

  • Stores records as a map of record ID to Record.
  • Enforces schema constraints and manages record versioning.

4. database/record.rs (Record)

  • Represents a single row in a table.
  • Contains:
    • id: Unique identifier
    • values: Map of field name to ValueType
    • version and timestamp for MVCC and concurrency control

5. database/schema.rs (TableSchema, FieldConstraint)

  • Defines table structure, field types, constraints (required, unique, min/max, pattern), and default values.
  • Supports schema validation and migration logic.

6. database/value_type.rs (ValueType, StringValue)

  • Strongly-typed representation of all supported field types (Int, Float, String, Bool, etc.).
  • StringValue supports both inline and blob-referenced storage for large strings.

7. storage/page_store.rs, storage/buffer_pool.rs, storage/blob_store.rs

  • PageStore: Manages allocation, reading, and writing of fixed-size pages on disk.
  • BufferPool: In-memory cache for pages, with LRU eviction.
  • BlobStore: Efficient storage for large binary/string data, with reference counting and garbage collection.

8. storage/wal.rs (WriteAheadLog)

  • Ensures durability and crash recovery by logging all transactional changes before commit.

9. transaction/manager.rs, transaction/transaction.rs

  • TransactionManager: Tracks active transactions, locks, timeouts, and deadlock detection.
  • Transaction: Encapsulates all pending changes, isolation level, and MVCC snapshot.

Key Functions and Internal Workings

Loading the Database (DBEngine::load_from_disk)

  • Purpose: Loads the database state from disk into memory.
  • How it works:
    • Reads the Table of Contents (TOC) from the first page using the buffer pool and page store.
    • Deserializes table locations, schemas, and next record IDs.
    • For each table, loads all records from their page chains (handling both old and new page formats, including overflow pages).
    • Reconstructs in-memory Table objects and inserts them into the database.
    • Skips loading for InMemory databases.

Saving the Database (DBEngine::save_to_disk)

  • Purpose: Persists the current in-memory state to disk.
  • How it works:
    • Iterates over all tables, serializes their records in batches.
    • Writes each batch to disk using the page store, allocating new pages as needed.
    • Updates the TOC with new page locations, schemas, and next IDs.
    • Flushes all pages via the buffer pool and syncs the page store to ensure durability.
    • Returns a checksum of the serialized state for verification.

Committing Transactions (DBEngine::commit and commit_internal)

  • Purpose: Atomically applies all changes in a transaction, ensuring ACID properties.
  • How it works:
    • Validates the transaction for isolation level conflicts (e.g., repeatable read, serializable).
    • Acquires necessary locks for all records being written.
    • Applies all staged changes (inserts, updates, deletes, schema changes) to the in-memory database.
    • Logs the transaction to the WAL (unless it's a blob operation or in-memory DB).
    • Persists changes to disk (if not in-memory).
    • Releases locks and ends the transaction.
    • On failure, rolls back all changes using the transaction's snapshot.

Rolling Back Transactions (DBEngine::rollback)

  • Purpose: Reverts all changes made by a transaction if commit fails or is aborted.
  • How it works:
    • Restores original state for all modified records using the transaction's snapshot.
    • Releases any acquired locks.
    • Cleans up any staged blob references.

Transaction Lifecycle

  • Begin: begin_transaction creates a new transaction, optionally with a snapshot for MVCC.
  • Read/Write: All record operations are staged in the transaction's pending sets.
  • Commit: See above.
  • Rollback: See above.

Table and Record Operations

  • Creating Tables: create_table or create_table_with_schema adds a new table definition to the transaction's pending creates.
  • Inserting Records: insert_record validates the record against the schema, handles large strings (blobs), and stages the insert.
  • Updating Records: update_record validates updates, manages blob references, and stages the update.
  • Deleting Records: delete_record stages the deletion and tracks any blob references for cleanup.

Blob Storage

  • Large strings are stored in the BlobStore if they exceed a threshold.
  • Records store a reference to the blob key.
  • On update or delete, old blob references are cleaned up.
  • Blob store is synced to disk after changes.

Write-Ahead Log (WAL)

  • Purpose: Ensures durability and crash recovery.
  • How it works:
    • All transactional changes are serialized and appended to the WAL before being applied.
    • On startup, the WAL is replayed to recover any committed but not yet persisted transactions.

Crash Recovery (DBEngine::recover_from_crash)

  • Purpose: Restores a consistent state after a crash.
  • How it works:
    • Loads the database from disk.
    • Loads and replays all valid transactions from the WAL.
    • Applies each transaction and persists the state after each.

Schema Management

  • Schema Validation: All record operations are validated against the current table schema.
  • Schema Migration: update_table_schema stages a schema update, which is validated for compatibility and applied atomically.

Transaction Types and Isolation Levels

Kinesis supports four transaction isolation levels, each with distinct semantics and internal handling:

1. ReadUncommitted

  • Behavior: Transactions can see uncommitted changes from other transactions ("dirty reads").
  • Implementation: No snapshot is taken. Reads always reflect the latest in-memory state, regardless of commit status.
  • Use Case: Maximum performance, minimal isolation. Rarely recommended except for analytics or non-critical reads.

2. ReadCommitted

  • Behavior: Transactions only see data that has been committed by other transactions.
  • Implementation:
    • No snapshot is taken.
    • On each read, the engine checks the committed state of the record.
    • During commit, the engine validates that records read have not changed since they were read (prevents "non-repeatable reads").
  • Use Case: Standard for many OLTP systems, balances consistency and concurrency.

3. RepeatableRead

  • Behavior: All reads within a transaction see a consistent snapshot of the database as of the transaction's start.
  • Implementation:
    • A full snapshot of the database is taken at transaction start.
    • Reads are served from the snapshot.
    • On commit, the engine validates that no records in the read set have changed since the snapshot.
    • Prevents non-repeatable reads and phantom reads (to the extent possible without full serializability).
  • Use Case: Applications requiring strong consistency for reads within a transaction.

4. Serializable

  • Behavior: Transactions are fully isolated as if executed serially.
  • Implementation:
    • Like RepeatableRead, but with additional validation:
      • Checks for write-write conflicts.
      • Ensures no new records (phantoms) have appeared in the range of interest.
      • Validates that the snapshot and current state are equivalent for all read and written data.
    • May abort transactions if conflicts are detected.
  • Use Case: Highest level of isolation, required for financial or critical systems.

Transaction Internals

Transaction Structure

Each transaction (Transaction) tracks:

  • ID: Unique identifier.
  • Isolation Level: Determines snapshot and validation logic.
  • Snapshot: Optional, for higher isolation levels.
  • Read Set: Records read (table, id, version).
  • Write Set: Records written (table, id).
  • Pending Operations: Inserts, updates, deletes, schema changes.
  • Metadata: For tracking blob references and other auxiliary data.

Transaction Flow

  1. Begin: DBEngine::begin_transaction creates a new transaction, possibly with a snapshot.
  2. Read: Reads are tracked in the read set. For RepeatableRead/Serializable, reads come from the snapshot.
  3. Write: Writes are staged in the write set and pending operations.
  4. Validation: On commit, the engine validates the transaction according to its isolation level.
  5. Commit: If validation passes, changes are applied atomically.
  6. Rollback: If validation fails or an error occurs, all changes are reverted.

Locking and Deadlock Detection

  • Locks: Acquired at the record level for writes. Managed by TransactionManager.
  • Deadlock Detection: Periodically checks for cycles in the lock graph. If detected, aborts one or more transactions.
  • Timeouts: Transactions can be configured to expire after a set duration.

Detailed Function Descriptions

DBEngine::begin_transaction

  • Purpose: Starts a new transaction.
  • Details: Assigns a unique ID, sets the isolation level, and optionally takes a snapshot of the database for higher isolation.

DBEngine::commit

  • Purpose: Commits a transaction, applying all staged changes.
  • Details:
    • Validates the transaction (see above).
    • Acquires all necessary locks.
    • Applies inserts, updates, deletes, and schema changes.
    • Logs the transaction to the WAL (unless skipped for blob ops).
    • Persists changes to disk.
    • Releases locks and ends the transaction.

DBEngine::rollback

  • Purpose: Rolls back a transaction, reverting all staged changes.
  • Details: Uses the transaction's snapshot and pending operations to restore the previous state.

DBEngine::insert_record

  • Purpose: Stages a new record for insertion.
  • Details: Validates against schema, handles large strings (blobs), and adds to the transaction's pending inserts.

DBEngine::update_record

  • Purpose: Stages updates to an existing record.
  • Details: Validates updates, manages blob references, and adds to pending updates.

DBEngine::delete_record

  • Purpose: Stages a record for deletion.
  • Details: Tracks blob references for cleanup and adds to pending deletes.

DBEngine::search_records

  • Purpose: Searches for records matching a query string.
  • Details: Supports case-insensitive search, uses the appropriate snapshot or committed state based on isolation level.

DBEngine::create_table_with_schema

  • Purpose: Stages creation of a new table with a specified schema.
  • Details: Adds to the transaction's pending table creates.

DBEngine::update_table_schema

  • Purpose: Stages a schema update for a table.
  • Details: Validates that the migration is safe and adds to pending schema updates.

DBEngine::load_from_disk

  • Purpose: Loads the database state from disk.
  • Details: Reads the TOC, loads all tables and records, reconstructs in-memory state.

DBEngine::save_to_disk

  • Purpose: Persists the current state to disk.
  • Details: Serializes all tables and records, writes to page store, updates TOC, flushes and syncs.

DBEngine::recover_from_crash

  • Purpose: Restores the database to a consistent state after a crash.
  • Details: Loads from disk, replays WAL, applies all valid transactions, persists state.

DBEngine::compact_database

  • Purpose: Compacts the database file, removing unused pages and reorganizing data.
  • Details: Writes all live data to a new file, replaces the old file, and reopens the page store.

REPL Shell (REPL)

  • Purpose: Interactive shell for database commands.
  • Supported Commands:
    • CREATE_TABLE <name> ...
    • INSERT INTO <table> ...
    • UPDATE <table> ...
    • DELETE FROM <table> ...
    • SEARCH_RECORDS FROM <table> MATCH <query>
    • GET_RECORDS FROM <table>
    • DROP_TABLE <name>
    • ALTER_TABLE <name> ...
  • Features:
    • Supports output formats: standard, table, JSON, etc.
    • Handles errors and transaction boundaries per command.
    • Useful for development, testing, and demos.

Error Handling & Recovery

  • All disk and transactional errors are surfaced as Result<T, String>.
  • On commit failure, a rollback is automatically attempted.
  • On startup, the engine attempts recovery using the WAL and disk state.

Extensibility

  • Indexes: Optional secondary indexes can be enabled via environment variable.
  • Custom Field Types: Extend ValueType and update schema validation logic.
  • Storage Backends: Implement new DatabaseType variants and adapt DBEngine initialization.

Testing

  • Extensive test suite under src/database/tests/ covers:
    • Basic operations
    • Transactional semantics
    • Persistence and recovery
    • Schema validation and migration
    • Large data and overflow page handling
    • Concurrency and rollback

Developer Notes

  • CommitGuard: RAII pattern to ensure rollback on drop if commit fails.
  • Isolation Levels: Each level has custom validation logic; review validate_transaction.
  • BlobStore: Always clean up blob references on update/delete.
  • Buffer Pool: Tune via DB_BUFFER_POOL_SIZE env variable.
  • Logging: Enable for debugging concurrency, persistence, or recovery issues.
  • Schema Evolution: Use can_migrate_from to validate safe schema changes.

Advanced Topics and Additional Details

Buffer Pool and Page Management

  • BufferPool: Implements an LRU cache for disk pages, reducing disk I/O and improving performance.
    • Pages are pinned/unpinned as they are accessed and modified.
    • Eviction policy ensures hot pages remain in memory.
    • Buffer pool size is configurable via environment variable.
  • PageStore: Handles allocation, reading, writing, and freeing of fixed-size pages on disk.
    • Supports overflow pages for large records or batches.
    • Ensures atomicity and durability of page writes.

Large Data and Blob Handling

  • BlobStore: Used for storing large strings or binary data that exceed the inline threshold.
    • Data is stored in separate files with reference counting.
    • Blob references are cleaned up on record deletion or update.
    • BlobStore is memory-mapped for efficient access and syncs to disk after changes.
    • Blob index file tracks all blob keys and their locations.

Indexing (Optional)

  • IndexManager: If enabled, maintains secondary indexes for fast lookups.
    • Indexes are updated on insert, update, and delete.
    • Can be extended to support custom index types or full-text search.

Schema Evolution and Migration

  • Schema Versioning: Each table schema has a version number.
  • Migration Safety: can_migrate_from checks for safe migrations (e.g., prevents dropping required fields without defaults, or changing types incompatibly).
  • Default Values: New required fields can be added if a default is provided; existing records are backfilled.

WAL Rotation and Compaction

  • WAL Rotation: WAL files are rotated after reaching a configurable threshold to prevent unbounded growth.
  • Database Compaction: Periodically, the database file can be compacted to reclaim space from deleted or obsolete pages.

Error Handling and Diagnostics

  • Error Propagation: All major operations return Result<T, String> for robust error handling.
  • Diagnostics: Warnings and errors are logged to stderr for troubleshooting.
  • Assertions and Invariants: Internal checks ensure data consistency and integrity.

Testing and Validation

  • Test Coverage: Unit, integration, and property-based tests cover all major features.
  • Test Utilities: Helpers for setting up test databases, cleaning up files, and simulating crash/recovery scenarios.
  • Performance Testing: Benchmark modules and stress tests for buffer pool, WAL, and blob store.

Extending the Engine

  • Adding New Field Types: Extend ValueType and update schema validation and serialization logic.
  • Custom Storage Backends: Implement new DatabaseType variants and adapt DBEngine initialization.
  • Custom REPL Commands: Extend the REPL parser and executor for new administrative or diagnostic commands.

Security and Data Integrity

  • Checksums: Data is checksummed before and after disk writes to detect corruption.
  • Atomicity: All disk writes are atomic at the page level; WAL ensures atomicity at the transaction level.
  • Crash Consistency: WAL and careful ordering of disk writes ensure no partial transactions are visible after a crash.

Performance Considerations

  • Batching: Record serialization and disk writes are batched for efficiency.
  • Parallelism: The engine is designed to allow concurrent transactions, subject to isolation and locking.
  • Tuning: Buffer pool size, WAL rotation threshold, and blob threshold can be tuned for workload characteristics.

Additional Considerations

Multi-threading and Concurrency

  • Thread Safety: The engine uses Arc, Mutex, and RwLock to ensure safe concurrent access to shared data structures.
  • Concurrent Transactions: Multiple transactions can be processed in parallel, subject to locking and isolation constraints.
  • Lock Granularity: Record-level locks minimize contention, but schema/table-level locks may be used for DDL operations.

Serialization and Deserialization

  • Bincode: All on-disk data (records, schemas, TOC) is serialized using bincode for compactness and speed.
  • Version Compatibility: Care is taken to support both old and new page formats for forward/backward compatibility.

Data Integrity and Consistency

  • Checksums: Used to verify data integrity after disk writes and during recovery.
  • Atomic Operations: Disk writes and WAL appends are performed atomically to prevent partial updates.
  • Consistency Checks: On startup and after recovery, the engine verifies that all tables and records are consistent with their schemas.

Maintenance and Operations

  • Backup and Restore: The engine can be stopped and files copied for backup; restore is as simple as replacing the files.
  • Monitoring: Logging can be enabled for monitoring transaction throughput, errors, and performance metrics.
  • Upgrades: Schema versioning and migration logic allow for safe upgrades without data loss.

Limitations and Future Work

  • Distributed Transactions: Currently, transactions are local to a single engine instance.
  • Query Language: The REPL supports a simple command language; SQL compatibility is a possible future enhancement.
  • Advanced Indexing: Only basic secondary indexes are supported; more advanced indexing (e.g., B-trees, full-text) can be added.
  • Encryption: Data-at-rest encryption is not yet implemented.

Storage Layer Deep Dive

Page Format and Layout

Kinesis uses a structured page format for optimal storage and retrieval:

┌─────────────────────────────────────────────┬──────────────────────┐
│                Header (40B)                 │     Data (16344B)    │
├──────┬──────────┬────────┬──────────┬───────┼──────────────────────┤
│ Type │   Next   │ Length │ Checksum │ Rsrvd │    Records/Data      │
│  1B  │    8B    │   4B   │    8B    │  19B  │                      │
└──────┴──────────┴────────┴──────────┴───────┴──────────────────────┘

Page Constants:

  • PAGE_SIZE: 16,384 bytes (16KB)
  • PAGE_HEADER_SIZE: 40 bytes
  • MAX_DATA_SIZE: 16,344 bytes (PAGE_SIZE - PAGE_HEADER_SIZE)

Header Fields:

  • Type (1 byte): Page type (0=regular, 1=overflow)
  • Next (8 bytes): Next page ID in chain (0 if last page)
  • Length (4 bytes): Valid data length in this page
  • Checksum (8 bytes): Data integrity verification hash
  • Reserved (19 bytes): Future use, zero-filled

Overflow Page Chains

Large records spanning multiple pages use linked chains:

┌─────────┐    ┌──────────┐    ┌─────────┐
│ Page 1  │───▶│ Page 2   │───▶│ Page 3  │
│(Header) │    │(Overflow)│    │ (Last)  │
│ Data... │    │ Data...  │    │ Data... │
└─────────┘    └──────────┘    └─────────┘

Table of Contents (TOC) Structure

#![allow(unused)]
fn main() {
struct TOC {
    table_locations: HashMap<String, Vec<u64>>,     // Table → Page IDs
    table_schemas: HashMap<String, TableSchema>,    // Table → Schema
    table_next_ids: HashMap<String, u64>,          // Table → Next Record ID
}
}

TOC Handling:

  • Small TOCs: Stored directly in page 0
  • Large TOCs: Stored in overflow chain with reference in page 0
  • Format: TOC_REF:<page_id> for overflow references

Memory Management and Performance

Buffer Pool Architecture

The buffer pool implements a sophisticated caching strategy:

#![allow(unused)]
fn main() {
// Configuration based on database type
DatabaseType::InMemory  => 10,000 pages  // No disk I/O overhead
DatabaseType::Hybrid    => 2,500 pages   // Balanced performance
DatabaseType::OnDisk    => 100 pages     // Minimal memory usage
}

LRU Eviction Policy:

  • Pages are ranked by access time
  • Dirty pages are flushed before eviction
  • Pin/unpin semantics prevent premature eviction
  • Configurable via DB_BUFFER_POOL_SIZE environment variable

Performance Characteristics

Operation Complexity

Operation          | Time Complexity | Notes
-------------------|-----------------|----------------------------------
Insert             | O(1) + schema   | Staging only, validation overhead
Update             | O(1) + schema   | In-place updates when possible
Delete             | O(1)            | Lazy deletion, cleanup on commit
Point Query        | O(1)            | Hash-based record lookup
Table Scan         | O(n)            | Linear scan through all records
Schema Migration   | O(n)            | All records validated/migrated
Transaction Commit | O(k)            | k = number of operations in tx

Memory Usage Patterns

  • Record Storage: ~50-100 bytes overhead per record
  • Transaction Tracking: ~200 bytes per active transaction
  • Page Cache: 4KB per cached page
  • Blob References: ~50 bytes per large string reference

Disk I/O Patterns

  • Sequential Writes: WAL, batch serialization, compaction
  • Random Reads: Buffer pool minimizes seeks for hot data
  • Bulk Operations: Chunked to prevent memory pressure
  • Checkpointing: Periodic full database sync

Advanced Configuration and Tuning

Environment Variables

# Core Performance Settings
export DB_BUFFER_POOL_SIZE=5000           # Pages in memory cache (default: varies by DB type)

# Database Engine Configuration
export DB_NAME=main_db                    # Database filename (default: main_db)
export DB_STORAGE_ENGINE=hybrid           # Storage type: memory, disk, hybrid (default: hybrid)
export DB_ISOLATION_LEVEL=serializable    # Transaction isolation level (default: serializable)
export DB_AUTO_COMPACT=true               # Automatic database compaction (default: true)
export DB_RESTORE_POLICY=recover_pending  # WAL recovery policy (default: recover_pending)

# Feature Toggles
export DB_INDEXING=true                   # Enable secondary indexes (if supported)

# Logging and Debugging
export RUST_LOG=kinesis_db=debug          # Enable debug logging

Database Storage Engines:

  • memory: Purely in-memory, non-persistent
  • disk: Fully persistent, disk-based
  • hybrid: In-memory caching with disk persistence (recommended)

Isolation Levels:

  • read_uncommitted: No isolation, maximum performance
  • read_committed: Prevents dirty reads
  • repeatable_read: Consistent snapshot within transaction
  • serializable: Full isolation with conflict detection

Restore Policies:

  • discard: Ignore WAL on startup
  • recover_pending: Recover only uncommitted transactions (default)
  • recover_all: Recover all transactions from WAL

Performance Tuning Guidelines

For High-Throughput Workloads

# Optimize for write performance
export DB_STORAGE_ENGINE=hybrid
export DB_BUFFER_POOL_SIZE=10000
export DB_ISOLATION_LEVEL=read_committed  # Lower isolation for better concurrency
export DB_AUTO_COMPACT=false              # Manual compaction only

For Memory-Constrained Environments

# Minimize memory usage
export DB_STORAGE_ENGINE=disk
export DB_BUFFER_POOL_SIZE=500
export DB_AUTO_COMPACT=true

For Read-Heavy Workloads

# Optimize for read performance
export DB_STORAGE_ENGINE=hybrid
export DB_BUFFER_POOL_SIZE=20000
export DB_INDEXING=true                   # If available
export DB_ISOLATION_LEVEL=repeatable_read # Good balance of consistency and performance

For Development and Testing

# Fast, non-persistent setup
export DB_STORAGE_ENGINE=memory
export DB_BUFFER_POOL_SIZE=1000
export DB_ISOLATION_LEVEL=read_uncommitted
export RUST_LOG=kinesis_db=trace

Debugging and Diagnostics

Logging and Monitoring

Enable comprehensive logging for troubleshooting:

# Full debug logging
export RUST_LOG=kinesis_db=trace,kinesis_db::storage=debug

# Component-specific logging
export RUST_LOG=kinesis_db::transaction=debug,kinesis_db::wal=info

Log Categories:

  • kinesis_db::transaction: Transaction lifecycle events
  • kinesis_db::storage: Page I/O and buffer pool activity
  • kinesis_db::wal: Write-ahead log operations
  • kinesis_db::blob: Large data storage operations

Common Issues and Solutions

Transaction Timeouts

Symptoms: Transaction timeout errors during commit Causes: Long-running transactions, deadlocks, excessive lock contention Solutions:

#![allow(unused)]
fn main() {
// Increase timeout in transaction config
let tx_config = TransactionConfig {
    timeout_secs: 300,     // 5 minutes
    max_retries: 10,
    deadlock_detection_interval_ms: 100,
};
}

Memory Pressure

Symptoms: Slow performance, frequent page evictions, OOM errors Causes: Insufficient buffer pool, large transactions, memory leaks Solutions:

  • Increase DB_BUFFER_POOL_SIZE
  • Batch large operations
  • Use streaming for bulk imports
  • Monitor RSS with ps or system monitoring

Disk Space Issues

Symptoms: No space left on device, WAL growth Causes: WAL accumulation, blob storage growth, failed compaction Solutions:

# Manual WAL cleanup (when safe)
find /path/to/wal -name "*.log.*" -delete

# Force database compaction
echo "COMPACT DATABASE;" | your_repl_tool

# Monitor disk usage
du -sh /path/to/database/

Data Corruption

Symptoms: Checksum mismatch warnings, deserialization errors Causes: Hardware issues, incomplete writes, software bugs Solutions:

  • Restore from backup
  • Check hardware (disk, memory)
  • Enable more frequent syncing
  • Verify file permissions

Diagnostic Tools and Commands

Built-in REPL Diagnostics

-- Check database statistics
GET_TABLES;

-- Verify table schema
GET_TABLE users;

-- Examine specific records
GET_RECORD FROM users 1;

-- Search for patterns
SEARCH_RECORDS FROM users MATCH "corrupted";

File System Analysis

# Check file sizes and growth
ls -lh data/*.pages data/*.log data/*.blobs*

# Verify file integrity
file data/*.pages  # Should show "data"

Memory Analysis

# Monitor memory usage during operations
watch 'ps aux | grep your_process'

# Check for memory leaks
valgrind --tool=memcheck --leak-check=full your_binary

# Analyze heap usage
heaptrack your_binary

Development Workflow and Best Practices

Setting Up Development Environment

# Clone and setup
git clone <repository>
cd kinesis-api

# Install dependencies
cargo build

# Setup test environment
mkdir -p data/tests
export RUST_LOG=debug
export DB_BUFFER_POOL_SIZE=1000

Running Tests

# Full test suite
cargo test database::tests

# Specific test categories
cargo test database::tests::overflow_pages    # Storage tests
cargo test database::tests::concurrency       # Transaction tests
cargo test database::tests::schema           # Schema validation tests
cargo test database::tests::benchmark        # Performance tests

# Run with logging
RUST_LOG=debug cargo test database::tests::basic_operations

Adding New Features

1. Extending ValueType for New Data Types

#![allow(unused)]
fn main() {
// 1. Add to ValueType enum
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum ValueType {
    // ...existing variants...
    Decimal(rust_decimal::Decimal),
    Json(serde_json::Value),
}

// 2. Update FieldType enum
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum FieldType {
    // ...existing variants...
    Decimal,
    Json,
}

// 3. Add validation logic
impl FieldConstraint {
    pub fn validate_value(&self, value: &ValueType) -> Result<(), String> {
        match (&self.field_type, value) {
            // ...existing cases...
            (FieldType::Decimal, ValueType::Decimal(_)) => Ok(()),
            (FieldType::Json, ValueType::Json(_)) => Ok(()),
            _ => Err("Type mismatch".to_string()),
        }
    }
}

// 4. Add display formatting
impl fmt::Display for ValueType {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            // ...existing cases...
            ValueType::Decimal(d) => write!(f, "{}", d),
            ValueType::Json(j) => write!(f, "{}", j),
        }
    }
}
}

2. Adding New REPL Commands

#![allow(unused)]
fn main() {
// 1. Add command variant
#[derive(Debug, Clone)]
pub enum Command {
    // ...existing commands...
    ExplainQuery { table: String, query: String },
    ShowIndexes { table: Option<String> },
}

// 2. Add parser case
fn parse_commands(&self, input: &str) -> Result<Vec<Command>, String> {
    // ...existing parsing...
    match tokens[0].to_uppercase().as_str() {
        // ...existing cases...
        "EXPLAIN" => self.parse_explain(&tokens[1..])?,
        "SHOW" if tokens.len() > 1 && tokens[1].to_uppercase() == "INDEXES" => {
            self.parse_show_indexes(&tokens[2..])?
        },
        _ => return Err(format!("Unknown command: {}", tokens[0])),
    }
}

// 3. Add executor case
fn execute(&mut self, input: &str, format: Option<OutputFormat>) -> Result<String, String> {
    // ...existing execution...
    match command {
        // ...existing cases...
        Command::ExplainQuery { table, query } => {
            self.explain_query(&table, &query)
        },
        Command::ShowIndexes { table } => {
            self.show_indexes(table.as_deref())
        },
    }
}
}

3. Implementing New Storage Backends

#![allow(unused)]
fn main() {
// 1. Add database type
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub enum DatabaseType {
    // ...existing variants...
    Distributed { nodes: Vec<String> },
    Compressed { algorithm: CompressionType },
}

// 2. Update engine initialization
impl DBEngine {
    pub fn new(db_type: DatabaseType, /* other params */) -> Self {
        let (page_store, blob_store) = match &db_type {
            // ...existing cases...
            DatabaseType::Distributed { nodes } => {
                (Some(DistributedPageStore::new(nodes)?),
                 Some(DistributedBlobStore::new(nodes)?))
            },
            DatabaseType::Compressed { algorithm } => {
                (Some(CompressedPageStore::new(algorithm)?),
                 Some(CompressedBlobStore::new(algorithm)?))
            },
        };

        // ...rest of initialization...
    }
}
}

Code Style and Standards

#![allow(unused)]
fn main() {
// Use descriptive names and comprehensive documentation
/// Commits a transaction, applying all staged changes atomically.
///
/// This method validates the transaction according to its isolation level,
/// acquires necessary locks, applies changes to the in-memory database,
/// logs to WAL (if applicable), and persists to disk.
///
/// # Arguments
/// * `tx` - The transaction to commit
///
/// # Returns
/// * `Ok(())` if the transaction was committed successfully
/// * `Err(String)` describing the failure reason
///
/// # Examples
/// ```rust
/// let mut tx = engine.begin_transaction();
/// engine.insert_record(&mut tx, "users", record)?;
/// engine.commit(tx)?;
/// ```
pub fn commit(&mut self, tx: Transaction) -> Result<(), String> {
    CommitGuard::new(self, tx).commit()
}

// Use proper error handling with context
match self.validate_transaction(&tx) {
    Ok(()) => { /* continue */ }
    Err(e) => return Err(format!("Transaction validation failed: {}", e)),
}

// Prefer explicit types for complex generics
let buffer_pool: Arc<Mutex<BufferPool>> = Arc::new(Mutex::new(
    BufferPool::new(buffer_pool_size, db_type)
));
}

Testing Guidelines

Unit Tests

#![allow(unused)]
fn main() {
#[test]
fn test_transaction_isolation() {
    let mut engine = setup_test_db("isolation_test", IsolationLevel::Serializable);

    // Test specific isolation behavior
    let mut tx1 = engine.begin_transaction();
    let mut tx2 = engine.begin_transaction();

    // Simulate concurrent operations
    engine.insert_record(&mut tx1, "test", record1)?;
    engine.insert_record(&mut tx2, "test", record2)?;

    // Verify isolation guarantees
    assert!(engine.commit(tx1).is_ok());
    assert!(engine.commit(tx2).is_err()); // Should conflict
}
}

Integration Tests

#![allow(unused)]
fn main() {
#[test]
fn test_crash_recovery_scenario() {
    let db_path = "test_crash_recovery";

    // Phase 1: Create initial state
    {
        let mut engine = create_engine(db_path);
        perform_operations(&mut engine);
        // Simulate crash - don't clean shutdown
    }

    // Phase 2: Recovery
    {
        let mut engine = create_engine(db_path); // Should recover automatically
        verify_recovered_state(&engine);
    }
}
}

Performance Tests

#![allow(unused)]
fn main() {
#[test]
fn test_bulk_insert_performance() {
    let mut engine = setup_test_db("perf_test", IsolationLevel::ReadCommitted);

    let start = Instant::now();
    let mut tx = engine.begin_transaction();

    for i in 0..10_000 {
        let record = create_test_record(i, &format!("Record {}", i));
        engine.insert_record(&mut tx, "perf_test", record)?;
    }

    engine.commit(tx)?;
    let duration = start.elapsed();

    println!("Bulk insert of 10k records: {:?}", duration);
    assert!(duration < Duration::from_secs(10)); // Performance threshold
}
}

FAQ and Troubleshooting

  • Q: Why is my data not persisted?
    • A: Ensure you are not using the InMemory backend. Only OnDisk and Hybrid persist data.
  • Q: How do I recover from a crash?
    • A: On startup, the engine automatically loads from disk and replays the WAL.
  • Q: How do I enable indexes?
    • A: Set the DB_INDEXING environment variable to true before starting the engine.
  • Q: How do I tune performance?
    • A: Adjust DB_BUFFER_POOL_SIZE, WAL rotation threshold, and use the appropriate backend for your workload.

Glossary

  • MVCC: Multi-Version Concurrency Control, enables snapshot isolation.
  • WAL: Write-Ahead Log, ensures durability and crash recovery.
  • TOC: Table of Contents, metadata page mapping tables to page chains.
  • Blob: Large binary or string data stored outside the main page store.

For further details, refer to the source code and inline documentation in each module.


References

  • See the src/database/ directory for implementation details.
  • Consult module-level Rust docs for API usage and extension points.
  • For design discussions and roadmap, refer to the project wiki or issue tracker.

This document is intended to be comprehensive. If you find any missing details or have suggestions for improvement, please update this file or open a new ticket.