Architecture

OxiDex is built on a clean, modular architecture that separates concerns, enables testability, and allows multiple access patterns (CLI, library API, FFI).

Design Philosophy

OxiDex follows Hexagonal Architecture (Ports and Adapters pattern) with three main layers:

Application Layer - External interfaces (CLI, FFI bindings)
Domain Layer - Core business logic and format-agnostic metadata models
Infrastructure Layer - Format-specific parsers, I/O operations, platform-specific code

This architecture ensures:

Clean separation of concerns - Business logic independent of I/O
Testability - Core logic testable without filesystem dependencies
Extensibility - Easy to add new file formats or access patterns
Multiple interfaces - Same core logic powers CLI, library API, and FFI

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                    APPLICATION LAYER                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
│  │  CLI Binary  │  │  C FFI API   │  │  Library API │     │
│  │  (oxidex)    │  │  (oxidex.h)  │  │  (crates.io) │     │
│  └──────────────┘  └──────────────┘  └──────────────┘     │
│         │                  │                  │            │
└─────────┼──────────────────┼──────────────────┼────────────┘
          │                  │                  │
          ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────┐
│                      DOMAIN LAYER                           │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Metadata Models (Tag, TagGroup, MetadataMap)       │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Operations (Extract, Write, Format Detection)      │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Tag Database (32,677 tags from ExifTool source)    │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘
          │                  │                  │
          ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────┐
│                   INFRASTRUCTURE LAYER                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Format Parsers (JPEG, TIFF, PNG, MP4, etc.)        │  │
│  │  - Binary parsers (nom combinators)                 │  │
│  │  - Segment extraction                               │  │
│  │  - Tag mapping to domain models                     │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  I/O Abstraction (FileReader, MemoryMap)            │  │
│  │  - Memory-mapped I/O (memmap2)                      │  │
│  │  - Buffered reading                                 │  │
│  │  - Atomic writes                                    │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                             │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Platform Layer (File system, OS attributes)        │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Core Components

1. Application Layer

CLI Binary (`src/bin/oxidex.rs`)

Command-line interface providing ExifTool-compatible syntax:

bash

# Extract metadata
oxidex photo.jpg

# Write metadata
oxidex -Artist="Jane Doe" photo.jpg

# Batch processing
oxidex -r /path/to/photos/

Features:

Argument parsing with clap
Output formatting (text, JSON, CSV)
Progress reporting
Error handling and user-friendly messages

Library API (`src/lib.rs`)

Rust library for embedding metadata operations:

rust

use oxidex::{MetadataReader, MetadataWriter};

// Read metadata
let reader = MetadataReader::from_file("photo.jpg")?;
let metadata = reader.extract_all()?;

// Write metadata
let mut writer = MetadataWriter::from_file("photo.jpg")?;
writer.set_tag("Artist", "Jane Doe")?;
writer.write()?;

Features:

Type-safe API
Zero-cost abstractions
Iterator-based access
Error handling with Result types

C FFI API (`src/ffi/mod.rs`)

C-compatible interface for cross-language integration:

#include "oxidex.h"

// Read metadata
OxidexReader* reader = oxidex_reader_new("photo.jpg");
OxidexMetadata* metadata = oxidex_reader_extract(reader);

// Access tags
const char* artist = oxidex_metadata_get(metadata, "Artist");

// Cleanup
oxidex_metadata_free(metadata);
oxidex_reader_free(reader);

Features:

C ABI compatibility
Manual memory management
Error codes and null checks
Cross-language support (Python, Ruby, JavaScript, etc.)

2. Domain Layer

Metadata Models

Tag - Single metadata element:

rust

pub struct Tag {
    pub name: String,
    pub value: TagValue,
    pub group: TagGroup,
}

pub enum TagValue {
    String(String),
    Integer(i64),
    Float(f64),
    DateTime(DateTime),
    Binary(Vec<u8>),
    Array(Vec<TagValue>),
}

MetadataMap - Collection of tags organized by group:

rust

pub struct MetadataMap {
    tags: HashMap<String, Tag>,
    groups: HashMap<TagGroup, Vec<Tag>>,
}

TagGroup - Logical groupings (EXIF, XMP, IPTC, etc.):

rust

pub enum TagGroup {
    EXIF,
    XMP,
    IPTC,
    GPS,
    MakerNotes,
    FileSystem,
    // ... 140+ format families
}

Operations

Extract - Read metadata from files:

Format detection via magic bytes
Parser dispatch based on format
Tag aggregation across multiple segments/IFDs
Value normalization and type conversion

Write - Modify metadata in files:

Atomic file operations (write to temp, then rename)
In-place updates where possible
Preserve unmodified metadata
Maintain file integrity

Format Detection - Identify file types:

Magic byte matching (first 16 bytes)
Extension fallback
MIME type detection

Tag Database

32,677 metadata tags automatically synchronized with ExifTool source:

rust

pub struct TagDatabase {
    tags: HashMap<u16, TagInfo>,  // Tag ID -> Info
    names: HashMap<String, u16>,  // Tag name -> ID
}

pub struct TagInfo {
    pub id: u16,
    pub name: &'static str,
    pub group: TagGroup,
    pub writable: bool,
    pub format: TagFormat,
}

Generation:

Automated extraction from ExifTool Perl source
Build-time code generation
Static data for zero runtime overhead

3. Infrastructure Layer

Format Parsers

Each format has a dedicated parser module:

JPEG Parser (src/parsers/jpeg/):

Segment extraction (APP0, APP1, APP13, etc.)
EXIF IFD parsing (IFD0, IFD1, ExifIFD, GPS)
XMP parsing (XML → tag map)
IPTC parsing (Photoshop IRB → IIM records)
JFIF metadata

TIFF Parser (src/parsers/tiff/):

IFD (Image File Directory) traversal
Byte order detection (little/big endian)
Tag value extraction with type conversion
SubIFD handling (recursive)

PNG Parser (src/parsers/png/):

Chunk parsing (tEXt, iTXt, zTXt, etc.)
CRC validation
ICC profile extraction

MP4/QuickTime Parser (src/parsers/mp4/):

Atom tree traversal
ItemList metadata (©nam, ©ART, etc.)
Timecode handling

PDF Parser (src/parsers/pdf/):

Info dictionary extraction
XMP metadata stream
ICC profiles

Binary Parsing:

Uses nom parser combinator library
Type-safe parsing with compile-time guarantees
Zero-copy where possible
Error recovery for malformed files

I/O Abstraction

FileReader - Abstract file access:

rust

pub trait FileReader {
    fn read(&self, offset: u64, size: usize) -> Result<Vec<u8>>;
    fn size(&self) -> u64;
}

Implementations:

MemoryMappedReader - Uses memmap2 for large files
BufferedReader - Standard buffered I/O
InMemoryReader - For testing with byte slices

FileWriter - Safe file modification:

rust

pub struct FileWriter {
    path: PathBuf,
    temp_path: PathBuf,
}

impl FileWriter {
    pub fn write_atomic(&mut self, data: &[u8]) -> Result<()> {
        // Write to temp file
        // Sync to disk
        // Atomic rename
    }
}

Platform Layer

File System Operations:

File attributes (size, permissions, timestamps)
Directory traversal
Recursive scanning
Symbolic link handling

OS-specific:

Unix permissions and ownership
Windows file attributes
macOS extended attributes

Data Flow

Read Operation

CLI/API Entry - User requests metadata extraction
Format Detection - Identify file type via magic bytes
Parser Selection - Dispatch to format-specific parser
Binary Parsing - Extract raw metadata from file
Tag Mapping - Convert binary data to domain Tag objects
Aggregation - Combine tags from multiple sources (EXIF + XMP + IPTC)
Return - Deliver MetadataMap to caller

Write Operation

CLI/API Entry - User requests metadata modification
Read Current Metadata - Load existing tags
Merge Changes - Apply user modifications to metadata map
Format Serialization - Convert tags back to binary format
Atomic Write - Write to temp file, then atomic rename
Verification - Re-read to confirm changes

Performance Optimizations

Zero-Cost Abstractions

Static dispatch - Trait objects avoided in hot paths
Inline functions - Critical path functions marked #[inline]
Const generics - Compile-time specialization where applicable

Memory Efficiency

Memory-mapped I/O - Large files accessed without full buffering
Zero-copy parsing - Borrow from memory map where possible
String interning - Common tag names stored as static strings

Parallelization

Batch processing - Uses Rayon for parallel file processing
Work stealing - Efficient load balancing across cores
Lock-free - Metadata operations are read-only or use atomic operations

Caching

Tag database - Static data embedded at compile time
Parser results - Reuse parsed structures within file
Format detection - Cache magic byte results

Extensibility

Adding a New Format

Create parser module - src/parsers/new_format/mod.rs

Implement FileParser trait:

rust

pub trait FileParser {
    fn detect(data: &[u8]) -> bool;
    fn parse(&self, reader: &dyn FileReader) -> Result<MetadataMap>;
}

Register in format registry - Add to src/formats/mod.rs
Add tests - Unit tests + integration tests
Update tag database - If new tags are needed

Adding New Tags

Update tag database source - build/tag_database_generator.rs
Regenerate - Run cargo build to regenerate tags.rs
Parser support - Add parsing logic in relevant parser

Testing Strategy

Unit Tests

Individual parser components
Tag database lookups
Binary parsing functions

Integration Tests

Complete file read/write cycles
ExifTool comparison tests
Format validation tests

Fuzzing

Continuous fuzzing with cargo-fuzz
Format-specific fuzz targets
Crash reproduction and regression tests

Security Considerations

Memory Safety

No unsafe code in critical paths
Bounds checking on all array accesses
UTF-8 validation for strings

Input Validation

Magic byte verification
Size limits on allocations
CRC checks where available

Atomic Operations

File writes are atomic (temp + rename)
No partial updates visible to other processes
Backup original on modification

Future Architecture

Planned Enhancements

Async I/O - Non-blocking file operations for GUI integration
Plugin System - Loadable parsers for proprietary formats
Network Streaming - Process files from HTTP/S3 without download
GPU Acceleration - Parallel processing of large image batches

API Stability

Public API - Semver guarantees for library users
Internal APIs - May change between minor versions
FFI API - C ABI stability for cross-language bindings

Architecture ​

Design Philosophy ​

Architecture Diagram ​

Core Components ​

1. Application Layer ​

CLI Binary (src/bin/oxidex.rs) ​

Library API (src/lib.rs) ​

C FFI API (src/ffi/mod.rs) ​

2. Domain Layer ​

Metadata Models ​

Operations ​

Tag Database ​

3. Infrastructure Layer ​

Format Parsers ​

I/O Abstraction ​

Platform Layer ​

Data Flow ​

Read Operation ​

Write Operation ​

Performance Optimizations ​

Zero-Cost Abstractions ​

Memory Efficiency ​

Parallelization ​

Caching ​

Extensibility ​

Adding a New Format ​

Adding New Tags ​

Testing Strategy ​

Unit Tests ​

Integration Tests ​

Fuzzing ​

Security Considerations ​

Memory Safety ​

Input Validation ​

Atomic Operations ​

Future Architecture ​

Planned Enhancements ​

API Stability ​

References ​

Architecture

Design Philosophy

Architecture Diagram

Core Components

1. Application Layer

CLI Binary (`src/bin/oxidex.rs`)

Library API (`src/lib.rs`)

C FFI API (`src/ffi/mod.rs`)

2. Domain Layer

Metadata Models

Operations

Tag Database

3. Infrastructure Layer

Format Parsers

I/O Abstraction

Platform Layer

Data Flow

Read Operation

Write Operation

Performance Optimizations

Zero-Cost Abstractions

Memory Efficiency

Parallelization

Caching

Extensibility

Adding a New Format

Adding New Tags

Testing Strategy

Unit Tests

Integration Tests

Fuzzing

Security Considerations

Memory Safety

Input Validation

Atomic Operations

Future Architecture

Planned Enhancements

API Stability

References