Architecture
OxiDex is built on a clean, modular architecture that separates concerns, enables testability, and allows multiple access patterns (CLI, library API, FFI).
Design Philosophy
OxiDex follows Hexagonal Architecture (Ports and Adapters pattern) with three main layers:
- Application Layer - External interfaces (CLI, FFI bindings)
- Domain Layer - Core business logic and format-agnostic metadata models
- Infrastructure Layer - Format-specific parsers, I/O operations, platform-specific code
This architecture ensures:
- Clean separation of concerns - Business logic independent of I/O
- Testability - Core logic testable without filesystem dependencies
- Extensibility - Easy to add new file formats or access patterns
- Multiple interfaces - Same core logic powers CLI, library API, and FFI
Architecture Diagram
┌─────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CLI Binary │ │ C FFI API │ │ Library API │ │
│ │ (oxidex) │ │ (oxidex.h) │ │ (crates.io) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
└─────────┼──────────────────┼──────────────────┼────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ DOMAIN LAYER │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Metadata Models (Tag, TagGroup, MetadataMap) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Operations (Extract, Write, Format Detection) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Tag Database (32,677 tags from ExifTool source) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ INFRASTRUCTURE LAYER │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Format Parsers (JPEG, TIFF, PNG, MP4, etc.) │ │
│ │ - Binary parsers (nom combinators) │ │
│ │ - Segment extraction │ │
│ │ - Tag mapping to domain models │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ I/O Abstraction (FileReader, MemoryMap) │ │
│ │ - Memory-mapped I/O (memmap2) │ │
│ │ - Buffered reading │ │
│ │ - Atomic writes │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Platform Layer (File system, OS attributes) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘Core Components
1. Application Layer
CLI Binary (src/bin/oxidex.rs)
Command-line interface providing ExifTool-compatible syntax:
# Extract metadata
oxidex photo.jpg
# Write metadata
oxidex -Artist="Jane Doe" photo.jpg
# Batch processing
oxidex -r /path/to/photos/Features:
- Argument parsing with clap
- Output formatting (text, JSON, CSV)
- Progress reporting
- Error handling and user-friendly messages
Library API (src/lib.rs)
Rust library for embedding metadata operations:
use oxidex::{MetadataReader, MetadataWriter};
// Read metadata
let reader = MetadataReader::from_file("photo.jpg")?;
let metadata = reader.extract_all()?;
// Write metadata
let mut writer = MetadataWriter::from_file("photo.jpg")?;
writer.set_tag("Artist", "Jane Doe")?;
writer.write()?;Features:
- Type-safe API
- Zero-cost abstractions
- Iterator-based access
- Error handling with Result types
C FFI API (src/ffi/mod.rs)
C-compatible interface for cross-language integration:
#include "oxidex.h"
// Read metadata
OxidexReader* reader = oxidex_reader_new("photo.jpg");
OxidexMetadata* metadata = oxidex_reader_extract(reader);
// Access tags
const char* artist = oxidex_metadata_get(metadata, "Artist");
// Cleanup
oxidex_metadata_free(metadata);
oxidex_reader_free(reader);Features:
- C ABI compatibility
- Manual memory management
- Error codes and null checks
- Cross-language support (Python, Ruby, JavaScript, etc.)
2. Domain Layer
Metadata Models
Tag - Single metadata element:
pub struct Tag {
pub name: String,
pub value: TagValue,
pub group: TagGroup,
}
pub enum TagValue {
String(String),
Integer(i64),
Float(f64),
DateTime(DateTime),
Binary(Vec<u8>),
Array(Vec<TagValue>),
}MetadataMap - Collection of tags organized by group:
pub struct MetadataMap {
tags: HashMap<String, Tag>,
groups: HashMap<TagGroup, Vec<Tag>>,
}TagGroup - Logical groupings (EXIF, XMP, IPTC, etc.):
pub enum TagGroup {
EXIF,
XMP,
IPTC,
GPS,
MakerNotes,
FileSystem,
// ... 140+ format families
}Operations
Extract - Read metadata from files:
- Format detection via magic bytes
- Parser dispatch based on format
- Tag aggregation across multiple segments/IFDs
- Value normalization and type conversion
Write - Modify metadata in files:
- Atomic file operations (write to temp, then rename)
- In-place updates where possible
- Preserve unmodified metadata
- Maintain file integrity
Format Detection - Identify file types:
- Magic byte matching (first 16 bytes)
- Extension fallback
- MIME type detection
Tag Database
32,677 metadata tags automatically synchronized with ExifTool source:
pub struct TagDatabase {
tags: HashMap<u16, TagInfo>, // Tag ID -> Info
names: HashMap<String, u16>, // Tag name -> ID
}
pub struct TagInfo {
pub id: u16,
pub name: &'static str,
pub group: TagGroup,
pub writable: bool,
pub format: TagFormat,
}Generation:
- Automated extraction from ExifTool Perl source
- Build-time code generation
- Static data for zero runtime overhead
3. Infrastructure Layer
Format Parsers
Each format has a dedicated parser module:
JPEG Parser (src/parsers/jpeg/):
- Segment extraction (APP0, APP1, APP13, etc.)
- EXIF IFD parsing (IFD0, IFD1, ExifIFD, GPS)
- XMP parsing (XML → tag map)
- IPTC parsing (Photoshop IRB → IIM records)
- JFIF metadata
TIFF Parser (src/parsers/tiff/):
- IFD (Image File Directory) traversal
- Byte order detection (little/big endian)
- Tag value extraction with type conversion
- SubIFD handling (recursive)
PNG Parser (src/parsers/png/):
- Chunk parsing (tEXt, iTXt, zTXt, etc.)
- CRC validation
- ICC profile extraction
MP4/QuickTime Parser (src/parsers/mp4/):
- Atom tree traversal
- ItemList metadata (©nam, ©ART, etc.)
- Timecode handling
PDF Parser (src/parsers/pdf/):
- Info dictionary extraction
- XMP metadata stream
- ICC profiles
Binary Parsing:
- Uses
nomparser combinator library - Type-safe parsing with compile-time guarantees
- Zero-copy where possible
- Error recovery for malformed files
I/O Abstraction
FileReader - Abstract file access:
pub trait FileReader {
fn read(&self, offset: u64, size: usize) -> Result<Vec<u8>>;
fn size(&self) -> u64;
}Implementations:
MemoryMappedReader- Usesmemmap2for large filesBufferedReader- Standard buffered I/OInMemoryReader- For testing with byte slices
FileWriter - Safe file modification:
pub struct FileWriter {
path: PathBuf,
temp_path: PathBuf,
}
impl FileWriter {
pub fn write_atomic(&mut self, data: &[u8]) -> Result<()> {
// Write to temp file
// Sync to disk
// Atomic rename
}
}Platform Layer
File System Operations:
- File attributes (size, permissions, timestamps)
- Directory traversal
- Recursive scanning
- Symbolic link handling
OS-specific:
- Unix permissions and ownership
- Windows file attributes
- macOS extended attributes
Data Flow
Read Operation
- CLI/API Entry - User requests metadata extraction
- Format Detection - Identify file type via magic bytes
- Parser Selection - Dispatch to format-specific parser
- Binary Parsing - Extract raw metadata from file
- Tag Mapping - Convert binary data to domain Tag objects
- Aggregation - Combine tags from multiple sources (EXIF + XMP + IPTC)
- Return - Deliver MetadataMap to caller
Write Operation
- CLI/API Entry - User requests metadata modification
- Read Current Metadata - Load existing tags
- Merge Changes - Apply user modifications to metadata map
- Format Serialization - Convert tags back to binary format
- Atomic Write - Write to temp file, then atomic rename
- Verification - Re-read to confirm changes
Performance Optimizations
Zero-Cost Abstractions
- Static dispatch - Trait objects avoided in hot paths
- Inline functions - Critical path functions marked
#[inline] - Const generics - Compile-time specialization where applicable
Memory Efficiency
- Memory-mapped I/O - Large files accessed without full buffering
- Zero-copy parsing - Borrow from memory map where possible
- String interning - Common tag names stored as static strings
Parallelization
- Batch processing - Uses Rayon for parallel file processing
- Work stealing - Efficient load balancing across cores
- Lock-free - Metadata operations are read-only or use atomic operations
Caching
- Tag database - Static data embedded at compile time
- Parser results - Reuse parsed structures within file
- Format detection - Cache magic byte results
Extensibility
Adding a New Format
- Create parser module -
src/parsers/new_format/mod.rs - Implement FileParser trait:rust
pub trait FileParser { fn detect(data: &[u8]) -> bool; fn parse(&self, reader: &dyn FileReader) -> Result<MetadataMap>; } - Register in format registry - Add to
src/formats/mod.rs - Add tests - Unit tests + integration tests
- Update tag database - If new tags are needed
Adding New Tags
- Update tag database source -
build/tag_database_generator.rs - Regenerate - Run
cargo buildto regeneratetags.rs - Parser support - Add parsing logic in relevant parser
Testing Strategy
Unit Tests
- Individual parser components
- Tag database lookups
- Binary parsing functions
Integration Tests
- Complete file read/write cycles
- ExifTool comparison tests
- Format validation tests
Fuzzing
- Continuous fuzzing with cargo-fuzz
- Format-specific fuzz targets
- Crash reproduction and regression tests
Security Considerations
Memory Safety
- No unsafe code in critical paths
- Bounds checking on all array accesses
- UTF-8 validation for strings
Input Validation
- Magic byte verification
- Size limits on allocations
- CRC checks where available
Atomic Operations
- File writes are atomic (temp + rename)
- No partial updates visible to other processes
- Backup original on modification
Future Architecture
Planned Enhancements
- Async I/O - Non-blocking file operations for GUI integration
- Plugin System - Loadable parsers for proprietary formats
- Network Streaming - Process files from HTTP/S3 without download
- GPU Acceleration - Parallel processing of large image batches
API Stability
- Public API - Semver guarantees for library users
- Internal APIs - May change between minor versions
- FFI API - C ABI stability for cross-language bindings