Skip to content

Storage Providers

Clarive supports pluggable storage providers for topic-attached documents and other binary assets. This allows you to store files outside the MongoDB database while maintaining backward compatibility with existing installations.

Overview

By default, Clarive stores all attachments in MongoDB GridFS. With storage providers, you can:

  • Store files in external storage systems (S3, filesystem, etc.)
  • Reduce database size and improve backup performance
  • Implement custom storage strategies for different file types
  • Maintain full backward compatibility with existing files

Default Behavior

All Clarive installations use GridFS as the default storage provider. This maintains 100% backward compatibility with previous versions.

Files are stored in the MongoDB collections: - grid.files - File metadata - grid.chunks - File content (in 255KB chunks)

Architecture

The storage provider system consists of three main components:

  1. Baseliner::StorageProvider - Role defining the provider interface
  2. Baseliner::Storage - Factory and manager for providers
  3. Provider Implementations - Concrete storage backends (GridFS, etc.)

Provider Interface

All storage providers must implement the Baseliner::StorageProvider role, which requires these methods:

  • put(filehandle => $fh, ...) - Store a file, returns storage_id
  • get(storage_id => $id) - Retrieve a file object
  • remove(storage_id => $id) - Delete a file
  • info(storage_id => $id) - Get file metadata
  • exists(storage_id => $id) - Check if file exists

Using Storage Providers

Getting a Provider

use Baseliner::Storage;

# Get the default provider (GridFS)
my $provider = Baseliner::Storage->provider();

# Get a specific provider
my $provider = Baseliner::Storage->provider('GridFS');

Storing Files

# Store a file
my $storage_id = $provider->put(
    filehandle => $fh,
    filename   => 'document.pdf',
    metadata   => { custom_field => 'value' }
);

Retrieving Files

# Get a file object
my $file = $provider->get( storage_id => $storage_id );

# Read content
my $content = $file->slurp();

# Write to filehandle
$file->print($output_fh);

# Get metadata
my $info = $file->info();  # { length, md5, uploadDate, filename, metadata }

Deleting Files

# Remove a file
$provider->remove( storage_id => $storage_id );

# Check if file exists
if ($provider->exists( storage_id => $storage_id )) {
    # File exists
}

Convenience Methods

You can use the Baseliner::Storage class methods directly:

use Baseliner::Storage;

# Store
my $storage_id = Baseliner::Storage->put( filehandle => $fh );

# Retrieve
my $file = Baseliner::Storage->get( storage_id => $storage_id );

# Check existence
if (Baseliner::Storage->exists( storage_id => $storage_id )) { ... }

# Get metadata
my $info = Baseliner::Storage->info( storage_id => $storage_id );

# Remove
Baseliner::Storage->remove( storage_id => $storage_id );

Implementing Custom Providers

To create a custom storage provider:

  1. Create a new module in lib/Baseliner/StorageProvider/
  2. Implement the Baseliner::StorageProvider role
  3. Implement all required methods

Example: Filesystem Provider

package Baseliner::StorageProvider::Filesystem;
use Moose;
use Baseliner::Utils;
use Path::Class;

with 'Baseliner::StorageProvider';

has 'base_path' => (
    is      => 'ro',
    isa     => 'Str',
    default => '/var/clarive/storage'
);

sub put {
    my ($self, %args) = @_;
    my $fh = $args{filehandle} or die "filehandle required";

    # Generate unique storage ID
    my $storage_id = Util->_uuid();

    # Determine storage path
    my $file_path = file($self->base_path, $storage_id);
    $file_path->dir->mkpath;

    # Write file
    my $out = $file_path->openw() or die "Cannot write: $!";
    while (read $fh, my $buffer, 8192) {
        print $out $buffer;
    }
    close $out;

    return $storage_id;
}

sub get {
    my ($self, %args) = @_;
    my $storage_id = $args{storage_id} or die "storage_id required";

    my $file_path = file($self->base_path, $storage_id);

    # Return file object
    return Baseliner::StorageProvider::Filesystem::File->new(
        path => $file_path
    );
}

sub remove {
    my ($self, %args) = @_;
    my $storage_id = $args{storage_id} or die "storage_id required";

    my $file_path = file($self->base_path, $storage_id);
    unlink $file_path if -f $file_path;

    return 1;
}

sub info {
    my ($self, %args) = @_;
    my $storage_id = $args{storage_id} or die "storage_id required";

    my $file_path = file($self->base_path, $storage_id);
    return {} unless -f $file_path;

    return {
        length => -s $file_path,
        md5    => Util->_md5($file_path->openr()),
    };
}

sub exists {
    my ($self, %args) = @_;
    my $storage_id = $args{storage_id} or return 0;

    my $file_path = file($self->base_path, $storage_id);
    return -f $file_path ? 1 : 0;
}

no Moose;
__PACKAGE__->meta->make_immutable;
1;

File Object Interface

Custom providers must return file objects from get() that support:

  • slurp() - Return entire file content
  • print($fh) - Write content to filehandle
  • info() - Return metadata hashref

GridFS Provider Details

The default GridFS provider (Baseliner::StorageProvider::GridFS):

  • Stores files in MongoDB GridFS collections
  • Generates MD5 checksums automatically
  • Supports file metadata and custom fields
  • Maintains full backward compatibility
  • Chunks files at 255KB for efficient storage

GridFS Storage Format

Files are stored with: - _id - MongoDB OID (used as storage_id) - filename - Original filename - length - File size in bytes - chunkSize - Chunk size (default 255KB) - uploadDate - Upload timestamp - md5 - MD5 checksum - metadata - Custom metadata hash

Migration and Compatibility

Backward Compatibility

The storage provider abstraction is fully backward compatible:

  • Existing files in GridFS continue to work
  • No migration required
  • Default behavior unchanged
  • All existing code paths supported

Future: Migrating to New Providers

When custom providers are configured (future enhancement):

  1. New uploads will use the configured provider
  2. Existing files remain in GridFS
  3. Files are migrated on access (lazy migration)
  4. Both old and new storage work simultaneously

Testing

Test your custom provider implementation:

use Test::More;
use Baseliner::Storage;

# Register your provider
Baseliner::Storage->set_default_provider('MyProvider');

# Test basic operations
my $provider = Baseliner::Storage->provider();

my $fh = ...;
my $storage_id = $provider->put( filehandle => $fh );
ok $storage_id, 'put returns storage_id';

my $file = $provider->get( storage_id => $storage_id );
ok $file, 'get returns file';

$provider->remove( storage_id => $storage_id );
ok !$provider->exists( storage_id => $storage_id ), 'file removed';

Microsoft SharePoint Provider

Clarive includes a SharePoint storage provider that stores files in Microsoft SharePoint Online using the Microsoft Graph API.

Configuration

To use SharePoint as a storage provider:

  1. Create a SharePoint Site CI Resource (MSSharePointSite)
  2. Configure the SharePoint connection:
  3. Tenant ID: Your Microsoft 365 tenant ID
  4. Client ID: Application (client) ID from Azure AD app registration
  5. Client Secret: Client secret from Azure AD app registration
  6. Site ID: SharePoint site identifier
  7. Drive ID: Document library drive ID (default: "root")
  8. Active: Enable the storage provider

  9. In the "Attach Files" field configuration, select:

  10. Storage Provider: Your SharePoint Site CI
  11. Storage Folder: Folder path in SharePoint (e.g., /FromClarive)

Azure AD App Registration

Before using the SharePoint provider, you must register an application in Azure AD:

  1. Go to Azure Portal → Azure Active Directory → App registrations
  2. Create a new registration
  3. Under "Certificates & secrets", create a new client secret
  4. Under "API permissions", add:
  5. Microsoft Graph → Application permissions
  6. Sites.ReadWrite.All
  7. Files.ReadWrite.All
  8. Grant admin consent for the permissions
  9. Copy the Tenant ID, Client ID, and Client Secret

Getting Site and Drive IDs

To find your SharePoint Site ID and Drive ID:

# Get site ID
curl -H "Authorization: Bearer <token>" \
  "https://graph.microsoft.com/v1.0/sites/<tenant>.sharepoint.com:/sites/<site-name>"

# Get drive ID
curl -H "Authorization: Bearer <token>" \
  "https://graph.microsoft.com/v1.0/sites/<site-id>/drives"

Features

The SharePoint storage provider:

  • Stores files in SharePoint Online with original filenames
  • Uses SharePoint file IDs for reliable file retrieval
  • Supports configurable storage folders per field
  • Automatically handles OAuth2 authentication
  • Supports chunked uploads for large files (3.2 MB chunks)
  • Replaces existing files on update (conflict behavior: replace)

File Organization

Files are stored in SharePoint exactly as specified in the field configuration:

  • Storage Folder: Configured at the field level (e.g., /FromClarive)
  • Filename: Original filename is preserved (e.g., document.pdf)
  • Full Path: /FromClarive/document.pdf

The storage provider stores SharePoint's unique file ID in Clarive's database, which is used for all file operations (download, delete, info).

Field-Level Configuration

Each "Attach Files" field can specify its own storage folder:

{
    xtype: 'textfield',
    name: 'storage_folder',
    fieldLabel: 'Storage Folder',
    value: '/FromClarive'  // Files will be stored here
}

This allows using the same SharePoint Site CI for multiple fields, each storing files in different folders.

Variable Substitution in Storage Folders

The storage folder path supports variable substitution using the ${variable} syntax. Variables are replaced with topic data at upload time.

Examples:

// Use topic title
storage_folder: '/Documents/${title}'

// Use topic category
storage_folder: '/Projects/${category}'

// Use related CI names (automatically resolved)
storage_folder: '/Environments/${environment}/Projects/${project}'

Variable Resolution:

  • Topic fields are available directly (e.g., ${title}, ${status})
  • Related CI fields (MIDs) are automatically replaced with CI names
  • Array fields with a single value are converted to scalars
  • Multiple values remain as arrays (first value is used)

Example:

If a topic has: - Title: "Feature Implementation" - Project CI: "MyProject" (MID: cla-default-project-123) - Environment CI: "Production" (MID: cla-default-environment-456)

And storage_folder is configured as: /Projects/${project}/${environment}

The final path will be: /Projects/MyProject/Production

Limitations

  • Files are stored with their original names; versioning is handled by Clarive asset versioning
  • Invalid SharePoint characters (" * : < > ? / \ |) in folder paths and filenames are automatically replaced with underscores
  • Special characters like #, %, &, etc. are allowed and automatically URL-encoded
  • Requires network connectivity to Microsoft Graph API
  • Requires valid Azure AD credentials and permissions
  • Download performance depends on SharePoint/Microsoft Graph API response times

Creating a Storage Provider Resource

Storage providers are created as CI Resources in Clarive. To create a new SharePoint storage provider:

Note: The mssharepoint feature must be installed for the MSSharePointSite provider type to be available.

  1. Navigate to Resources in the Clarive interface
  2. Click Create New Resource
  3. Select Storage Provider from the resource types
  4. Choose MSSharePointSite as the provider type
  5. Fill in the required fields:
  6. Name: A descriptive name for the provider
  7. Moniker: Unique identifier
  8. Active: Check to enable the provider
  9. Tenant ID, Client ID, Client Secret: Azure AD credentials
  10. Site ID, Drive ID: SharePoint identifiers
  11. Click Save

Once created, the SharePoint provider will be available for selection in "Attach Files" field configurations throughout Clarive.

See Also