Storage Providers
Clarive supports pluggable storage providers for topic-attached documents and other binary assets. This allows you to store files outside the MongoDB database while maintaining backward compatibility with existing installations.
Overview¶
By default, Clarive stores all attachments in MongoDB GridFS. With storage providers, you can:
- Store files in external storage systems (S3, filesystem, etc.)
- Reduce database size and improve backup performance
- Implement custom storage strategies for different file types
- Maintain full backward compatibility with existing files
Default Behavior¶
All Clarive installations use GridFS as the default storage provider. This maintains 100% backward compatibility with previous versions.
Files are stored in the MongoDB collections:
- grid.files - File metadata
- grid.chunks - File content (in 255KB chunks)
Architecture¶
The storage provider system consists of three main components:
- Baseliner::StorageProvider - Role defining the provider interface
- Baseliner::Storage - Factory and manager for providers
- Provider Implementations - Concrete storage backends (GridFS, etc.)
Provider Interface¶
All storage providers must implement the Baseliner::StorageProvider role, which requires these methods:
- put(filehandle => $fh, ...) - Store a file, returns storage_id
- get(storage_id => $id) - Retrieve a file object
- remove(storage_id => $id) - Delete a file
- info(storage_id => $id) - Get file metadata
- exists(storage_id => $id) - Check if file exists
Using Storage Providers¶
Getting a Provider¶
use Baseliner::Storage;
# Get the default provider (GridFS)
my $provider = Baseliner::Storage->provider();
# Get a specific provider
my $provider = Baseliner::Storage->provider('GridFS');
Storing Files¶
# Store a file
my $storage_id = $provider->put(
filehandle => $fh,
filename => 'document.pdf',
metadata => { custom_field => 'value' }
);
Retrieving Files¶
# Get a file object
my $file = $provider->get( storage_id => $storage_id );
# Read content
my $content = $file->slurp();
# Write to filehandle
$file->print($output_fh);
# Get metadata
my $info = $file->info(); # { length, md5, uploadDate, filename, metadata }
Deleting Files¶
# Remove a file
$provider->remove( storage_id => $storage_id );
# Check if file exists
if ($provider->exists( storage_id => $storage_id )) {
# File exists
}
Convenience Methods¶
You can use the Baseliner::Storage class methods directly:
use Baseliner::Storage;
# Store
my $storage_id = Baseliner::Storage->put( filehandle => $fh );
# Retrieve
my $file = Baseliner::Storage->get( storage_id => $storage_id );
# Check existence
if (Baseliner::Storage->exists( storage_id => $storage_id )) { ... }
# Get metadata
my $info = Baseliner::Storage->info( storage_id => $storage_id );
# Remove
Baseliner::Storage->remove( storage_id => $storage_id );
Implementing Custom Providers¶
To create a custom storage provider:
- Create a new module in
lib/Baseliner/StorageProvider/ - Implement the
Baseliner::StorageProviderrole - Implement all required methods
Example: Filesystem Provider¶
package Baseliner::StorageProvider::Filesystem;
use Moose;
use Baseliner::Utils;
use Path::Class;
with 'Baseliner::StorageProvider';
has 'base_path' => (
is => 'ro',
isa => 'Str',
default => '/var/clarive/storage'
);
sub put {
my ($self, %args) = @_;
my $fh = $args{filehandle} or die "filehandle required";
# Generate unique storage ID
my $storage_id = Util->_uuid();
# Determine storage path
my $file_path = file($self->base_path, $storage_id);
$file_path->dir->mkpath;
# Write file
my $out = $file_path->openw() or die "Cannot write: $!";
while (read $fh, my $buffer, 8192) {
print $out $buffer;
}
close $out;
return $storage_id;
}
sub get {
my ($self, %args) = @_;
my $storage_id = $args{storage_id} or die "storage_id required";
my $file_path = file($self->base_path, $storage_id);
# Return file object
return Baseliner::StorageProvider::Filesystem::File->new(
path => $file_path
);
}
sub remove {
my ($self, %args) = @_;
my $storage_id = $args{storage_id} or die "storage_id required";
my $file_path = file($self->base_path, $storage_id);
unlink $file_path if -f $file_path;
return 1;
}
sub info {
my ($self, %args) = @_;
my $storage_id = $args{storage_id} or die "storage_id required";
my $file_path = file($self->base_path, $storage_id);
return {} unless -f $file_path;
return {
length => -s $file_path,
md5 => Util->_md5($file_path->openr()),
};
}
sub exists {
my ($self, %args) = @_;
my $storage_id = $args{storage_id} or return 0;
my $file_path = file($self->base_path, $storage_id);
return -f $file_path ? 1 : 0;
}
no Moose;
__PACKAGE__->meta->make_immutable;
1;
File Object Interface¶
Custom providers must return file objects from get() that support:
- slurp() - Return entire file content
- print($fh) - Write content to filehandle
- info() - Return metadata hashref
GridFS Provider Details¶
The default GridFS provider (Baseliner::StorageProvider::GridFS):
- Stores files in MongoDB GridFS collections
- Generates MD5 checksums automatically
- Supports file metadata and custom fields
- Maintains full backward compatibility
- Chunks files at 255KB for efficient storage
GridFS Storage Format¶
Files are stored with: - _id - MongoDB OID (used as storage_id) - filename - Original filename - length - File size in bytes - chunkSize - Chunk size (default 255KB) - uploadDate - Upload timestamp - md5 - MD5 checksum - metadata - Custom metadata hash
Migration and Compatibility¶
Backward Compatibility¶
The storage provider abstraction is fully backward compatible:
- Existing files in GridFS continue to work
- No migration required
- Default behavior unchanged
- All existing code paths supported
Future: Migrating to New Providers¶
When custom providers are configured (future enhancement):
- New uploads will use the configured provider
- Existing files remain in GridFS
- Files are migrated on access (lazy migration)
- Both old and new storage work simultaneously
Testing¶
Test your custom provider implementation:
use Test::More;
use Baseliner::Storage;
# Register your provider
Baseliner::Storage->set_default_provider('MyProvider');
# Test basic operations
my $provider = Baseliner::Storage->provider();
my $fh = ...;
my $storage_id = $provider->put( filehandle => $fh );
ok $storage_id, 'put returns storage_id';
my $file = $provider->get( storage_id => $storage_id );
ok $file, 'get returns file';
$provider->remove( storage_id => $storage_id );
ok !$provider->exists( storage_id => $storage_id ), 'file removed';
Microsoft SharePoint Provider¶
Clarive includes a SharePoint storage provider that stores files in Microsoft SharePoint Online using the Microsoft Graph API.
Configuration¶
To use SharePoint as a storage provider:
- Create a SharePoint Site CI Resource (
MSSharePointSite) - Configure the SharePoint connection:
- Tenant ID: Your Microsoft 365 tenant ID
- Client ID: Application (client) ID from Azure AD app registration
- Client Secret: Client secret from Azure AD app registration
- Site ID: SharePoint site identifier
- Drive ID: Document library drive ID (default: "root")
-
Active: Enable the storage provider
-
In the "Attach Files" field configuration, select:
- Storage Provider: Your SharePoint Site CI
- Storage Folder: Folder path in SharePoint (e.g.,
/FromClarive)
Azure AD App Registration¶
Before using the SharePoint provider, you must register an application in Azure AD:
- Go to Azure Portal → Azure Active Directory → App registrations
- Create a new registration
- Under "Certificates & secrets", create a new client secret
- Under "API permissions", add:
- Microsoft Graph → Application permissions
Sites.ReadWrite.AllFiles.ReadWrite.All- Grant admin consent for the permissions
- Copy the Tenant ID, Client ID, and Client Secret
Getting Site and Drive IDs¶
To find your SharePoint Site ID and Drive ID:
# Get site ID
curl -H "Authorization: Bearer <token>" \
"https://graph.microsoft.com/v1.0/sites/<tenant>.sharepoint.com:/sites/<site-name>"
# Get drive ID
curl -H "Authorization: Bearer <token>" \
"https://graph.microsoft.com/v1.0/sites/<site-id>/drives"
Features¶
The SharePoint storage provider:
- Stores files in SharePoint Online with original filenames
- Uses SharePoint file IDs for reliable file retrieval
- Supports configurable storage folders per field
- Automatically handles OAuth2 authentication
- Supports chunked uploads for large files (3.2 MB chunks)
- Replaces existing files on update (conflict behavior: replace)
File Organization¶
Files are stored in SharePoint exactly as specified in the field configuration:
- Storage Folder: Configured at the field level (e.g.,
/FromClarive) - Filename: Original filename is preserved (e.g.,
document.pdf) - Full Path:
/FromClarive/document.pdf
The storage provider stores SharePoint's unique file ID in Clarive's database, which is used for all file operations (download, delete, info).
Field-Level Configuration¶
Each "Attach Files" field can specify its own storage folder:
{
xtype: 'textfield',
name: 'storage_folder',
fieldLabel: 'Storage Folder',
value: '/FromClarive' // Files will be stored here
}
This allows using the same SharePoint Site CI for multiple fields, each storing files in different folders.
Variable Substitution in Storage Folders¶
The storage folder path supports variable substitution using the ${variable} syntax. Variables are replaced with topic data at upload time.
Examples:
// Use topic title
storage_folder: '/Documents/${title}'
// Use topic category
storage_folder: '/Projects/${category}'
// Use related CI names (automatically resolved)
storage_folder: '/Environments/${environment}/Projects/${project}'
Variable Resolution:
- Topic fields are available directly (e.g.,
${title},${status}) - Related CI fields (MIDs) are automatically replaced with CI names
- Array fields with a single value are converted to scalars
- Multiple values remain as arrays (first value is used)
Example:
If a topic has: - Title: "Feature Implementation" - Project CI: "MyProject" (MID: cla-default-project-123) - Environment CI: "Production" (MID: cla-default-environment-456)
And storage_folder is configured as: /Projects/${project}/${environment}
The final path will be: /Projects/MyProject/Production
Limitations¶
- Files are stored with their original names; versioning is handled by Clarive asset versioning
- Invalid SharePoint characters (
" * : < > ? / \ |) in folder paths and filenames are automatically replaced with underscores - Special characters like
#,%,&, etc. are allowed and automatically URL-encoded - Requires network connectivity to Microsoft Graph API
- Requires valid Azure AD credentials and permissions
- Download performance depends on SharePoint/Microsoft Graph API response times
Creating a Storage Provider Resource¶
Storage providers are created as CI Resources in Clarive. To create a new SharePoint storage provider:
Note: The mssharepoint feature must be installed for the MSSharePointSite provider type to be available.
- Navigate to Resources in the Clarive interface
- Click Create New Resource
- Select Storage Provider from the resource types
- Choose MSSharePointSite as the provider type
- Fill in the required fields:
- Name: A descriptive name for the provider
- Moniker: Unique identifier
- Active: Check to enable the provider
- Tenant ID, Client ID, Client Secret: Azure AD credentials
- Site ID, Drive ID: SharePoint identifiers
- Click Save
Once created, the SharePoint provider will be available for selection in "Attach Files" field configurations throughout Clarive.