Cloud to Local Data Migration in Speckle

Eleron96 · 19 February 2025 15:55

Good afternoon!

I have a question regarding the “migration” process. We’ve successfully set up our local Speckle, but a lot of data is stored on the cloud servers. I’d like to know if there’s a way to migrate data from the cloud to our local servers, either through the API or other means. Are there any convenient tools or solutions for transferring data from their Speckle to our local setup?

jonathon · 19 February 2025 22:41

Something I’ve been using a fair amount for this purpose is a jupyter notebook. It should be easy to follow if you are comfortable with Python. It is fair to say that it is a long-running process on big projects. It also doesn’t take account of projects with thousands of models or versions, but it should cover most typical cases.

It is worth noting that the explanatory text has been updated to the new nomenclature of Project, Model, and version. Still, I haven’t yet updated my use of the latest version of Specklepy, which has deprecated methods with stream, branch, and commit and replaced them with newer methods.

Speckle Project Migration

This implements a memory-optimized approach to migrating Speckle projects between servers. It uses a sequential read-write pattern to handle large projects efficiently while maintaining commit history and branch structure.

Key Features

Sequential version processing to minimize memory usage
Preservation of commit history and branch structure
Error isolation for individual version failures
Real-time progress tracking
Optional memory usage monitoring

Caveats

It doesn’t page request for long model lists or long version histories

Setup and Dependencies

%%capture
# capture turns off the output for this cell, which would be the pip install log
%pip install -U pip
%pip install -U specklepy python-dotenv

%reload_ext dotenv
%dotenv

from specklepy.api.client import SpeckleClient
from specklepy.api import operations
from specklepy.transports.server import ServerTransport
import os

# Verify specklepy version
!pip show specklepy

Environment and Transport Configuration

Configure source and target streams with authentication tokens. We use environment variables for security.

# Server configuration
SOURCE_SERVER_URL = os.getenv("LATEST_NEW_SERVER")
SOURCE_ACCESS_TOKEN = os.getenv("LATEST_NEW_ACCESS_TOKEN")
TARGET_SERVER_URL = os.getenv("NEW_WEB_APP_SERVER_URL")
TARGET_ACCESS_TOKEN = os.getenv("NEW_WEB_APP_ACCESS_TOKEN")

# Stream IDs
source_project_id = "9fc43fe003"
target_project_id = "c7bfa8d65f"

# Initialize clients with authentication
source_client = SpeckleClient(host=SOURCE_SERVER_URL) 
source_client.authenticate_with_token(SOURCE_ACCESS_TOKEN)
target_client = SpeckleClient(host=TARGET_SERVER_URL) 
target_client.authenticate_with_token(TARGET_ACCESS_TOKEN)

# Set up transport channels
source_transport = ServerTransport(
    client=source_client, stream_id=source_project_id
)
target_transport = ServerTransport(
    client=target_client, stream_id=target_project_id
)

Core Processing Functions

These functions handle the version migration process with memory efficiency in mind.

def process_single_version(model_name, referenced_object, 
                           source_transport, target_transport):
    """
    Implements immediate read-write cycle for each version.
    
    Args:
        model_name: Name of the model branch
        referenced_object: Version object ID
        source_transport: Transport for reading from source
        target_transport: Transport for writing to the target
        
    Returns: target_object_id or None if processing fails
    """
    try:
        # Immediate read-write cycle
        version_data = operations.receive(referenced_object, source_transport)
        target_object = operations.send(
                           version_data, transports=[target_transport]
                                   )
        return target_object
    except Exception as e:
        print(f"Error processing version {referenced_object}: {str(e)}")
        return None

def ensure_model_exists(client, project_id, model_name):
    """
    Creates a model if it is not present in the target stream.
    Maintains model structure while preserving version history.
    """
    models = client.branch.list(project_id)
    if not any(model.name == model_name for model in models):
        client.branch.create(project_id, model_name)

Migration Process

The primary migration process preserves commit order and handles each version sequentially.

# Get all models from source
models = source_client.branch.list(
    stream_id=source_project_id, branches_limit=100
)
models_with_commits = [
    model for model in models if model.commits.totalCount > 0
]

# Process each model
for model in models_with_commits:
    model_name = model.name
    print(f"Processing model: {model_name}")
    
    model_versions = model.commits.items[::-1]
    
    ensure_model_exists(target_client, target_project_id, model_name)
    
    # Sequential version processing
    for version in model_versions:
        referenced_object = version.referencedObject
        print(f"Processing version: {referenced_object}")
        
        target_object = process_single_version(
            model_name, referenced_object, 
            source_transport, target_transport
        )
        
        if target_object:
            # Create a version while preserving metadata
            target_client.commit.create(
                stream_id=target_project_id,
                branch_name=model_name,
                object_id=target_object,
                message=(
                    f"Imported from source server: {SOURCE_SERVER_URL}, "
                    f"project id:{source_project_id}, model name: "
                    f"{model_name}, id:{referenced_object}"
                )
            )
            print(f"Successfully migrated version {referenced_object}")
        else:
            print(
                f"Skipped version {referenced_object} due to errors"
            )

Technical Notes

Memory Management
- Peak memory usage limited to single-version data
- Immediate release of processed version memory
- Sequential processing prevents memory accumulation
Error Handling
- Version-level error isolation
- Continued processing despite individual failures
- Detailed error logging for failed versions
Data Integrity
- Preservation of commit chronology
- Maintenance of branch structure
- Version metadata retention
Performance Considerations
- Network bandwidth is the primary bottleneck
- CPU usage is generally minimal - enormous models may be slow
- Memory usage stays consistent regardless of project size

Also, as this is pretty brute force, versions lose their dates and source application data, and comments are not currently copied.

The Speckle server team performs database migrations at a DB level, which retains this information but is a paid service or part of a business/team workspace plan.

For anyone who doesn’t like cut & paste…

speckle-migration-notebook.ipynb (8.3 KB)

Eleron96 · 20 February 2025 07:03

Thank you so much! I didn’t expect such a detailed and thorough response! I’ll definitely post in the thread once I try the solution!