Version Diffing with Python: Part 2 - Automation 🚀

jonathon · 19 May 2023 16:54

In Part 1 , I introduced programmatic Version diffing of Speckle data . In this part, I will show how this simple example can be deployed to respond to new commits on a branch. This uses the Speckle webhook functionality .

In the example of Rhino models being incrementally amended and Versions committed to Speckle, the comparison is between the latest and the immediately previous Version. In a webhook scenario, the comparison is between the latest and current versions when the webhook was triggered. This is because the webhook is created against a specific commit, and the webhook is triggered when a new commit is made .

NB: With automation, we could trigger the program with each new commit, making our system more efficient. Of course, if you’re making more commits than a barista makes lattes during the morning rush, you might end up with duplicate results. No one needs two lattes at once… unless it’s a Monday.

Recapping the basic functionality :

# boilerplate user credentials and server
import os
HOST_SERVER = os.getenv('HOST_SERVER')
ACCESS_TOKEN = os.getenv('ACCESS_TOKEN')
from specklepy.api.client import SpeckleClient

client = SpeckleClient(host=HOST_SERVER)  # or whatever your host is
client.authenticate_with_token(ACCESS_TOKEN)  # or whatever your token is
stream_id = "20e76a799c"  # or whatever your stream id is

from specklepy.transports.server import ServerTransport
from specklepy.api.wrapper import StreamWrapper

transport = ServerTransport(client=client, stream_id=stream_id)

stream = client.stream.get(stream_id)
commits = client.commit.list(stream_id=stream_id, limit=2)

from specklepy.api import operations

# get obj id from latest commit
latest = commits[0].referencedObject
previous = commits[1].referencedObject

# receive objects from speckle
latest_data = operations.receive(obj_id=latest, remote_transport=transport)
previous_data = operations.receive(obj_id=previous, remote_transport=transport)

But wait! The commit.list is getting us the last two commits. But in Part 1 of this tutorial we made a “Diff!” commit.

Sensibly, however, the part 1 Diff commit was made on a separate Model branch. This is one of the advantages of using branches in general, partitioning versions for later recall.

To query a specific branch for commits, we use a different part of the Client API :

# Add the branch name as a query filter,
# limit changes to commits_limit for this call.
main_branch_commits = client.branch.get(
    stream_id=stream_id, name="main", commits_limit=2
)

main_branch_commits.commits.items

There are our two original commits.

[
  Commit(
    id: 1ebb510278, 
    message: Second Commit, 
    referencedObject: e44034f90b817573270cc3ec2534c74f, 
    ....
  ),
  Commit(
    id: 2bf8b491ce, 
    message: First Commit, 
    referencedObject: 9e18b762dd52788a457de6a7b961d428, 
    ....
  )
]

NB: Notice the order they are received, newest first. A bit like your social media feed, but with less cat videos and more productive outcomes!

But we need to go further – If responding to a commit event, we must request that specific commit and the immediately preceding one.

We can inspect the properties of the webhook to determine how to do that.

Webhooks

Webhooks can be thought of as our program’s very own personal assistant. They’ll let us know when something we care about, like a new commit, happens. Just be warned, they won’t fetch you coffee or laugh at your jokes. Believe me, I’ve tried.

We write about Speckle Webhooks in the docs, so I won’t go into great detail here except to show what would be the setup in this case .

The webhook setup: url, description(optional), secret(optional) and events

and the resultant payload from a new commit is added:

{
  "payload": {
    ...
    "event": {
      "event_name": "commit_create",
      "data": {
          "id": "1ebb510278",
          "commit": {
            "message": "Second Commit",
            "objectId": "e44034f90b817573270cc3ec2534c74f",
            "sourceApplication": "Rhino7",
            "branchName": "main",
            "authorName": "Jonathon",
          }
      }
    },
    "server": {
      ...
    },
    "stream": {
      "id": "20e76a799c",
      "name": "Diffing Demo",
      "createdAt": "2023-04-16T17:17:05.058Z",
      ...
    },
    "user": {
      ...
      "name": "Jonathon",
      ...
      },
    "webhook": {
      "id": "a77eb6c427",
      "streamId": "20e76a799c",
      "url": "https://your.code/endpoint",
      "triggers": [
        "commit_create"
      ]
    }
  }
}

Critical parts of this worth noting are the event_name and the data object. The event_name is the event that triggered the webhook. In this case, it is commit_create. The data object contains the commit object, which contains the id of the created commit.

The flatten function we can reuse from before.

# Flatten the objects into a list of objects
def flatten(obj, visited=None):
    ....

And the comparison function :

# Compare two Speckle commits and populate a List of tuples
def compare_speckle_commits(
    ....

To vary the example to a different use case, I have amended the store_speckle_commit_diff function only to report changed elements grouped by their applicationId. This could be useful if you are updating a separate database. In this case, the two things worth noting would be DELETE representing an element that was removed, UPDATE representing an element that was changed and INSERT representing an added element.

def report_speckle_commit_diff(commit1_objects, commit2_objects):
    diff_report = []

    for obj in compare_speckle_commits(commit1_objects, commit2_objects):
        object_report = {"applicationId": obj[1].applicationId,
                         "data": obj[1].to_dict()}

        if getattr(obj[0], "id", None) == getattr(obj[1], "id", None):
            continue  # Skip unchanged elements

        if obj[0] is not None and obj[1] is not None:
            object_report["verb"] = "UPDATE"
        elif obj[0] is None and obj[1] is not None:
            object_report["verb"] = "INSERT"
        elif obj[0] is not None and obj[1] is None:
            object_report["verb"] = "DELETE"

        diff_report.append(object_report)

    return diff_report

That diff_report would likely suffice for any follow-on script you might want to take using those database instructions. But for storing this report in Speckle itself for posterity, we can convert it into a Speckle object and commit it to the stream.

# convert the diff report to a speckle base object

from specklepy.objects import Base
from specklepy.objects.other import Collection

def diff_report_to_speckle_base(diff_report):
    commit = Collection(collectionType="diff_report",
                        name="diff report",
                        elements=[])

    deleted = Collection(collectionType="deleted objects",
                         name="deleted objects",
                         elements=[])
    updated = Collection(collectionType="updated objects",
                         name="updated objects",
                         elements=[])
    inserted = Collection(collectionType="inserted objects",
                          name="inserted objects",
                          elements=[])

    for obj in diff_report:
        target_collection = None
        if obj["verb"] == "DELETE":
            target_collection = deleted
        elif obj["verb"] == "UPDATE":
            target_collection = updated
        elif obj["verb"] == "INSERT":
            target_collection = inserted
        
        target_collection.elements.append(Base.from_dict(obj["data"]))

    commit.elements.extend([deleted, updated, inserted])

    return commit

For this example, I’ll deploy this webhook handler as a Google Cloud Function (other methods exist), so we’ll import the functions_framwork package. And load in the secrets from environment variables.

#GCP SDK for cloud functions
import functions_framework
import os

HOST_SERVER = os.getenv('HOST_SERVER')
ACCESS_TOKEN = os.getenv('ACCESS_TOKEN')
GCP_API_USER = os.getenv('GCP_API_USER')
GCP_API_KEY = os.getenv('GCP_API_KEY')
GCP_PROJEC = os.getenv('GCP_PROJECT')

@functions_framework.http is the function decorator that will allow us to deploy this as a Google Cloud Function triggered by an HTTP request.

My webhook handler looks like this:

from specklepy.api import operations

@functions_framework.http
def diffing_handler(request):
    # Retrieve payload from the request
    payload = request.get_json()

    # Check if the event is a commit creation event
    event = payload.get('event')
    if not event or event.get('event_name') != 'commit_create':
        return 'Not a commit created event', 204

NB: It’s a good idea to check that the event that triggered the webhook is what we expect. We wouldn’t want a surprise party for commit_create when we were expecting commit_delete , would we?

    # Retrieve branch name from the payload
    branch_name = payload['data']['commit'].get('branch_name')
    if not branch_name or branch_name != 'main':
        return 'Commit not on the main branch', 400
    ....

NB: To be defensive, we should check that the event that triggered the webhook is what we expect. In this case, we are expecting commit_create. In addition, we want to limit the webhook to only responding to commits on the main branch. This is because we are only interested in commits made to the main branch. We can do this by checking the branchName property of the commit object. The handler could be protected further by using the webhook secret to sign the payload and verify the signature. This is not covered here. But this is a good idea if you are deploying a webhook handler to a public endpoint. But in particular, it can distinguish between different workloads triggered by the same webhook event to different endpoints.

    ....

    # Retrieve stream ID and commit ID from the payload
    stream_id = payload['stream']['id']
    commit_id = payload['data']['id']

    # Get a list of recent commits from the main branch
    commits = client.branch.get(branch_name, stream_id=stream_id,
                                commits_limit=10).commits.items

    # Find the consecutive commits that match the commit ID
    consecutive_commits = [
        commits[i:i+2]
        for i in range(len(commits)-1)
        if commits[i].id == commit_id
    ]
    if consecutive_commits:
        latest, previous = consecutive_commits[0]
    else:
        return 'Commit not found', 404

    # Get the referenced objects from the consecutive commits
    latest = commits[0].referencedObject
    previous = commits[1].referencedObject

    # Receive the object data from Speckle
    latest_data = operations.receive(obj_id=latest,
                                     remote_transport=transport)
    previous_data = operations.receive(obj_id=previous,
                                       remote_transport=transport)

    # Flatten the object data lists
    latest_objects = list(flatten(latest_data))
    previous_objects = list(flatten(previous_data))

    # Generate the diff report between the previous and latest objects
    diff_report = report_speckle_commit_diff(
        previous_objects,
        latest_objects
    )

    # Convert the diff report to a Speckle base object
    diff_commit = diff_report_to_speckle_base(diff_report)

    # Send the diff report object to the Speckle server
    obj_id = operations.send(base=diff_commit, transports=[transport])

    # Create a new commit with the diff report
    client.authenticate_with_token(ACCESS_TOKEN)  # or whatever your token is
    diff_commit_id = client.commit.create(
        stream_id=stream_id,
        branch_name="diffs",
        message=f"diff report for commit {commit_id}",
        source_application="diffing",
        object_id=obj_id,
    )

    # Return success response with the diff commit ID
    return f"diff report created with id {diff_commit_id}", 200

NB: In an ideal world, we’d separate the webhook handler from the deployment code, but let’s be honest, this isn’t a cooking show where everything is pre-prepared, and we pull the finished dish out of the oven. We’re keeping things real and simple for this tutorial. Separating the two would allow us to deploy the webhook handler to other platforms and allow for testing of the webhook handler without having to deploy it .

I won’t go into detail about deploying a Google Cloud Function, but the code is available in the GCP docs. The important thing to note is that the function is deployed to a specific endpoint. In this case, it is https://{{CLOUD_REGION_GCP_PROJECT}}.cloudfunctions.net/diffing_handler. This is the endpoint that we will use when creating the webhook.

Summing Up

And there we have it – a look under the hood of automating version diffing with Python. In the span of this tutorial, we’ve set up triggers, tamed webhooks, and sidestepped surprise parties – all in the name of efficient computing. The moral of the story? With a little Python magic, we can coax our machines into being even more helpful without necessarily resorting to bribery with coffee or robotic laughter.

But this is just the tip of the proverbial iceberg. As with any journey, there’s always room for improvement and evolution. Your feedback, questions, and even your wild brainstorming ideas are the fuel that drives this community. So don’t hold back – share your thoughts, experiences, and even your ‘aha’ moments as you’ve followed along with this tutorial. Remember, it’s in our collective tinkering and mutual curiosity that the best innovations are born.

If you thought this was fun, you’re in for a treat. Hold onto your hats for Part 3, where we’ll kick things up a notch and delve into the world of multithreading and other power-processing methods. This next adventure promises to take your Python prowess to new heights, or at the very least, it’ll keep your CPU cores busy. So, stay tuned and keep those coding fingers limber.

Until next time, happy coding!

jonathon · 19 May 2023 17:04

Troubleshooting webhooks

I thought I’d add a quick list of things to check when troubleshooting webhook deployments.

Warning, here comes a big scary paragraph on a potentially disastrous situation! When designing webhooks, avoiding a haunting scenario of infinite recursion is crucial. Imagine a webhook that’s triggered by a create_commit event. It’s like having a loyal dog that fetches a ball each time you throw it. Now, what if this dog starts throwing the ball and fetching it endlessly? This is exactly the situation we can face with a self-triggering webhook.

When our tutorial listens to a create_commit event but only acts on the main branch, while it commits to the diff branch, it’s like teaching the dog to fetch the ball only when thrown by you and not thrown by itself. This clever configuration prevents the troublesome infinite recursion.

Without such precautions, you can find yourself in a wild loop, where each action triggers another identical action, spiralling out of control. It’s like a perpetual motion machine on steroids - a real nightmare if you can’t simply reach the wall socket to cut the power because your service lives in the cloud. The story’s moral is: always design your webhooks carefully to prevent them from running amok!

Aside from that, 10 simple gotchas:

Feedback : Is your webhook URL accurate? Make sure it’s not taking a detour, ensure you’ve got the right address.
Security : Guard your secret key, it’s a crucial piece in the puzzle of safety. Handle it with care, like it’s a treasure map to your valuable data.
Rogue IP : Has your server or cloud provider decided to switch its IP/URL schema without your knowledge? Track it down, it’s like finding a missing piece in a jigsaw puzzle.
Response : If you’re not seeing any responses, the problem might reside on the receiver’s end. It’s time to scrutinize your response handling mechanism.
Execution time : Webhooks aren’t big fans of delays. If your process is slow, your webhook might decide it’s had enough. or depending on how your response is deployed, it may also hit execution limits silently
Data Deluge : Facing a data overflow? It’s like trying to read a whole library in a day. You need to streamline your data management process.
Persistence of Retries : Experiencing retries? It’s a sign that your service is encountering difficulties. Time to investigate and resolve the issues!
Guard Against Spoofing : Be vigilant for potential spoofing attempts. Regularly check and safeguard your secret keys - a secured webhook is a content webhook.