Visualizing Data → Process → Data → ... → Data → Process → Data Pipelines

Setting the stage

AEC projects can be mapped out into “pipelines”
Input data → process → data → … → data → process → Deliverables

  • A pipeline starts with input data (project requirements, drawings, GIS data, Ground Investigation data, etc.)
  • A pipeline ends with the deliverables (BIMs, drawings, reports, presentations, etc.)
  • Processes can have multiple Datasets as inputs and / or outputs.
  • A pipeline CANNOT start nor end with a Process.
  • Data CANNOT directly flow into Data.
  • Processes CANNOT flow into other Processes.
  • The whole project can be seen as one big process, but this huge process can be mapped out into smaller and smaller
    data → process → data → … → data → process → data
    pipelines. In software engineering this is called refactoring.
  • Of course, some parts of a pipeline are iterative, and others might be “messy”. AEC projects don’t have “unidirectional data flow”, and never will, but that is fine.
  • Of course, we’re going to put all our datasets on Speckle :speckle:, which allows us to swap out cumbersome manual processes for scripts incrementally, one process at a time.
  • A script will then mature over the course of a or multiple project(s), and will be split (i.e. refactored) into multiple smaller scripts such that a script only serves a single purpose (this is called the Single Responsibility Principle in software engineering).
  • Because our scripts only serve a single purpose, they are more maintainable and they become reusable on other projects.
  • Once a script is mature, it can become a Speckle automation.
  • Once we have a library of modular, single purpose scripts on Speckle Automate, we can start chaining them into automated pipelines.

Data engineering / science pipelines

In data engineering / science it is common to talk about data processing pipelines. One open source Python library I like for building reproducible, maintainable, and modular pipelines is kedro [src 2.]:

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

I especially like kedro because of kedro-viz, which is fantastic for visualizing such data processing pipelines. See this demo:

What do you think?

I think that it could be very useful to be able to visualize AEC project pipelines, and the parts of such pipelines that are automated with Speckle Automate.

I discussed this stuff with @KatherineC earlier this week at SpeckleCon, and thought it would be worth bringing to the attention of the Speckle Community and @Automatons.

Credit where credit is due

Apart from kedro, @RamonvanderHeijden, Evan Levelle and Martin Riese (2015) must also be credited for proposing this concept under the name of “Building Information Generation”.

Sources:

  1. Van Der Heijden, R., Levelle, E., & Riese, M. (2015). Parametric building information generation for design and construction. In Computational Ecologies: Design in the Anthropocene-35th Annual Conference of the Association for Computer Aided Design in Architecture (pp. 417-429).
  2. Alam, S., Chan, N. L., Couto, L., Dada, Y., Danov, I., Datta, D., DeBold, T., Gundaniya, J., Honoré-Rougé, Y., Kaiser, S., Kanchwala, R., Katiyar, A., Pilla, R. K., Nguyen, H., Cano Rodríguez, J. L., Schwarzmann, J., Sorokin, D., Theisen, M., Zabłocki, M., & Brugman, S. (2024). Kedro (Version 0.19.9) [Computer software]. https://github.com/kedro-org/kedro
8 Likes

this deserves a more thorough response, but in short - yes, 1000%

“chaining functions” and visualizing your pipelines is very much in the mindshare, ty for the kedro ref. On a more sentimental note, I really hope that this is the place where we finally pull off a public library of these “module, single purpose scripts,” but we’re all well aware of the hurdles there. A damn good set of tools for doing this internally is a good step, imo, and would fill my heart a little

3 Likes