Dagster & Sigma
This feature is considered in a beta stage. It is still being tested and may change. For more information, see the API lifecycle stages documentation.
Your Sigma assets, including datasets and workbooks, can be represented in the Dagster asset graph, allowing you to track lineage and dependencies between Sigma assets and upstream data assets you are already modeling in Dagster.
What you'll learn
- How to represent Sigma assets in the Dagster asset graph, including lineage to other Dagster assets.
- How to customize asset definition metadata for these Sigma assets.
Prerequisites
- The dagster-sigmalibrary installed in your environment
- Familiarity with asset definitions and the Dagster asset graph
- Familiarity with Dagster resources
- Familiarity with Sigma concepts, like datasets and workbooks
- A Sigma organization
- A Sigma client ID and client secret. For more information, see Generate API client credentials in the Sigma documentation.
Set up your environment
To get started, you'll need to install the dagster and dagster-sigma Python packages:
- uv
- pip
uv add dagster-sigma
pip install dagster-sigma
Represent Sigma assets in the asset graph
To load Sigma assets into the Dagster asset graph, you must first construct a SigmaOrganization resource, which allows Dagster to communicate with your Sigma organization. You'll need to supply your client ID and client secret alongside the base URL. See Identify your API request URL in the Sigma documentation for more information on how to find your base URL.
Dagster can automatically load all datasets and workbooks from your Sigma organization as asset specs. Call the load_sigma_asset_specs function, which returns list of AssetSpecs representing your Sigma assets. You can then include these asset specs in your Definitions object:
from dagster_sigma import SigmaBaseUrl, SigmaOrganization, load_sigma_asset_specs
import dagster as dg
sigma_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),
)
sigma_specs = load_sigma_asset_specs(sigma_organization)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})
Load Sigma assets from filtered workbooks
It is possible to load a subset of your Sigma assets by providing a SigmaFilter to the load_sigma_asset_specs function. This SigmaFilter object allows you to specify the folders from which you want to load Sigma workbooks, and also will allow you to configure which datasets are represented as assets.
Note that the content and size of Sigma organization may affect the performance of your Dagster deployments. Filtering the workbooks selection from which your Sigma assets will be loaded is particularly useful for improving loading times.
from dagster_sigma import (
    SigmaBaseUrl,
    SigmaFilter,
    SigmaOrganization,
    load_sigma_asset_specs,
)
import dagster as dg
sigma_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),
)
sigma_specs = load_sigma_asset_specs(
    organization=sigma_organization,
    sigma_filter=SigmaFilter(
        # Filter down to only the workbooks in these folders
        workbook_folders=[
            ("my_folder", "my_subfolder"),
            ("my_folder", "my_other_subfolder"),
        ],
        # Specify whether to include datasets that are not used in any workbooks
        # default is True
        include_unused_datasets=False,
    ),
)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})
Load Sigma assets using a snapshot
Sigma assets can be loaded using the snapshot of a Sigma organization, which allows organizations with large amounts of Sigma data to speed up their deployment process.
from dagster_sigma import SigmaBaseUrl, SigmaOrganization, load_sigma_asset_specs
import dagster as dg
sigma_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),
)
sigma_specs = load_sigma_asset_specs(
    organization=sigma_organization, snapshot_path=dg.EnvVar("SIGMA_SNAPSHOT_PATH")
)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})
To capture the snapshot, the dagster-sigma snapshot CLI can be used.
dagster-sigma snapshot --python-module my_dagster_package --output-path snapshot.snap
Customize asset definition metadata for Sigma assets
By default, Dagster will generate asset specs for each Sigma asset based on its type, and populate default metadata. You can further customize asset properties by passing a custom DagsterSigmaTranslator subclass to the load_sigma_asset_specs function. This subclass can implement methods to customize the asset specs for each Sigma asset type.
from dagster_sigma import (
    DagsterSigmaTranslator,
    SigmaBaseUrl,
    SigmaOrganization,
    SigmaWorkbookTranslatorData,
    load_sigma_asset_specs,
)
import dagster as dg
sigma_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),
)
# A translator class lets us customize properties of the built Sigma assets, such as the owners or asset key
class MyCustomSigmaTranslator(DagsterSigmaTranslator):
    def get_asset_spec(self, data: SigmaWorkbookTranslatorData) -> dg.AssetSpec:  # pyright: ignore[reportIncompatibleMethodOverride]
        # We create the default asset spec using super()
        default_spec = super().get_asset_spec(data)
        # we customize the team owner tag for all Sigma assets
        return default_spec.replace_attributes(owners=["team:my_team"])
sigma_specs = load_sigma_asset_specs(
    sigma_organization, dagster_sigma_translator=MyCustomSigmaTranslator()
)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})
Note that super() is called in each of the overridden methods to generate the default asset spec. It is best practice to generate the default asset spec before customizing it.
Load Sigma assets from multiple organizations
Definitions from multiple Sigma organizations can be combined by instantiating multiple SigmaOrganization resources and merging their specs. This lets you view all your Sigma assets in a single asset graph:
from dagster_sigma import SigmaBaseUrl, SigmaOrganization, load_sigma_asset_specs
import dagster as dg
sales_team_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("SALES_SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("SALES_SIGMA_CLIENT_SECRET"),
)
marketing_team_organization = SigmaOrganization(
    base_url=SigmaBaseUrl.AWS_US,
    client_id=dg.EnvVar("MARKETING_SIGMA_CLIENT_ID"),
    client_secret=dg.EnvVar("MARKETING_SIGMA_CLIENT_SECRET"),
)
sales_team_specs = load_sigma_asset_specs(sales_team_organization)
marketing_team_specs = load_sigma_asset_specs(marketing_team_organization)
# Merge the specs into a single set of definitions
defs = dg.Definitions(
    assets=[*sales_team_specs, *marketing_team_specs],
    resources={
        "marketing_sigma": marketing_team_organization,
        "sales_sigma": sales_team_organization,
    },
)
Customize upstream dependencies
By default, Dagster sets upstream dependencies when generating asset specs for your Sigma assets. To do so, Dagster parses information about assets that are upstream of specific Sigma assets from the Sigma organization itself. You can customize how upstream dependencies are set on your Sigma assets by passing an instance of the custom DagsterSigmaTranslator to the load_sigma_asset_specs function.
The below example defines my_upstream_asset as an upstream dependency of my_sigma_workbook:
class MyCustomSigmaTranslator(DagsterSigmaTranslator):
    def get_asset_spec(
        self, data: Union[SigmaDatasetTranslatorData, SigmaWorkbookTranslatorData]
    ) -> dg.AssetSpec:
        # We create the default asset spec using super()
        default_spec = super().get_asset_spec(data)
        # We customize upstream dependencies for the Sigma workbook named `my_sigma_workbook`
        return default_spec.replace_attributes(
            deps=["my_upstream_asset"]
            if data.properties["name"] == "my_sigma_workbook"
            else ...
        )
sigma_specs = load_sigma_asset_specs(
    sigma_organization, dagster_sigma_translator=MyCustomSigmaTranslator()
)
Note that super() is called in each of the overridden methods to generate the default asset spec. It is best practice to generate the default asset spec before customizing it.