Mage has features that empower you to make highly dynamic and powerful pipelines. Looking through all of the features can feel overwhelming at first. I’ll show you how to use some basic pipeline features that will take your Mage project code to the next level.
Note: I am assuming some familiarity with blocks in Mage.
Global Variables
You probably have already seen the Global Variables option on the Edit Pipeline page. It’s pretty simple to understand but also extremely powerful. Here’s a quick example of leveraging these variables to make a reusable pipeline in many contexts.
Let’s say you work at a SaaS company that uses Postgres as its operational database. You want to extract some data from this database and stage it in a data warehouse for analytics. Right now, all you need is the `orders` table. You could write a data loader named `load_orders` that looks like this:
@data_loader
def load_data_from_postgres(*args, **kwargs):
"""
Load orders from postgres database.
"""
query = '''
SELECT
id,
customer_id,
product_id,
quantity,
price,
created_at
FROM
orders
'''
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
with Postgres.with_config(ConfigFileLoader(config_path, config_profile)) as loader:
return loader.load(query)
Let’s say you need to migrate customer data later from the same Postgres database. You could make a new loader and call it `load_customers` that looks like this:
@data_loader
def load_data_from_postgres(*args, **kwargs):
"""
Load customers from a postgres database.
"""
query = '''
SELECT
id,
name,
address,
city,
state,
zip,
sales_rep_id
FROM
customers
'''
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
with Postgres.with_config(ConfigFileLoader(config_path, config_profile)) as loader:
return loader.load(query)
While this is fine in a small project, it could add a lot of extra blocks that are all essentially doing the same thing (loading data from the same Postgres database).
Here’s a way you can use variables to consolidate these blocks.
Set a global variable for the pipeline named `table`.
Give it a default value. We’ll use `customers`.
Modify the query in the code block.
The new generic code block should look like this.
@data_loader
def load_data_from_postgres(*args, **kwargs):
"""
Load customers from a postgres database.
"""
table_name = kwargs['table']
query = f'''
SELECT
*
FROM
{table_name}
'''
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
with Postgres.with_config(ConfigFileLoader(config_path, config_profile)) as loader:
return loader.load(query)
When you create a trigger, you can set the `table` Runtime Variable for the pipeline for the particular table you want to migrate.
Pros and Cons of Using Global Variables
Pros
Simple approach that’s easy to implement.
Block can be reused more efficiently across multiple pipelines.
Global Variables can be set at the trigger level, allowing a single pipeline to process multiple tables.
Cons
In a SQL db loader, you must use `select *` in your query, which isn’t a good practice.
Using Block Variables
This is a similar approach to Global Variables, except rather than setting the variable at the pipeline level, you set it on the block itself.
To set the variable, you need to edit the block settings. A button in the top left of the block code window opens the block settings.
Here’s what the code will look like:
@data_loader
def load_data_from_postgres(*args, **kwargs):
"""
Load table from a postgres database.
"""
table_name = kwargs['configuration'].get('table')
query = f'''
SELECT
*
FROM
{table_name}
'''
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
with Postgres.with_config(ConfigFileLoader(config_path, config_profile)) as loader:
return loader.load(query)
Pros and Cons of Using Block Variables
Pros
You can reuse the same block across multiple pipelines like before but also reuse it within the same pipeline.
Cons
You must still use `select *`.
Cannot be set in the trigger, so your pipeline will no longer be generic.
More to come…
We’re just scratching the surface of what we can do with Mage to simplify your pipeline code. Stay tuned for more tips and techniques for managing Mage project code and how to get the most out of it.
This is FIRE!!!!!!!!!!!!