Exploring mage.ai
I have been exploring an alternative to Airflow called Mage.ai, a modern framework for building data pipelines. It features an intuitive UI that allows users to create, orchestrate, and monitor data pipelines, even for beginners.
What I like about mage.ai is the good documentation and their developer support, they have a (slack channel)[https://mageai.slack.com/join/shared_invite/zt-31zaw8y3o-fc5~ZwTgzKg8UjF9hoSBxw#/shared-invite/email] that has an AI bot that can help you with any questions you may have.
Mage.ai offers several powerful features, including:
- Version control with Git
- Built-in integration with generative AI tools
- Support for various pipeline types, such as batch processing, data integration, and streaming
- DBT integration
Personally, I have been using it mainly for simple pipelines, such as ETL processes and basic task scheduling. Mage.ai is easy to use and can be hosted on your own server or on kuberentes.
Below is a sample dockerfile that you can use to run Mage.ai on your VPC.:
FROM python:3.11.0
LABEL description="Deploy Mage on Linux"
WORKDIR /app
COPY . /app
# build-essential
RUN apt-get update && apt-get install -y
curl
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt &&
pip install --no-cache-dir mage-ai==0.9.72
ENV ENV=PROD
EXPOSE 6789
ENTRYPOINT ["/bin/bash", "-c"]
CMD ["mage start pipelines"]
One way I have been using mage.ai is to monitor the uptime of my own projects. Under the trigger tab, you can set a cron job to call a particular pipeline at a specific time.
In this case, I am calling my SystemHealthz pipeline everyday at 2pm.
Currently, I am building a pipeline to add a document to a Elasticsearch index whenever a new entry is added to a database. Perhaps I will write a follow-up post on this topic.
Cheers and thanks for reading this short post!