REQUIREMENTS

To reproduce this project you will need:

Google Cloud account
Docker with docker-compose
Git account

Note

You can use either your local machine or a virtual machine on Google Cloud. The decision to opt for a local machine was made to reduce the costs associated with cloud usage. However, if you prefer to run it on a virtual machine, please refer to the video below:

🎥 GCP Cloud VM

Setting up the environment on cloud VM

SETUP GOOGLE CLOUD ACCOUNT

Initial Setup

Create an account with your Google email ID
Setup your first project if you haven't already
- eg. "truck-logistics", and note down the "Project ID" (we'll use this later when deploying infra with TF)
Create a service account
- Add a service account name and click create and continue.
- Grant Viewer role to begin with.
Create a service account key
- Under 'Actions' click on the 3 dots and 'Manage Keys'
- Click 'Add key' and 'Create new key', choose 'JSON' key type. It will download it to your local machine, move it to a safe directory.

Setup for Access

IAM Roles for Service account:
- Go to the IAM section of IAM & Admin https://console.cloud.google.com/iam-admin/iam
- Click the Edit principal icon for your service account.
- Add these roles in addition to Viewer : Storage Admin + Storage Object Admin + BigQuery Admin
Enable these APIs for your project:
- https://console.cloud.google.com/apis/library/iam.googleapis.com
- https://console.cloud.google.com/apis/library/iamcredentials.googleapis.com

Please ensure GOOGLE_APPLICATION_CREDENTIALS env-var is set.

export GOOGLE_APPLICATION_CREDENTIALS="<path/to/your/service-account-authkeys>.json"

# Refresh token/session, and verify authentication
gcloud auth application-default login

REPRODUCING THE PROJECT

CLONE THE REPO

git clone https://github.com/dieegogutierrez/Data-Engineering-Capstone-Project.git

UPDATE WITH YOUR INFORMATION

cd mage-zoomcamp

Rename file dev.env to simply .env.
Update the variables with your information, specially 'LOCAL_PATH_SERVICE_ACCOUNT' with the path to your local service account file and 'TF_VAR' with your cloud project information.

RUN THE SCRIPT

./start.sh

The script will run Terraform in Docker and create the infrastructure in Google Cloud, specifically, a storage bucket and a BigQuery dataset.
Then, it will run the orchestrator MAGE, which will load local data, transform it, and export it to Google Cloud. Afterward, DBT will create models that will build a final table to be used on a dashboard.
Access the orchestrator at http://localhost:6789/ and run the pipeline by yourself.
After completion, it will create a table named 'trips_gross_revenue' in BigQuery, which can be used in Looker Studio to build a dashboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REQUIREMENTS

🎥 GCP Cloud VM

Setting up the environment on cloud VM

SETUP GOOGLE CLOUD ACCOUNT

Initial Setup

Setup for Access

REPRODUCING THE PROJECT

CLONE THE REPO

UPDATE WITH YOUR INFORMATION

RUN THE SCRIPT

FilesExpand file tree

setup.md

Latest commit

History

setup.md

File metadata and controls

REQUIREMENTS

🎥 GCP Cloud VM

Setting up the environment on cloud VM

SETUP GOOGLE CLOUD ACCOUNT

Initial Setup

Setup for Access

REPRODUCING THE PROJECT

CLONE THE REPO

UPDATE WITH YOUR INFORMATION

RUN THE SCRIPT