Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

173 changes: 173 additions & 0 deletions examples/data-science-analystics/cpu-bokeh-geoviz/make_notebook.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
import json

# Define the notebook structure with your new standard format
notebook_content = {
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div align=\"left\">\n",
" <img src=\"./bokeh-geo-icon.png\" width=\"80\">\n",
"</div>\n",
"\n",
"# 🌍 Bokeh Notebook — Geo Visualization\n",
"\n",
"### **Template Review**\n",
"This template provides a production-ready setup for geospatial data analysis and interactive mapping on **Saturn Cloud**. Optimized for **CPU resources**, it demonstrates how to process geographic coordinates and render high-performance interactive maps. The primary goal is to showcase \"Map Incidents\" with interactive hover filters, allowing users to inspect localized data points dynamically.\n",
"\n",
"### **Dataset Overview**\n",
"The template utilizes a **Geospatial Incident** toy dataset. This dataset contains simulated event coordinates (latitude and longitude), incident types, and severity levels. It serves as a benchmark for testing spatial joins, coordinate reference system (CRS) transformations, and interactive glyph rendering in a mapping environment.\n",
"\n",
"### **Tech Stack**\n",
"* **Python**: The core language for spatial logic and data processing.\n",
"* **GeoPandas**: Extends Pandas to allow spatial operations on geometric types, handling the transformation of raw coordinates into map-ready shapes.\n",
"* **Bokeh**: A powerful visualization library used here to create interactive, web-ready maps with custom hover tools and real-time filtering capabilities.\n",
"\n",
"---\n",
"\n",
"## 🚀 Quick Start\n",
"The Saturn Cloud environment is pre-configured for Jupyter. Run the following cells to install the specialized geospatial libraries and launch the interactive map.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Step 1: Install Required Libraries**\n",
"In this step, we install the specific libraries needed for geospatial visualization. This includes **Bokeh** for the interactive mapping engine and **GeoPandas** for handling spatial data structures."
]
},
{
"cell_type": "code",
"execution_count": None,
"metadata": {},
"outputs": [],
"source": [
"# Install geospatial and interactive visualization libraries\n",
"!pip install bokeh geopandas shapely"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Step 2: Load and Prepare Geospatial Data**\n",
"We initialize a GeoDataFrame containing incident coordinates. We ensure the data is projected into the Web Mercator format (EPSG:3857), which is the standard coordinate system used by Bokeh and most web-based map tiles."
]
},
{
"cell_type": "code",
"execution_count": None,
"metadata": {},
"outputs": [],
"source": [
"import geopandas as gpd\n",
"from shapely.geometry import Point\n",
"import pandas as pd\n",
"\n",
"# Create toy incident data\n",
"data = {\n",
" 'Incident_ID': [1, 2, 3, 4],\n",
" 'Type': ['Maintenance', 'Emergency', 'Inquiry', 'Maintenance'],\n",
" 'Severity': ['Low', 'High', 'Medium', 'Low'],\n",
" 'lat': [37.7749, 37.7849, 37.7649, 37.7549],\n",
" 'lon': [-122.4194, -122.4094, -122.4294, -122.4394]\n",
"}\n",
"\n",
"df = pd.DataFrame(data)\n",
"# Convert to GeoDataFrame\n",
"gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lon, df.lat), crs=\"EPSG:4326\")\n",
"\n",
"# Project to Web Mercator for Bokeh compatibility\n",
"gdf = gdf.to_crs(\"EPSG:3857\")\n",
"gdf['x'] = gdf.geometry.x\n",
"gdf['y'] = gdf.geometry.y\n",
"gdf.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### **Step 3: Build the Interactive Map with Hover Filters**\n",
"Using Bokeh, we render a map background (tile provider) and overlay the incident points. We configure a **HoverTool** to display incident details when the user moves their cursor over a point."
]
},
{
"cell_type": "code",
"execution_count": None,
"metadata": {},
"outputs": [],
"source": [
"from bokeh.plotting import figure, show\n",
"from bokeh.models import HoverTool\n",
"from bokeh.io import output_notebook\n",
"\n",
"output_notebook()\n",
"\n",
"# Initialize the map figure\n",
"p = figure(x_axis_type=\"mercator\", y_axis_type=\"mercator\", \n",
" title=\"Incident Map: Localized Hover Filters\",\n",
" active_scroll=\"wheel_zoom\")\n",
"\n",
"# Add OpenStreetMap background tiles\n",
"p.add_tile(\"OSM\")\n",
"\n",
"# Plot incidents as circles\n",
"p.circle(x='x', y='y', size=10, color=\"red\", alpha=0.7, source=gdf)\n",
"\n",
"# Add Hover Tool with filters for ID, Type, and Severity\n",
"hover = HoverTool()\n",
"hover.tooltips = [\n",
" (\"ID\", \"@Incident_ID\"),\n",
" (\"Type\", \"@Type\"),\n",
" (\"Severity\", \"@Severity\")\n",
"]\n",
"p.add_tools(hover)\n",
"\n",
"show(p)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🔗 Resources and Support\n",
"For further information on the platform or the libraries used in this template, please refer to the following official links:\n",
"\n",
"* **Platform**: [Saturn Cloud Dashboard](https://saturncloud.io/)\n",
"* **Support**: [Saturn Cloud Documentation](https://saturncloud.io/docs/)\n",
"* **Library**: [Bokeh Documentation](https://docs.bokeh.org/)\n",
"* **Library**: [GeoPandas Documentation](https://geopandas.org/)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

# Save as .ipynb
with open("bokeh_geo_visualization.ipynb", "w", encoding="utf-8") as f:
json.dump(notebook_content, f, indent=1)

print("SUCCESS: bokeh_geo_visualization.ipynb has been created!")
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f7a343b1",
"metadata": {},
"source": [
"# 📂 Data Versioning with DVC & S3\n",
"\n",
"<div align=\"center\">\n",
" <img src=\"./asset/DAG-icon.png\" width=\"200\">\n",
"</div>\n",
"\n",
"### **Template Review**\n",
"This template demonstrates a professional data management workflow using **DVC**. Optimized for **Saturn Cloud Jupyter Notebooks**, it allows you to track large datasets and maintain 100% reproducibility without bloat in your Git repository. \n",
"\n",
"**Core Workflow:** We will use a **Local Directory** to simulate an S3 bucket for free, while providing the boilerplate code required to switch to a **Live AWS S3** bucket.\n",
"\n",
"### **Tech Stack**\n",
"* **DVC**: The core data versioning engine.\n",
"* **S3 (Boto3)**: Backend support for cloud object storage.\n",
"* **Infrastructure**: [Saturn Cloud](https://saturncloud.io/) CPU Jupyter Instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2a1efbac",
"metadata": {},
"outputs": [],
"source": [
"# Wrap package in quotes to prevent 'zsh: no matches found' errors\n",
"!pip install \"dvc[s3]\" boto3 pandas -q"
]
},
{
"cell_type": "markdown",
"id": "c4bc7738",
"metadata": {},
"source": [
"### **Step 1: Initialize DVC**\n",
"Set up the local environment to track data metadata."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9992ae3",
"metadata": {},
"outputs": [],
"source": [
"!dvc init --no-scm -f\n",
"print(\"DVC initialized successfully.\")"
]
},
{
"cell_type": "markdown",
"id": "d81638aa",
"metadata": {},
"source": [
"### **Step 2: Configure Remotes (Local vs. Remote S3)**\n",
"We use a local folder to simulate S3 for testing, but provide the logic for a real S3 bucket."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "30f75985",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# --- OPTION A: LOCAL SIMULATION (Free) ---\n",
"local_storage = \"/tmp/dvc_simulated_s3\"\n",
"os.makedirs(local_storage, exist_ok=True)\n",
"!dvc remote add -d local_remote {local_storage} -f\n",
"\n",
"# --- OPTION B: REMOTE S3 BUCKET (Production Setup) ---\n",
"# To use a real bucket, uncomment the lines below and provide your URI\n",
"# S3_URI = \"s3://your-real-bucket-name/data-folder\"\n",
"# !dvc remote add -d s3_remote {S3_URI} -f\n",
"\n",
"print(f\"✅ Active Remote set to: {local_storage}\")"
]
},
{
"cell_type": "markdown",
"id": "8006eadb",
"metadata": {},
"source": [
"### **Step 3: Track, Version, and Push Data**\n",
"We create a dummy dataset and 'push' it to our remote simulation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21d00790",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"# 1. Create dataset\n",
"df = pd.DataFrame({'feature_x': [10, 20, 30], 'label': [1, 0, 1]})\n",
"df.to_csv('my_dataset.csv', index=False)\n",
"\n",
"# 2. Track with DVC\n",
"!dvc add my_dataset.csv\n",
"\n",
"# 3. Push to storage (Local Sim or S3)\n",
"!dvc push\n",
"\n",
"print(\"\\nData pushed to remote storage. Tracking file 'my_dataset.csv.dvc' created.\")"
]
},
{
"cell_type": "markdown",
"id": "59b5e4d7",
"metadata": {},
"source": [
"### **Step 4: The Recovery Test**\n",
"Prove reproducibility by deleting the local data and pulling it back from the remote."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7fd78c57",
"metadata": {},
"outputs": [],
"source": [
"os.remove('my_dataset.csv')\n",
"print(\"Local dataset deleted.\")\n",
"\n",
"!dvc pull\n",
"print(\"\\n✅ Data successfully recovered from remote storage!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## 🏁 Conclusion & Next Steps\n",
"You have successfully versioned a dataset using **DVC**. To transition this to a production environment, simply update your DVC remote to a real S3 URI and ensure your **AWS Access Keys** are set in your environment variables.\n",
"\n",
"### **Resources & Backlinks**\n",
"* **Cloud Infrastructure**: [Deploy on Saturn Cloud](https://saturncloud.io/)\n",
"* **DVC Guide**: [S3 Remote Configuration](https://dvc.org/doc/user-guide/data-management/remote-storage/amazon-s3)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "cpu-plotly-env",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading