Skip to content

Commit 9b6b93a

Browse files
authored
feat: vector embedding demo (#2)
1 parent fd2e691 commit 9b6b93a

File tree

15 files changed

+616
-0
lines changed

15 files changed

+616
-0
lines changed

.vscode/settings.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"editor.defaultFormatter": "esbenp.prettier-vscode",
3+
"[javascript]": {
4+
"editor.defaultFormatter": "esbenp.prettier-vscode"
5+
}
6+
}

pnpm-lock.yaml

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"scriptId": "1BaRz4XASe09owPsDwClqaX4XowP-M5TmGBvDp3jAVhhpxEgfyBnkNZoz",
3+
"rootDir": "dist",
4+
"projectId": "jpoehnelt-internal",
5+
"scriptExtensions": [
6+
".js",
7+
".gs"
8+
],
9+
"htmlExtensions": [
10+
".html"
11+
],
12+
"jsonExtensions": [
13+
".json"
14+
],
15+
"filePushOrder": []
16+
}
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
# Harnessing the Power of Vector Embeddings in Google Apps Script with Vertex AI
2+
3+
## Introduction
4+
5+
In today's data-driven world, the ability to understand and process text semantically has become increasingly important. Vector embeddings provide a powerful way to represent text as numerical vectors, enabling semantic search, content recommendation, and other advanced natural language processing capabilities. This blog post explores how to leverage Google's Vertex AI to generate vector embeddings directly within Google Apps Script.
6+
7+
## What are Vector Embeddings?
8+
9+
Vector embeddings are numerical representations of text (or other data) in a high-dimensional space. Unlike traditional keyword-based approaches, embeddings capture semantic meaning, allowing us to measure similarity between texts based on their actual meaning rather than just matching keywords.
10+
11+
For example, the phrases "I love programming" and "Coding is my passion" would be recognized as similar in an embedding space, despite having no words in common.
12+
13+
## Why Use Vertex AI with Apps Script?
14+
15+
Google Apps Script provides a powerful platform for automating tasks within Google Workspace. By combining it with Vertex AI's embedding capabilities, you can:
16+
17+
1. Build semantic search functionality in Google Sheets or Docs
18+
2. Create content recommendation systems
19+
3. Implement intelligent document classification
20+
4. Enhance chatbots and virtual assistants
21+
5. Perform sentiment analysis and topic modeling
22+
23+
## Implementation Guide
24+
25+
### Prerequisites
26+
27+
1. A Google Cloud Platform account with Vertex AI API enabled
28+
2. A Google Apps Script project
29+
30+
### Step 1: Set Up Your Project
31+
32+
First, you'll need to set up your Apps Script project and configure it to use the Vertex AI API. Make sure to store your project ID in the script properties. You can do this by going to the Script Editor, clicking on the "Script properties" icon, and adding your project ID.
33+
34+
### Step 2: Generate Embeddings
35+
36+
The core functionality is generating embeddings from text. Here's how to implement it:
37+
38+
```javascript
39+
/**
40+
* Generate embeddings for the given text.
41+
* @param {string|string[]} text - The text to generate embeddings for.
42+
* @returns {number[][]} - The generated embeddings.
43+
*/
44+
function batchedEmbeddings_(
45+
text,
46+
{ model = "text-embedding-005" } = {}
47+
) {
48+
if (!Array.isArray(text)) {
49+
text = [text];
50+
}
51+
52+
const token = ScriptApp.getOAuthToken();
53+
const PROJECT_ID = PropertiesService.getScriptProperties().getProperty("PROJECT_ID");
54+
const REGION = "us-central1";
55+
56+
const requests = text.map((content) => ({
57+
url: `https://${REGION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${REGION}/publishers/google/models/${model}:predict`,
58+
method: "post",
59+
headers: {
60+
Authorization: `Bearer ${token}`,
61+
"Content-Type": "application/json",
62+
},
63+
muteHttpExceptions: true,
64+
contentType: "application/json",
65+
payload: JSON.stringify({
66+
instances: [{ content }],
67+
parameters: {
68+
autoTruncate: true,
69+
},
70+
}),
71+
}));
72+
73+
const responses = UrlFetchApp.fetchAll(requests);
74+
const results = responses.map((response) => {
75+
if (response.getResponseCode() !== 200) {
76+
throw new Error(response.getContentText());
77+
}
78+
return JSON.parse(response.getContentText());
79+
});
80+
81+
return results.map((result) => result.predictions[0].embeddings.values);
82+
}
83+
```
84+
85+
### Step 3: Calculate Similarity Between Embeddings
86+
87+
Once you have embeddings, you'll want to compare them to find similar content. The cosine similarity is a common metric for this purpose. It measures the cosine of the angle between two vectors. The cosine of the angle is calculated as the dot product of the vectors divided by the product of their magnitudes. The cosine similarity takes values between -1 (completely dissimilar) and 1 (completely similar) and is defined as `cosine similarity = dot product / (magnitude of x * magnitude of y)`.
88+
89+
```javascript
90+
/**
91+
* Calculates the cosine similarity between two vectors.
92+
* @param {number[]} x - The first vector.
93+
* @param {number[]} y - The second vector.
94+
* @returns {number} The cosine similarity value between -1 and 1.
95+
*/
96+
function similarity_(x, y) {
97+
return dotProduct_(x, y) / (magnitude_(x) * magnitude_(y));
98+
}
99+
100+
function dotProduct_(x, y) {
101+
let result = 0;
102+
for (let i = 0, l = Math.min(x.length, y.length); i < l; i += 1) {
103+
result += x[i] * y[i];
104+
}
105+
return result;
106+
}
107+
108+
function magnitude_(x) {
109+
let result = 0;
110+
for (let i = 0, l = x.length; i < l; i += 1) {
111+
result += x[i] ** 2;
112+
}
113+
return Math.sqrt(result);
114+
}
115+
```
116+
117+
I tested out the code with a small corpus of texts and it worked well.
118+
119+
```sh
120+
🔍 Searching for "Hello world!" ...
121+
🔥 1.00000 - "Hello world!"
122+
✅ 0.71132 - "Lorem ipsum dolor ..."
123+
👍 0.60433 - "Hello Justin"
124+
👍 0.53294 - "I love dogs 🐕"
125+
🤔 0.45529 - "The forecast is ..."
126+
🤔 0.45220 - "Foo bar"
127+
🤔 0.41263 - "Apps Script is a ..."
128+
```
129+
130+
You might be wondering why `Hello Justin` is ranked lower than `Lorem ipsum ...` when searching for `Hello world!`. While `Hello Justin` contains the word `Hello`, the meaning of `Hello world!` is more similar to `Lorem ipsum ...` as both are common phrases used in the context of software development. This will vary by the model used and how it was trained.
131+
132+
The full matrix of similarities is:
133+
134+
135+
### Step 4: Building a Simple Semantic Search
136+
137+
Here's how to create a basic semantic search function:
138+
139+
```javascript
140+
function semanticSearch(query, corpus) {
141+
// Generate embedding for the query
142+
const queryEmbedding = batchedEmbeddings_([query])[0];
143+
144+
// Create or use existing index
145+
const index = corpus.map((text) => ({
146+
text,
147+
embedding: batchedEmbeddings_([text])[0],
148+
}));
149+
150+
// Calculate similarities
151+
const results = index.map(({ text, embedding }) => ({
152+
text,
153+
similarity: similarity_(embedding, queryEmbedding),
154+
}));
155+
156+
// Sort by similarity (highest first)
157+
return results.sort((a, b) => b.similarity - a.similarity);
158+
}
159+
```
160+
161+
## Real-World Applications
162+
163+
### Example 1: Semantic Search in Google Sheets
164+
165+
You can build a custom function that allows users to search through data in a spreadsheet based on meaning rather than exact keyword matches:
166+
167+
```javascript
168+
/**
169+
* Custom function for Google Sheets to perform semantic search
170+
* @param {string} query The search query
171+
* @param {Range} dataRange The range containing the text to search through
172+
* @param {number} limit Optional limit on number of results
173+
* @return {string[][]} The search results with similarity scores
174+
* @customfunction
175+
*/
176+
function SEMANTIC_SEARCH(query, dataRange, limit = 5) {
177+
const corpus = dataRange.getValues().flat().filter(Boolean);
178+
const results = semanticSearch(query, corpus);
179+
180+
return results
181+
.slice(0, limit)
182+
.map(({ text, similarity }) => [text, similarity]);
183+
}
184+
```
185+
186+
### Example 2: Document Classification
187+
188+
You can use embeddings to automatically categorize documents:
189+
190+
```javascript
191+
/**
192+
* Document classification using embeddings
193+
* @param {string} document The document to classify
194+
* @param {Array<string>} categories List of possible categories
195+
* @return {string} The most similar category
196+
*/
197+
function classifyDocument(document = "I love dogs", categories = ["Software", "Animal", "Food"]) {
198+
const docEmbedding = batchedEmbeddings_([document])[0];
199+
const categoryEmbeddings = batchedEmbeddings_(categories);
200+
201+
const similarities = categoryEmbeddings.map((catEmbedding, index) => ({
202+
category: categories[index],
203+
similarity: similarity_(docEmbedding, catEmbedding)
204+
}));
205+
206+
// Return the most similar category
207+
const results = similarities.sort((a, b) => b.similarity - a.similarity)[0].category;
208+
console.log(results);
209+
return results;
210+
}
211+
```
212+
213+
When this is run with the default parameters, it will return `Animal`.
214+
215+
## Performance Considerations
216+
217+
When working with vector embeddings in Apps Script, keep these tips in mind:
218+
219+
1. **Batch Processing**: Generate embeddings in batches to reduce API calls.
220+
2. **Caching**: Store embeddings in properties service or a spreadsheet to avoid regenerating them.
221+
3. **Dimensionality**: Consider using a lower dimensionality for faster processing if accuracy is less critical.
222+
4. **Quota Limits**: Be mindful of Apps Script's quotas for UrlFetchApp calls and execution time.
223+
224+
> **Note**: You will likely want to cache embeddings for performance and cost savings.
225+
226+
## Conclusion
227+
228+
Vector embeddings represent a powerful tool for bringing advanced natural language processing capabilities to your Google Workspace environment. By combining Google Apps Script with Vertex AI, you can create intelligent applications that understand the semantic meaning of text, enabling more sophisticated search, recommendation, and classification systems.
229+
230+
The code examples provided in this blog post should help you get started with implementing vector embeddings in your own Apps Script projects. As you explore this technology further, you'll discover even more creative ways to leverage the power of AI within your workflows.
231+
232+
## Resources
233+
234+
- [Google Cloud Vertex AI Documentation](https://cloud.google.com/vertex-ai/docs)
235+
- [Apps Script Documentation](https://developers.google.com/apps-script)
236+
- [Text Embeddings API Reference](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings)
52.3 KB
Loading
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
import fs from "fs";
2+
import esbuild from "esbuild";
3+
import path from "path";
4+
5+
const outdir = "dist";
6+
const sourceRoot = "src";
7+
8+
await esbuild.build({
9+
entryPoints: ["./src/index.ts"],
10+
bundle: true,
11+
outdir,
12+
sourceRoot,
13+
platform: "node",
14+
format: "esm",
15+
plugins: [],
16+
inject: ["polyfill.js"],
17+
minify: true,
18+
banner: { js: "// Generated code DO NOT EDIT\n" },
19+
entryNames: "zzz_bundle_[name]",
20+
chunkNames: "zzz_chunk_[name]",
21+
// See mocks in https://github.com/mjmlio/mjml/tree/master/packages/mjml-browser
22+
});
23+
24+
const passThroughFiles = ["main.js", "examples.js", "tools.js", "appsscript.json"];
25+
26+
await Promise.all(
27+
passThroughFiles.map(async (file) =>
28+
fs.promises.copyFile(path.join(sourceRoot, file), path.join(outdir, file))
29+
)
30+
);
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"name": "@repository/vector-embeddings",
3+
"version": "0.1.0",
4+
"scripts": {
5+
"build": "node build.js",
6+
"check": "tsc --noEmit",
7+
"push": "DEBUG=clasp:* clasp push -f"
8+
},
9+
"author": "Justin Poehnelt <justin.poehnelt@gmail.com>",
10+
"license": "Apache-2.0",
11+
"devDependencies": {
12+
"@google/clasp": "3.0.2-alpha",
13+
"@types/google-apps-script": "^1.0.97",
14+
"esbuild": "^0.25.0"
15+
},
16+
"type": "module",
17+
"dependencies": {},
18+
"private": true
19+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
globalThis.window = globalThis;
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"timeZone": "America/Denver",
3+
"dependencies": {},
4+
"exceptionLogging": "STACKDRIVER",
5+
"runtimeVersion": "V8",
6+
"oauthScopes": [
7+
"https://www.googleapis.com/auth/cloud-platform",
8+
"https://www.googleapis.com/auth/script.external_request"
9+
]
10+
}

0 commit comments

Comments
 (0)