Vector search is everywhere in 2025. Semantic search, image similarity, recommendation engines, RAG applications—they all need vector embeddings. The default answer is usually Pinecone, Weaviate, or Qdrant. But what if you could do vector search in SQLite?
That’s what sqlite-vec gives you. Turn your embedded database into a vector store with zero external dependencies.
What Is sqlite-vec?
sqlite-vec is a SQLite extension that adds vector similarity search to your database. It’s written in pure C with zero dependencies, which means it runs anywhere SQLite runs: servers, laptops, mobile devices, browsers (WASM), Raspberry Pis.
Think of it as bringing pgvector’s capabilities to SQLite, but lighter and more portable.
Key features:
- Multiple vector types: float32, int8, and bitvectors
- Binary quantization for 32x compression
- Pure SQL interface (CREATE, INSERT, SELECT)
- Distance metrics: L2 (Euclidean), cosine, Hamming
- Works with OpenAI, Cohere, Hugging Face embeddings
- Bindings for Python, Node.js, Ruby, Go, Rust
Why Vector Search in SQLite?
Local-first applications: No network latency, no API costs, works offline.
Development and prototyping: Test vector search without setting up infrastructure.
Small to medium datasets: Brute-force search on 10K-100K vectors is fast enough for most use cases.
Privacy: Your embeddings never leave your machine.
Edge deployments: Run vector search on IoT devices, mobile apps, or single-user applications.
You don’t need a dedicated vector database for everything. sqlite-vec is perfect when you want semantic search without the operational overhead.
Installation and Setup
Install the Extension
Python:
pip install sqlite-vec
import sqlite3
import sqlite_vec
db = sqlite3.connect("my_database.db")
db.enable_load_extension(True)
sqlite_vec.load(db)
db.enable_load_extension(False)
Node.js:
npm install sqlite-vec
const sqlite3 = require('sqlite3');
const { loadSqliteVec } = require('sqlite-vec');
const db = new sqlite3.Database(':memory:');
loadSqliteVec(db);
Command-line (macOS/Linux):
# Download pre-compiled extension
curl -L https://github.com/asg017/sqlite-vec/releases/latest/download/vec0.dylib -o vec0.dylib
# Load in sqlite3
sqlite3 my_database.db
.load ./vec0.dylib
SELECT vec_version();
Go, Rust, Ruby: Check the official documentation for language-specific bindings.
Verify Installation
SELECT vec_version();
-- Output: v0.1.6
Part 1: Basic Vector Storage and Search
Creating a Vector Table
Use the vec0 virtual table for efficient vector storage:
CREATE VIRTUAL TABLE vec_documents USING vec0(
document_id INTEGER PRIMARY KEY,
content_embedding float[768], -- 768-dimensional float vector
category TEXT, -- metadata column for filtering
+content TEXT, -- auxiliary column (stores data without indexing)
+url TEXT
);
Column types:
- Vector columns:
float[N],int8[N], orbit[N] - Metadata columns: Filterable with WHERE clauses
- Auxiliary columns: Prefixed with
+, stored but not filterable (for large text)
Inserting Vectors
Vectors can be inserted as JSON text or binary blobs:
-- JSON format (easy but verbose)
INSERT INTO vec_documents(document_id, content_embedding, category, content, url)
VALUES (
1,
'[0.1, 0.2, 0.3, ...]', -- 768 floats
'technology',
'Introduction to vector search',
'https://example.com/doc1'
);
-- Binary format (efficient)
-- Python example:
import struct
def serialize_f32(vector):
return struct.pack(f'{len(vector)}f', *vector)
embedding = [0.1, 0.2, 0.3, ...] # 768 floats
db.execute(
"INSERT INTO vec_documents VALUES (?, ?, ?, ?, ?)",
(doc_id, serialize_f32(embedding), category, content, url)
)
K-Nearest Neighbors Search
Find the top K most similar vectors:
-- Basic KNN search
SELECT
document_id,
content,
distance
FROM vec_documents
WHERE content_embedding MATCH ? -- Query vector
AND k = 10
ORDER BY distance;
-- With metadata filters
SELECT
document_id,
content,
category,
distance
FROM vec_documents
WHERE content_embedding MATCH ?
AND k = 10
AND category = 'technology'
ORDER BY distance;
The MATCH operator performs KNN search, and distance returns the similarity score (lower is more similar).
Distance Metrics
L2 Distance (Euclidean) - Default:
CREATE VIRTUAL TABLE vec_items USING vec0(
embedding float[128]
);
-- Uses L2 distance by default
Cosine Distance - Better for normalized vectors:
CREATE VIRTUAL TABLE vec_items USING vec0(
embedding float[128] distance_metric=cosine
);
Manual distance calculation:
SELECT vec_distance_L2('[1, 2, 3]', '[4, 5, 6]'); -- 5.196152422706632
SELECT vec_distance_cosine('[1, 1]', '[2, 2]'); -- ~0 (same direction)
SELECT vec_distance_cosine('[1, 1]', '[-2, -2]'); -- 2 (opposite direction)
Part 2: Image Similarity Search
Let’s build a practical image search system using CLIP embeddings.
Setup: Generate Image Embeddings
First, generate embeddings for your images using a model like CLIP:
import sqlite3
import sqlite_vec
import struct
from PIL import Image
import torch
from transformers import CLIPProcessor, CLIPModel
# Load CLIP model
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
def get_image_embedding(image_path):
image = Image.open(image_path)
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
embedding = model.get_image_features(**inputs)
return embedding[0].numpy().tolist()
def serialize_f32(vector):
return struct.pack(f'{len(vector)}f', *vector)
# Setup database
db = sqlite3.connect("image_search.db")
db.enable_load_extension(True)
sqlite_vec.load(db)
db.enable_load_extension(False)
# Create table
db.execute("""
CREATE VIRTUAL TABLE vec_images USING vec0(
image_id INTEGER PRIMARY KEY,
embedding float[512], -- CLIP produces 512-dim vectors
+filename TEXT,
+image_data BLOB -- Store actual image
)
""")
Index Your Images
import os
image_dir = "./photos"
for idx, filename in enumerate(os.listdir(image_dir)):
if filename.endswith(('.jpg', '.png')):
path = os.path.join(image_dir, filename)
# Generate embedding
embedding = get_image_embedding(path)
embedding_blob = serialize_f32(embedding)
# Read image data
with open(path, 'rb') as f:
image_data = f.read()
# Insert into database
db.execute(
"INSERT INTO vec_images(image_id, embedding, filename, image_data) VALUES (?, ?, ?, ?)",
(idx, embedding_blob, filename, image_data)
)
db.commit()
print(f"Indexed {idx + 1} images")
Search by Image
def find_similar_images(query_image_path, k=5):
# Get embedding for query image
query_embedding = get_image_embedding(query_image_path)
query_blob = serialize_f32(query_embedding)
# Search for similar images
results = db.execute("""
SELECT
image_id,
filename,
distance
FROM vec_images
WHERE embedding MATCH ?
AND k = ?
ORDER BY distance
""", (query_blob, k)).fetchall()
return results
# Find images similar to a query
similar = find_similar_images("query_photo.jpg", k=5)
for image_id, filename, distance in similar:
print(f"{filename}: {distance:.4f}")
Search by Text (Cross-Modal)
CLIP embeddings work across images and text:
def search_images_by_text(query_text, k=5):
# Get text embedding
inputs = processor(text=[query_text], return_tensors="pt", padding=True)
with torch.no_grad():
text_embedding = model.get_text_features(**inputs)
query_blob = serialize_f32(text_embedding[0].numpy().tolist())
# Search
results = db.execute("""
SELECT
image_id,
filename,
distance
FROM vec_images
WHERE embedding MATCH ?
AND k = ?
ORDER BY distance
""", (query_blob, k)).fetchall()
return results
# Search images by description
results = search_images_by_text("a dog playing on the beach", k=5)
for _, filename, distance in results:
print(f"{filename}: {distance:.4f}")
Part 3: Binary Quantization for Speed
Binary quantization reduces float32 vectors to 1-bit values, giving you 32x compression with minimal accuracy loss.
How Binary Quantization Works
- Positive values → 1
- Negative values → 0
- Distance metric changes to Hamming distance
SELECT vec_quantize_binary('[0.5, -0.3, 0.8, -0.1]');
-- Returns: bit vector X'A0' (binary: 1001...)
Store Both Versions
For best results, store both full-precision and quantized vectors:
db.execute("""
CREATE VIRTUAL TABLE vec_images_optimized USING vec0(
image_id INTEGER PRIMARY KEY,
embedding float[512], -- Full precision for re-scoring
embedding_binary bit[512], -- Binary for fast initial search
+filename TEXT
)
""")
# Insert with both versions
embedding_blob = serialize_f32(embedding)
db.execute("""
INSERT INTO vec_images_optimized(image_id, embedding, embedding_binary, filename)
VALUES (?, ?, vec_quantize_binary(?), ?)
""", (idx, embedding_blob, embedding_blob, filename))
Two-Stage Search with Re-scoring
Use binary vectors for fast initial search, then refine with full precision:
def search_with_rescoring(query_embedding, k=5, oversample=8):
query_blob = serialize_f32(query_embedding)
# Stage 1: Fast coarse search with binary vectors
coarse_results = db.execute("""
SELECT image_id, embedding
FROM vec_images_optimized
WHERE embedding_binary MATCH vec_quantize_binary(?)
AND k = ?
ORDER BY distance
""", (query_blob, k * oversample)).fetchall()
# Stage 2: Re-score with full precision
from struct import unpack
def cosine_distance(a, b):
import numpy as np
return 1 - np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
refined = []
for image_id, embedding_blob in coarse_results:
embedding = list(unpack(f'{len(embedding_blob)//4}f', embedding_blob))
distance = cosine_distance(query_embedding, embedding)
refined.append((image_id, distance))
# Return top K after re-scoring
refined.sort(key=lambda x: x[1])
return refined[:k]
Performance gain: Binary search is ~30x faster for the initial pass, with less than 5% accuracy loss after re-scoring.
Part 4: Semantic Text Search with OpenAI
Build a semantic search system using OpenAI embeddings.
Generate and Store Text Embeddings
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
db.execute("""
CREATE VIRTUAL TABLE vec_articles USING vec0(
article_id INTEGER PRIMARY KEY,
title_embedding float[1536], -- OpenAI text-embedding-3-small
+title TEXT,
+content TEXT,
+url TEXT
)
""")
# Embed articles
articles = [
{"title": "Introduction to AI", "content": "...", "url": "..."},
{"title": "Machine Learning Basics", "content": "...", "url": "..."},
# ... more articles
]
# Batch embed for efficiency
titles = [a["title"] for a in articles]
response = client.embeddings.create(
input=titles,
model="text-embedding-3-small"
)
# Insert embeddings
for idx, (article, embedding_data) in enumerate(zip(articles, response.data)):
embedding = embedding_data.embedding
embedding_blob = serialize_f32(embedding)
db.execute("""
INSERT INTO vec_articles(article_id, title_embedding, title, content, url)
VALUES (?, ?, ?, ?, ?)
""", (idx, embedding_blob, article["title"], article["content"], article["url"]))
db.commit()
Semantic Search
def semantic_search(query, k=10):
# Get query embedding
response = client.embeddings.create(
input=query,
model="text-embedding-3-small"
)
query_embedding = response.data[0].embedding
query_blob = serialize_f32(query_embedding)
# Search
results = db.execute("""
SELECT
article_id,
title,
content,
distance
FROM vec_articles
WHERE title_embedding MATCH ?
AND k = ?
ORDER BY distance
""", (query_blob, k)).fetchall()
return results
# Search
results = semantic_search("machine learning tutorials", k=5)
for _, title, content, distance in results:
print(f"{title} (score: {1 - distance:.3f})")
print(f"{content[:200]}...\n")
Hybrid Search (Keyword + Semantic)
Combine traditional full-text search with semantic search:
db.execute("""
CREATE VIRTUAL TABLE articles_fts USING fts5(
article_id,
title,
content
)
""")
def hybrid_search(query, k=10):
# Get query embedding
response = client.embeddings.create(input=query, model="text-embedding-3-small")
query_embedding = response.data[0].embedding
query_blob = serialize_f32(query_embedding)
# Semantic search
semantic_results = db.execute("""
SELECT article_id, distance as score
FROM vec_articles
WHERE title_embedding MATCH ?
AND k = ?
""", (query_blob, k * 2)).fetchall()
# Keyword search
keyword_results = db.execute("""
SELECT article_id, rank as score
FROM articles_fts
WHERE articles_fts MATCH ?
LIMIT ?
""", (query, k * 2)).fetchall()
# Merge and re-rank (simple weighted combination)
from collections import defaultdict
scores = defaultdict(float)
for article_id, score in semantic_results:
scores[article_id] += (1 - score) * 0.7 # 70% weight to semantic
for article_id, rank in keyword_results:
scores[article_id] += (1 / (rank + 1)) * 0.3 # 30% weight to keyword
# Get top K
top_ids = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)[:k]
# Fetch articles
results = []
for article_id in top_ids:
article = db.execute(
"SELECT title, content FROM vec_articles WHERE article_id = ?",
(article_id,)
).fetchone()
results.append((article[0], article[1], scores[article_id]))
return results
Part 5: Multi-Tenant RAG System
Build a RAG (Retrieval-Augmented Generation) system with per-user document isolation.
Use Partition Keys for Sharding
db.execute("""
CREATE VIRTUAL TABLE vec_documents USING vec0(
user_id INTEGER PARTITION KEY, -- Shard by user
document_id INTEGER,
chunk_embedding float[768],
+chunk_text TEXT,
+chunk_index INTEGER,
+document_title TEXT
)
""")
Partition keys enable fast per-user queries without scanning all data.
Index User Documents
def index_document(user_id, document_title, text, embedding_model):
# Split into chunks
chunks = split_text_into_chunks(text, max_length=500)
for idx, chunk in enumerate(chunks):
# Generate embedding
response = client.embeddings.create(
input=chunk,
model="text-embedding-3-small"
)
embedding = response.data[0].embedding
embedding_blob = serialize_f32(embedding)
# Insert
db.execute("""
INSERT INTO vec_documents(user_id, document_id, chunk_embedding, chunk_text, chunk_index, document_title)
VALUES (?, ?, ?, ?, ?, ?)
""", (user_id, document_id, embedding_blob, chunk, idx, document_title))
db.commit()
def split_text_into_chunks(text, max_length=500, overlap=50):
words = text.split()
chunks = []
for i in range(0, len(words), max_length - overlap):
chunk = ' '.join(words[i:i + max_length])
chunks.append(chunk)
return chunks
Query User Documents
def rag_retrieve(user_id, query, k=5):
# Get query embedding
response = client.embeddings.create(input=query, model="text-embedding-3-small")
query_embedding = response.data[0].embedding
query_blob = serialize_f32(query_embedding)
# Search within user's documents only
results = db.execute("""
SELECT
document_id,
document_title,
chunk_text,
chunk_index,
distance
FROM vec_documents
WHERE chunk_embedding MATCH ?
AND user_id = ? -- Fast partition filtering
AND k = ?
ORDER BY distance
""", (query_blob, user_id, k)).fetchall()
return results
# Use with LLM
context_chunks = rag_retrieve(user_id=123, query="How do I deploy my app?", k=3)
context = "\n\n".join(chunk[2] for chunk in context_chunks)
llm_response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": "How do I deploy my app?"}
]
)
Part 6: Vector Operations and Utilities
sqlite-vec provides utility functions for vector manipulation.
Vector Information
-- Get vector length (dimensions)
SELECT vec_length('[1, 2, 3, 4]'); -- 4
-- Get vector type
SELECT vec_type('[1.0, 2.0]'); -- 'float32'
SELECT vec_type(vec_int8('[1, 2]')); -- 'int8'
Vector Math
-- Add vectors
SELECT vec_to_json(vec_add('[1, 2, 3]', '[4, 5, 6]'));
-- '[5.0, 7.0, 9.0]'
-- Subtract vectors
SELECT vec_to_json(vec_sub('[4, 5, 6]', '[1, 2, 3]'));
-- '[3.0, 3.0, 3.0]'
-- Normalize vector (L2 normalization)
SELECT vec_to_json(vec_normalize('[3, 4]'));
-- '[0.6, 0.8]' (magnitude = 5, so 3/5 and 4/5)
Matryoshka Embeddings (Adaptive Size)
Some models support truncation without losing much quality:
-- Truncate 1024-dim embedding to 256 dimensions
SELECT vec_normalize(vec_slice(embedding, 0, 256)) as embedding_256d
FROM vec_documents;
Supported models:
- OpenAI text-embedding-3-small: 1536 → 512
- OpenAI text-embedding-3-large: 3072 → 1024, 512
- mxbai-embed-large-v1: 1024 → 512, 256, 128, 64
Iterate Vector Elements
SELECT rowid, value
FROM vec_each('[10, 20, 30]');
/*
┌───────┬───────┐
│ rowid │ value │
├───────┼───────┤
│ 0 │ 10 │
│ 1 │ 20 │
│ 2 │ 30 │
└───────┴───────┘
*/
Performance Tips
1. Use Partition Keys for Multi-Tenant Data
Partition keys create internal shards for fast filtering:
CREATE VIRTUAL TABLE vec_docs USING vec0(
user_id INTEGER PARTITION KEY,
embedding float[768]
);
-- This query only scans user 123's partition
SELECT * FROM vec_docs
WHERE embedding MATCH :query
AND user_id = 123
AND k = 10;
2. Binary Quantization for Large Datasets
32x compression with less than 5% accuracy loss:
# Store both versions
embedding_binary = db.execute(
"SELECT vec_quantize_binary(?)",
(embedding_blob,)
).fetchone()[0]
3. Re-scoring for Better Accuracy
Oversample with binary, refine with full precision:
WITH coarse AS (
SELECT rowid, embedding
FROM vec_items
WHERE embedding_binary MATCH vec_quantize_binary(:query)
AND k = 100
)
SELECT rowid, vec_distance_L2(embedding, :query) as distance
FROM coarse
ORDER BY distance
LIMIT 10;
4. Batch Inserts
Insert vectors in transactions:
db.execute("BEGIN")
for embedding in embeddings:
db.execute("INSERT INTO vec_items VALUES (?)", (embedding,))
db.execute("COMMIT")
Limitations and Considerations
No ANN indexes yet: Currently brute-force search only. Fine for less than 100K vectors, slower beyond that. HNSW support is planned.
Pre-v1 API: Expect breaking changes before v1.0 release.
Metadata column efficiency: Strings longer than 12 chars have slight overhead. Use auxiliary columns (+) for large text.
Binary quantization requirements: Vector dimensions must be divisible by 8.
Loading extensions: Some hosting providers don’t allow loading SQLite extensions.
When to Use sqlite-vec
Use sqlite-vec when:
- Building local-first or offline applications
- Dataset is less than 100K vectors
- You want vector search without operational complexity
- Privacy matters (data stays local)
- Prototyping or MVP development
- Edge deployments (IoT, mobile, single-user apps)
Use a dedicated vector database when:
- Dataset is more than 1M vectors
- Need ANN indexes (HNSW, IVF)
- Require distributed search across multiple nodes
- Need real-time updates at massive scale
Real-World Use Cases
Personal knowledge base: Search your notes, documents, and bookmarks semantically.
Photo library: Find images by description or similar photos.
Customer support: RAG system for answering questions from docs.
Content recommendation: Suggest similar articles, products, or videos.
Code search: Find code snippets by natural language description.
Local AI assistants: Offline RAG for privacy-focused AI apps.
Final Thoughts
sqlite-vec brings vector search to the most deployed database in the world. It’s not trying to replace Pinecone or Weaviate—it’s solving a different problem.
If you need distributed, billion-vector search, use a specialized vector database. But if you need semantic search in a local-first app, RAG for a single-user system, or just want to experiment without infrastructure, sqlite-vec is perfect.
It’s SQLite. It’s portable, embedded, and works everywhere. Now it does vector search too.
Resources:
Vector search doesn’t need to be complicated. Sometimes SQLite is all you need.