Bradley Kirton's Blog

Published on June 1, 2026

Go home

Semantic search sqlite (C extension)

This is a follow up to a previous post which made use of numpy for calculating the cosine similarity of two vectors.

I have since discovered most embedding inference engines return L2 normalized vectors. This means the calculation of consine similarity is simplified to just the dot product of the two vectors.

Given this I thought I would attempt to write a C extension for calculating the inner product of two vectors in C.

The core idea of this approach is to store a sequence of floating point numbers as binary in sqlite, then use C to iterate over two sequences and calculate the inner product. So far this approach assumes the vectors are pre L2 normalized (If you don't have normalized vectors you can divide each element by the vector norm).

To get data into the database using Python you can either pack the data using the Python struct module or just use the built in array module and serialize as bytes.

import array
import struct

values = [0.1, 0.2, 0.3, 0.4]

# array module — pick 'f' for float32, 'd' for float64
blob = array.array("d", values).tobytes()

# struct module — equivalent
blob = struct.pack(f"{len(values)}d", *values)

The dot function has the following signature:

| Argument | Type | Description | | --- | --- | --- | | a | BLOB | First vector, packed as a contiguous array of dtype. | | b | BLOB | Second vector, must have the same byte length as a. | | dtype | TEXT | Element type — 'f' (32-bit float) or 'd' (64-bit double). |

Here is a full example

import array
import sqlite3

conn = sqlite3.connect(":memory:")
conn.enable_load_extension(True)
try:
    conn.load_extension("./dot.so")
finally:
    conn.enable_load_extension(False)

a = array.array("d", [1.0, 2.0, 3.0]).tobytes()
b = array.array("d", [4.0, 5.0, 6.0]).tobytes()

(dp,) = conn.execute("SELECT dot(?, ?, ?)", (a, b, "d")).fetchone()
# dp == 32.0

The extension code can be found here. I may use this repo for creating a set of extensions in future.