Discrete 2D Convolution
As the banner photo of this post models, we can think of a grayscale digital image as a 2D matrix of squares (pixels). Each pixel maps to an integer from 0 to 255. These values carry no color information, instead expressing lightness, from black (0) to white (255). This specific range of values allows the full range possible to be represented in a computer by unsigned 8-bit integers.
If Python is your language du jour and you want to programmatically blur an image, it's common to use libraries like PIL or cv2 with built-in methods for that purpose. Today, I'm building a simple Gaussian blur from scratch. Here, I define "from scratch" to mean only using numpy to execute the kernel convolution. I will import matplotlib and PIL, but only to set up the initial image and visualize the process.
I use PIL at the beginning of the marimo notebook to load the original image file, then transform it (a 2655 × 2655-pixel JPEG image in RGB color space, converted to grayscale) into a 256x256 numpy array. I chose this size as arbitrarily small to balance visible results with prototyping speed.
But what is a blur, mathematically? With the mental model of a matrix in mind, blurring replaces each entry's value with a weighted average of its neighbors. The weights are determined by a kernel, which is a separate 2D matrix with smaller dimensions than the image it's convolving. The entries in the kernel are multipliers, applied to the "window" of pixels around each pixel; their sum becomes the final weighted average. This kernel "window" slides across the image, summing the products of each element to produce new values for each pixel.
The name for this process is discrete 2D convolution. This term reflects that we're applying a sliding, product-summing operation (convolution) over a finite grid of values (discrete) in both the horizontal and vertical directions (2D).
The blur occurs because the kernel averages the lightness values between a given "center" pixel and its neighbors. By contrast, no blurring would occur if a 3x3 kernel with a center weight of 1 and perimeter weights of 0 were applied to an image.
The Kernel
The kernel I'll use ([[1,2,1],[2,4,2],[1,2,1]]) comes from taking the outer product of Pascal's triangle's second row ([1,2,1]) with itself, which gives a reasonable discrete approximation of the 2D Gaussian. It produces a Gaussian-esque blur because it's center-weighted and tapers to a minimum at the corners.
The kernel weights multiply each value in the window and sum into a dot product of flattened vectors. The weights add to 16, so we divide by 16 to normalize them to sum to 1. This preserves the original image's brightness; skip that step and the output shifts in brightness rather than just sharpness.
Part 2 will derive the kernel analytically from the 2D Gaussian function and introduce sigma and radius as free parameters. The Gaussian function is separable, so the 2D convolution factors into two cheaper 1D passes, one horizontal and one vertical. Part 3 covers the complexity reduction.
Handling Border Pixels
Applying the kernel to border pixels is a problem: some positions extend past the array boundary, causing undefined behavior or tracebacks. Padding is required. Four options:
- Reflect padding: Mirrors the image content across the border. If the rightmost entries in a row are
[10, 20, 30], two pixels of reflect padding on the right would be[20, 10]. - Wrap padding: Treats the image as a tile, so the right edge wraps to the left. Imagine the image wrapped around a cylinder; the padding continues one edge onto the next.
- Zero-padding: Adds a border of zeros around the image, creating a solid black border. Makes the pixels near it darker during convolution.
- Edge padding: Replicates the nearest border pixel outward, which may distort any sharp edges or distinctive content near the border.
I went with edge padding. The original image has no strong border content, doesn't tile or repeat, and the 3x3 kernel means the affected region is only one pixel wide. The surface area for trouble is small. This means we only need to add two rows and two columns of pixels, bringing the shape of our pre-processed matrix to 258x258.
The Convolution Loop
Once padded, we loop over the rows and columns to make the transformation. We initialize a result_array of zeros at 256x256 to store each output. Then, at each entry (r, c):
- Compute boundary indices
- Use
np.ix_to select the kernel window entries without a nested loop - Compute and normalize the dot product
- Write the result to
result_arrayThis is O(nk²) complexity, with n as the number of matrix entries and k as the kernel side length. Clear and correct, but not fast.
Results
The transformation is mild. A small binomial kernel won't produce dramatic blurring, and the output confirms the approach is sound.
There are limitations. O(nk²) is slow on 65k+ pixels. The kernel is fixed, and the binomial approximation is coarse, so the program is inflexible and inexact. Future parts address all three.
Next
In Part 2, I'll derive the Gaussian kernel analytically to parameterize it. In Part 3, we'll prove separability and reduce complexity for a measurable speed improvement.
To view a PDF version of this notebook, click here.