*11-747 Learning-based Image Synthesis* *Manuel Rodriguez Ladron de Guevara*

## Overview

The goal of this assignment is to take the digitized Prokudin-Gorskii
glass plate images and produce a color image using image processing
techniques, aiming to have as few visual artifacts as possible.

The input of the process is a digitized glass plate image that consists of 3
separate channels ordered from top to bottom as BGR as shown in Figure 1. Our
job is
to separate
the 3 channels and align them as better as possible. The question is *how
to align them efficiently?*. The straight-away inefficient approach is
to search over all possible displacements until we find best alignment. This
is prohibitively expensive for large images—images with one of the
dimensions larger than 3000px.

## Search Methods

Each alignment gets a score using some image matching metrics. For this assingment, I explored Sum of Squared Differences (SSD), Normalized Cross Correlation (NCC), and Peak Signal-to-Noise Ration (PSNR) functions. The objective is to maximize or minimize one of these functions by searching which array configuration finds the highest or lowest value.

**Sum of Squared Differences** SSD is calculated based on the following equation:

where and are the two arrays and are pixels corresponding to the height and width dimensions respectively.

**Normalized Cross Correlation** NNC is calculated based on the following equation:

where and are the luminance of each image.

**Peak Signal-to-Noise Ration** PSNR is calculated based on the following equation:

where is the mean of squared error function between the two images and $p_m$ is the maximum pixel intensity.

To make an efficient algorithm, the assignment already suggests to use an
image pyramid—downsampling the image *n* times by a factor
of 2—and do the search over the new set of lower resolution images.
Having the image pyramid algorithm as a backbone for this project, I
propose two different approaches, besides exhaustive searching over the
whole pixel space.

**Exhaustive Search **The exhaustive search is the easy way out solution for small images, since the time complexity of the search spaces is , and consists of iterating over the rows and cols of the image array and computing a metric that compares such two array configurations.

**Image Pyramid Search ** This is an efficient search
algorithm that generally yields good results in less than 60 seconds for
high resolution images—up to 9000 pixels. As mentioned above, this
algorithm scales down the image by a factor of 2 *n* times, being
*n* an automated parameter based on the input image resolution.
Given an initial window search, the algorithm searches from coarse to
fine images, within the reduced window size, which decreases by a factor
of 0.8 as the images become larger. Window size is a hyperparameter.

**Evolutionary Search ** I propose a novel approach to search the best alignment, inspired by the Darwinian theory. This black-box optimization algorithm consists on treating the search space as a multivariate Gaussian, with varying mean and covariance matrix. The algorithm starts with a population , initial , and initial covariance . We select elites by keeping a fraction of the population of at each iteration, and noise is added to covariance at each step: . We optimize the different metrics mentioned above. We let the algorithm run for 10 periods each time and set an initial Gaussian variance of , which decreases by a factor of at each pyramid level. We optimize using the following equations:

where is the elite size and denotes the *i*-th best parameters from the previous iteration and is a small constant.

## Bells & Whistles

We introduce various improvements over our baselines.

**Automated Edge Cropping** Natural to the glass plates is
to have artifacts along the borders. While for the baselines we do not
touch the edges of the images, here we detect edges automatically by
applying an intensity threshold at rows and columns near the image limits. To alleviate a potential til of
the edges, instead of looking at values of single rows or cols, we look at multiple and compute a threshold
within the various rows or cols.

**Pytorch Implementation** We re-implement the algorithm in
Pytorch, this time making use of functions such as MSE loss, and
efficient built-in operations such as convolutions.

**Automatic Contrasting ** We implement automatic contrast to improve the perception of the image
quality.

## Results and Discussion

We show here the results obtained by different algorithms and metrics.

### Pyramid algorithm

The following images are aligned using the pyramid algorithm baseline. Unless otherwise stated, there is no prior automatic cropping—we show how, in some cases, prior cropping helps alignment. Likewise, and unless otherwise stated, the alignment is calculated using SSD metric.

### Evolutionary Approach with Pyramid Algorithm

The following images are aligned using the evolutionary approach described above integrated in our pyramid algorithm. This process treats a space of actions as a gaussian distribution, and the portion of the population tha best fits the problem leave their offsprings through the mean and variance of the next gen vector.

Due to time constraints, we show the not-so-good results taken by this approach, although I’ll work on this to get it done, since the part left is a grid-search of of hyperparameters.

### Bells and Whistles

Results on automatic contrasting

Results on automatic cropping.

### All images + Vector Displacements

The rest of the images using SSD + automatic fixed post-cropping: