COMP SCI 180
Project 1
Goal: Produce a color image of digitized Prokudin-Gorskii using image processing techniques.
We start off with splitting the image into its three color channels: R, G, B.
Then, we have to align the 3 color channel images in order to combine them and create a color image. Unfortunately, each channel is a little bit offset from the other, meaning that when we put the 3 channels together, the image produced will not be completely aligned, producing a blurry image. For example:

So, we must find the optimal displacement for each color channel in order to best align them on top of each other. I started off on low res images, creating a function which performed exhaustive search through a bunch of possible displacements. At first, I defined my window of possibilities to be [-20, 20] around x=0, y=0. For each possible displacement, I found the sum of squared differences between a given color channel and a chosen reference channel. (I chose my reference channel to be blue). After doing this, I displayed my image, and it turned out to be quite unaligned still.
So, I decided to implement border reduction so that the edges of the image would be ignored during the search process. This is because I noticed the edges of the images were quite “noisy”, so it could have been affecting the alignment result. I did the reduced borders by calculating int(height * 5 / 100) and int(width * 5 / 100). When I calculated the SSD of 2 channels, I used only the internal pixels based off of my new border dimensions. Then, I also implemented a dynamic search window, also by calculating a percentage of height and width. I searched around x=0, y=0 based off of that dynamic height and width. These changes improved my alignment of low res images by a lot.
Then, it was time to move on to high res images. For this, I implemented pyramid alignment by rescaling my image down by a factor of 2 a certain num_levels times using a for loop. Then, for each layer of the pyramid, I found the best displacement values for x and y, and then used those values as a reference to search around for the next layer of the pyramid. At first, my image was taking forever to display, which I figured out was because I was dynamically calculating the window size for each and every layer. So, I decided to use a fixed window size of [-10, 10]. (At first, I was using a window size of [-15, 15], but my images were taking a very long time to display with this window size). Now, my high res images were being displayed in reasonable time, but they were still unaligned. At first, I thought this could be due to the “noisy borders”. So, I tried to crop 15 percent of the image border before beginning the pyramid search process. Even after this, my images were still unaligned, so I decided to try difference metrics for comparing the color channels, including NCC(Normalized Cross Correlation) and SSIM(Structural SImilarity and Index Measure). In the end, I ended up using SSIM, which produced some well aligned images!

IMAGE | G | R |
church.tif | (4, 25) | (-4, 58) |
emir.tif | (23, 49) | (41, 106) |
harvesters.tif | (16, 59) | (14, 123) |
icon.tif | (17, 40) | (23, 89) |
lady.tif | (9, 55) | (13, 119) |
melons.tif | (10, 81) | (13, 177) |
onion_church.tif | (28, 51) | (36, 108) |
sculpture.tif | (-11, 33) | (-27, 140) |
self_portrait.tif | (29, 78) | (37, 175) |
three_generations.tif | (17, 54) | (11, 113) |
tobolsk.jpg | (3, 3) | (3, 7) |
train.tif | (7, 41) | (32, 85) |
cathedral.jpg | (2, 5) | (3, 12) |
monestary.jpg | (2, -3) | (2, 3) |
Trees.tif | (-40, 75) | (-66, 113) |
Khan.tif | (50, 64) | (87, 134) |