Image Processing with Numpy
Published: 23/10/2016
Last updated
Was this helpful?
Published: 23/10/2016
Last updated
Was this helpful?
I recently had to computationally alter some images, an ended up getting interested in some of the basic image manipulation techniques. The result is this post.
In python, there are a number of powerful libraries that make image processing easy, such as , and . For anyone thinking about doing serious image processing, they should be the first place to look.
However, I am not planning on putting anything into production. Instead the goal of this post is to try and understand the fundamentals of a few simple image processing techniques. Because of this, I am going to stick to using numpy to preform most of the manipulations, although I will use other libraries now and then.
Let's start by loading our librariesIn [1]:
And loading our imageIn [2]:
Out[2]:
We see that image is loaded into an array of dimension 4608 x 2592 x 3.
The first two indices represent the Y and X position of a pixel, and the third represents the RGB colour value of the pixel. Let's take a look at what the image is ofIn [3]:
It is a photo of a painting of a dog. The painting itself was found in a flat in London I lived in may years ago, abandoned by its previous owner. I don't know what the story behind it is. If you do, please get in touch.
We can see that whichever bumbling fool took that photo of the painting also captured a lot of the wall. We can crop the photo so we are only focused on the painting itself. In numpy, this is just a matter of slicing the image arrayIn [4]:
Each pixel of the image is represented by three integers: the RGB value of its colour. Splitting the image into seperate colour components is just a matter of pulling out the correct slice of the image array:In [6]:
In my first edition of this post I made this mistake. Thanks to commenter Rusty Chris Holleman for pointing out the problem.
Representing colour in this way, we can think of each pixel as a point in a three dimensional space. Thinking of colour this way, we can apply various transformations to the colour "point". An interesting example of these is "rotating" the colour.
The following functions apply a sigmoid to the images colour space, and rotate it about the red axis by some angle, before returning the image to normal colour space.In [7]:
psychedelic.
On the topic of colour, we can also transform the image to greyscale easily. There are a number of ways to do this, but a straight forward way is to take the weighted mean of the RGB value of original image:In [9]:
In [10]:
Another of the basic operations you can apply to an image is a convolution. It is defined as
C(x,y)=∫dx′dy′I(x+x′,y+y′)W(x′,y′)C(x,y)=∫dx′dy′I(x+x′,y+y′)W(x′,y′)
Where CC is the convoluted image, II is the original image and WW is a window function. Essentially we are replacing each pixel with a weighted sum of nearby pixels.
Because convolutions can be expensive, let's start by shrinking the imageIn [11]:
We can now apply a uniform window to the image. This has the effect of bluring the image, by averaging each pixel with those nearbyIn [12]:
For blurring an image, there is a whole host of different windows and functions that can be used. The most common I have found are the uniform window, the Gaussian window and the median filter. To get a feel what these are doing to an image, I apply all of these filters to our image, for different window sizesIn [13]:
By finding the gradient in both the X and Y directions, and then taking the magnitude of these values we get a map of the gradients in an image for each colourIn [14]:
The results are pretty impressive. By combining filtering and gradient finding operations together we can generate some strange patterns that resemble the original image but are distorted in interesting ways. The best I have found so far is combining a large window median filter with a Sobel filterIn [15]:
So far we've looked at applying the same operations to all colour channels at once. If we blur only one colour channel at a time, we get the following eerie effectsIn [16]:
Another major area of image processing is segmenting the image into different regions, for example foreground and background. There are a number of ways to do this, and I will only look at a few here.
The simplest is to convert the image to greyscale, and find a threshold. Pixels with a value above the threshold are treated as belonging to one region, and below another region. We can explore how different choosing different thresholds segments our greyscale image belowIn [16]:
It's not great. However, we might think that by converting our image to greyscale we are throwing away information. We can apply the same process to each colour channel separately to getIn [18]:
A natural way to combine each channel into one image is to take the intersection of each thresholded colour channelIn [19]:
Which works much better then just looking at the greyscale image.
Which isn't bad at all. By randomly assigning colours to each of the segments we can create artIn [21]:
So far we have treated an image as a bitmap. As a final step, I am going to extract the segment of the dog from the image, and convert it into a vectorise image.
Let's start by grabbing the segment of the dog:In [23]:
We can now use a a contour tracing algorithm from Scikit-Image to extract the paths round the dogIn [25]:
When tracing shapes it it worth playing with the tolerance parameter. This controls how accurately the path follows the original bitmap shape. As our image is fairly noisey, I've turned this up quiet high to "smooth" out the image. The result is a fairly jagged polygons.
To plot the image, we can convert these contour paths to patches and fill them with matplotlib:In [27]:
And we run into the problem that some of the contours we've found are not the outside edge of the shape, but the inside edge. To get round this I use the following hackey code.
In [29]:
In [30]:
Finally we can plot our polygons with a random colour.In [31]:
And with that I'm done. In this post I have only really scratched the surface with what can be done with image processing, but it already feels like I have written a lot, so I'm going to leave it here. For more ideas of what can be done, I suggest looking at the examples of a number of image libraries.
When using matplotlib's to display images, it is important to keep track of which data type you are using, as the colour mapping used is data type dependent: if a float is used, the values are mapped to the range 0-1, so we need to cast to type "uint8" to get the expected behavior. A good discussion of this issue can be found here .
There is some subtleties - a legal colours officially exist as integer points in a three dimensional cube of side lengths 255. It is possible that a rotation could push a point out of this cube. To get around this I apply a transformation to the data, a mapping from the range 0-1 to the full real line. Having applied this transformation we apply the rotation matrix then transform back to colour space.
The rotation matrix is applied pixel-wise to to the image using numpy's function, which I hadn't used before but, but make the operation concise. It is explained well in .
Not bad. It looks even more impressive if we continiously rotate the colour of the pixels. We can animate this transformation using matplotlib's tool. has an excellent introduction to how to use it.In [8]:
Blurring is only one use of convolutions in image processing. By using more exotic windows, was can extract different kinds of information. The tried to approximate the gradients of the image along one direction using window functions of the form
How exactly we choose a threshold is going to be application specific. However, we might argue that we would expect the backgroud pixel values to be similar in value to other background pixel values, and the same for the foreground. One way to quantify this is to say that we are looking for the threshold which minimises the inter pixel variance in the foreground and background. One way to calculate this is , implemented belowIn [17]:
We can be more direct in our approach though. In Otsu thresholding, we found the threshold which minimised the inter-segment pixel variance. If rather then looking for a threshold, we look for clusters in colour space, we end up with the clustering technique. Applying this directed to the coloured image, we getIn [20]:
We check whether a point is within a polygon using the , and rely on the fact the for our specific case polygons don't overlap with each other: one polygon is either inside another or not. Finally the subsume function arranges all the polygons into our polygon object, which describes the outside and possible inside edges of the shapesIn [28]:
The jupyter notebook for this post can be found on github, .
Reference :