DOCUMENT SCANNER USING OPENCV

OpenCV is a library written in C++ aimed to provide an infrastructure for computer vision and machine learning. The library contains more than 2500 algorithms that are used for facial detection, gesture recognition, augmented reality, track moving objects, identify objects, etc.

In this tutorial, we will create a simple document scanner using the OpenCV library. This can be useful, for example, for scanning pages in a book.

This is a beginner tutorial so I will explain in detail each line of code so that you can follow along with me.

The steps that we need to follow to build this project are:

1. Image pre-processing

2. Find the edges in the image

3. Use the edges to find the biggest contours

4. Select only the contours of the document

5. Apply warp perspective to get the top-down view of the document

SETUP

So, let's get started. Open your favourite IDE and create a new Python file, name it documentScanner.py and import necessary packages by running the command below:

Ok great! We are now ready to start writing some code.

LOADING VIDEO

After creating the file documentScanner.py put the following code:

To load a video with OpenCV we use the VideoCapture() function, which takes the device index as an argument. We set the height (heightImg) and width(widthImg) of the image so that we can resize it since videos are fast moving still images.

Lastly, we take a copy of our image. This will allow us later to display the contours of the document on the original image rather than the modified image.

IMAGE PROCESSING

Now we start pre-processing our image by converting it to grayscale, blur it, and then find the edges in the image. Let's see how to do it:

Now that our image is loaded we start by converting it from the BGR colour to grayscale.

Next, to remove noise from the image, we smooth it by using the GaussianBlur function. The first argument is the image we want to blur. The second argument is the width and height of the kernel which must be positive and odd.

The last argument is the standard deviation which in this case we set to 1. If we set it to 0, OpenCV calculates it from the kernel size.

Then, we apply the Canny edge detector using the function Canny(). This is a multi-stage algorithm that is used to remove noise and detect edges in the image.

The first argument is our input image. The second and third argument are the thresholds that the algorithm uses to determine the edges and non-edges in the image.

We used the imshow function to display our images in a window.

The waitKey(delay) function will wait for a pressed key for delay milliseconds if delay is positive. Otherwise, it will wait infinitely for a pressed key.

The destroyAllWindows() function simply destroys all the windows we created.

Below you can see the output that we get (you can find the image in the repository):

Original image:

Convert the BGR colour to grayscale:

Blurring the image using GaussianBlur function with a (5, 5) kernel size:

Applying Canny edge detector:

USING EDGES TO FIND CONTOURS

Now we can use our edged image to find the contours.

To find the contours on the image we define a function getcontours(). This function takes only one argument, the source image. This function allows us to draw contours on an image.

Let's see what we get so far:

Cool! Let's keep going.

DETECTING THE BIGGEST CONTOURS

Now we need to find the biggest rectangle contour in the image that will define our document. Here is how to do it:

Here we use the arcLength function to compute the perimeter of the contour. It takes as first argument the contour, and the second argument is just a boolean to tell the function whether the contour is closed or not. True means that the contour is closed.

Then we use the approxPolyDP function to get the approximation of the contour with another contour with fewer vertices.

This function takes 3 argument: the first one is the contour we want to approximate, the second argument is to specify the approximation accuracy. In our case, we are approximating the contour with an accuracy that is proportional to the contour perimeter (0.05 * perimeter).The last argument is a boolean to specify whether the approximated contour is closed or not.

Finally, we check if the approximated contour has four point. If so, we can assume with confidence that we have found our document (we break the for loop).

Let's see what we got

APPLYING WARP PERSPECTIVE

Now we are ready to apply the warp to get the top-down view:

Basically, we define a function getWarp() that takes an image and the biggest contour as input and returns the top-down view of the image.

Here is what we get:

Lastly, we apply adaptive thresholding to get a black and white scanned image.

SUMARY

In this tutorial, we learned how to build a simple document scanner with OpenCV. Of course, the best thing about this algorithm is that it wont throw an error when its unable find the contours or even if the document goes out of the frame.

This same program can be used to scan an image while doing so try to keep the original size of the image and you will get a better result.

As always, the full code and the necessary file are available in my GitHub repository