If you were to attend a Graduate Computer Vision course without any experience of Computer Vision at an undergraduate level like me, you would certainly face problems with edge detection and what it really means. They usually start with Canny Edge Detector at any Graduate Course and it makes it hard for many to comprehend how the idea came up in the first place.

Lets try loading an image into our code. In this example, I'll be taking a photo which I took at the Niagara Falls State Park.

    Mat orig_image;
    Mat resized_image;

    try{
        orig_image = imread(IMG_LOC + "niagara.jpg", 1);
    } catch( cv::Exception& e ) {
        const char* err_msg = e.what();
        std::cout << "exception caught: " << err_msg << std::endl;
    }
    resize(orig_image, resized_image, cv::Size(orig_image.cols*0.25, orig_image.rows*0.25));
If you notice, I also resized it to approximately 25% the image size.

The Math

The thing that probably bites people here is - as I call it - The Discretization of Continuous Functions. Recall from high school calculus that a unary function is said to be differentiable at $x$ if (NOT "iff") the following derivative exists: $$f'(x) = \frac{d}{dx}f = \lim _{h\to 0}{\frac {f(x+h)-f(x)}{h}} $$ On paper, sure, you can have infinitesimal "$h$" - but in the world of computers, where everything is discrete, well you need to have some sort of "step" to be able to define a differential properly. Talking of images as a function of $x$ and $y$, we are further limited by pixel-level accuracy and can only set $h = 1$. But for symmetry, we could possibly divide this "step" of 1 write the above equation as $$I_x(x,y) = \frac{I(x+1, y) - I(x-1, y)}{2*1}$$ Notice that we now divide by $2*1$ because we take a step of 2 for symmetry. The resulting matrix representation of the kernel is as follows: $$ \begin{bmatrix} 0 & 0 & 0 \\ -1 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix} $$ We will revisit this once we answer the obvious question which is popping into our head right now,

But wait, how will the differential help with the edges?

Short answer - If your image was really a 2D function of $x$ and $y$, edges are essentially peaks.
Long Answer - Lets start with a rather philosophical question - Stolen from Stanford CS Webpage, this is a famous "Origin of Edges" slide found in almost all Graduate Computer Vision courses. Observe how there are so many ways one could define an edge.

So what are edges, really? Probably the best way to put this is to say - edges are reduced set of pixels that define an image for you or to say - are enough for a human to make sense of it. Or, we can also say that edges occur at change of a boundary

I don't believe things I can't see

Lets create a small python script to visualize the image in 3D space, with z-axis representing the pixel values of the image.


import cv2
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
import scipy
import scipy.misc
IMG_LOC = "/Users/abhinandandubey/Desktop/cv/images/"
orig_image = cv2.imread(IMG_LOC + "niagara.jpg", 0)
resized_image = cv2.resize(orig_image, (0,0), fx=0.25, fy=0.25) 
resized_image.shape
(756, 1008)
plt.imshow(resized_image, cmap = 'gray', interpolation = 'bicubic')
plt.title('Niagara Falls State Park')
plt.xticks([]), plt.yticks([])  # to hide tick values on X and Y axis
plt.show()

png

from mpl_toolkits.mplot3d import Axes3D
# downscaling has a "smoothing" effect
smooth_resized_image = scipy.misc.imresize(resized_image, 0.50, interp='cubic')
# create the x and y coordinate arrays (here we just use pixel indices)
xx, yy = np.mgrid[0:smooth_resized_image.shape[0], 0:smooth_resized_image.shape[1]]

# create the figure
fig = plt.figure(figsize=(8, 6))
ax = fig.gca(projection='3d', elev=70, azim=10)
ax.plot_surface(xx, yy, smooth_resized_image ,rstride=1, cstride=1, cmap=plt.cm.gray,
        linewidth=0)


plt.show()

png

If you observe, the three "white patches" on the top left corner of the image (the open sky with a lot of light are the same as the ones on top left of the surface plot.) These are the peaks of the surface plot, and if you were to stand at one of these peaks, you'd see a steep fall down in the area where the trunk of the tree is, or even the area with the leaves of the trees. The image using pyplot's
cmap
should make this more clear.

Coming back to our discussion of the gradient, we can clearly see from the above surface plot that the gradient will have peaks at the cliffs of the plot. Steeper the cliff, larger the gradient peak. $$ \sqrt{ {I_{x}(x,y)}^2 + {I_{y}(x,y)}^2 } \approx \| \nabla I(x,y)\|_2 $$

The Sobel Operator

The Sobel operator is a set of two 3×3 kernels.

\[\begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \\ 1 \end{bmatrix} \begin{bmatrix} -1 & 0 & 1 \end{bmatrix}\]

The other kernel is simply a rotation of this one by $90°$.

Suggested Reading : Sec 2.3.3 Basic Edge Detectors, Reinhard Klette - Concise Computer Vision p.62-64