Recently, the teacher in the laboratory assigned the first simple assignment: give a picture and a picture of the eyes taken from the characters in the picture, and design an algorithm to find the position of the eyes in the complete picture.
First go to the intuitive effect,
enter the picture
Output picture
Input picture
Output picture
The prerequisite knowledge needed to complete this task (you only need to understand it without going deep) include the nature of vector multiplication and the concept of convolution .
The so-called convolution, in simple terms, is the matrix and the corresponding elements of the matrix are respectively multiplied ( note, not matrix multiplication ), generally there will be a smaller matrix as a filter, from the upper left corner of the larger picture to the lower right corner, calculation The convolution sum of each small matrix (the size of the small matrix is the same as the filter matrix), where the point corresponding to the largest sum is the upper left corner of the matrix with the highest matching degree.
Why is the largest convolutional sum the best match? Think about it, the process of convolution is the multiplication of corresponding points in the matrix, which is the same as the result of expanding the matrix into a one-dimensional vector and then multiplying. It can be known from the property of vector multiplication that the maximum value of the multiplication can be obtained when two vectors are parallel, because the angle is 0 and cos0 = 1. Of course, two identical vectors are parallel, which is why the convolution sum is the largest.
However, it is still wrong. You may ask such a question. A large matrix can be divided into many small matrices. If the values of the two matrices are very different from each other and the filter does not have much effect, what should I do at this time? What? The answer is simple, that is, for each small matrix in the large matrix, the average value of itself is subtracted before convolution with the filter matrix, so as to avoid the value gap being too large.
Since the average value of the small matrix in the big matrix has been subtracted, right?
Well, although it is not necessary to do this, it can make the calculated value relatively small.
official
After the above wave of analysis, plus removing the denominator, in fact, the final version is simply to find the correlation coefficient orz. The larger the correlation coefficient, the higher the matching degree.
At this point, the structure of the algorithm is out, and the code is very concise, as follows:
import matplotlib.pyplot as plt # plt 用于显示图片
import numpy as np
from PIL import Image
def find_piece_in_pic(whole_pic, part_pic) :
#两个参数分别是两张图片的地址
part = Image.open(part_pic)
#转化为灰度图
part = part.convert('L')
part = np.array(part).astype('float64')
whole = Image.open(whole_pic)
whole = whole.convert('L')
whole = np.array(whole).astype('float64')
H, W = whole.shape
h, w = part.shape
part = part - int(np.average(part))
res = np.zeros((whole.shape))
for r in range(H - h + 1) :
for c in range(W - w + 1) :
cur_whole = whole[r : r + h, c : c + w]
cur_whole = cur_whole - np.average(cur_whole)
temp1 = (math.sqrt(np.sum(part * part)) * math.sqrt(np.sum(cur_whole * cur_whole)))
if temp1 == 0 : continue
temp = np.sum(cur_whole * part) / temp1
res[r ,c] = temp
topr, topc = np.where(res == np.max(res))
print(topr, topc)
print(np.max(res))
plt.figure()
plt.imshow(whole, cmap='gray')#灰度图要加上这个参数
plt.gca().add_patch(plt.Rectangle((topc,topr), w, h, color='black'))
plt.show()
find_piece_in_pic('einstain.png', 'eye.png')
Matlab function
Such a simple and highly applicable algorithm must of course be included in Matlab. It becomes extremely simple under Matlab because the functions are all encapsulated.
Still take Einstein and his charming big eyes as an example:
eye = rgb2gray(imread('eye.png'));
einstain = rgb2gray(imread('einstain.png'));
imshowpair(peppers,onion,'montage')
c = normxcorr2(eye,einstain); #就是这个函数
figure, surf(c), shading flat
[ypeak, xpeak] = find(c==max(c(:)));
yoffSet = ypeak-size(eye,1);
xoffSet = xpeak-size(eye,2);
figure
imshow(einstain);
imrect(gca, [xoffSet+1, yoffSet+1, size(eye,2), size(eye,1)]);