|
||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionIn computer vision, you very often have to perform an image convolution operation with some filter. Examples are image enhancements, edge detection, Gaussian smoothing, or object detection using projection of the image into a low dimensional space represented with PCA, LDA, ICA, MP, or any other set of vectors. Having a bigger sized image makes the operation more time consuming. Recently, I found a great article, Face Detection - Efficient and Rank Deficient, where the authors apply a method to the rank deficient approximation of the support vector machine (SVM) classifier. With rank deficient approximation, the ordinary convolution taking O(w*h) filtering operations can perform as fast as O(w+h) operations only, where w and h are sizes of the image. So, I decided to code the method for my own computer vision proceedings and see the performance. BackgroundIn this article, the AlgorithmHaving some linear filter H and applying it to an image I results in a 2D convolution operation: J = I * H;
So, J is the same size as I. If H has size h x w, the convolution requires O(h*w) operations for each output pixel. However, in special cases, where H can be decomposed into two column vectors a and b, such that: H = ab';
the convolution becomes: J = [I * a] * b';
Since the convolution is associative, and in this case, ab' = a * b'. This splits the original convolution operation into two operations with vectors of size h x 1 and 1 x w, respectively. As a result, if a linear filter is separable (H = ab'), the computational complexity of the filtering operation drops from O(h*w) to O(h + w) per pixel. The singular value decomposition, SVD, is used to decompose the H filter to a and b vectors. H = USV';
U and V are orthogonal matrices of size h x h and w x w. S is a diagonal (the diagonal entries are the singular values) and has size h x w. Let r = rank(H). Due to rank(S) = rank(H), we may write H as a sum of r rank one matrices.
Where s denotes the ith singular value of H, and u, v are the ith columns of U and V (i.e., the ith singular vectors of the matrix H), respectively. As a result, the corresponding linear filter can be evaluated as the weighted sum of r separable convolutions:
And the computational complexity drops from O(h * w) to O(r * (h + w)) per output pixel. The speed benefit depends on r, which can be seen to measure the structural complexity of H. You can choose r until the reconstruction error for the H filter in (1) decreases below some threshold. Apparently, the more complex the filter (i.e., the more high frequency components it contains), a bigger r is needed, but simple filters as Gaussian can be approximated with great precision, for r = 1. Below some Eigen face is presented:
And its reconstruction after 1, 2, and 3 iterations (i.e., r = 1, 2, and 3):
As can be seen, the MSE decreases as r increases. Using the codeBelow, an ordinary convolution and a rank deficient one are presented: //this = a * filter;
void vec2D::conv2(const vec2D& a, const vec2D& filter,
enum CONV type) //filter should be of odd size
{
int l = filter.height() / 2;
int m = filter.width() / 2;
vec2D& c = *this;
if (type == SAME) { //this size = a size
for (int i = 0; i < (int)c.height(); i++) {
for (int j = 0; j < (int)c.width(); j++) {
float tmp = 0;
for (int u = i - l; u <= i + l; u++)
for (int v = j - m; v <= j + m; v++)
tmp += filter((i-u)+l, (j-v)+m) * a.get(u, v);
c(i, j) = tmp;
}
}
}
else if (type == VALID) { //this size = (a size - filter size) + 1
for (int i = l; i < (int)a.height() - l; i++) {
for (int j = m; j < (int)a.width() - m; j++) {
float tmp = 0;
for (int u = i - l; u <= i + l; u++)
for (int v = j - m; v <= j + m; v++)
tmp += filter((i-u)+l, (j-v)+m) * a(u, v);
c(i - l, j - m) = tmp;
}
}
}
}
With //this = sum( s[i] [a * u[i]] * v[i])
void vec2D::conv2(const vec2D& a, const std::vector<vec2D>& u, //col vectors
const std::vector<vec2D>& v, //row vectors
const vec2D& s, //row vector singular values
vec2D& tmp1, //tmp1 = [(a.height - u.height + 1) x a.width]
vec2D& tmp2, //tmp2 = this = [(a.height - u.height + 1)
// x a.width - v.width + 1]
enum CONV type)
{
vec2D& c = *this;
c.set(0.0);
for (unsigned int i = 0; i < s.width(); i++) {
tmp1.conv2(a, u[i], type);
tmp2.conv2(tmp1, v[i], type);
tmp2.mul(s(0, i));
c.add(c, tmp2);
}
}
With the rank deficient function, you need to provide u and v vectors in If you're going to perform your image projection onto some PCA, LDA, ICA, or the like basis, remember to rotate your filter before SVD, or rotate u, v vectors before calculating the convolution. void vec2D::fliplr()
{
vec2D& c = *this;
float tmp;
for (unsigned int i = 0; i < width() / 2; i++) {
for (int y = yfirst(); y <= ylast(); y++) {
tmp = c(y, xfirst() + i);
c(y, xfirst() + i) = c(y, xlast() - i);
c(y, xlast() - i) = tmp;
}
}
}
void vec2D::flipud()
{
vec2D& c = *this;
float tmp;
for (unsigned int i = 0; i < height() / 2; i++) {
for (int x = xfirst(); x <= xlast(); x++) {
tmp = c(yfirst() + i, x);
c(yfirst() + i, x) = c(ylast() - i, x);
c(ylast() - i, x) = tmp;
}
}
}
void vec2D::rotate180()
{
vec2D& c = *this;
c.fliplr();
c.flipud();
}
ResultsThe convolution results for a 192x256 array with a 19x19 filter are presented bellow. First is the original image to convolve with Eigen face:
The rank deficient convolution for r = 1, 2, 3 and the ordinary convolution of that image are presented below:
As can be seen, as r increases, the MSE between ordinary convolution results and the rank deficient one decreases. And, the timing of the convolutions follows: SAME
J = I * H processing time: 142.258583 ms
J = sum (s[i] [I * u[i]] * v[i]'); i < 1 processing time: 20.521044 ms
J = sum (s[i] [I * u[i]] * v[i]'); i < 2 processing time: 42.043612 ms
J = sum (s[i] [I * u[i]] * v[i]'); i < 3 processing time: 63.772351 ms
VALID
J = I * H processing time: 88.931973 ms
J = sum (s[i] [I * u[i]] * v[i]'); i < 1 processing time: 14.583697 ms
J = sum (s[i] [I * u[i]] * v[i]'); i < 2 processing time: 28.884118 ms
J = sum (s[i] [I * u[i]] * v[i]'); i < 3 processing time: 42.637542 ms
For the linear rank deficient filters convolution, the results are presented for 1, 2, and 3 iterations. The
|
|||||||||||||||||||||||||||||||||||||||||||||||