Contents
Introduction
While I have been coding some AI application I heard some mellow strains of a
childish songstress coming from upstairs of the neighbours which they played
repeatedly. It was sometimes hardly audible to catch the verses, but I managed
to distinguish several characteristic phrases to have a look over some great web
search engine (I like it, since it puts some of my codeproject code articles to
first 1-2 pages of the search results). The only significant phrase from the
song I submitted to the engine was (to prevent undue advertisment), say
"фиолетовая паста" (violet paste). I expected it would have given scores of
make up advertisments, but contrariwise, just one link from the first page of
the search results among cosmetic industry spam pointed to some music web forum
with exactly that phrase from the rhymes. The next click of mouse and second
search over that engine gave me music group verses of the song, guitar tabs and
put me to you tube so I was listening that marvelous music clip.
It is astounding how a person with permanent internet access can in few seconds, after having
heard the music, be presented with the verses, group information and video clip
to listen to. The process is described as searching on the media data content.
As current web searches uses textual information to return results, consider you
will be able to give it as a search query either audio, video or image sample
the same way you submit your textual requests. Just as the computer was
listening to some music it was able to present you the same information.
The concept known as Connected Visual Computing (CVC) is actively
pursued by Intel.
The CVC concerns the media data processing e.g. when in the field of view of
your mobile phone cam emerges some object (ant for example) you can see on the
screen its identification obtained by mobile analized its image, that it is say Camponotus herculeanus, or when you see some caption in
the street on unknown language, you may view it through your mobile cam
and it will display at the same location in the street the same caption but in
your native language (augmented reality (AR), 2D/3D overlays), or the
above presented example by the search using audio content. The market promises
immense propagation. That introduced market will for the very long period of
time keep the audience consuming modern hardware and software.
Here I'd like to present the general idia on how the computer may be used to desribe
the image analyzing its pixel content known as the Automatic Linguistic
Indexing of Pictures (ALIP). The approach is general and is always assumed
to extract some descriptive features from the data and to use some rules to
attibute the content to some category.
If you're intrested in the immediate applications you may contact the supporting firm
System7 of the content based image recognition (CBIR) part of the
project.
Background
Basic understanding of AI approaches e.g. neural networks, support vector
machines, nearest neighbour classifiers. Image descriptive and transform methods
as wavelets, edge extraction, image statistics, histograms. C++/C# experience as
in this article you will find how to invoke C++ dll methods from within C#
application.
Using the application
In my ALIP experiment I decided to annotate the simple natural image categories.
There are 5 ANN classifiers in the project corresponding to:
- Pictures that might contain animals
- Pictures that might contain flowers
- Pictures that might contain landscapes
- Pictures that might contain sunsets
- Others pictures that do not contain the above categories or simply unknown image
type
You need to use unknown category along with the others you'd like to classify to.
As otherwise AI classifier would be able to identify only e.g. animals, flowers,
landscapes, sunsets with every image you give. But in real world there are other
types of images that do not fall into either of the above presented categories, so you
will need to meddle with AI classification thresholds which is rather cumbersome
and awkward. But having additional unknown category AI classifier the results of
the image identification will be as either one of the known image categories or
simply unknown image type the computer can not identify using its petty
knowledge.
I adore the image databases, they contain shots from all over the world really
nice to observe. I've got about 20000 images for designers bought from a DVD
shop. I've taken image samples from the animals, flowers, landscapes, sunsets
image types and added all other image categories that do not come from the 4 ones to have unknown
image type.
Now the usage of the program is simple enough. Just run the alip.exe and it will load all
necessary AI classifiers files (in case of error you will have a message box and
will not be able to use it). Then click the [...] button and select the
directory that presumably contains some *.jpg files. You may use the ones
supported in this demo under pics directory. All the found files will be added
to the list box, then just click them to watch in the right panel and see the
proposed category in the top left panel. In theory it should be able to comment
the image as presented below.

Methodology
Due to the competing intrests with the former organizations and the current one I
work for, I will not be able to describe in minute details the methodology and feature
extraction methods. I would rather present the general trend and categories of
the features used for
description of images. As searching over internet for corresponding feature
computation will reveal all the necessary papers with particular formulae.
There are some demos availabe online e.g. ALIPr.
They use hidden markov models HMMs and wavelet features from the images. You may
try the pictures from that article using their methods or vice versa my
application with their pictures and compare the annotation results.
As the AI approach is general and assumes some reduction of the original data
dimensionality using either features extraction or PCA transform or both, all
that is needed is to collect some data, extract the features and train AI
classifiers. If you understand my face detection articles you will be able to
repeat the experiment:
After you converted your raw image data to the features, just train some AI
classifiers to discriminate desired positive category from negative ones.
ALIP features
Generaly they are divided into:
- Color features
- Texture features
- Shape features
The Color features are simply the original raw image data, histogram of
the image channels, image profile. Texture features are the known edge
extraction methods, wavelet transforms, image statistics (e.g. 1st order: mean,
std, skew; 2nd order: contrast, correlation, entropy...). And Shape features
tries to estimate the object shapes found in the images. Just have a look at
wiki for CBIR.
Typically the original image color space RGB is transformed to alternative spaces
as YCbCr, HSV, HSI, CIEXYZ, etc... As alternative spaces might give better
discrimination of the data, but you need to experiment with them anyway.
Source code tips
The point worth to mention here is the interaction from the C# application with C/C++ code in
dll. As it leads to efficient way of coding the great GUI yet retaining the
advantages of C/C++ native code.
Just create the simple C++ dll with some exported function:
Alip alip;
ALIP_API int alipClassify(const double* data, double* results, unsigned int* indices)
{
return alip.classify(data, results, indices);
}
In C# application declare the functions in the class you will be calling from the dll:
[DllImport("alip")]
static extern unsafe int alipClassify(double* data, double* results, uint* indices);
Switch on the /unsafe code switch in application settings. Then using the
fixed C# statement you may create the pointers to C# variables and pass
them to C++ dll:
double[] results = new double[this.aiClassifiers.Count];
uint[] indices = new uint[this.aiClassifiers.Count];
fixed (double* pdata = cbir.CbirEntries[0].features.Features)
fixed (double* presults = results)
fixed (uint* pindices = indices)
{
int res = alipClassify(pdata, presults, pindices);
if (res != 0)
throw new Exception(String.Format("alipClassify() returned {0}", res));
}
ALIP results
I deliberatly selected the most simple image features, that do not look like a
features at all, due to competing intrests with the former funding organization
System7. I used just image itself, downscaled it to 16x16 and converted to
YCbCr colorspace. Obviously that is not the proper feature to start with, as
others would significantly outperform it in discrimination ability. However,
though I anticipated the classification would be completely incorrect, to my
great suprise it performed pretty well, producing quite precise results. Then
consider the annotation quality had you used combination of color and texture
features (e.g. histograms, statistics, entropy, etc...).
You may estimate the quality of the other feature types on cbir.system7.com demo.
It
just returns images that are close to the query one using some linear or
non-linear distance metric. So it acts as some kind of kNN classifier, you just
annotate the image type basing on the majority of the first several best matches
returned, or in any other way combining the annotation.
For annotation I selected the 5 image categories:
- animals - 900 pictures
- flowers - 1100 pictures
- landscapes - 1200 pictures
- sunsets - 700 pictures
- unknown - 1600 pictures of other types than the above 4
By all means there is interconnection between the categories, as flowers or
animals pictures may be shot in landscape like surrounding, sunsets may also be
the shots of the lanscapes, also some unknown pictures may contain one of
the above 4 categories.
The single image feature vector is quite high dimensional as 16x16x3 = 768D. So I
performed PCA dimensionality reduction to 70D space. The 70 eigenpictures contain 90% of
variance retained. The eigenimages are presented as pca.nn file. And the first
60 eigen vectors for separate colorspace channels are presented below:



They look pretty similar to the ones from my PCA based Face Detection article,
which is attributed to the analysis of the natural image scenes.
Then having 70D data I used first half of the image categories for training AI
classifiers and the rest halves for estimating classification accuracy. I opted
for ANN classifiers with 70-20-1 structure, so there are 5 trained ANNs at all,
every one is trained to separate its image category from all the others. The
small number of hidden neurons and just 1 hidden layer will keep the ANN from
overfiting the data.
The train part showed 8% error for classifying unknown image into one of the 4
known image categories (false positive rate), and 4% error for classifying one
of the 4 known image categories into unknown (false negative rate). The test
part showed worse results as 45% of false positive error rate and 20% of false
negative rate.
They seem to be quite inaccurate on the test part, however this might be caused
by the noise, as in unknown category there might be some images from known
category and vise versa. I never trusted image database composers, and looking
at 1000 images to deselect the wrong ones, might lead that after 5 minutes of
work you may forget about the image category you're working with. The better way
of course is the cbir.system7.com application.
You just give it the desired image category sample image, e.g. with flowers, and
it will return you the most closer images say from 1000000+ image database. Have
a bash to do that manually.
But to the worse test images error rate also accounts the simplicity of the
image features by all means.
Below I present the annotation results from the test part only to be fair. As
there might be several ANNs with high outputs some shots contain annotation of
more than one image type, e.g. animals in the landscape surroundings.
Animals category



Actually annotated as landscape, but at 16x16 resolution it looks like that category. Remeber about worse error rates and
'noise' in the image categories.



That one is better, animals in the landscape like surrounding.











Flowers category
Flowers annotations are quite good also. It reveals landscape annotions in
addition to flowers, as some images are quite similar to landscapes. There is
also spurious animals group added sometimes.












Landscapes category
Here are the few shots of landscapes annotated as unknown category due to high negative error rate.
Otherwise annotation is reasonable, revealing also additional category as
sunsets added to landscape view in the evening.



The landscape in the sunset. Adroit AI annotation.



Sunsets category
Obviously, sunsets is the most simple picture type. Besides several unknown annotations, there are landscapes
and some flowers during sunsets annotations. Well, AI never 'has been'
taught to identify trees or
palms, so it generalizes them to flowers. Otherwise very good results.

Landscape with a sunset.

'Flowers' in the sunset

Landscape like picture, sunset behind mountain ridge, very romatic.

The 'flowers' in the sunset.




Unknown sunset pictures.



Another bunch of 'flowers' in the sunset.

The next two, are these lanscape in the sunset or sunset in the landscape?


'Flowers' again in the sunset of a landscape.

Very thin 'flowers' in the sunset.

Londres?

Unknown category
The unknown category showed about 43% of error on the test set, but there might
be two possibilities to that percentage. Either the ANN failed to generalize
well, showing much better performance on train set, or it might be due to the
noise in the data set, e.g. incorrect measurements attributed to the unknown
category while they are actually from others, e.g. sunsets, landscapes.
The test results rather prove to the benefit of AI than for the accuracy of human
image categorization. Having few dozens of unknown pictures from the test set
presented below, only few of them might be attributed to the pure unknown
category. Others contain the scenes from landscapes, sunsets, animals
categories, which were correctly identified by AI.
That one is fleshy and succulent.





The sunset from unknown category.

The landscape generalization.


The sunset in the unknown category. La pareja va a abrazar.


The sunset again. La pareja se esta abrazando.





The animals.



Landscapes.


Flowers like image?

Looks like a sunset with flowers.




Here one may agree with AI.


Live flowers, as in 'Alice in wonderland'. Better generalization.

AI-xenophobia?
The rest of the unknown samples annotated by AI pertaining to other category are rather controversial
and defiant, as it tends to annotate the humans on the pictures as animals, what
impertinence. The results can be attributed to:
- AI generalization of the learned objects (e.g. trees identified as flowers)
- AI proclamation of his superior intelligence over ordinary human being who
he considers as animal species
- AI gross error on the test set
The first scenario is pretty likely to occur, as AI already showed his capacity
to generalize the similar objects to the only categories known to him, as in the
case he annotated trees as flowers. The last is less probable, as the scenes are
not quite different from the learned categories, so the greater false positive
error is rather attributes to the benefit of AI generalization acumen.
Well, the second case is also might be possible. It seems even more dramatic to
the benefit of science fiction writers, who forbode, that once computers will
gain control, they would either exterminate the humans or subdue them to zoo, as
we have done with the 'real' animals (e.g. I Robot, Terminator 3),
as only AI revolution might save the human being from self-extermination from AI
point of view.
I presume also, that, the second scenario might be the telling example to the
benefit of Darwin theory, that humans evolved from the animals, as even dozen
neurons of a simple AI understood that, while some persistent human beings try
to disprove the obvious facts.
I looked over google for the term that might be applicable to such newly revealed
phenomena. AI-xenophobia showed about 5 links only to some blog, the
cyber-xenophobia is already coined to be the phenomena widely used by
Japanese, or cyborg-xenophobia which does not reveal any links, but it is
rather restricted to robo beings and not to general AI intelligence. Without
discussing the already used terms in more details, all of them describe the
actions of the humans in the cyberspace, and not AI against the human.
Who knows, that might be the first manifestation of the presumtuous AI action
agains human by taunting at first. Beware
yourself.
Anyway the results are shown below. I'm just presenting the AI understanding of
the image content. Please forebear from taking his incentives too serious and do
not cane me.



Might be he is proclaiming, beware, the AI is callous.

Someone may agree with the below examples of AI understanding.



Here AI is right at one point at least, landscape!


As the final words, try yourself different features and combinations, you might then
be able to teach AI to respect humans, or simply add another category as images
with humans.
At least AI indicates some reverence to his creator, as not puting me to animals.

Try him on images of yours.