snibgo's ImageMagick pages

Multi-phase searching

A process module that returns multiple search results, invariant to various transformations.

Aka whatSRT??

Given a main image and a smaller subimage, the goal is to find all occurrences of the subimage within the main image. For each occurrence found, the module emits the scale, rotate and translate (SRT) parameters that would transform the subimage into the occurrence. The search can also be invariant to lightness and contrast.

The task could be solved very simply: at every possible position, we can test every possible resize at every possible rotation, and the best result from all of those tests is the solution. The search space has four dimensions: x, y, scale and angle. The problem with an exhaustive search is the time required: we might have a million possible positions, a hundred possible scales and 360 possible angles, so there are 360*108 tests. Even if we can do 1000 tests every second, this would take more than a year.

The module can search in a variety of ways:

The results from one invocation can be used to limit the search-space of another. The limiting ranges can be per-result, from previous invocations.

Results from an invocation at a small scale can be used for an invocation at a larger scale.

Innovations: aggressive searching, sinks, jumps.

References

This process module ties together a number of separate methods that have been implemented as Windows BAT scripts:

Some components of this process module are available as simpler modules. These are:

Simple example

We scatter some small frogs on the "toes" image. Each frog is scaled and rotated, and given a colour distortion.

frog.png

This has transparent and semi-transparent pixels.

frog.png
%IM%convert ^
  toes.png ^
  ( frog.png +write mpr:FROG +delete ) ^
  -virtual-pixel None ^
  -set option:modulate:colorspace HCL ^
  ( mpr:FROG ^
    +distort SRT 22.5,22.5,1.0,0,70,170 ^
  ) ^
  ( mpr:FROG ^
    +distort SRT 22.5,22.5,1.2,30,50,50 ^
  ) ^
  ( mpr:FROG ^
    +distort SRT 22.5,22.5,0.8,-20,120,70 ^
  ) ^
  ( mpr:FROG ^
    +distort SRT 22.5,22.5,0.9,150,180,105 ^
  ) ^
  ( mpr:FROG ^
    +distort SRT 22.5,22.5,0.6,0,245,215 ^
  ) ^
  -layers merge ^
  ms_toes_frogs.png

goto :eof
ms_toes_frogs.pngjpg

We search for the frogs.

Options

Option Description
Short
form
Long form
p string phases string List of phases to invoke, with a comma between each phase.
Any of:
    All all the following:
    Aggr aggressive search;
    Approx approximate search;
    RefPos refine positions;
    RefSR refine scale and rotation;
    Reduce final reduction.
Default = All.
at N aggrThr N Threshold for Aggr search.
Typically 0.0 to 1.0.
Default = 0.75.
s0 N minScale N Minimum scale for Approx search.
A positive number.
Default = 0.5.
s1 N minScale N Maximum scale for Approx search.
A positive number, large than minScale.
Default = 2.0.
ns N nScales N Number of scales for Approx search.
A positive integer, at least 1.
Default = 20.
cm string crMethod string Method for creating concentric rings in Approx search, one of:
    Slow unroll at each scale;
    Quick unroll once, and scale that.
Default = Quick.
z N sizeSiv N Size divider for multi-scale in Approx search.
A positive number greater than 1.
Default = 2.0.
md N minDim N Minimum dimension for multi-scale in Approx search.
A positive integer greater than 1.
Default = 20.
o string outType string Type of output image, one of:
    None leave image list unchanged;
    Mask return a mask;
    Trans transform subimage by best SRT parameters;
    Comp composite all found occurrences.
Default = Comp.
string file string Write text data (SRT parameters) to stderr or stdout.
Default = stderr.
v verbose Write some text output to stderr.
v2 verbose2 Write more text output to stderr.
v3 verbose3 Write yet more text output to stderr.
d string debug string Write debugging images to MIFF files with this prefix.

CSV results file format

The results file is text, with fields separated by commas, and possibly spaces. There is one line for each candidate position of the subimage within the main image. Not all possible positions may be shown, as some may have been eliminated. Header? The file may be sorted, with the lowest (best) scores at the top. The fields are:

x-coordinate of top-left position
y-coordinate of top-left position
rotation
rotation error
scale
scale error
RMSE score, 0.0 to 1.0.

But the best score (locally) will be surrounded by positions with very good scores, which we don't want to process further.

Or an image could be made, with RGB channels representing rotation, scale and score. Then use nLightest to get the coordinate list.

Or: make pixels that aren't local sinks, white.

We have a new concept: a masked search. Inputs:

Searches can create, or use and update, the mask. These searches include: aggressive search, concentric rings, radials, pixel-by-pixel, ... When a search reads the mask, it may be smaller than required. If so, it is simply resized.

Process modules

avgconrings findsinks multisrch %IMDEV%convert toes.png d.png -process 'multisrch j r 10' -channel B -separate +channel -auto-level x.png

The concentric rings search works well. For each likely position, we have an approximate scale. Next step: at that scale, what is the best angle? At that angle, what is the best scale? Iterate until happy.

%IM%convert d.png -virtual-pixel None +distort SRT 0.900323,9.6861 x.png %IM%convert toes.png ( x.png -evaluate add 10% -repage +50+50! ) -layers merge x3.png

Problem: when subimage has essentially radial features, the first search for scale could be wrong (even if score is good, and search for rotation is no problem). How do we notice the problem? If the worst score is similar to the best score?

%IMDEV%convert ^
  rose: rose: ^
  -process 'multisrch help' ^
  NULL: 
Usage: -process 'multisrch [OPTION]...'
Multi-phase search with various methods for subimage SRT.

  g,  geometry string  geometry of search area
  p,  phases string    'all' or list eg 'aggr,approx,refpos,refsr'
  at, aggrThr number   threshold for aggressive
  s0, minScale number  minimum scale
  s1, maxScale number  maximum scale
  ns, nScales integer  number of scales
  r,  scaleRadMax int  scale so radius is this, at max
  m,  method string    for concentric rings, 'slow' or 'quick'
  j,  notJustSinks     don't eliminate non-sinks
  z,  sizeDiv number   size divider for ms
  md, minDim integer   minimum dimension for ms
  o,  outType string   'None', 'Mask', 'Trans' or 'Comp'
  f,  file string      write to file stream stdout or stderr
  v,  verbose          write text information to stderr
  v2, verbose2         write more text information to stderr
  v3, verbose3         yet more text information to stderr
  d,  debug string     write debugging text to stderr, and image files with given prefix
      version          write version information to stdout

Process module srchmask reads a mask image and spits out the six SRT parameters and score, in score order, perhaps limited to the top N. To be cute, this might also composite those results over the main image.

Perhaps use jump method to find top scoring positions.

Very simple examples

Find the rose: subimage in each of these:

set PARAMS=f stdout
The problem The solution
%IM%convert ^
  rose: ^
  ms_rse1.png
ms_rse1.pngjpg
%IMDEV%convert ^
  ms_rse1.png ^
  rose: ^
  -process 'multisrch %PARAMS%' ^
  -background #4f4 ^
  -layers flatten ^
  ms_rse1_out.png 
WrSrchMaskSrt:  nPositions=1  nGoodMatch=1
0.00759641,35,23,0.999105,0.0327869,35,23
ms_rse1_out.png
%IM%convert ^
  -size 200x200 xc:#0af ^
  rose: ^
  -geometry +45+77 ^
  -composite ^
  ms_rse2.png
ms_rse2.pngjpg
%IMDEV%convert ^
  ms_rse2.png ^
  rose: ^
  -process 'multisrch %PARAMS%' ^
  -background #4f4 ^
  -layers flatten ^
  ms_rse2_out.png 
WrSrchMaskSrt:  nPositions=1  nGoodMatch=1
0.0873283,35,23,0.971099,0.00692281,81,100
ms_rse2_out.png
%IM%convert ^
  -size 200x200 xc:#0af ^
  ( rose: ^
    -virtual-pixel None ^
    +distort SRT ^
      35,23,1,0,80,100 ^
  ) ^
  -layers merge ^
  ms_rse3.png
ms_rse3.pngjpg
%IMDEV%convert ^
  ms_rse3.png ^
  rose: ^
  -process 'multisrch %PARAMS%' ^
  -background #4f4 ^
  -layers flatten ^
  ms_rse3_out.png 
WrSrchMaskSrt:  nPositions=1  nGoodMatch=1
0.0860579,35,23,0.971099,0.00799244,81,100
ms_rse3_out.png
%IM%convert ^
  -size 200x200 xc:#0af ^
  ( rose: ^
    -virtual-pixel None ^
    +distort SRT ^
      35,23,1,35,80,100 ^
  ) ^
  -layers merge ^
  ms_rse4.png
ms_rse4.pngjpg
%IMDEV%convert ^
  ms_rse4.png ^
  rose: ^
  -process 'multisrch %PARAMS%' ^
  -background #4f4 ^
  -layers flatten ^
  ms_rse4_out.png 
WrSrchMaskSrt:  nPositions=4  nGoodMatch=1
0.0378664,35,23,0.986302,35.1798,80,100
ms_rse4_out.png
%IM%convert ^
  -size 200x200 xc:#0af ^
  ( rose: ^
    -virtual-pixel None ^
    +distort SRT ^
      35,23,1.5,0,80,100 ^
  ) ^
  -layers merge ^
  ms_rse5.png
ms_rse5.pngjpg
%IMDEV%convert ^
  ms_rse5.png ^
  rose: ^
  -process 'multisrch %PARAMS%' ^
  -background #4f4 ^
  -layers flatten ^
  ms_rse5_out.png 
WrSrchMaskSrt:  nPositions=3  nGoodMatch=1
0.0248807,35,23,1.5014,0.0696241,80,100
ms_rse5_out.png
%IM%convert ^
  -size 200x200 xc:#0af ^
  ( rose: ^
    -virtual-pixel None ^
    +distort SRT ^
      35,23,1.5,35,80,100 ^
  ) ^
  -layers merge ^
  ms_rse6.png
ms_rse6.pngjpg
%IMDEV%convert ^
  ms_rse6.png ^
  rose: ^
  -process 'multisrch %PARAMS%' ^
  -background #4f4 ^
  -layers flatten ^
  ms_rse6_out.png 
WrSrchMaskSrt:  nPositions=7  nGoodMatch=1
0.0323015,35,23,1.48386,35.0716,80,100
ms_rse6_out.png

Examples from the What rotation and scale? page, which uses scripts.

%IMDEV%convert ^
  wrs_src2.png ^
  wrs_src1.png ^
  -process 'multisrch verbose' ^
  ms_wrsp21.png 
          
ms_wrsp21.png
set SRT=150,100,1.28416,99.6825,150,100

%IMDEV%convert ^
  wrs_src2.png ^
  ms_wrsp21.png ^
  -background None -layers merge ^
  ms_wrsp21_out.png
ms_wrsp21_out.png
goto skip

%IM%convert ^
  toes.png ^
  -distort SRT 78,106,1.1,35,78,106 ^
  +repage ^
  -crop 60x80+48+66 +repage ^
  ms_ex_sub1.png
ms_ex_sub1.pngjpg
%IMDEV%convert ^
  toes.png       +write mpr:MAIN +delete ^
  ms_ex_sub1.png +write mpr:SUB  +delete ^
  ( mpr:MAIN mpr:SUB ^
    -resize 12.5%% ^
    -process 'multisrch n 10 o Mask' ^
  ) ^
  ( mpr:MAIN mpr:SUB ^
    -resize 25%% ^
    -clone 0 ^
    -process 'multisrch n 10 o Mask' ^
  ) ^
  -delete 0 ^
  ( mpr:MAIN mpr:SUB ^
    -resize 50%% ^
    -clone 0 ^
    -process 'multisrch n 20 o Mask' ^
  ) ^
  -delete 0 ^
  ( mpr:MAIN mpr:SUB ^
    -clone 0 ^
    -process 'multisrch n 5 o Mask' ^
  ) ^
  -delete 0 ^
  ( mpr:MAIN mpr:SUB ^
    -clone 0 ^
    -process 'multisrch n 20 o Mask' ^
  ) ^
  -delete 0 ^
  ( mpr:MAIN mpr:SUB ^
    -clone 0 ^
    -process 'multisrch n 100 o Mask' ^
  ) ^
  -delete 0 ^
  -depth 32 ^
  -define quantum:format=floating-point ^
  ms_ex_map1.miff 
@ms_ex_map1.lis
%IM%convert ^
  ms_ex_map1.miff ^
  -scale 300%% ^
  +write ms_ex_map1_lg.png ^
  -separate ^
  ms_ex_map1_lg_%%d.png
ms_ex_map1_lg.png

Red channel: scale

ms_ex_map1_lg_0.png

Green channel: rotation

ms_ex_map1_lg_1.png

Blue channel: score

ms_ex_map1_lg_2.png
%IM%convert ms_ex_sub1.png -virtual-pixel None +distort SRT 0.9091,321.266 c1.png

f:\prose\PICTURES>c1.png

%IMDEV%convert toes.png c1.png -process 'srchimg' NULL:
0.0438884 @ 34,61

f:\prose\PICTURES>%IM%convert toes.png -crop 60x80+34+61 +repage c2.png

f:\prose\PICTURES>c2.png

f:\prose\PICTURES>gimp c1.png

f:\prose\PICTURES>"c:\Program Files\gimp 2\bin\gimp-2.8.exe" c1.png

f:\prose\PICTURES>
%IMDEV%convert ^
  toes.png ms_ex_sub1.png ^
  -process 'multisrch o Mask v' ^
  -define quantum:format=floating-point ^
  -depth 32 ^
  ms_x.miff

%IMDEV%convert ^
  ms_x.miff ^
  -process srchmask NULL:

Examples from Ciratefi paper

Using images from Ciratefi zip file:

ciratefi_one_a1.jpg ciratefi_one_a2.jpg ciratefi_one_a3.jpg ciratefi_one_q09.png ciratefi_two_a1.jpg ciratefi_two_a2.jpg ciratefi_two_a3.jpg ciratefi_two_q1.png

These images are not my copyright.

:skip

%IMDEV%convert ^
  ciratefi_one_a2.jpg ^
  ciratefi_one_q09.png ^
  -process 'multisrch debug dbg_cira v' ^
  ms_cira_comp1.png 
ms_cira_comp1.png

Approx solution:

%IMDEV%convert ^
  ciratefi_two_a1.jpg -evaluate Subtract 10%% ^
  ( ciratefi_two_q1.png ^
    -virtual-pixel None +distort SRT 22,19.5,1.19,25,171,149 ^
    -channel A -evaluate Multiply 0.5 +channel ^
  ) ^
  -layers merge ^
  mr.png

This solution would be in the mask at 149,129.5.

%IMDEV%convert ^
  ciratefi_two_a1.jpg ^
  ( ciratefi_two_q1.png ^
    -virtual-pixel None +distort SRT 1.19,25 ^
+write info: ^
    +repage ^
  ) ^
  -process 'srchimg v' ^
  NULL:

0.0950066 @ 135,114

Sub_img transformed dims are 44x39, so centre is 135+44/2=157, 114+67/2=147.5

How does it work?

The task is to find all occurrences of a subimage within a main image, and find the scale, rotation and translation parameters (the six "+distort SRT" parameters) for each occurrence. We might want to find only the best match, or the N best matches.

If the main image is size W*H, and the subimage is w*h, then there are (W-w+1)*(H-h+1) possible positions for the subimage. This might be a few million positions. At each position, there are a hundred or more possible scales and a hundred or more possible angles of rotation. We cannot do an exhaustive search.

Instead, we tackle the problem in phases. At each phase, we exclude positions that are clearly not solutions.

By default, the module will invoke all phases. The phase option can be used with a list of the required phases. In the phase list, "all" is a shortcut for all the phases. Phase names can be prefixed with "-" to omit that phase. Whatever the order given in the list, phases are invoked in the order shown above. For example:

Phase 3 "refPos" for each candidate position, rotates and scales the subimage by the current estimate for that position, and takes a crop from the main image with an extra margin on each side, and searches that for the transformed subimage.

Within phase 4 "refSR", if the scale (or rotation) isn't exactly correct, then the found rotation (or scale) will be slightly wrong. So the phase iterates to stability. Similarly, if the scale and rotation (or position) aren't exactly correct, the position (or scale and rotation) will be slightly wrong. So phases 3 and 4 iterate to stability. Refining each solution by iteration is expensive, but the first two phases have reduced the set of candidate solutions to a small number.

Data is communicated between phases by a mask image. The mask has one pixel for each possible position of the subimage within the main image. The red channel represents a scale, typically close to one; the green channel is an angle; the blue channel is a score. If the mask is saved, it must be saved a floating-point.

The mask is available as an output. The debug option will write the mask after each phase. A mask can also be supplied as a third input image. This may be useful for large images: first scale down the main and subimage, invoke the module to get scale, rotaton and score, reduce the candidates solutions at that scale, and make a mask which is scaled up for the full search. (The module already does this scaling, but only for the Approx search.)

[[ Functions in whatrot.inc and whatscale.inc can find the approximate scale and rotation that makes one image match another, assuming no translation.

For every possible position of one subimage within another, we could crop the larger image and apply the whatrot.inc and whatscale.inc functions. This gives us a scale and rotation, and a score, for each position. We find which position gives the best score, so that is the required translation, and hence we know the scale and rotation. This would take a long time for large images.

To get decent performance, we do the search on a pyramid of scales. At each level in the pyramid, we record the score at each position, in an image that has one pixel for each position, and we eliminate positions that are clearly not the best. This mask (aka map) of scores is then used to restrict the positions that are searched at the next level.

When an approximate scale, rotation and translation are known, we refine the search. We extract a larger crop from the main image, which we scale, rotate and crop. This should then roughly match the subimage.

Our list of candidate positions, each with scale, angle and score, may have tens or hundreds of entries, sorted in order of score, with the best first. We might manually declare how many we want. How can we automatically find the correct number? We can use a variation of the jump method.

When we have data that is eg 0.9123456 +/- 0.045, the answer should not be quoted with six decimal places.

Phase 5, "reduce": With multiple result, sort by score. Where one result is entirely covered by another with better score, remove it. How do we do this? We readily calculate corner coords. If all corners of image A are inside image B, then image A is hidden by image B. But we can also eliminate solutions that are mosly obscured. So a fairly easy test is: If A has worse score than B, and each corner of A is within a semi-diagonal of any corner of B, eliminate A.

Debugging images

%IMDEV%convert ^
  ms_rse3.png ^
  rose: ^
  -process 'multisrch d ms_dbg' ^
  ms_dbg_out.png

ms_dbg_aggrmask

ms_dbg_aggrmask.miffpng

ms_dbg_avgconcrings

ms_dbg_avgconcrings.miffpng

ms_dbg_dispmap

ms_dbg_dispmap.miffpng

ms_dbg_avgconcrings2

ms_dbg_avgconcrings2.miffpng

ms_dbg_dispmap2

ms_dbg_dispmap2.miffpng

ms_dbg_ret_mask_0.25

ms_dbg_ret_mask_0.25.miffpng

ms_dbg_ret_mask_0.5

ms_dbg_ret_mask_0.5.miffpng

ms_dbg_ret_mask_1

ms_dbg_ret_mask_1.miffpng

ms_dbg_mask_approx

ms_dbg_mask_approx.miffpng

ms_dbg_sub_trans

ms_dbg_sub_trans.miffpng

ms_dbg_fnd_img

ms_dbg_fnd_img.miffpng

ms_dbg_mask_refined

ms_dbg_mask_refined.miffpng

ms_dbg_mask_final

ms_dbg_mask_final.miffpng

Scripts

For convenience, .bat scripts are also available in a single zip file. See Zipped BAT files.

buildcore.sh

This file is in %IMSRC%\snibgo.

export PKG_CONFIG_PATH=${IMDEV}../lib/pkgconfig


while [ $# -gt 0 ]
  do

#  gcc -o "$1" "$1".c -Imagick -I. -Ofast -m64 -flto -mwindows -march=native -mtune=corei7 -Wall -Wa,-march=corei7,-mtune=corei7 `pkg-config --cflags --libs MagickCore`
  gcc -o "$1" "$1".c -Imagick -I. -Ofast -m64 -march=native -mtune=native -Wall `pkg-config --cflags --libs MagickCore`
  cp "$1".exe ${IMDEV}

  shift   # next option
done

All images on this page were created by the commands shown, using:

%IM%identify -version
Version: ImageMagick 6.9.5-3 Q16 x86 2016-07-22 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Visual C++: 180040629
Features: Cipher DPC Modules OpenMP 
Delegates (built-in): bzlib cairo flif freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib
gcc --version
gcc (GCC) 5.4.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

To improve internet download speeds, some images may have been automatically converted (by ImageMagick, of course) from PNG to JPG.

Source file for this web page is multisrch.h1. To re-create this web page, execute "procH1 multisrch".


This page, including the images, is my copyright. Anyone is permitted to use or adapt any of the code, scripts or images for any purpose, including commercial use.

Anyone is permitted to re-publish this page, but only for non-commercial use.

Anyone is permitted to link to this page, including for commercial use.


Page version v1.0 7-September-2017.

Page created 17-Oct-2017 16:15:32.

Copyright © 2017 Alan Gibson.