A process module that returns multiple search results, invariant to various transformations.
Aka whatSRT??
Given a main image and a smaller subimage, the goal is to find all occurrences of the subimage within the main image. For each occurrence found, the module emits the scale, rotate and translate (SRT) parameters that would transform the subimage into the occurrence. The search can also be invariant to lightness and contrast.
The task could be solved very simply: at every possible position, we can test every possible resize at every possible rotation, and the best result from all of those tests is the solution. The search space has four dimensions: x, y, scale and angle. The problem with an exhaustive search is the time required: we might have a million possible positions, a hundred possible scales and 360 possible angles, so there are 360*108 tests. Even if we can do 1000 tests every second, this would take more than a year.
The module can search in a variety of ways:
The results from one invocation can be used to limit the search-space of another. The limiting ranges can be per-result, from previous invocations.
Results from an invocation at a small scale can be used for an invocation at a larger scale.
Innovations: aggressive searching, sinks, jumps.
This process module ties together a number of separate methods that have been implemented as Windows BAT scripts:
Some components of this process module are available as simpler modules. These are:
We scatter some small frogs on the "toes" image. Each frog is scaled and rotated, and given a colour distortion.
frog.png This has transparent and semi-transparent pixels. |
|
%IM%convert ^ toes.png ^ ( frog.png +write mpr:FROG +delete ) ^ -virtual-pixel None ^ -set option:modulate:colorspace HCL ^ ( mpr:FROG ^ +distort SRT 22.5,22.5,1.0,0,70,170 ^ ) ^ ( mpr:FROG ^ +distort SRT 22.5,22.5,1.2,30,50,50 ^ ) ^ ( mpr:FROG ^ +distort SRT 22.5,22.5,0.8,-20,120,70 ^ ) ^ ( mpr:FROG ^ +distort SRT 22.5,22.5,0.9,150,180,105 ^ ) ^ ( mpr:FROG ^ +distort SRT 22.5,22.5,0.6,0,245,215 ^ ) ^ -layers merge ^ ms_toes_frogs.png goto :eof |
We search for the frogs.
Option | Description | |
---|---|---|
Short
form |
Long form | |
p string | phases string | List of phases to invoke, with a comma between each phase.
Any of: All all the following: Aggr aggressive search; Approx approximate search; RefPos refine positions; RefSR refine scale and rotation; Reduce final reduction. Default = All. |
at N | aggrThr N | Threshold for Aggr search.
Typically 0.0 to 1.0. Default = 0.75. |
s0 N | minScale N | Minimum scale for Approx search.
A positive number. Default = 0.5. |
s1 N | minScale N | Maximum scale for Approx search.
A positive number, large than minScale. Default = 2.0. |
ns N | nScales N | Number of scales for Approx search.
A positive integer, at least 1. Default = 20. |
cm string | crMethod string | Method for creating concentric rings in Approx search, one of:
Slow unroll at each scale; Quick unroll once, and scale that. Default = Quick. |
z N | sizeSiv N | Size divider for multi-scale in Approx search.
A positive number greater than 1. Default = 2.0. |
md N | minDim N | Minimum dimension for multi-scale in Approx search.
A positive integer greater than 1. Default = 20. |
o string | outType string | Type of output image, one of:
None leave image list unchanged; Mask return a mask; Trans transform subimage by best SRT parameters; Comp composite all found occurrences. Default = Comp. |
f string | file string | Write text data (SRT parameters) to stderr or stdout.
Default = stderr. |
v | verbose | Write some text output to stderr. |
v2 | verbose2 | Write more text output to stderr. |
v3 | verbose3 | Write yet more text output to stderr. |
d string | debug string | Write debugging images to MIFF files with this prefix. |
The results file is text, with fields separated by commas, and possibly spaces. There is one line for each candidate position of the subimage within the main image. Not all possible positions may be shown, as some may have been eliminated. Header? The file may be sorted, with the lowest (best) scores at the top. The fields are:
x-coordinate of top-left position y-coordinate of top-left position rotation rotation error scale scale error RMSE score, 0.0 to 1.0.
But the best score (locally) will be surrounded by positions with very good scores, which we don't want to process further.
Or an image could be made, with RGB channels representing rotation, scale and score. Then use nLightest to get the coordinate list.
Or: make pixels that aren't local sinks, white.
We have a new concept: a masked search. Inputs:
Searches can create, or use and update, the mask. These searches include: aggressive search, concentric rings, radials, pixel-by-pixel, ... When a search reads the mask, it may be smaller than required. If so, it is simply resized.
The concentric rings search works well. For each likely position, we have an approximate scale. Next step: at that scale, what is the best angle? At that angle, what is the best scale? Iterate until happy.
%IM%convert d.png -virtual-pixel None +distort SRT 0.900323,9.6861 x.png %IM%convert toes.png ( x.png -evaluate add 10% -repage +50+50! ) -layers merge x3.pngProblem: when subimage has essentially radial features, the first search for scale could be wrong (even if score is good, and search for rotation is no problem). How do we notice the problem? If the worst score is similar to the best score?
%IMDEV%convert ^ rose: rose: ^ -process 'multisrch help' ^ NULL:
Usage: -process 'multisrch [OPTION]...' Multi-phase search with various methods for subimage SRT. g, geometry string geometry of search area p, phases string 'all' or list eg 'aggr,approx,refpos,refsr' at, aggrThr number threshold for aggressive s0, minScale number minimum scale s1, maxScale number maximum scale ns, nScales integer number of scales r, scaleRadMax int scale so radius is this, at max m, method string for concentric rings, 'slow' or 'quick' j, notJustSinks don't eliminate non-sinks z, sizeDiv number size divider for ms md, minDim integer minimum dimension for ms o, outType string 'None', 'Mask', 'Trans' or 'Comp' f, file string write to file stream stdout or stderr v, verbose write text information to stderr v2, verbose2 write more text information to stderr v3, verbose3 yet more text information to stderr d, debug string write debugging text to stderr, and image files with given prefix version write version information to stdout
Process module srchmask reads a mask image and spits out the six SRT parameters and score, in score order, perhaps limited to the top N. To be cute, this might also composite those results over the main image.
Perhaps use jump method to find top scoring positions.
Find the rose: subimage in each of these:
set PARAMS=f stdout
The problem | The solution | ||
---|---|---|---|
%IM%convert ^ rose: ^ ms_rse1.png |
%IMDEV%convert ^ ms_rse1.png ^ rose: ^ -process 'multisrch %PARAMS%' ^ -background #4f4 ^ -layers flatten ^ ms_rse1_out.png WrSrchMaskSrt: nPositions=1 nGoodMatch=1 0.00759641,35,23,0.999105,0.0327869,35,23 |
||
%IM%convert ^ -size 200x200 xc:#0af ^ rose: ^ -geometry +45+77 ^ -composite ^ ms_rse2.png |
%IMDEV%convert ^ ms_rse2.png ^ rose: ^ -process 'multisrch %PARAMS%' ^ -background #4f4 ^ -layers flatten ^ ms_rse2_out.png WrSrchMaskSrt: nPositions=1 nGoodMatch=1 0.0873283,35,23,0.971099,0.00692281,81,100 |
||
%IM%convert ^ -size 200x200 xc:#0af ^ ( rose: ^ -virtual-pixel None ^ +distort SRT ^ 35,23,1,0,80,100 ^ ) ^ -layers merge ^ ms_rse3.png |
%IMDEV%convert ^ ms_rse3.png ^ rose: ^ -process 'multisrch %PARAMS%' ^ -background #4f4 ^ -layers flatten ^ ms_rse3_out.png WrSrchMaskSrt: nPositions=1 nGoodMatch=1 0.0860579,35,23,0.971099,0.00799244,81,100 |
||
%IM%convert ^ -size 200x200 xc:#0af ^ ( rose: ^ -virtual-pixel None ^ +distort SRT ^ 35,23,1,35,80,100 ^ ) ^ -layers merge ^ ms_rse4.png |
%IMDEV%convert ^ ms_rse4.png ^ rose: ^ -process 'multisrch %PARAMS%' ^ -background #4f4 ^ -layers flatten ^ ms_rse4_out.png WrSrchMaskSrt: nPositions=4 nGoodMatch=1 0.0378664,35,23,0.986302,35.1798,80,100 |
||
%IM%convert ^ -size 200x200 xc:#0af ^ ( rose: ^ -virtual-pixel None ^ +distort SRT ^ 35,23,1.5,0,80,100 ^ ) ^ -layers merge ^ ms_rse5.png |
%IMDEV%convert ^ ms_rse5.png ^ rose: ^ -process 'multisrch %PARAMS%' ^ -background #4f4 ^ -layers flatten ^ ms_rse5_out.png WrSrchMaskSrt: nPositions=3 nGoodMatch=1 0.0248807,35,23,1.5014,0.0696241,80,100 |
||
%IM%convert ^ -size 200x200 xc:#0af ^ ( rose: ^ -virtual-pixel None ^ +distort SRT ^ 35,23,1.5,35,80,100 ^ ) ^ -layers merge ^ ms_rse6.png |
%IMDEV%convert ^ ms_rse6.png ^ rose: ^ -process 'multisrch %PARAMS%' ^ -background #4f4 ^ -layers flatten ^ ms_rse6_out.png WrSrchMaskSrt: nPositions=7 nGoodMatch=1 0.0323015,35,23,1.48386,35.0716,80,100 |
Examples from the What rotation and scale? page, which uses scripts.
%IMDEV%convert ^ wrs_src2.png ^ wrs_src1.png ^ -process 'multisrch verbose' ^ ms_wrsp21.png |
|
set SRT=150,100,1.28416,99.6825,150,100 %IMDEV%convert ^ wrs_src2.png ^ ms_wrsp21.png ^ -background None -layers merge ^ ms_wrsp21_out.png |
goto skip %IM%convert ^ toes.png ^ -distort SRT 78,106,1.1,35,78,106 ^ +repage ^ -crop 60x80+48+66 +repage ^ ms_ex_sub1.png |
%IMDEV%convert ^ toes.png +write mpr:MAIN +delete ^ ms_ex_sub1.png +write mpr:SUB +delete ^ ( mpr:MAIN mpr:SUB ^ -resize 12.5%% ^ -process 'multisrch n 10 o Mask' ^ ) ^ ( mpr:MAIN mpr:SUB ^ -resize 25%% ^ -clone 0 ^ -process 'multisrch n 10 o Mask' ^ ) ^ -delete 0 ^ ( mpr:MAIN mpr:SUB ^ -resize 50%% ^ -clone 0 ^ -process 'multisrch n 20 o Mask' ^ ) ^ -delete 0 ^ ( mpr:MAIN mpr:SUB ^ -clone 0 ^ -process 'multisrch n 5 o Mask' ^ ) ^ -delete 0 ^ ( mpr:MAIN mpr:SUB ^ -clone 0 ^ -process 'multisrch n 20 o Mask' ^ ) ^ -delete 0 ^ ( mpr:MAIN mpr:SUB ^ -clone 0 ^ -process 'multisrch n 100 o Mask' ^ ) ^ -delete 0 ^ -depth 32 ^ -define quantum:format=floating-point ^ ms_ex_map1.miff
@ms_ex_map1.lis
%IM%convert ^ ms_ex_map1.miff ^ -scale 300%% ^ +write ms_ex_map1_lg.png ^ -separate ^ ms_ex_map1_lg_%%d.png |
|
Red channel: scale |
|
Green channel: rotation |
|
Blue channel: score |
%IM%convert ms_ex_sub1.png -virtual-pixel None +distort SRT 0.9091,321.266 c1.png f:\prose\PICTURES>c1.png %IMDEV%convert toes.png c1.png -process 'srchimg' NULL: 0.0438884 @ 34,61 f:\prose\PICTURES>%IM%convert toes.png -crop 60x80+34+61 +repage c2.png f:\prose\PICTURES>c2.png f:\prose\PICTURES>gimp c1.png f:\prose\PICTURES>"c:\Program Files\gimp 2\bin\gimp-2.8.exe" c1.png f:\prose\PICTURES>
%IMDEV%convert ^ toes.png ms_ex_sub1.png ^ -process 'multisrch o Mask v' ^ -define quantum:format=floating-point ^ -depth 32 ^ ms_x.miff %IMDEV%convert ^ ms_x.miff ^ -process srchmask NULL:
Using images from Ciratefi zip file:
These images are not my copyright.
:skip %IMDEV%convert ^ ciratefi_one_a2.jpg ^ ciratefi_one_q09.png ^ -process 'multisrch debug dbg_cira v' ^ ms_cira_comp1.png |
Approx solution:
%IMDEV%convert ^ ciratefi_two_a1.jpg -evaluate Subtract 10%% ^ ( ciratefi_two_q1.png ^ -virtual-pixel None +distort SRT 22,19.5,1.19,25,171,149 ^ -channel A -evaluate Multiply 0.5 +channel ^ ) ^ -layers merge ^ mr.png
This solution would be in the mask at 149,129.5.
%IMDEV%convert ^ ciratefi_two_a1.jpg ^ ( ciratefi_two_q1.png ^ -virtual-pixel None +distort SRT 1.19,25 ^ +write info: ^ +repage ^ ) ^ -process 'srchimg v' ^ NULL: 0.0950066 @ 135,114
Sub_img transformed dims are 44x39, so centre is 135+44/2=157, 114+67/2=147.5
The task is to find all occurrences of a subimage within a main image, and find the scale, rotation and translation parameters (the six "+distort SRT" parameters) for each occurrence. We might want to find only the best match, or the N best matches.
If the main image is size W*H, and the subimage is w*h, then there are (W-w+1)*(H-h+1) possible positions for the subimage. This might be a few million positions. At each position, there are a hundred or more possible scales and a hundred or more possible angles of rotation. We cannot do an exhaustive search.
Instead, we tackle the problem in phases. At each phase, we exclude positions that are clearly not solutions.
By default, the module will invoke all phases. The phase option can be used with a list of the required phases. In the phase list, "all" is a shortcut for all the phases. Phase names can be prefixed with "-" to omit that phase. Whatever the order given in the list, phases are invoked in the order shown above. For example:
Phase 3 "refPos" for each candidate position, rotates and scales the subimage by the current estimate for that position, and takes a crop from the main image with an extra margin on each side, and searches that for the transformed subimage.
Within phase 4 "refSR", if the scale (or rotation) isn't exactly correct, then the found rotation (or scale) will be slightly wrong. So the phase iterates to stability. Similarly, if the scale and rotation (or position) aren't exactly correct, the position (or scale and rotation) will be slightly wrong. So phases 3 and 4 iterate to stability. Refining each solution by iteration is expensive, but the first two phases have reduced the set of candidate solutions to a small number.
Data is communicated between phases by a mask image. The mask has one pixel for each possible position of the subimage within the main image. The red channel represents a scale, typically close to one; the green channel is an angle; the blue channel is a score. If the mask is saved, it must be saved a floating-point.
The mask is available as an output. The debug option will write the mask after each phase. A mask can also be supplied as a third input image. This may be useful for large images: first scale down the main and subimage, invoke the module to get scale, rotaton and score, reduce the candidates solutions at that scale, and make a mask which is scaled up for the full search. (The module already does this scaling, but only for the Approx search.)
[[ Functions in whatrot.inc and whatscale.inc can find the approximate scale and rotation that makes one image match another, assuming no translation.
For every possible position of one subimage within another, we could crop the larger image and apply the whatrot.inc and whatscale.inc functions. This gives us a scale and rotation, and a score, for each position. We find which position gives the best score, so that is the required translation, and hence we know the scale and rotation. This would take a long time for large images.
To get decent performance, we do the search on a pyramid of scales. At each level in the pyramid, we record the score at each position, in an image that has one pixel for each position, and we eliminate positions that are clearly not the best. This mask (aka map) of scores is then used to restrict the positions that are searched at the next level.
When an approximate scale, rotation and translation are known, we refine the search. We extract a larger crop from the main image, which we scale, rotate and crop. This should then roughly match the subimage.
Our list of candidate positions, each with scale, angle and score, may have tens or hundreds of entries, sorted in order of score, with the best first. We might manually declare how many we want. How can we automatically find the correct number? We can use a variation of the jump method.
When we have data that is eg 0.9123456 +/- 0.045, the answer should not be quoted with six decimal places.
Phase 5, "reduce": With multiple result, sort by score. Where one result is entirely covered by another with better score, remove it. How do we do this? We readily calculate corner coords. If all corners of image A are inside image B, then image A is hidden by image B. But we can also eliminate solutions that are mosly obscured. So a fairly easy test is: If A has worse score than B, and each corner of A is within a semi-diagonal of any corner of B, eliminate A.
%IMDEV%convert ^ ms_rse3.png ^ rose: ^ -process 'multisrch d ms_dbg' ^ ms_dbg_out.png
ms_dbg_aggrmask |
|
ms_dbg_avgconcrings |
|
ms_dbg_dispmap |
|
ms_dbg_avgconcrings2 |
|
ms_dbg_dispmap2 |
|
ms_dbg_ret_mask_0.25 |
|
ms_dbg_ret_mask_0.5 |
|
ms_dbg_ret_mask_1 |
|
ms_dbg_mask_approx |
|
ms_dbg_sub_trans |
|
ms_dbg_fnd_img |
|
ms_dbg_mask_refined |
|
ms_dbg_mask_final |
For convenience, .bat scripts are also available in a single zip file. See Zipped BAT files.
This file is in %IMSRC%\snibgo.
export PKG_CONFIG_PATH=${IMDEV}../lib/pkgconfig while [ $# -gt 0 ] do # gcc -o "$1" "$1".c -Imagick -I. -Ofast -m64 -flto -mwindows -march=native -mtune=corei7 -Wall -Wa,-march=corei7,-mtune=corei7 `pkg-config --cflags --libs MagickCore` gcc -o "$1" "$1".c -Imagick -I. -Ofast -m64 -march=native -mtune=native -Wall `pkg-config --cflags --libs MagickCore` cp "$1".exe ${IMDEV} shift # next option done
All images on this page were created by the commands shown, using:
%IM%identify -version
Version: ImageMagick 6.9.5-3 Q16 x86 2016-07-22 http://www.imagemagick.org Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC License: http://www.imagemagick.org/script/license.php Visual C++: 180040629 Features: Cipher DPC Modules OpenMP Delegates (built-in): bzlib cairo flif freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib
gcc --version
gcc (GCC) 5.4.0 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
To improve internet download speeds, some images may have been automatically converted (by ImageMagick, of course) from PNG to JPG.
Source file for this web page is multisrch.h1. To re-create this web page, execute "procH1 multisrch".
This page, including the images, is my copyright. Anyone is permitted to use or adapt any of the code, scripts or images for any purpose, including commercial use.
Anyone is permitted to re-publish this page, but only for non-commercial use.
Anyone is permitted to link to this page, including for commercial use.
Page version v1.0 7-September-2017.
Page created 17-Oct-2017 16:15:32.
Copyright © 2017 Alan Gibson.