snibgo's ImageMagick pages

Statistics

Here is a process I use for detecting outliers in a series, such as the transformations from points in one image to points in another. By rejecting outliers, we improve the overall transformation.

Removing outliers

I use the "modified Thompson tau technique", as described in John M. Cimbala's Outliers (pdf). See Wikipedia: Outlier for other possible methods.

The basic technique is to reject values that lie outside the range (mean ± k * standard_deviation), where k is around 1.0 to 2.0. The sophistication of this technique is that k is well-defined and depends only on the sample size; and it removes only one sample (the worst one) at a time. After a sample is removed, it re-calculates the mean and standard_deviation and k, and re-tests whether another sample can be removed. When no samples can be removed, the algorithm is finished.

I implemented the algorithm in removeOutliers.bat. In practice a higher-level script is used, such as removeCsvOutliers.bat, which is used by alignArea.bat on the Simple alignment by matching areas page.

These scripts use arrays. The Windows BAT language doesn't have arrays, so these are really environment variables named VALUES[5] and so on. For large arrays, eg with 64000 entries, Windows 8.1 suffers a big performance problem. These processes are more easily implemented in decent languages such as C.

When used on dx and dy and r = sqrt(dx*dx+dy*dy), the translations of points from one image to another, this isn't the most effective method. It assumes dx, dy and r are independent variables, but in fact they are dependent.

The most suitable method might be to "break into" the internals of IM's "-distort perspective", which finds a least-square-error solution when a distortion is over-specified. Perhaps the errors can be pulled out, and used to identify outliers. Or the verbose output could be used to calculate the transformation of each point, and we know where we wanted it moving, so we know the error of each.

Example usage: suppose a file stt_data.txt contains:

98.0
99.6
40.5
92.7
95.5
93.5
85.8
91.2
76.4
150.5
67.3
We can remove all outliers:
call removeCsvOutliers stt_data.txt stt_out1.lis 1 1
98.0 
99.6 
92.7 
95.5 
93.5 
91.2 
We can remove only the outliers that are above the mean:
call removeCsvOutliers stt_data.txt stt_out2.lis 1 0
98.0 
99.6 
40.5 
92.7 
95.5 
93.5 
85.8 
91.2 
76.4 
67.3 

Scripts

For convenience, .bat scripts are also available in a single zip file. See Zipped BAT files.

extendCoordPairs.bat

rem From CSV file %1, assumed to contain Score,X0,Y0,X1,Y1
rem writes %2 containing Score,X0,Y0,X1,Y1,dX,dY,r
rem where:
rem   dX = X1 - X0
rem   dY = Y1 - Y0
rem   r = sqrt (dX^2 + dY^2)
rem
rem Files have no headers.

@if "%2"=="" findstr /B "rem @rem" %~f0 & exit /B 1

@setlocal enabledelayedexpansion

@call echoOffSave


set INFILE=%1
set OUTFILE=%2

if not exist %INFILE% (
  echo %0: can't find %INFILE%
  exit /B 1
)

del %OUTFILE% 2>nul

for /F "tokens=1-5 delims=, " %%A in (%INFILE%) do (
  set /A dx=%%D-%%B
  set /A dy=%%E-%%C
  set /A dx2=!dx!*!dx!
  set /A dy2=!dy!*!dy!
  set /A r2=!dx2!+!dy2!

  for /F "usebackq" %%L in (`%IM%identify ^
    -precision 9 ^
    -format "r=%%[fx:sqrt(!r2!)]" ^
    xc:`) do set %%L

  echo %%A,%%B,%%C,%%D,%%E,!dx!,!dy!,!r! >>%OUTFILE%
)

@call echoRestore

@endlocal

removeCsvOutliers.bat

rem From CSV file %1, writes CSV file %2 without outliers.
rem %3, %4, %5, %6 ...:
rem    %3, %5,.. are the columns to be examined for outliers.
rem    %4, %6,.. are 0 or 1 for removeOutliers.bat
rem
rem First column is numbered one.
@rem
@rem Also uses:
@rem   rcoAPPEND_NUM if 1, appends line number as extra column.
@rem   rcoDEBUG_FILE if set, echos debugging text to this file


@if "%4"=="" findstr /B "rem @rem" %~f0 & exit /B 1

@setlocal enabledelayedexpansion

@call echoOffSave


set INFILE=%1
set OUTFILE=%2

if not exist %INFILE% (
  echo %0: can't find %INFILE%
  exit /B 1
)


if "%rcoAPPEND_NUM%"=="" set rcoAPPEND_NUM=0

if "%rcoDEBUG_FILE%"=="" set rcoDEBUG_FILE=+

if not "%rcoDEBUG_FILE%"=="+" (
  set SaveroDEBUG_FILE=%roDEBUG_FILE%
  set roDEBUG_FILE=%rcoDEBUG_FILE%
)


set nVAL=0
for /F "tokens=%3 delims=, " %%V in (%INFILE%) do (
  set /A nVAL+=1
)
call zeroArray OUTLIERS %nVAL%

:loop
if not "%rcoDEBUG_FILE%"=="+" echo %~n0: Column=%3 IsAbs=%4 >>%rcoDEBUG_FILE%

rem Read the array
set nVAL=0
for /F "tokens=%3 delims=, " %%V in (%INFILE%) do (
  rem echo %%V
  set VALUES[!nVAL!]=%%V
  set /A nVAL+=1
)

call removeOutliers VALUES %nVAL% OUTLIERS %4

shift /3
shift /3
if not "%3"=="" goto loop

set /A nLAST=%nVAL%-1

rem for /L %%i in (0,1,%nLAST%) do echo !OUTLIERS[%%i]!


rem Write the non-outliers.
del %OUTFILE% 2>nul
set nVAL=0
for /F "tokens=*" %%L in (%INFILE%) do (
  set /A ISOUTLIER=OUTLIERS[!nVal!]
  echo ISOUTLIER=!ISOUTLIER!
  if not !ISOUTLIER!==1 (
    if %rcoAPPEND_NUM%==0 (
      echo %%L >>%OUTFILE%
    ) else (
      echo %%L, !nVAL! >>%OUTFILE%
    )
  )
  set /A nVAL+=1
)

if not "%rcoDEBUG_FILE%"=="+" (
  echo %~n0: nVAL=%nVAL% >>%rcoDEBUG_FILE%

  set roDEBUG_FILE=%SaveroDEBUG_FILE%
)

@call echoRestore

@endlocal

removeOutliers.bat

rem From array %1 with %2 elements, removes outliers.
rem Array %3 is already 1 where element is to be ignored.
rem Sets %3 elements to 1 where element is outlier (so we can't use setlocal).
rem %4 is optional, 1=take abs (so remove high or low outliers) 0=no abs (so remove only high outliers). Default=1.
@rem See http://www.mne.psu.edu/me345/Lectures/Outliers.pdf
@rem
@rem Also uses:
@rem   roDEBUG_FILE if set, echos debugging text to this file

@if "%3"=="" findstr /B "rem @rem" %~f0 & exit /B 1

@rem @setlocal enabledelayedexpansion

@rem @call echoOffSave


if %2 LSS 3 (
  exit /B 0
)

set DO_ABS=%4
if "%DO_ABS%"=="" set DO_ABS=1

if %DO_ABS%==1 (
  set FUNC=abs
) else (
  set FUNC=
)

echo DO_ABS=%DO_ABS%

set ARR=%1

if "%roDEBUG_FILE%"=="" set roDEBUG_FILE=+

set /A nLAST=%2-1
set IGNORE=%3

:loop

call meanStdDev %1 %2 %3

set HIGH_DEV=0
set nHIGH=0
set nVALS=0

for /L %%i in (0,1,%nLAST%) do (
  if not "!%IGNORE%[%%i]!"=="1" (
    set VAL=!%ARR%[%%i]!

    for /F "usebackq" %%L in (`%IM%identify ^
      -precision 9 ^
      -format "DEVIATION=%%[fx:%FUNC%(!VAL!-%msdMEAN%)]" ^
      xc:`) do set %%L

    for /F "usebackq" %%L in (`%IM%identify ^
      -format "IS_HIGHER=%%[fx:!DEVIATION!>=!HIGH_DEV!?1:0]" ^
      xc:`) do set %%L

    if !IS_HIGHER!==1 (
      set HIGH_DEV=!DEVIATION!
      set nHIGH=%%i
    )
    set /A nVALS+=1
  )  
)


echo HIGH_DEV=%HIGH_DEV% nHIGH=%nHIGH%

call getModThoTau %nVALS%

for /F "usebackq" %%L in (`%IM%identify ^
  -format "IS_OUTLIER=%%[fx:%HIGH_DEV%>=%gmttTAU%*%msdSTD_DEV_SAMP%?1:0]" ^
  xc:`) do set %%L

rem echo IS_OUTLIER=%IS_OUTLIER%

if %IS_OUTLIER%==1 (
  set %IGNORE%[%nHIGH%]=1
  if not "%roDEBUG_FILE%"=="+" echo %~n0: removed %nHIGH% >>%roDEBUG_FILE%
  goto loop
)


@rem @call echoRestore

@rem @endlocal

meanStdDev.bat

rem From array %1 with %2 elements returns mean and standard deviation.
rem Array %3 is 1 where element is to be ignored.

@if "%2"=="" findstr /B "rem @rem" %~f0 & exit /B 1

@setlocal enabledelayedexpansion

@call echoOffSave

set ARR=%1
set /A nLAST=%2-1

set IGNORE=%3

set sigV=0
set sigV2=0
set nVAL=0
for /L %%i in (0,1,%nLAST%) do (
  rem echo %ARR%[%%i]=!%ARR%[%%i]!

  if not "!%IGNORE%[%%i]!"=="1" (
    set VAL=!%ARR%[%%i]!

    for /F "usebackq" %%L in (`%IM%identify ^
      -precision 9 ^
      -format "sigV=%%[fx:!sigV!+!VAL!]\nsigV2=%%[fx:!sigV2!+!VAL!*!VAL!]" ^
      xc:`) do set %%L

    set /A nVAL+=1
  )
)

echo nVAL=%nVAL% sigV=%sigV% sigV2=%sigV2%

for /F "usebackq" %%L in (`%IM%identify ^
  -precision 9 ^
  -format "mean=%%[fx:%sigV%/%nVAL%]" ^
  xc:`) do set %%L

for /F "usebackq" %%L in (`%IM%identify ^
  -precision 9 ^
  -format "sd=%%[fx:sqrt(%sigV2%/%nVAL%-%mean%*%mean%)]" ^
  xc:`) do set %%L

for /F "usebackq" %%L in (`%IM%identify ^
  -precision 9 ^
  -format "sdSamp=%%[fx:sqrt((%sigV2%-%mean%*%sigV%)/(%nVAL%-1))]" ^
  xc:`) do set %%L

echo mean=%mean% sd=%sd% sdSamp=%sdSamp%

@call echoRestore

@endlocal & set msdMEAN=%mean%& set msdSTD_DEV=%sd%& set msdSTD_DEV_SAMP=%sdSamp%

zeroArray.bat

@for /L %%i in (0,1,%2) do @set /A %1[%%i]=0

getModThoTau.bat

rem Returns Modified Thompson Tau for n=%1.
@rem See http://www.mne.psu.edu/me345/Lectures/Outliers.pdf

@if "%1"=="" findstr /B "rem @rem" %~f0 & exit /B 1

@call echoOffSave

if %1 LEQ 2 (
  echo %0: bad %1
  exit /B 1
)

if %1 GEQ 5000 (
  set gmttTAU=1.9597
) else if %1 GEQ 1000 (
  set gmttTAU=1.9586
) else if %1 GEQ 500 (
  set gmttTAU=1.9572
) else if %1 GEQ 100 (
  set gmttTAU=1.9459
) else if %1 GEQ 50 (
  set gmttTAU=1.9314
) else if %1 GEQ 39 (
  set gmttTAU=1.9281
) else (
  call getLineN %Util%\ModThoTau.txt %1
  set gmttTAU=!glnLINE!
)

echo gmttTAU=%gmttTAU%

@call echoRestore

ModThoTau.txt

# Values of Modified Thompson Tau for n, starting at n=3, up to n=38.
# Data file for getModThoTau.bat. See http://www.mne.psu.edu/me345/Lectures/Outliers.pdf
# This file must have 3 lines before the first value.
1.1511
1.4250
1.5712
1.6563
1.7110
1.7491
1.7770
1.7984
1.8153
1.8290
1.8403
1.8498
1.8579
1.8649
1.8710
1.8764
1.8811
1.8853
1.8891
1.8926
1.8957
1.8985
1.9011
1.9035
1.9057
1.9078
1.9096
1.9114
1.9130
1.9146
1.9160
1.9174
1.9186
1.9198
1.9209
1.9220

getLineN.bat

@rem From text file %1,
@rem   starting at line number %2 [default 0],
@rem   echo %3 lines to stdout [default 1].
@rem Also returns last found line in glnLINE.
@rem First line is number zero.
@rem Call with %3 = -1 for all remaining lines.

@setlocal

@set nSkip=%2
@if "%nSkip%"=="" set nSkip=0

@set ToDo=%3
@if "%ToDo%"=="" set ToDo=1

@set sSKIP=
@if not %nSkip%==0 set sSKIP=skip=%2

@for /F "%sSKIP% tokens=*" %%L in (%1) do @(
  @if not !ToDo!==0 (
    @echo %%L
    @set LINE=%%L
    @set /A ToDo-=1
  )
)

@endlocal & set glnLINE=%LINE%

echoOffSave.bat

@echo>%TEMP%\echo_%1.txt
@for /f "tokens=3 delims=. " %%E in (%TEMP%\echo_%1.txt) do @set ECHO_SAVE=%%E
@echo off

echoRestore.bat

@echo %ECHO_SAVE%

All images on this page were created by the commands shown, using:

%IM%identify -version
Version: ImageMagick 6.9.2-5 Q16 x64 2015-10-31 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Visual C++: 180031101
Features: Cipher DPC Modules OpenMP 
Delegates (built-in): bzlib cairo freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib

Source file for this web page is stats.h1. To re-create this web page, run "procH1 stats".


This page, including the images, is my copyright. Anyone is permitted to use or adapt any of the code, scripts or images for any purpose, including commercial use.

Anyone is permitted to re-publish this page, but only for non-commercial use.

Anyone is permitted to link to this page, including for commercial use.


Page version v1.0 4-July-2014.

Page created 26-May-2016 16:40:53.

Copyright © 2016 Alan Gibson.