Here is a process I use for detecting outliers in a series, such as the transformations from points in one image to points in another. By rejecting outliers, we improve the overall transformation.
I use the "modified Thompson tau technique", as described in John M. Cimbala's Outliers (pdf). See Wikipedia: Outlier for other possible methods.
The basic technique is to reject values that lie outside the range (mean ± k * standard_deviation), where k is around 1.0 to 2.0. The sophistication of this technique is that k is well-defined and depends only on the sample size; and it removes only one sample (the worst one) at a time. After a sample is removed, it re-calculates the mean and standard_deviation and k, and re-tests whether another sample can be removed. When no samples can be removed, the algorithm is finished.
I implemented the algorithm in removeOutliers.bat. In practice a higher-level script is used, such as removeCsvOutliers.bat, which is used by alignArea.bat on the Simple alignment by matching areas page.
These scripts use arrays. The Windows BAT language doesn't have arrays, so these are really environment variables named VALUES[5] and so on. For large arrays, eg with 64000 entries, Windows 8.1 suffers a big performance problem. These processes are more easily implemented in decent languages such as C.
When used on dx and dy and r = sqrt(dx*dx+dy*dy), the translations of points from one image to another, this isn't the most effective method. It assumes dx, dy and r are independent variables, but in fact they are dependent.
The most suitable method might be to "break into" the internals of IM's "-distort perspective", which finds a least-square-error solution when a distortion is over-specified. Perhaps the errors can be pulled out, and used to identify outliers. Or the verbose output could be used to calculate the transformation of each point, and we know where we wanted it moving, so we know the error of each.
Example usage: suppose a file stt_data.txt contains:
98.0 99.6 40.5 92.7 95.5 93.5 85.8 91.2 76.4 150.5 67.3We can remove all outliers:
call removeCsvOutliers stt_data.txt stt_out1.lis 1 1
98.0 99.6 92.7 95.5 93.5 91.2We can remove only the outliers that are above the mean:
call removeCsvOutliers stt_data.txt stt_out2.lis 1 0
98.0 99.6 40.5 92.7 95.5 93.5 85.8 91.2 76.4 67.3
For convenience, .bat scripts are also available in a single zip file. See Zipped BAT files.
rem From CSV file %1, assumed to contain Score,X0,Y0,X1,Y1 rem writes %2 containing Score,X0,Y0,X1,Y1,dX,dY,r rem where: rem dX = X1 - X0 rem dY = Y1 - Y0 rem r = sqrt (dX^2 + dY^2) rem rem Files have no headers. @rem @rem Updated: @rem 3-October-2022 for IM v7. @rem @if "%2"=="" findstr /B "rem @rem" %~f0 & exit /B 1 @setlocal enabledelayedexpansion @call echoOffSave set INFILE=%1 set OUTFILE=%2 if not exist %INFILE% ( echo %0: can't find %INFILE% exit /B 1 ) del %OUTFILE% 2>nul for /F "tokens=1-5 delims=, " %%A in (%INFILE%) do ( set /A dx=%%D-%%B set /A dy=%%E-%%C set /A dx2=!dx!*!dx! set /A dy2=!dy!*!dy! set /A r2=!dx2!+!dy2! for /F "usebackq" %%L in (`%IMG7%magick identify ^ -precision 9 ^ -format "r=%%[fx:sqrt(!r2!)]" ^ xc:`) do set %%L echo %%A,%%B,%%C,%%D,%%E,!dx!,!dy!,!r! >>%OUTFILE% ) @call echoRestore @endlocal
rem From CSV file %1, writes CSV file %2 without outliers. rem %3, %4, %5, %6 ...: rem %3, %5,.. are the columns to be examined for outliers. rem %4, %6,.. are 0 or 1 for removeOutliers.bat rem rem First column is numbered one. @rem @rem Also uses: @rem rcoAPPEND_NUM if 1, appends line number as extra column. @rem rcoDEBUG_FILE if set, echos debugging text to this file @if "%4"=="" findstr /B "rem @rem" %~f0 & exit /B 1 @setlocal enabledelayedexpansion @call echoOffSave set INFILE=%1 set OUTFILE=%2 if not exist %INFILE% ( echo %0: can't find %INFILE% exit /B 1 ) if "%rcoAPPEND_NUM%"=="" set rcoAPPEND_NUM=0 if "%rcoDEBUG_FILE%"=="" set rcoDEBUG_FILE=+ if not "%rcoDEBUG_FILE%"=="+" ( set SaveroDEBUG_FILE=%roDEBUG_FILE% set roDEBUG_FILE=%rcoDEBUG_FILE% ) set nVAL=0 for /F "tokens=%3 delims=, " %%V in (%INFILE%) do ( set /A nVAL+=1 ) call zeroArray OUTLIERS %nVAL% :loop if not "%rcoDEBUG_FILE%"=="+" echo %~n0: Column=%3 IsAbs=%4 >>%rcoDEBUG_FILE% rem Read the array set nVAL=0 for /F "tokens=%3 delims=, " %%V in (%INFILE%) do ( rem echo %%V set VALUES[!nVAL!]=%%V set /A nVAL+=1 ) call removeOutliers VALUES %nVAL% OUTLIERS %4 shift /3 shift /3 if not "%3"=="" goto loop set /A nLAST=%nVAL%-1 rem for /L %%i in (0,1,%nLAST%) do echo !OUTLIERS[%%i]! rem Write the non-outliers. del %OUTFILE% 2>nul set nVAL=0 for /F "tokens=*" %%L in (%INFILE%) do ( set /A ISOUTLIER=OUTLIERS[!nVal!] echo ISOUTLIER=!ISOUTLIER! if not !ISOUTLIER!==1 ( if %rcoAPPEND_NUM%==0 ( echo %%L >>%OUTFILE% ) else ( echo %%L, !nVAL! >>%OUTFILE% ) ) set /A nVAL+=1 ) if not "%rcoDEBUG_FILE%"=="+" ( echo %~n0: nVAL=%nVAL% >>%rcoDEBUG_FILE% set roDEBUG_FILE=%SaveroDEBUG_FILE% ) echo %0: nVAL=%nVAL% @call echoRestore @endlocal
rem From array %1 with %2 elements, removes outliers. rem Array %3 is already 1 where element is to be ignored. rem Sets %3 elements to 1 where element is outlier (so we can't use setlocal). rem %4 is optional, 1=take abs (so remove high or low outliers) 0=no abs (so remove only high outliers). Default=1. @rem See http://www.mne.psu.edu/me345/Lectures/Outliers.pdf @rem @rem Also uses: @rem roDEBUG_FILE if set, echos debugging text to this file @rem @rem Updated: @rem 4-October-2022 for IM v7. @rem @if "%3"=="" findstr /B "rem @rem" %~f0 & exit /B 1 @rem @setlocal enabledelayedexpansion @rem @call echoOffSave if %2 LSS 3 ( exit /B 0 ) set DO_ABS=%4 if "%DO_ABS%"=="" set DO_ABS=1 if %DO_ABS%==1 ( set FUNC=abs ) else ( set FUNC= ) echo DO_ABS=%DO_ABS% set ARR=%1 if "%roDEBUG_FILE%"=="" set roDEBUG_FILE=+ set /A nLAST=%2-1 set IGNORE=%3 :loop call meanStdDev %1 %2 %3 set HIGH_DEV=0 set nHIGH=0 set nVALS=0 for /L %%i in (0,1,%nLAST%) do ( if not "!%IGNORE%[%%i]!"=="1" ( set VAL=!%ARR%[%%i]! for /F "usebackq" %%L in (`%IMG7%magick identify ^ -precision 9 ^ -format "DEVIATION=%%[fx:%FUNC%(!VAL!-%msdMEAN%)]" ^ xc:`) do set %%L for /F "usebackq" %%L in (`%IMG7%magick identify ^ -format "IS_HIGHER=%%[fx:!DEVIATION!>=!HIGH_DEV!?1:0]" ^ xc:`) do set %%L if !IS_HIGHER!==1 ( set HIGH_DEV=!DEVIATION! set nHIGH=%%i ) set /A nVALS+=1 ) ) echo HIGH_DEV=%HIGH_DEV% nHIGH=%nHIGH% call getModThoTau %nVALS% for /F "usebackq" %%L in (`%IMG7%magick identify ^ -format "IS_OUTLIER=%%[fx:%HIGH_DEV%>=%gmttTAU%*%msdSTD_DEV_SAMP%?1:0]" ^ xc:`) do set %%L rem echo IS_OUTLIER=%IS_OUTLIER% if %IS_OUTLIER%==1 ( set %IGNORE%[%nHIGH%]=1 if not "%roDEBUG_FILE%"=="+" echo %~n0: removed %nHIGH% >>%roDEBUG_FILE% goto loop ) @rem @call echoRestore @rem @endlocal
rem From array %1 with %2 elements returns mean and standard deviation. rem Array %3 is 1 where element is to be ignored. @rem @rem Updated: @rem 4-October-2022 for IM v7. @rem @if "%2"=="" findstr /B "rem @rem" %~f0 & exit /B 1 @setlocal enabledelayedexpansion @call echoOffSave set ARR=%1 set /A nLAST=%2-1 set IGNORE=%3 set sigV=0 set sigV2=0 set nVAL=0 for /L %%i in (0,1,%nLAST%) do ( rem echo %ARR%[%%i]=!%ARR%[%%i]! if not "!%IGNORE%[%%i]!"=="1" ( set VAL=!%ARR%[%%i]! for /F "usebackq" %%L in (`%IMG7%magick identify ^ -precision 9 ^ -format "sigV=%%[fx:!sigV!+!VAL!]\nsigV2=%%[fx:!sigV2!+!VAL!*!VAL!]" ^ xc:`) do set %%L set /A nVAL+=1 ) ) echo nVAL=%nVAL% sigV=%sigV% sigV2=%sigV2% for /F "usebackq" %%L in (`%IMG7%magick identify ^ -precision 9 ^ -format "mean=%%[fx:%sigV%/%nVAL%]" ^ xc:`) do set %%L for /F "usebackq" %%L in (`%IMG7%magick identify ^ -precision 9 ^ -format "sd=%%[fx:sqrt(%sigV2%/%nVAL%-%mean%*%mean%)]" ^ xc:`) do set %%L for /F "usebackq" %%L in (`%IMG7%magick identify ^ -precision 9 ^ -format "sdSamp=%%[fx:sqrt((%sigV2%-%mean%*%sigV%)/(%nVAL%-1))]" ^ xc:`) do set %%L echo mean=%mean% sd=%sd% sdSamp=%sdSamp% @call echoRestore @endlocal & set msdMEAN=%mean%& set msdSTD_DEV=%sd%& set msdSTD_DEV_SAMP=%sdSamp%
@for /L %%i in (0,1,%2) do @set /A %1[%%i]=0
rem Returns Modified Thompson Tau for n=%1. @rem See http://www.mne.psu.edu/me345/Lectures/Outliers.pdf @if "%1"=="" findstr /B "rem @rem" %~f0 & exit /B 1 @call echoOffSave if %1 LEQ 2 ( echo %0: bad %1 exit /B 1 ) if %1 GEQ 5000 ( set gmttTAU=1.9597 ) else if %1 GEQ 1000 ( set gmttTAU=1.9586 ) else if %1 GEQ 500 ( set gmttTAU=1.9572 ) else if %1 GEQ 100 ( set gmttTAU=1.9459 ) else if %1 GEQ 50 ( set gmttTAU=1.9314 ) else if %1 GEQ 39 ( set gmttTAU=1.9281 ) else ( call getLineN %Util%\ModThoTau.txt %1 set gmttTAU=!glnLINE! ) echo gmttTAU=%gmttTAU% @call echoRestore
# Values of Modified Thompson Tau for n, starting at n=3, up to n=38. # Data file for getModThoTau.bat. See http://www.mne.psu.edu/me345/Lectures/Outliers.pdf # This file must have 3 lines before the first value. 1.1511 1.4250 1.5712 1.6563 1.7110 1.7491 1.7770 1.7984 1.8153 1.8290 1.8403 1.8498 1.8579 1.8649 1.8710 1.8764 1.8811 1.8853 1.8891 1.8926 1.8957 1.8985 1.9011 1.9035 1.9057 1.9078 1.9096 1.9114 1.9130 1.9146 1.9160 1.9174 1.9186 1.9198 1.9209 1.9220
@rem From text file %1, @rem starting at line number %2 [default 0], @rem echo %3 lines to stdout [default 1]. @rem Also returns last found line in glnLINE. @rem First line is number zero. @rem Call with %3 = -1 for all remaining lines. @setlocal @set nSkip=%2 @if "%nSkip%"=="" set nSkip=0 @set ToDo=%3 @if "%ToDo%"=="" set ToDo=1 @set sSKIP= @if not %nSkip%==0 set sSKIP=skip=%2 @for /F "%sSKIP% tokens=*" %%L in (%1) do @( @if not !ToDo!==0 ( @echo %%L @set LINE=%%L @set /A ToDo-=1 ) ) @endlocal & set glnLINE=%LINE%
@echo>%TEMP%\echo_%1.txt @for /f "tokens=3 delims=. " %%E in (%TEMP%\echo_%1.txt) do @set ECHO_SAVE=%%E @echo off
@echo %ECHO_SAVE%
All images on this page were created by the commands shown, using:
%IMG7%magick -version
Version: ImageMagick 7.1.0-49 Q16-HDRI x64 7a3f3f1:20220924 https://imagemagick.org Copyright: (C) 1999 ImageMagick Studio LLC License: https://imagemagick.org/script/license.php Features: Cipher DPC HDRI OpenCL Delegates (built-in): bzlib cairo freetype gslib heic jng jp2 jpeg jxl lcms lqr lzma openexr pangocairo png ps raqm raw rsvg tiff webp xml zip zlib Compiler: Visual Studio 2022 (193331630)
Source file for this web page is stats.h1. To re-create this web page, run "procH1 stats".
This page, including the images, is my copyright. Anyone is permitted to use or adapt any of the code, scripts or images for any purpose, including commercial use.
Anyone is permitted to re-publish this page, but only for non-commercial use.
Anyone is permitted to link to this page, including for commercial use.
Page version v1.0 4-July-2014.
Page created 04-Oct-2022 12:51:41.
Copyright © 2022 Alan Gibson.