snibgo's ImageMagick pages

Sound and vision

Sound is an image with one dimension.

ImageMagick is a powerful tool for processing arrays of two-dimensional signals, commonly known as "images". By ignoring the second dimension, we can use it to process sound.


ImageMagick can't read or write audio files directly, but can read and write headerless binary files. We will use SoX to convert between audio and binary files.

Download SoX from Install it to any directory, and put the name of the directory into environment variable SOXDIR.

Download an example sound file.

if not exist 11k16bitpcm.wav wget

Listen to the audio. It is a male voice saying, "Thank you for installing..."

Convert audio to image

List information about the audio file:

"%SOXDIR%sox" --info 11k16bitpcm.wav 
 Input File     : '11k16bitpcm.wav'
Channels       : 1
Sample Rate    : 11025
Precision      : 16-bit
Duration       : 00:00:13.81 = 152267 samples ~ 1035.83 CDDA sectors
File Size      : 305k
Bit Rate       : 176k
Sample Encoding: 16-bit Signed Integer PCM

This sound file has one channel, with 152267 samples. We will transform it into an image size 152267x1 pixels. If we are using IM Q16, we want unsigned 16-bit integers.

Dump the samples as unsigned 16-bit integers into a headerless binary file:

"%SOX%sox" 11k16bitpcm.wav sv_test.u16

Convert the binary file to an image file. Being headerless, we need to tell IM the dimensions of the image, and how many channels it has.

%IM%convert -depth 16 -size 152267x1 gray:sv_test.u16 sv_test.png

We can view a portion of the image. 600 samples represents 600/11025 = 0.054 seconds.

%IM%convert ^
  sv_test.png ^
  -crop 600x1+11025+0 +repage ^

More usefully, we can display this as a waveform.

call %PICTBAT%graphLineCol ^
  sv_1sec.png . . 0 sv_1sec_glc.png

Audio/image processing

As an example of simple audio processing, we will add an echo half a second behind the input.

At a sample rate of 11025 Hz, half a second is 5512 samples (or pixels). We lengthen the audio by this amount, by splicing mid-gray pixels to the east side.

The echo is quieter than its input. We use "-compose Mathematics" to blend them.

%IM%convert ^
  sv_test.png ^
  -background gray(50%%) -gravity East -splice 5512x0+0+0 ^
  ( +clone ) ^
  -geometry -5512+0 ^
  -compose Mathematics -define "compose:args=0,0.3,0.7,0" ^
  -composite ^

What do those 600 samples look like now?

%IM%convert ^
  sv_echoed.png ^
  -crop 600x1+11025+0 +repage ^

call %PICTBAT%graphLineCol ^
  sv_1sec_e.png . . 0 sv_1sec_e_glc.png

Convert image to audio

Convert the image back to a sound file.

%IM%convert ^
  sv_echoed.png ^
  -depth 16 gray:sv_echoed.u16

"%SOX%sox" --rate 11025 sv_echoed.u16 sv_echoed.wav

Listening to the audio, we can hear the echo.


Audio-as-visual is a large topic on which many pages could be written. For example, how do we visually summarise a long audio? Here is a possible method:

%IM%convert ^
  sv_test.png ^
  -solarize 50% -evaluate Multiply 2 ^
  -resize "500x1^!" ^
  -negate ^
  -evaluate Multiply 2 ^
  -evaluate Add 50%% ^
  -crop 1x1 ^
  -background None ^
  -gravity East -splice 1x0+0+0 ^
  +append +repage ^
  ( +clone -negate ) ^
  -geometry +1+0 ^
  -compose Over -composite ^

call %PICTBAT%graphLineCol ^
  sv_all.png . . 0 sv_all_glc.png

All images (and sounds) on this page were created by the commands shown, using:

%IM%identify -version
Version: ImageMagick 6.9.5-3 Q16 x86 2016-07-22
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
Visual C++: 180040629
Features: Cipher DPC Modules OpenMP 
Delegates (built-in): bzlib cairo flif freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib
"%SOXDIR%sox" --version
c:\program files (x86)\sox-14-4-2\sox:      SoX v14.4.2

Source file for this web page is sndvis.h1. To re-create this web page, run "procH1 sndvis".

This page, including the images, is my copyright. Anyone is permitted to use or adapt any of the code, scripts or images for any purpose, including commercial use.

Anyone is permitted to re-publish this page, but only for non-commercial use.

Anyone is permitted to link to this page, including for commercial use.

Page version v1.0 18-September-2016.

Page created 18-Sep-2016 17:16:48.

Copyright © 2016 Alan Gibson.