Sound is an image with one dimension.
ImageMagick is a powerful tool for processing arrays of two-dimensional signals, commonly known as "images". By ignoring the second dimension, we can use it to process sound.
ImageMagick can't read or write audio files directly, but can read and write headerless binary files. We will use SoX to convert between audio and binary files.
Download SoX from http://sox.sourceforge.net/. Install it to any directory, and put the name of the directory into environment variable SOXDIR.
Download an example sound file.
if not exist 11k16bitpcm.wav wget http://www.nch.com.au/acm/11k16bitpcm.wav
Listen to the audio. It is a male voice saying, "Thank you for installing..."
List information about the audio file:
"%SOXDIR%sox" --info 11k16bitpcm.wav
Input File : '11k16bitpcm.wav' Channels : 1 Sample Rate : 11025 Precision : 16-bit Duration : 00:00:13.81 = 152267 samples ~ 1035.83 CDDA sectors File Size : 305k Bit Rate : 176k Sample Encoding: 16-bit Signed Integer PCM
This sound file has one channel, with 152267 samples. We will transform it into an image size 152267x1 pixels. If we are using IM Q16, we want unsigned 16-bit integers.
Dump the samples as unsigned 16-bit integers into a headerless binary file:
"%SOX%sox" 11k16bitpcm.wav sv_test.u16
Convert the binary file to an image file. Being headerless, we need to tell IM the dimensions of the image, and how many channels it has.
%IM%convert -depth 16 -size 152267x1 gray:sv_test.u16 sv_test.png
We can view a portion of the image. 600 samples represents 600/11025 = 0.054 seconds.
%IM%convert ^ sv_test.png ^ -crop 600x1+11025+0 +repage ^ sv_1sec.png
More usefully, we can display this as a waveform.
call %PICTBAT%graphLineCol ^ sv_1sec.png . . 0 sv_1sec_glc.png
As an example of simple audio processing, we will add an echo half a second behind the input.
At a sample rate of 11025 Hz, half a second is 5512 samples (or pixels). We lengthen the audio by this amount, by splicing mid-gray pixels to the east side.
The echo is quieter than its input. We use "-compose Mathematics" to blend them.
%IM%convert ^ sv_test.png ^ -background gray(50%%) -gravity East -splice 5512x0+0+0 ^ ( +clone ) ^ -geometry -5512+0 ^ -compose Mathematics -define "compose:args=0,0.3,0.7,0" ^ -composite ^ sv_echoed.png
What do those 600 samples look like now?
%IM%convert ^ sv_echoed.png ^ -crop 600x1+11025+0 +repage ^ sv_1sec_e.png call %PICTBAT%graphLineCol ^ sv_1sec_e.png . . 0 sv_1sec_e_glc.png
Convert the image back to a sound file.
%IM%convert ^ sv_echoed.png ^ -depth 16 gray:sv_echoed.u16 "%SOX%sox" --rate 11025 sv_echoed.u16 sv_echoed.wav
Listening to the audio, we can hear the echo.
Audio-as-visual is a large topic on which many pages could be written. For example, how do we visually summarise a long audio? Here is a possible method:
%IM%convert ^ sv_test.png ^ -solarize 50% -evaluate Multiply 2 ^ -resize "500x1^!" ^ -negate ^ -evaluate Multiply 2 ^ -evaluate Add 50%% ^ -crop 1x1 ^ -background None ^ -gravity East -splice 1x0+0+0 ^ +append +repage ^ ( +clone -negate ) ^ -geometry +1+0 ^ -compose Over -composite ^ sv_all.png call %PICTBAT%graphLineCol ^ sv_all.png . . 0 sv_all_glc.png
All images (and sounds) on this page were created by the commands shown, using:
Version: ImageMagick 6.9.5-3 Q16 x86 2016-07-22 http://www.imagemagick.org Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC License: http://www.imagemagick.org/script/license.php Visual C++: 180040629 Features: Cipher DPC Modules OpenMP Delegates (built-in): bzlib cairo flif freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib
c:\program files (x86)\sox-14-4-2\sox: SoX v14.4.2
Source file for this web page is sndvis.h1. To re-create this web page, run "procH1 sndvis".
This page, including the images, is my copyright. Anyone is permitted to use or adapt any of the code, scripts or images for any purpose, including commercial use.
Anyone is permitted to re-publish this page, but only for non-commercial use.
Anyone is permitted to link to this page, including for commercial use.
Page version v1.0 18-September-2016.
Page created 18-Sep-2016 17:16:48.
Copyright © 2016 Alan Gibson.