snibgo's ImageMagick pages

UTF-8 characters

ImageMagick can rasterise UTF-8 characters outside the ASCII range 0-127.

Procedures described here were tested under Windows 8.1. The principles may be correct for other versions of Windows.

See also my Pango page.

Peparation

You may need to tell your command window that you will be working with UTF-8.

  1. In the command window, click the top-left window menu, Properties, Font, select "Lucida Console".
  2. Tell Windows to use a UTF code page by typing the following command:
    chcp 65001

I will use "šŋĩβģő élève äëïöü" as a test string. It is meaningless but looks quite good. My web browser correctly displays these characters as:

u8_browse.png

If your browser shows something different, you may need to change something.

You may be able to type these characters with Alt plus something. See external page How to enter Unicode characters in Microsoft Windows.

Type the following command, or copy and paste it from this web page:

echo šŋĩβģő élève äëïöü>snibu8.txt

The file contains 33 bytes. Use dir snibu8.txt to verify this. There are 20 characters including the final carriage-return and line-feed, of which 7 occupy 1 byte and 13 occupy 2 bytes.

Windows cmd echo always appends a carriage-return and line-feed. To avoid these:

set /p="šŋĩβģő élève äëïöü"<nul >snibu8noCr.txt
dir snibu8noCr.txt

If you edit snibu8.txt, Notepad will recognise that the file is encoded in UTF-8 and will show the correct characters. Sadly, Wordpad won't. If you type snibu8.txt it should display correctly on the console.

If you edit the file in Notepad, it will insert three extra bytes at the start of the file: 0xEF, 0xBB and 0xBF, the "byte order mark" (BOM).

Edited:
@\web\im\s.txt

IM seems to ignore the BOM, or perhaps the font has no glyph. If you want to remove BOMs, see Removing BOM from a file below.

We can create text in an image either by directly including the text in the command, or indirectly using a text file. Your chosen font needs to include UTF-8 glyphs. On my current computer, the default font contains all the glyphs in my test string.

Direct text from a command

Annotate

%IM%convert ^
  -size 400x200 xc:khaki -gravity Center ^
  -pointsize 30 ^
  -annotate 0 "šŋĩβģő élève äëïöü" ^
  u8_an.png
u8_an.png

Caption

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  caption:"šŋĩβģő élève äëïöü" ^
  u8_ca.png
u8_ca.png

Label

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  label:"šŋĩβģő élève äëïöü" ^
  u8_la.png
u8_la.png

Draw

%IM%convert ^
  -size 400x200 xc:khaki -gravity Center ^
  -pointsize 30 ^
  -draw "text 0,0 'šŋĩβģő élève äëïöü'" ^
  u8_dr.png
u8_dr.png

Pango

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  -pointsize 30 ^
  pango:"šŋĩβģő élève äëïöü" ^
  u8_pa.png
u8_pa.png

Indirect text from a file

The terminal cr-lf in the file influences the graphical output.

Annotate

%IM%convert ^
  -size 400x200 xc:khaki -gravity Center ^
  -pointsize 30 ^
  -annotate 0 @snibu8.txt ^
  u8_an_t.png
u8_an_t.png

Caption

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  caption:@snibu8.txt ^
  u8_ca_t.png
u8_ca_t.png

Label

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  label:@snibu8.txt ^
  u8_la_t.png
u8_la_t.png

Draw

echo text 0,0 'šŋĩβģő élève äëïöü'>u8_dr_t.txt

%IM%convert ^
  -size 400x200 xc:khaki -gravity Center ^
  -pointsize 30 ^
  -draw @u8_dr_t.txt ^
  u8_dr_t.png
u8_dr_t.png

Pango

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  -pointsize 30 ^
  pango:@snibu8.txt ^
  u8_pa_t.png
u8_pa_t.png

SVG

call %PICTBAT%setInkPath

%IM%convert ^
  -density 72 ^
  snutf8.svg ^
  u8_sv_t.png
u8_sv_t.png

The default font and size for pango is different from the other methods. In addition, "gravity center" doesn't centralise vertically.

snutf8.svg is:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<svg
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:cc="http://creativecommons.org/ns#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:svg="http://www.w3.org/2000/svg"
   xmlns="http://www.w3.org/2000/svg"
   width="400px"
   height="200px"
   id="svg2"
   version="1.1">
  <g
     id="layer1">
    <text
       xml:space="preserve"
       style="fill:#000000;fill-opacity:1;stroke:none"
       x="100"
       y="100"
       id="text2816"><tspan
         id="tspan2818"
         >šŋĩβģő élève äëïöü</tspan></text>
  </g>
</svg>

Arabic

Arabic is written from right to left. Joined-up writing uses different glyphs from separated writing, even when printed or on computer screens.

According to Cambridge Dictionaries Online, the Arabic for "image" is "صَوْرة", and for "magic" is "سِحْر". So we have "صَوْرة سِحْر". Re-writing that phrase, with characters spaced so each is standalone, like "i m a g e m a g i c", looks like this: "صَ وْ ر ة سِ حْ ر".

With default rendering, each character is written separately, and in the wrong order (left to right):

Annotate

%IM%convert ^
  -size 400x200 xc:khaki -gravity Center ^
  -pointsize 30 ^
  -annotate 0 "صَوْرة سِحْر" ^
  u8_an_a.png
u8_an_a.png

Caption

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  caption:"صَوْرة سِحْر" ^
  u8_ca_a.png
u8_ca_a.png

Label

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  label:"صَوْرة سِحْر" ^
  u8_la_a.png
u8_la_a.png

Draw

%IM%convert ^
  -size 400x200 xc:khaki -gravity Center ^
  -pointsize 30 ^
  -draw "text 0,0 'صَوْرة سِحْر'" ^
  u8_dr_a.png
u8_dr_a.png

Pango

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  -pointsize 30 ^
  pango:"صَوْرة سِحْر" ^
  u8_pa_a.png

Pango's default font doesn't include Arabic glyphs.
A font should be specified.

u8_pa_a.png

The IM setting "-direction" should be useful. In a right-to-left world, gravity is confused, and the automatic pointsize for caption and label doesn't work.

Annotate

%IM%convert ^
  -size 400x200 xc:khaki -gravity Center ^
  -pointsize 30 -direction right-to-left ^
  -annotate 0 "صَوْرة سِحْر" ^
  u8_an_a2.png

The direction is incorrect.

u8_an_a2.png

Caption

%IM%convert ^
  -size 400x200 -gravity NorthWest ^
  -background khaki ^
  -pointsize 30 -direction right-to-left ^
  caption:"صَوْرة سِحْر" ^
  u8_ca_a2.png

The direction is correct but the characters are standalone.

u8_ca_a2.png

Label

%IM%convert ^
  -size 400x200 -gravity NorthWest ^
  -background khaki ^
  -pointsize 30 -direction right-to-left ^
  label:"صَوْرة سِحْر" ^
  u8_la_a2.png

The direction is correct but the characters are standalone.

u8_la_a2.png

Draw

%IM%convert ^
  -size 400x200 xc:khaki -gravity Center ^
  -pointsize 30 -direction right-to-left ^
  -draw "text 0,0 'صَوْرة سِحْر'" ^
  u8_dr_a2.png

The direction is incorrect.

u8_dr_a2.png

Pango

This needs a font to be specified.

%IM%convert ^
  -size 400x200 -gravity Center ^
  -background khaki ^
  -font Arial -pointsize 30 ^
  pango:"صَوْرة سِحْر" ^
  u8_pa_a2.png

The direction and glyphs are correct.

u8_pa_a2.png

Pango

Testing wordwrap.

%IM%convert ^
  -size 200x200 -gravity Center ^
  -background khaki ^
  -font Arial -pointsize 50 ^
  pango:"صَوْرة سِحْر" ^
  u8_pa_a2ww.png

This seems correct.

u8_pa_a2ww.png

Conclusion: for Arabic text, "pango:" is the obvious choice.

Some more examples of Pango, in English and Arabic:

Wordwrap.

%IM%convert ^
  -size 350x200 -gravity NorthEast ^
  -background khaki ^
  -font Arial -pointsize 30 ^
  pango:"wonderful powerful no-cost image magic" ^
  u8_pa_ex1.png
u8_pa_ex1.png

Wordwrap, with a forced new line.

%IM%convert ^
  -size 350x200 -gravity NorthEast ^
  -background khaki ^
  -font Arial -pointsize 30 ^
  pango:"wonderful powerful no-cost\nimage magic" ^
  u8_pa_ex2.png
u8_pa_ex2.png

Wordwrap.

%IM%convert ^
  -size 350x200 -gravity NorthWest ^
  -background khaki ^
  -font Arial -pointsize 30 ^
  pango:"رائع قَوي مَجّاني صَوْرة سِحْر" ^
  u8_pa_ex3.png
u8_pa_ex3.png

Wordwrap, with a forced new line.

%IM%convert ^
  -size 350x200 -gravity West ^
  -background khaki ^
  -font Arial -pointsize 30 ^
  pango:"رائع قَوي مَجّاني\nصَوْرة سِحْر" ^
  u8_pa_ex4.png
u8_pa_ex4.png

Removing BOM from a file

Here is a simple BAT script that strips any BOMs in a file. Open a UTF-8 file in Notepad. Delete all the contents. Save the file as "bom.txt", checking that the encoding is UTF-8. Check with dir bom.txt that the file has three bytes. Move bom.txt to the same directory as the script deBom.txt.

The script will read a file, strip any BOMs, and write to stdout. Call it like this:

 call deBom infile.txt >outfile.txt

The script deBom.bat is:

@rem Removes any Byte Order Marks (BOM) in a file.

@setlocal enabledelayedexpansion

@set BOMFILE=%~dp0bom.txt

@if not exist %BOMFILE% @(
  @echo Can't find %BOMFILE%
  @exit /B 1
)

@for /F %%a in (%BOMFILE%) do @set bom=%%a

@for /F "tokens=*" %%a in (%1) do @(
  @set line=%%a
  @set line=!line:%bom%=!
  @echo !line!
)
@exit /B 0

All images on this page were created by the commands shown, using:

%IM%identify -version
Version: ImageMagick 6.9.5-3 Q16 x86 2016-07-22 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Visual C++: 180040629
Features: Cipher DPC Modules OpenMP 
Delegates (built-in): bzlib cairo flif freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib

Source file for this web page is snutf8.h1, which is encoded in UTF-8. To re-create this web page, execute "procH1 snutf8".


This page, including the images, is my copyright. Anyone is permitted to use or adapt any of the code, scripts or images for any purpose, including commercial use.

Anyone is permitted to re-publish this page, but only for non-commercial use.

Anyone is permitted to link to this page, including for commercial use.


Page version v1.1 1-Dec-2014.

Page created 24-Sep-2016 18:54:44.

Copyright © 2016 Alan Gibson.