Composing PDF happily On Linux As An Non-Latin Coder
Contents
Introduction
Groff is a ancient but powerful typesetting system that creates formatted output when given plain text mixed with formatting commands. By using the Mom macroset, one can compose PDF file from plain-text file easily.
|
|
input.mom
|
|
output.pdf
Neat!
My Situation
The problem for me is I need to write some Chinese document, which is a rare use case since I didn’t see any one talked about it over the search engines. After some digging, trial and error, I finally figured out the utimate solution.
(groff spit out gibberish for Chinese characters)
The Solution
File Encoding
groff reads input file in ASCII
encoding by default, but most distros use UTF8 as default text file encoding nowaday, so we can read and write Unicode without any problme at all. Fortunately, we can change this behavious with -Kutf8
option.
|
|
Font
Then, I received some complaining from groff.
|
|
That meant groff can now read input file correctly, but failed to find the Glyph
from Font
file. That was weird to me because I definitely had some Chinese Fonts installed on my system. Turned out groff uses its own font library, which certainly doesn’t contain any CJK glyph judged by their file sizes.
|
|
In this output, AR
, AB
, AI
and ABI
stand for Arial Regular
, Arial Bold
, Arial Italic
, and Arial BoldItalic
. We can use them like this:
|
|
For more information, check out this page on momdoc
The font format that groff uses is also obsolete, it’s hard to find font in such format. So, we need (font-forge)[https://fontforge.org/] to help us converting TrueType/OpenType fonts to something groff can understand.
I decided to try adding (Sarasa-Mono-SC)[https://github.com/be5invis/Sarasa-Gothic] into groff font library. So I downloaded and extracted following files:
|
|
Conversion
In general, we have to convert one TTF to two files:
- groff font file, for
sarasa-mono-sc-regular.ttf
, it should be something likeSarasaMonoSCR
. Then we can refer this font asSarasaMonoSC
in our groff source file. - PS Type 1 or Type 42 (end width
.pfa
and.t42
accordingly). I would suggest Type 42 format which has a smaller file size, leads to a faster compilation.
Step by step:
- Open our CJK font file
sarasa-mono-sc-regular.ttf
with font forge - Go to File -> Generate Fonts, Select PS Type 1 (Ascii), make sure Output AFM is checked in Options Dialog
- Click on Generate button, we will get 2 new files,
Sarasa-Mono-SC-Regular.pfa
andSarasa-Mono-SC-Regular.afm
- Change output type to Type 42 and generate a file named
Sarasa-Mono-SC-Regular.t42
- Generate groff font file from
.afm
file1
afmtodit Sarasa-Mono-SC-Regular "/usr/share/groff/current/font/devps/generate/textmap" SarasaMonoSCR
Installation
In general, we have to copy genrated files to groff search folder and register them as downloadable font file.
Step by step:
- Copy
SarasaMonoSCR
andSarasa-Mono-SC-Regular.t42
to/usr/share/groff/site-font/devps
- Add following line to
/usr/share/groff/current/font/devps/download
, replace<TAB>
withTab
… you know what I mean..1 2
... Sarasa-Mono-SC-Regular<TAB>Sarasa-Mono-SC-Regular.pfa
With that we can now make groff embed our font when outputing a postscript file
Now, in order to make it work when outputing a PDF file with -Tpdf
option, we also need to register the font for devpdf
, but we are not going to do that. Because with this approach, we are going to receive a huge PDF with all used fonts embedded. A better way is to use ps2pdf
to create a optimized PDF file:
|
|
Other styles
If your font came with all styles like Sarasa
did, then you are in luck, convert and install them and you are good to go. Some font may provides Regular
version only, then you need to generate other styles yourselves. fontforge has this Change Weight
and Italic/Oblique
commands under Element->Style
menu can help to do that. Use Edit->Select
to highlight your characters and then apply those style commands you need.
Live preview
Live preview is helpful when we composing complex layout. First, we need a auto-compilation mechanism, So I add some script to my .vimrc
to enter a auto-compile
mode:
|
|
Then, we can launch any PDF viewer we like to do the previewing part. I would suggestion to use something like zathura
which can auto reload when file changed. Or you can take a look at entr
to help you automate the reloading if your viewer of choice doesn’t support autoreloading.
Another problem I ran into was auto-compilation took too long to complete, the reason was Sarasa
fonts are huge, each style could took up to 40M in size. When multiple styles were used, the outputed .ps
file could grew over 100M easily. So I finally opt for custom font consists of DroidSansFallback
+ URW Gothic Bookman
+ FontAWesome
.
If you just want to get a taste of it, WenQuenYiMicroHei
would be a better choice. Most distros have this font on their official repository, but its Korean part might be broken, manifests as characters stack together. Fortunately, Ubuntu fixed this problem, for other distros, we could download the deb
package and extract the font file out of it.
Usage
Finally, I can now compose PDF file happily.
input.mom
|
|
output.pdf
Take away
In order to avoid redundant labor, I created couple of scripts to automate the process:
- unttc: extracts
ttf
files fromttc
file - generate_fontstyle: geneates Bold/Italic versions out of base on Regular version
- groff_ttf.sh: adds
ttf
to groff font library
Author Klesh Wong
LastMod 2020-11-24