Composing PDF happily On Linux As An Non-Latin Coder
Contents
Introduction
Groff is a ancient but powerful typesetting system that creates formatted output when given plain text mixed with formatting commands. By using the Mom macroset, one can compose PDF file from plain-text file easily.
|
|
input.mom
|
|
output.pdf
Neat!
My Situation
The problem for me is I need to write some Chinese document, which is a rare use case since I didn’t see any one talked about it over the search engines. After some digging, trial and error, I finally figured out the utimate solution.
(groff spit out gibberish for Chinese characters)
The Solution
File Encoding
groff reads input file in ASCII encoding by default, but most distros use UTF8 as default text file encoding nowaday, so we can read and write Unicode without any problme at all. Fortunately, we can change this behavious with -Kutf8 option.
|
|
Font
Then, I received some complaining from groff.
|
|
That meant groff can now read input file correctly, but failed to find the Glyph from Font file. That was weird to me because I definitely had some Chinese Fonts installed on my system. Turned out groff uses its own font library, which certainly doesn’t contain any CJK glyph judged by their file sizes.
|
|
In this output, AR, AB, AI and ABI stand for Arial Regular, Arial Bold, Arial Italic, and Arial BoldItalic. We can use them like this:
|
|
For more information, check out this page on momdoc
The font format that groff uses is also obsolete, it’s hard to find font in such format. So, we need (font-forge)[https://fontforge.org/] to help us converting TrueType/OpenType fonts to something groff can understand.
I decided to try adding (Sarasa-Mono-SC)[https://github.com/be5invis/Sarasa-Gothic] into groff font library. So I downloaded and extracted following files:
|
|
Conversion
In general, we have to convert one TTF to two files:
- groff font file, for
sarasa-mono-sc-regular.ttf, it should be something likeSarasaMonoSCR. Then we can refer this font asSarasaMonoSCin our groff source file. - PS Type 1 or Type 42 (end width
.pfaand.t42accordingly). I would suggest Type 42 format which has a smaller file size, leads to a faster compilation.
Step by step:
- Open our CJK font file
sarasa-mono-sc-regular.ttfwith font forge - Go to File -> Generate Fonts, Select PS Type 1 (Ascii), make sure Output AFM is checked in Options Dialog
- Click on Generate button, we will get 2 new files,
Sarasa-Mono-SC-Regular.pfaandSarasa-Mono-SC-Regular.afm - Change output type to Type 42 and generate a file named
Sarasa-Mono-SC-Regular.t42 - Generate groff font file from
.afmfile1afmtodit Sarasa-Mono-SC-Regular "/usr/share/groff/current/font/devps/generate/textmap" SarasaMonoSCR
Installation
In general, we have to copy genrated files to groff search folder and register them as downloadable font file.
Step by step:
- Copy
SarasaMonoSCRandSarasa-Mono-SC-Regular.t42to/usr/share/groff/site-font/devps - Add following line to
/usr/share/groff/current/font/devps/download, replace<TAB>withTab… you know what I mean..1 2... Sarasa-Mono-SC-Regular<TAB>Sarasa-Mono-SC-Regular.pfaWith that we can now make groff embed our font when outputing a postscript file
Now, in order to make it work when outputing a PDF file with -Tpdf option, we also need to register the font for devpdf, but we are not going to do that. Because with this approach, we are going to receive a huge PDF with all used fonts embedded. A better way is to use ps2pdf to create a optimized PDF file:
|
|
Other styles
If your font came with all styles like Sarasa did, then you are in luck, convert and install them and you are good to go. Some font may provides Regular version only, then you need to generate other styles yourselves. fontforge has this Change Weight and Italic/Oblique commands under Element->Style menu can help to do that. Use Edit->Select to highlight your characters and then apply those style commands you need.
Live preview
Live preview is helpful when we composing complex layout. First, we need a auto-compilation mechanism, So I add some script to my .vimrc to enter a auto-compile mode:
|
|
Then, we can launch any PDF viewer we like to do the previewing part. I would suggestion to use something like zathura which can auto reload when file changed. Or you can take a look at entr to help you automate the reloading if your viewer of choice doesn’t support autoreloading.
Another problem I ran into was auto-compilation took too long to complete, the reason was Sarasa fonts are huge, each style could took up to 40M in size. When multiple styles were used, the outputed .ps file could grew over 100M easily. So I finally opt for custom font consists of DroidSansFallback + URW Gothic Bookman + FontAWesome.
If you just want to get a taste of it, WenQuenYiMicroHei would be a better choice. Most distros have this font on their official repository, but its Korean part might be broken, manifests as characters stack together. Fortunately, Ubuntu fixed this problem, for other distros, we could download the deb package and extract the font file out of it.
Usage
Finally, I can now compose PDF file happily.
input.mom
|
|
output.pdf
Take away
In order to avoid redundant labor, I created couple of scripts to automate the process:
- unttc: extracts
ttffiles fromttcfile - generate_fontstyle: geneates Bold/Italic versions out of base on Regular version
- groff_ttf.sh: adds
ttfto groff font library
Author Klesh Wong
LastMod 2020-11-24