Composing PDF happily On Linux As An Non-Latin Coder
Groff is a ancient but powerful typesetting system that creates formatted output when given plain text mixed with formatting commands. By using the Mom macroset, one can compose PDF file from plain-text file easily.
The problem for me is I need to write some Chinese document, which is a rare use case since I didn’t see any one talked about it over the search engines. After some digging, trial and error, I finally figured out the utimate solution.
(groff spit out gibberish for Chinese characters)
groff reads input file in
ASCII encoding by default, but most distros use UTF8 as default text file encoding nowaday, so we can read and write Unicode without any problme at all. Fortunately, we can change this behavious with
Then, I received some complaining from groff.
That meant groff can now read input file correctly, but failed to find the
Font file. That was weird to me because I definitely had some Chinese Fonts installed on my system. Turned out groff uses its own font library, which certainly doesn’t contain any CJK glyph judged by their file sizes.
In this output,
ABI stand for
Arial Italic, and
Arial BoldItalic. We can use them like this:
For more information, check out this page on momdoc
The font format that groff uses is also obsolete, it’s hard to find font in such format. So, we need (font-forge)[https://fontforge.org/] to help us converting TrueType/OpenType fonts to something groff can understand.
I decided to try adding (Sarasa-Mono-SC)[https://github.com/be5invis/Sarasa-Gothic] into groff font library. So I downloaded and extracted following files:
In general, we have to convert one TTF to two files:
- groff font file, for
sarasa-mono-sc-regular.ttf, it should be something like
SarasaMonoSCR. Then we can refer this font as
SarasaMonoSCin our groff source file.
- PS Type 1 or Type 42 (end width
.t42accordingly). I would suggest Type 42 format which has a smaller file size, leads to a faster compilation.
Step by step:
- Open our CJK font file
sarasa-mono-sc-regular.ttfwith font forge
- Go to File -> Generate Fonts, Select PS Type 1 (Ascii), make sure Output AFM is checked in Options Dialog
- Click on Generate button, we will get 2 new files,
- Change output type to Type 42 and generate a file named
- Generate groff font file from
afmtodit Sarasa-Mono-SC-Regular "/usr/share/groff/current/font/devps/generate/textmap" SarasaMonoSCR
In general, we have to copy genrated files to groff search folder and register them as downloadable font file.
Step by step:
- Add following line to
Tab… you know what I mean..
With that we can now make groff embed our font when outputing a postscript file
Now, in order to make it work when outputing a PDF file with
-Tpdf option, we also need to register the font for
devpdf, but we are not going to do that. Because with this approach, we are going to receive a huge PDF with all used fonts embedded. A better way is to use
ps2pdf to create a optimized PDF file:
If your font came with all styles like
Sarasa did, then you are in luck, convert and install them and you are good to go. Some font may provides
Regular version only, then you need to generate other styles yourselves. fontforge has this
Change Weight and
Italic/Oblique commands under
Element->Style menu can help to do that. Use
Edit->Select to highlight your characters and then apply those style commands you need.
Live preview is helpful when we composing complex layout. First, we need a auto-compilation mechanism, So I add some script to my
.vimrc to enter a
Then, we can launch any PDF viewer we like to do the previewing part. I would suggestion to use something like
zathura which can auto reload when file changed. Or you can take a look at
entr to help you automate the reloading if your viewer of choice doesn’t support autoreloading.
Another problem I ran into was auto-compilation took too long to complete, the reason was
Sarasa fonts are huge, each style could took up to 40M in size. When multiple styles were used, the outputed
.ps file could grew over 100M easily. So I finally opt for custom font consists of
URW Gothic Bookman +
If you just want to get a taste of it,
WenQuenYiMicroHei would be a better choice. Most distros have this font on their official repository, but its Korean part might be broken, manifests as characters stack together. Fortunately, Ubuntu fixed this problem, for other distros, we could download the
deb package and extract the font file out of it.
Finally, I can now compose PDF file happily.
In order to avoid redundant labor, I created couple of scripts to automate the process:
- unttc: extracts
- generate_fontstyle: geneates Bold/Italic versions out of base on Regular version
- groff_ttf.sh: adds
ttfto groff font library
Author Klesh Wong