Introduction

Groff is a ancient but powerful typesetting system that creates formatted output when given plain text mixed with formatting commands. By using the Mom macroset, one can compose PDF file from plain-text file easily.

1
groff -mom input.mom -Tpdf > output.pdf

input.mom

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
.PRINTSTYLE TYPESET
.TAB_SET 1   0 5P
.TAB_SET 2  6P 38P
.TAB_SET 3 22P 22P
.START
.MCO
.TAB 1
.FONT B
Intro
.MCR
.TAB 2
.FONT R
Some introductory sentence...
.MCX
.MCO
.TAB 1
.FONT B
Content
.MCR
.TAB 2
.FONT R
Formal content starts from here...
.MCX

output.pdf

Neat!

My Situation

The problem for me is I need to write some Chinese document, which is a rare use case since I didn’t see any one talked about it over the search engines. After some digging, trial and error, I finally figured out the utimate solution.

gibberish

(groff spit out gibberish for Chinese characters)

The Solution

File Encoding

groff reads input file in ASCII encoding by default, but most distros use UTF8 as default text file encoding nowaday, so we can read and write Unicode without any problme at all. Fortunately, we can change this behavious with -Kutf8 option.

1
groff -Kutf8 -ms input.ms -Tpdf > output.pdf

Font

Then, I received some complaining from groff.

1
2
3
4
5
6
troff: input.ms:12: warning: can't find special character 'u4E00'
troff: input.ms:12: warning: can't find special character 'u6BB5'
troff: input.ms:12: warning: can't find special character 'u4E2D'
troff: input.ms:12: warning: can't find special character 'u6587'
troff: input.ms:12: warning: can't find special character 'u63CF'
troff: input.ms:12: warning: can't find special character 'u8FF0'

That meant groff can now read input file correctly, but failed to find the Glyph from Font file. That was weird to me because I definitely had some Chinese Fonts installed on my system. Turned out groff uses its own font library, which certainly doesn’t contain any CJK glyph judged by their file sizes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/usr/share/groff/current/font/devps
total 604K
-rw-r--r-- 1 root root  11K Mar 21  2020 AB
-rw-r--r-- 1 root root  13K Mar 21  2020 ABI
-rw-r--r-- 1 root root  13K Mar 21  2020 AI
-rw-r--r-- 1 root root  11K Mar 21  2020 AR
-rw-r--r-- 1 root root  11K Mar 21  2020 BMB
-rw-r--r-- 1 root root  13K Mar 21  2020 BMBI
-rw-r--r-- 1 root root  12K Mar 21  2020 BMI
-rw-r--r-- 1 root root 9.9K Mar 21  2020 BMR
-rw-r--r-- 1 root root 6.4K Mar 21  2020 CB
-rw-r--r-- 1 root root 8.6K Mar 21  2020 CBI
-rw-r--r-- 1 root root 8.6K Mar 21  2020 CI
-rw-r--r-- 1 root root 6.3K Mar 21  2020 CR
-rw-r--r-- 1 root root  200 Mar 21  2020 DESC
-rw-r--r-- 1 root root  141 Mar 21  2020 download
-rw-r--r-- 1 root root 1.2K Mar 21  2020 EURO
-rw-r--r-- 1 root root 1.5K Mar 21  2020 freeeuro.afm
-rw-r--r-- 1 root root  21K Mar 21  2020 freeeuro.pfa
drwxr-xr-x 2 root root 4.0K Aug  1 00:29 generate/
-rw-r--r-- 1 root root  18K Mar 21  2020 HB
-rw-r--r-- 1 root root  20K Mar 21  2020 HBI
-rw-r--r-- 1 root root  21K Mar 21  2020 HI
-rw-r--r-- 1 root root  18K Mar 21  2020 HNB
-rw-r--r-- 1 root root  20K Mar 21  2020 HNBI
-rw-r--r-- 1 root root  21K Mar 21  2020 HNI
-rw-r--r-- 1 root root  19K Mar 21  2020 HNR
-rw-r--r-- 1 root root  19K Mar 21  2020 HR
-rw-r--r-- 1 root root  13K Mar 21  2020 NB
-rw-r--r-- 1 root root  20K Mar 21  2020 NBI
-rw-r--r-- 1 root root  17K Mar 21  2020 NI
-rw-r--r-- 1 root root  15K Mar 21  2020 NR
-rw-r--r-- 1 root root  11K Mar 21  2020 PB
-rw-r--r-- 1 root root  13K Mar 21  2020 PBI
-rw-r--r-- 1 root root  13K Mar 21  2020 PI
-rw-r--r-- 1 root root  11K Mar 21  2020 PR
-rw-r--r-- 1 root root 3.0K Mar 21  2020 prologue
-rw-r--r-- 1 root root 6.2K Mar 21  2020 S
-rw-r--r-- 1 root root 7.9K Mar 21  2020 SS
-rw-r--r-- 1 root root  606 Mar 21  2020 symbolsl.pfa
-rw-r--r-- 1 root root 8.9K Mar 21  2020 TB
-rw-r--r-- 1 root root  11K Mar 21  2020 TBI
-rw-r--r-- 1 root root 2.5K Mar 21  2020 text.enc
-rw-r--r-- 1 root root  11K Mar 21  2020 TI
-rw-r--r-- 1 root root 8.8K Mar 21  2020 TR
-rw-r--r-- 1 root root 4.3K Mar 21  2020 zapfdr.pfa
-rw-r--r-- 1 root root  15K Mar 21  2020 ZCMI
-rw-r--r-- 1 root root 5.4K Mar 21  2020 ZD
-rw-r--r-- 1 root root 5.4K Mar 21  2020 ZDR

In this output, AR, AB, AI and ABI stand for Arial Regular, Arial Bold, Arial Italic, and Arial BoldItalic. We can use them like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
.FAMILY A
.FONT R
Regular Arial Text
.FONT B
Bolded Regular Arial Text
.FONT I
Italic Regular Arial Text
.FONT BI
BoldItalic Regular Arial Text
.FONT R
Go back to Regular

For more information, check out this page on momdoc

The font format that groff uses is also obsolete, it’s hard to find font in such format. So, we need (font-forge)[https://fontforge.org/] to help us converting TrueType/OpenType fonts to something groff can understand.

I decided to try adding (Sarasa-Mono-SC)[https://github.com/be5invis/Sarasa-Gothic] into groff font library. So I downloaded and extracted following files:

1
2
3
4
-rw-rw-r-- 1 klesh klesh  23M Nov 21 19:26 sarasa-mono-sc-bolditalic.ttf
-rw-rw-r-- 1 klesh klesh  22M Nov 21 19:26 sarasa-mono-sc-bold.ttf
-rw-rw-r-- 1 klesh klesh  23M Nov 21 19:26 sarasa-mono-sc-italic.ttf
-rw-rw-r-- 1 klesh klesh  23M Nov 26 16:27 sarasa-mono-sc-regular.ttf

Conversion

In general, we have to convert one TTF to two files:

  1. groff font file, for sarasa-mono-sc-regular.ttf, it should be something like SarasaMonoSCR. Then we can refer this font as SarasaMonoSC in our groff source file.
  2. PS Type 1 or Type 42 (end width .pfa and .t42 accordingly). I would suggest Type 42 format which has a smaller file size, leads to a faster compilation.

Step by step:

  • Open our CJK font file sarasa-mono-sc-regular.ttf with font forge
  • Go to File -> Generate Fonts, Select PS Type 1 (Ascii), make sure Output AFM is checked in Options Dialog
  • Click on Generate button, we will get 2 new files, Sarasa-Mono-SC-Regular.pfa and Sarasa-Mono-SC-Regular.afm
  • Change output type to Type 42 and generate a file named Sarasa-Mono-SC-Regular.t42
  • Generate groff font file from .afm file
    1
    
    afmtodit Sarasa-Mono-SC-Regular "/usr/share/groff/current/font/devps/generate/textmap" SarasaMonoSCR
    

Installation

In general, we have to copy genrated files to groff search folder and register them as downloadable font file.

Step by step:

  • Copy SarasaMonoSCR and Sarasa-Mono-SC-Regular.t42 to /usr/share/groff/site-font/devps
  • Add following line to /usr/share/groff/current/font/devps/download, replace <TAB> with Tab… you know what I mean..
    1
    2
    
    ...
    Sarasa-Mono-SC-Regular<TAB>Sarasa-Mono-SC-Regular.pfa
    

    With that we can now make groff embed our font when outputing a postscript file

Now, in order to make it work when outputing a PDF file with -Tpdf option, we also need to register the font for devpdf, but we are not going to do that. Because with this approach, we are going to receive a huge PDF with all used fonts embedded. A better way is to use ps2pdf to create a optimized PDF file:

1
2
groff -mom -Kutf8 input.mom > tmp.ps
ps2pdf tmp.ps output.pdf

Other styles

If your font came with all styles like Sarasa did, then you are in luck, convert and install them and you are good to go. Some font may provides Regular version only, then you need to generate other styles yourselves. fontforge has this Change Weight and Italic/Oblique commands under Element->Style menu can help to do that. Use Edit->Select to highlight your characters and then apply those style commands you need.

Live preview

Live preview is helpful when we composing complex layout. First, we need a auto-compilation mechanism, So I add some script to my .vimrc to enter a auto-compile mode:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
fu! SilentOK(cmd)
    let l:ouput = system(substitute(a:cmd, "%", expand("%"), "g"))
    if v:shell_error != 0
        echo ouput
    endif
endfu

fu! ToggleGroffMomAutoCompilation()
    let g:GroffMomAutoCompilation = !get(g:, "GroffMomAutoCompilation", 0)

    augroup GroffPdf
        autocmd!
    augroup END

    if g:GroffMomAutoCompilation
        augroup GroffPdf
            autocmd BufWritePost,FileWritePost *.mom :call SilentOK("groff -Kutf8 -mom % > /tmp/tmp.ps && ps2pdf /tmp/tmp.ps %.pdf")
        augroup END
        echo "Auto compilation for groff_mom enabled"
    else
        echo "Auto compilation for groff_mom disabled"
    endif
endfu

nnoremap <leader>ac :call ToggleGroffMomAutoCompilation()<CR>

" syntax highlighting: https://github.com/vim-scripts/mom.vim
autocmd BufEnter,BufRead *.mom :set ft=mom

Then, we can launch any PDF viewer we like to do the previewing part. I would suggestion to use something like zathura which can auto reload when file changed. Or you can take a look at entr to help you automate the reloading if your viewer of choice doesn’t support autoreloading.

Another problem I ran into was auto-compilation took too long to complete, the reason was Sarasa fonts are huge, each style could took up to 40M in size. When multiple styles were used, the outputed .ps file could grew over 100M easily. So I finally opt for custom font consists of DroidSansFallback + URW Gothic Bookman + FontAWesome.

If you just want to get a taste of it, WenQuenYiMicroHei would be a better choice. Most distros have this font on their official repository, but its Korean part might be broken, manifests as characters stack together. Fortunately, Ubuntu fixed this problem, for other distros, we could download the deb package and extract the font file out of it.

Usage

Finally, I can now compose PDF file happily.

input.mom

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
.PRINTSTYLE TYPESET
.FAMILY SarasaMonoSC
.TAB_SET 1   0 5P
.TAB_SET 2  6P 38P
.TAB_SET 3 22P 22P
.START
.MCO
.TAB 1
.FONT B
简介
.MCR
.TAB 2
.FONT R
这是一段中文
.MCX
.MCO
.TAB 1
.FONT B
正文
.MCR
.TAB 2
.FONT R
内容从这里开始
.MCX

output.pdf

preview

Take away

In order to avoid redundant labor, I created couple of scripts to automate the process: