sed ‘/^\s*$/d’ Hakyll.Template

14 minute read

Published:

Hakyll is a Haskell library for generating static sites, mostly aimed at small-to-medium sites and personal blogs. It is written in a very configurable way and uses an xmonad-like DSL for configuration.

At ZuriHac this year I created a patch for Hakyll’s issue #414 - “$for template leaves trailing spaces behind”. The change enables users to purge the redundant whitespace, created when using Hakyll’s markup language, from any of the files generated with the templating system!

This post starts by giving a brief overview of Hakyll and it’s templating system and markup language before introducing issue #414 that, from this point on, will be referred to as The Whitespace Problem. It finishes by exploring the features provided by other popular templating engines, such as Python’s jinja2.

Templates and Markup!

Approach #1

An animal rehoming centre wishes to create a website to advertise each animal requiring a home. They have three so, in order to keep things simple, they create a static site consisting of three pages, one per animal. Each page has a header that links to every other page.

Foo’s page - foo.html:

<html>
<head>
  <title>Foo the Ferret</title>
</head>
<body>
  <a href="./foo.html">Foo</a>
  <a href="./bar.html">Bar</a>
  <a href="./baz.html">Baz</a>
  <h1>Foo the Ferret</h1>
  <img src="foo.jpg">
  <p>
    Foo's amazing.
  </p>
</body>
<footer>
  <a href="mailto:[email protected]">Contact us!</a>
</footer>
</html>

This approach breaks down as the number of animals grows - O(n) edits are required upon the nth animal entering the centre (at least the header on every page needs updating) meaning O(n!) edits in total over the website’s lifetime! This approach also makes updating common code, such as the contact details or site layout, difficult and error-prone due to the sheer number of duplicated changes across every page.

The solution? Templates and markup!

Approach #2

Templates enable common code to be broken out into separate files, reducing WETness, and markup provides a simple metaprogramming language to both tie together the individual templates and content and to generate additional code in the templates at compile time, increasing DRYness!

A template for the above example - template.html:

<html>
<head>
  <title>$title$</title>
</head>
<body>
  $for(animals)$
    <a href="$url$">$title$</a>
  $endfor$

  <h1>$title$</h1>

  $if(img)$
    <img src="$img$">
  $endif$

  $body$
</body>
<footer>
  <a href="mailto:[email protected]">Contact us!</a>
</footer>
</html>

The markup consists of expressions and statements, resembling a simple C-world language, that render to strings at compile time. The value of any variables, such as title, are determined by indexing a context which can be thought of as a map. More information about their use and implementation is available on the Hackage Hakyll.Web.Template page.

The complete site.hs Hakyll file using the above:

data Animal = Animal{ title      :: String
                    , identifier :: Identifier
                    , content    :: String
                    , img        :: Maybe String
                    }

animals = [ Animal{ title      = "Foo the Ferret"
                  , identifier = "foo.html"
                  , content    = "<p>\nFoo's amazing!\n</p>"
                  , img        = Just "foo.jpg"
                  }
          , Animal{ title      = "Bar the Bison"
                  , identifier = "bar.html"
                  , content    = "<p>\nBar's awesome!\n</p>"
                  , img        = Nothing
                  }
          ]

main :: IO ()
main = hakyll $ do
  match "template.html" $
    compile templateCompiler

  match "*.jpg" $ do
    route   idRoute
    compile copyFileCompiler

  create (map identifier animals) $ do
    route idRoute
    compile $ do
      ident <- getUnderlying
      let animal = fromJust $ find ((== ident) . identifier) animals
          animalCtx = constField "title" (title animal)
                   <> urlField "url"
                   <> constField "content" (content animal)
                   <> maybe mempty (constField "img") (img animal)
                   <> listField "animals" animalsCtx (mapM makeItem animals)

          animalsCtx = field "title" (pure . title . itemBody)
                    <> field "url"   (pure . toFilePath . identifier . itemBody)

      makeItem "" >>= loadAndApplyTemplate "template.html" animalCtx

Each animal’s information is contained within an instantiation of the Animal data constructor and updating the animals :: [Animal] list updates the entire site! This solution causes the total number of edits to drop to O(n) but it still has an issue when scaling: the length of the site.hs file increases linearly with the number of animals. Hakyll’s metadata feature, combined with templating, could be used solve this problem by splitting the list out into separate files but this is beyond the scope of this post.

The Whitespace Problem

With the above in place, visiting /foo.html shows:

<html>
<head>
  <title>Foo the Ferret</title>
</head>
<body>

    <a href="foo.html">Foo the Ferret</a>

    <a href="bar.html">Bar the Bison</a>


  <h1>Foo the Ferret</h1>


    <img src="foo.jpg">


  <p>
Foo's amazing!
</p>
</body>
<footer>
  <a href="mailto:[email protected]">Contact us!</a>
</footer>
</html>

The above code has strange indenting and seems to contain excess line feeds - these problems combine to form The Whitespace Problem!

Why?

Hakyll parses templates using the following context-free grammar (EBNF) (Note: Some productions seem redundant but it’s how it’s defined in the code so they have been left in order to aid in mapping between the two):

template = { chunk | escaped | conditional | for | partial | expr } ;

chunk = noDollar , { noDollar } ;

noDollar = ? Data.Char.Char - "$" ?

escaped = "$$"

conditional = "$if(" , expr_ , ")$" , template , [ "$else$", template ] , "$endif$" ;

for = "$for(" , expr_ , ")$" , template , [ "$sep$" , template ] , "$endfor$" ;

partial = "$partial(" , expr_ , ")$" ;

expr = "$" , expr_ , "$" ;

expr_ = stringLiteral | call | ident ;

stringLiteral = '"' , { noSlash , slash } , '"' ;

noSlash = ? noQuote - '\' ? ;

slash = '\' , noQuote ;

noQuote = ? Data.Char.Char - '"' ? ;

call = key , "(" , { spaces } , args , { spaces } , ")" ;

ident = key ;

key = metadataKey ;

spaces = ? Data.Char.isSpace ? ;

args = { expr_ , { spaces , "," , spaces , expr_ } } ;

metadataKey = ( letter , { alphaNum | "_" | "-" | "." } ) - reservedKeys ;

letter = ? Data.Char.isAlpha ? ;

alphaNum = ? Data.Char.isAlphaNum ? ;

reservedKeys = "if" | "else" | "endif" | "for" | "sep" | "endfor" | "partial" ;

to create an abstract syntax tree (AST) represented by the following recursive algebraic data type (ADT):

newtype Template = Template [TemplateElement]

-- | Elements of a template.
data TemplateElement
    = Chunk String
    | Expr TemplateExpr
    | Escaped
    | If TemplateExpr Template (Maybe Template)   -- expr, then, else
    | For TemplateExpr Template (Maybe Template)  -- expr, body, separator
    | Partial TemplateExpr                        -- filename

newtype TemplateKey = TemplateKey String

-- | Expression in a template
data TemplateExpr
    = Ident TemplateKey
    | Call TemplateKey [TemplateExpr]
    | StringLiteral String

A Template is a list of TemplateElements. A TemplateElement is either markup code that is signified by “$” bookends, or a chunk of text which could be anything. i.e. plaintext, HTML, Haskell, Python etc.

The latter is captured by the Chunk constructor. The former can be an:

  • Expr: A variable or function call (must be defined in used Context)
    • $ident$
    • $func("arg")$
  • Escaped: A literal ‘$’ rather than signifying markup.
    • Only $$2.99!
  • If: If statement with optional else clause.
    • $if(variable)$ foo $endif$
    • $if(variable)$ foo $else$ bar $endif$
  • For: For statement with optional separator string.
    • $for(things)$ $thing$ $endfor$
    • $for(things)$ $thing$$sep$, $endfor$
  • Partial: “Loads a template located in a separate file and interpolates it under the current context.”
    • $partial("path.html")$

The Whitespace Problem occurs because any whitespace preceding or trailing a markup element is captured in a Chunk and is thus untouched when that element is rendered as a string. Note that it is untouched because Hakyll mapMs applyElem :: TemplateElement -> Compiler String over each TemplateElement, as can be seen here, and thus each application has no knowledge of any surrounding TemplateElements.

An example:

template.tpl file:

<body>
  $if(var)$
    $var$
  $endif$
</body>

Loading as a Template:

> import Hakyll.Web.Template.Internal
> readTemplate <$> readFile "template.tpl"
Template [Chunk "<body>\n  ",If var (Template [Chunk "\n    ",Expr var,Chunk "\n  "]) Nothing,Chunk "\n</body>\n"]

Pretty printed [TemplateElement]’s:

Template [ Chunk "<body>\n  "
         , If var
              (Template [ Chunk "\n    "
                        , Expr var
                        , Chunk "\n  "
                        ])
              Nothing
         , Chunk "\n</body>\n"
         ]

Rendering with:

ctx :: Context String
ctx = constField "var" "foo"

Yields:

<body>

    foo

</body>

in Haskell:

> readFile "_site/template.html"
"<body>\n  \n    foo\n  \n</body>\n"

Solution?

The solutions for The Whitespace Problem in the other templating systems seem to consist of two parts:

  1. Syntactic: Augment the markup language to enable the removal of whitespace.
  2. Semantic: Augment the module(s) to handle any language change(s).

The implementation of each part in both jinja2, “a modern and designer-friendly templating language for Python”, and ERB, an implementation of eRuby which is a templating system that permits Ruby code to be embedded in a file, is discussed below.

State Comparison

Jinja2 - Syntactic

As per the documentation, Jinja’s default delimiters (bookends) are:

  • {% ... %} for Statements
  • {{ ... }} for Expressions to print to the template output

However, unlike Hakyll, Jinja is configured such that “An application developer can change the syntax configuration from {% foo %} > to <% foo %>, or something similar.”

Jinja’s documentation on whitespace control explains it’s syntactic part. This is quoted verbatim to preserve it in this post:

If an application configures Jinja to trim_blocks, the first newline after a template tag is removed automatically (like in PHP). The lstrip_blocks option can also be set to strip tabs and spaces from the beginning of a line to the start of a block. (Nothing will be stripped if there are other characters before the start of the block.)

With both trim_blocks and lstrip_blocks enabled, you can put block tags on their own lines, and the entire block line will be removed when rendered, preserving the whitespace of the contents. For example, without the trim_blocks and lstrip_blocks options, this template:

<div>
    {% if True %}
        yay
    {% endif %}
</div>

gets rendered with blank lines inside the div:

<div>

        yay

</div>

But with both trim_blocks and lstrip_blocks enabled, the template block lines are removed and other whitespace is preserved:

<div>
        yay
</div>

You can manually disable the lstrip_blocks behavior by putting a plus sign (+) at the start of a block:

<div>
        {%+ if something %}yay{% endif %}
</div>

You can also strip whitespace in templates by hand. If you add a minus sign (-) to the start or end of a block (e.g. a For tag), a comment, or a variable expression, the whitespace before or after that block will be removed:

{% for item in seq -%}
    {{ item }}
{%- endfor %}

This will yield all elements without whitespace between them. If seq was a list of numbers from 1 to 9, the output would be 123456789.

Note:

You must not add whitespace between the tag and the minus sign.

valid:

{%- if foo -%}...{% endif %}

invalid:

{% - if foo - %}...{% endif %}

The above, in short, is that Jinja has both a global and local method of trimming whitespace. The global method is enabled via two flags, trim_blocks and lstrip_blocks, and can be locally disabled with a ‘+’, e.g. +%}. The local method is enabled with a ‘-‘, e.g. -%}.

Jinja2 - Semantic

Jinja passes an Environment around the compilation process that “contains important shared variables like configuration, filters, tests, globals and others.”. Both trim_blocks and lstrip_blocks are Environment variables. The state of the given Environment changes the behaviour of the compilation phases and causes whitespace to be trimmed in the Lexer.


>>> import jinja2
>>> from jinja2.environment import Environment
>>> from jinja2.lexer import Lexer
>>> s = '''    {% if True %}\n    foo\n    {% endif %}'''
>>> env = Environment(lstrip_blocks=False, trim_blocks=False)
>>> list(Lexer(env).tokenize(s))
[Token(1, 'data', u'    '), Token(1, 'block_begin', u'{%'), Token(1, 'name', 'if'), Token(1, 'name', 'True'), Token(1, 'block_end', u'%}'), Token(1, 'data', u'\n    foo\n    '), Token(3, 'block_begin', u'{%'), Token(3, 'name', 'endif'), Token(3, 'block_end', u'%}')]
>>> env = Environment(lstrip_blocks=True, trim_blocks=True)
>>> list(Lexer(env).tokenize(s))
[Token(1, 'block_begin', u'    {%'), Token(1, 'name', 'if'), Token(1, 'name', 'True'), Token(1, 'block_end', u'%}\n'), Token(2, 'data', u'    foo\n'), Token(3, 'block_begin', u'    {%'), Token(3, 'name', 'endif'), Token(3, 'block_end', u'%}')]

ERB - Syntactic

ERB’s delimiters are:

  • <% ... %> for Statements (scriptlets)
  • <%= ... %> for Expressions

A template is rendered by passing it as an argument to the ERB object’s constructor and then calling the result method. e.g:

require 'erb'

foo = "bar"
tpl = "<%= foo %>"

render = ERB.new(tpl)
puts render.result()

Outputs:

bar

Whitespace is controlled by the trim_mode argument to the new method. This is documented here. Again, the relevant part has been quoted directly in order to preserve it for the context of this post.

If trim_mode is passed a String containing one or more of the following modifiers, ERB will adjust its code generation as listed:

%  enables Ruby code processing for lines beginning with %
<> omit newline for lines starting with <% and ending in %>
>  omit newline for lines ending in %>
-  omit blank lines ending in -%>

ERB’s syntactic part is similar to Jinja’s syntactic part in several ways:

  • It has global and local whitespace trimming.
  • Trimming is enabled via flags.
  • Local trimming is enabled with a ‘-‘, e.g. -%>

ERB - Semantic

Ruby has an ERB class that “Compiles ERB templates into Ruby code; the compiled code produces the template result when evaluated”.

Example use:

+irb(main):001:0> require 'erb'
=> true
+irb(main):002:0> compiler = ERB::Compiler.new('<>')
=> #<ERB::Compiler:0x000000009c0d88 @percent=false, @trim_mode="<>", @put_cmd="print", @insert_cmd="print", @pre_cmd=[], @post_cmd=[]>
+irb(main):003:0> code, enc = compiler.compile("Template <%= obj %>!\n")
=> ["#coding:UTF-8\nprint \"Template \"; print(( obj ).to_s); print \"!\\n\"\n", #<Encoding:UTF-8>]
+irb(main):004:0> obj = "Foo"
=> "Foo"
+irb(main):005:0> eval code
Template Foo!
=> nil

The ERB class contains a series of nested classes:

class ERB
  class Compiler

    ...

    class Scanner
      ...
    end

    class TrimScanner < Scanner
      ...
    end

    class SimpleScanner < Scanner
      ...
    end

    ...

  end
end

The new method called to instantiate an ERB::Compiler takes a trim_mode as an argument and uses it to select a Scanner. A Scanner is, essentially, a lexer and thus ERB also removes whitespace during the lexical analysis phase.