Diffs with syntax highlight

--- example_a.py
+++ example_b.py
@@ -1,4 +1,5 @@
-def hello()
- print("Hello, World!")
+def hello(name="world")
+ name = name.upper()
+ print("Hello, {name}!".format(name))
hello()

Someone on the #pocoo channel asked if there is any way to highlight a diff between two files while keeping their original syntax highlighting as well (using Pygments).


The obvious option to just highlight the diff with the language's syntax has many flaws. Stuff would just break and look wrong. Multiline string, comments and C macros are just a few examples.

After some thinking I realized that the best way to do this is to highlight the two files individually and then diff them.

Because I had nothing better to do, I decided to deliver.


Anyway... if we try to feed Pygments-generated HTML to Pygments again, it will escape all the HTML. So instead of doing something dirty like unescaping, or writing custom formatters, I decided to just do the diff highliting by myself. There is definitely not much to the syntax...

Let's take these two files as examples:

example_a.pydef hello()
    print("Hello, World!")

hello()
example_b.pydef hello(name="world")
    name = name.upper()
    print("Hello, {name}!".format(name))

hello()

You can see what their diff looks like at the beginning of the post.

And, without further ado, here is the code itself:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/usr/bin/env python

import difflib

import pygments
import pygments.lexers
import pygments.formatters

def read_file(fn):
    with open(fn) as f:
        return f.read()

file_names = ['example_a.py', 'example_b.py']
file_contents = [read_file(fn) for fn in file_names]

# Override HtmlFormatter's wrap function so it 
# doesn't wrap our result in excess HTML containers
class Formatter(pygments.formatters.HtmlFormatter):
    def wrap(self, source, outfile):
        return source

# Highlight both files with Pygments (guessing lexer by filename and code)
colored = [pygments.highlight(fc, 
    pygments.lexers.guess_lexer_for_filename(fn, fc),
    Formatter()
).splitlines(keepends=True) for fn, fc in zip(file_names, file_contents)]
# Also, difflib wants the inputs as separate lines with newlines at the end

with open('out.html', 'w') as out_file:
    # Write the needed HTML to enable styles
    out_file.write('<head><link rel="stylesheet" type="text/css" href="style.css"></head>\n')
    out_file.write('<pre class="code">')
    # Get the diff line by line
    # The 4 arguments are 2 files and their 2 names
    for line in difflib.unified_diff(*(colored+file_names)):
        # For each line we output a <div> with the original content,
        # but if it starts with a special diff symbol, we apply a class to it
        cls = {'+': 'diff_plus', '-': 'diff_minus', '@': 'diff_special'}.get(line[:1], '')
        if cls:
            cls = ' class="{}"'.format(cls)
        line = '<div{}>{}</div>'.format(cls, line.rstrip('\n'))
        out_file.write(line)
    out_file.write('</pre>')
 
1
2
3
4
5
style.css.code .diff_minus { background-color: rgba(255, 0, 0, 0.3) }
.code .diff_plus { background-color: rgba(0, 255, 0, 0.3) }
.code .diff_special { background-color: rgba(128, 128, 128, 0.3) }

/* add your usual Pygments styles here */

The result HTML:

<head><link rel="stylesheet" type="text/css" href="style.css"></head>
<pre class="code"><div class="diff_minus">--- example_a.py</div><div class="diff_plus">+++ example_b.py</div><div class="diff_special">@@ -1,5 +1,5 @@</div><div class="diff_minus">-<span class="k">def</span> <span class="nf">hello</span><span class="p">()</span></div><div class="diff_minus">-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;&quot;&quot;Hello, </span></div><div class="diff_minus">-<span class="s">World!&quot;&quot;&quot;</span><span class="p">)</span></div><div class="diff_plus">+<span class="k">def</span> <span class="nf">hello</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s">&quot;world&quot;</span><span class="p">)</span></div><div class="diff_plus">+    <span class="n">name</span> <span class="o">=</span> <span class="n">name</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span></div><div class="diff_plus">+    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Hello, {name}!&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">name</span><span class="p">))</span></div><div> </div><div> <span class="n">hello</span><span class="p">()</span></div></pre>

What's nice is that Pygments colors each lines individually, so I didn't have to do anything special for the problem of multiline stuff discussed earlier.


The code snippets are also on Gist.

Created (last updated )
Comments powered by Disqus