A broken 404 template in Django can swallow your backtraces
I wrote this post as an exercise during a meeting of IndieWebClub Bangalore.
I recently migrated this website from Astro to Wagtail. The reason why I did it is a story for another day. In this post, I want to talk about a bug that took me far too long to figure out.
In his (verifiably incorrect) post about making chai, Abhigyan linked to my own (verifiably correct) post on the topic. While linking to my post, he accidentally omitted the trailing slash from the URL.
This shouldn't have been a problem. By default, Django automatically redirects a URL without a trailing slash to the same URL with the trailing slash appended, provided the original URL returns a 404. For example, if you try to access the following URL on my website:
https://ankursethi.com/about
Django automatically performs a 301 redirect to:
https://ankursethi.com/about/
This is the default behavior, controlled by the APPEND_SLASH setting. However, when Abhigyan linked to my (verifiably correct) post about making chai, my server returned a 500 error instead. I'd never have discovered this error myself, but Shubh pointed it out to me on the IndieWebClub chat last week. Thanks Shubh!
I started investigating the issue by checking the Gunicorn logs on my VPS. I was hoping they would contain a backtrace that would help me pinpoint the exact problem, but the logs only printed the stringInternal Server Error whenever the broken URL was accessed.
I ran my app with production settings inside a Docker container to see if I could trigger the same behavior. And sure enough, the Dockerized app produced the same error with the same mysterious Internal Server Error in the Gunicorn logs.
My first instinct was that I had somehow messed up my logging configuration. I'd surely introduced a bug in some Python code somewhere, and my logging configuration was failing to log the backtrace because of a misconfiguration. But tweaking Django's LOGGING setting didn't change anything. I could see backtraces from the exceptions I inserted at random points in my code, but accessing a URL without a trailing slash would still only produce the string Internal Server Error in the logs.
After a lot of head scratching, reading the docs, and yelling at Claude, I wondered if something in my 404.html template could be responsible for the error. My 404 template was fairly complex, loading and calling several template tags, inheriting from a chain of templates, rendering a few partialdefs, and concatenating assets using django-compressor.
I started by deleting everything from 404.html and reducing it to a single <h1> tag. Sure enough, this fixed the issue! Then I slowly added some of the code back until I found the one custom template tag that was throwing an exception, but only when called in the context of a 404 page. Fixing the tag and redeploying fixed the issue for good.
But what about the logs? An error in my 404 template not only caused my server to return a 500, but also suppressed any backtraces that might have helped me diagnose the issue. That's weird, right?
I might be wrong, but I believe the sequence of events that can lead to this issue is as follows:
- Somebody accesses a URL without a trailing slash.
- Django tries to find that URL in its
urlconf. Since this is a Wagtail installation, it also tries to find a page in the URLs known to Wagtail. - All the URLs in my
urlconfhave trailing slashes. Wagtail also appends trailing slashes to all its URLs whenAPPEND_SLASHis true. So trying to access a page without a trailing slash returns a 404. - You would expect Django's redirect logic to kick in at this point, trying to append a trailing slash to the original URL and performing a
301redirect. But that's not what happens! - The redirect logic lives in
CommonMiddleware, which can only perform the redirect after the entire404handling chain has finished running. This means regardless of what happens, Django will always render your404template when an unknown URL is accessed. Yes, even if redirecting to the same URL with a trailing slash produces a known, correct URL! - This means if your
404template errors out,CommonMiddlewaredoesn't even get a chance to run. Django encounters an unknown URL, tries to render the404template, fails, and turns the404into a500. - When this happens, Django only logs the
500, not the404template failing to render. This happens even if you're logging template rendering errors in your logging configuration. From what I can tell, there is no way to get Django to log an error in404.htmlwithout creating a custom404view, manually catching errors, logging the caught errors, and re-raising them so that Django can turn them into500s.
The lessons I learned from this frustrating scenario were:
- Always render your
404and500pages in unit tests to make sure they can never error out. - Keep your error pages as simple as possible. Ideally, they should only contain HTML and inlined CSS, nothing more.