Troubleshooting Environment Specific AEM Issues

One of the most common issues that arise during development/testing on AEM implementations is why does my page look right in author but not when I view it through the web server/Dispatcher . Or why does it work fine when I hit the publish server but not the web server.

Most developers with AEM experience have their list of immediate suspects – did you publish everything, is Dispatcher blocking it somehow, etc. Where people run into trouble is when the cause isn’t on their list and they aren’t sure how to identify the source of the problem. This is where having clear understanding of request process helps you identify the at least what layer is causing the problem which will then help you quickly solve the problem.

Step 1 – What’s Happening on Browser

The first step is to figure out whether or not all the files your page requires are getting downloaded. Normally in these situations you open something like Firebug or Charles and look for the 404s (or other error code – 403 or 500). You can then normally zero in on why those files are throwing the error by moving to Step 2 – What’s Happening on the Web Server/Dispatcher.

Every once in while you will run into a situation where you don’t have any requests throwing errors so at this point you need stay focused on what’s happening in the browser. If you don’t have any requests with errors then you will want to look at a couple of different things:

  1. JavaScript error console – look to see if you find any security warnings of other errors. You may be experiencing cross domain scripting issues for example that perhaps you don’t see when hitting the publish server. Also you may see parsing errors – this can sometimes indicate that one of your files has bad content in it even though it isn’t throwing an error code.
  2. Compare the list files downloaded in environment that works against the list downloaded in the environment that doesn’t work to see if you have same files (also check not just the count but file names as well). Sometimes it’s useful to use a proxy tool like Charles for this rather than a Firebug if you site leverages plugins like Flash or other tools that might make an HTTP request that the browser isn’t aware of.
  3. Worst case scenario is to start opening individual files and comparing the contents between environments to see if you have errors embedded in files or old versions, etc. You may also need to compare HTTP headers between environments (verifying for example that the mime type and content types are the same in different environments).

Normally as you run through these options you hit on the problem file and you will advance to the Step 2

Step 2 – What’s Happening on the Web Server/Dispatcher

The goal at this step is determine whether or not the web server/Dispatcher is the source of the problem. The questions to ask will vary a little based on whether what the problem you have (404 vs. wrong mime type) but generally it’s one of these questions

  1. Is the request getting to the web server at all or is getting misdirected above the web server by another layer?
  2. Is the request getting to Dispatcher or is the web server configuration somehow misdirecting the request?
  3. Is Dispatcher serving the item from cache or sending the request back to publish?
  4. Do you get different behavior when the item is server from cache vs. when Dispatcher sends the request back to a publish server.
  5. Is the URL that Dispatcher is trying to resolve the correct URL (if you have rewrite logic is it getting properly applied,  is it being rewritten when it should not be, did it get mangled somewhere along the way)?

In order to answer these questions there are generally a few places to look:

  1. Web server’s access.log or error logs.
  2. Dispatcher .log – ideally you should turn up the logging on this to get more information but in production that may be a last resort – however sometimes it may be the only option.
  3. Publish server request.log or access.log – if you can’t turn up the logging at the Dispatcher level these can sometimes give you the information you are looking for, although in production where you have more than one publish server it can be difficult.

At this point if you determine that the web server/Dispatcher is the problem it is normally either an issue with filter rules in Dispatcher .any, stale cache, or a problem in the web server that is either changing, or failing to change the URL, or an issue with web server not forwarding the request to Dispatcher for handling.

If however you determine that request is getting to the web server/Dispatcher and being sent back to the publish server then you are onto Step 3.

Step 3 – What’s Happening on the Publish Server

Generally if you get to this point you have a fairly straightforward problem to solve (and generally the issue was it works in author and not in publish). The questions you are looking at are:

  • If it’s a 404 why – does the node not exist in the publish server or is the URL wrong for some reason or do you have some sort security issue.
  • If it’s a 403 error why – the common one here beyond just misconfigured security permissions is the Sling Servlet Referrer filter setting (which can block things like post requests that don’t come from a white listed domain).
  • If it’s 500 – read the error logs – this is really important in all the questions – if you stumped always read the error logs.

Cutting to the Chase

That’s a pretty extensive list, and once you have gone through it a time or two you start to develop some short cuts to identifying the problem layer which is good. I usually try tto bracket the problem before I start digging:

  • Do I get the same results when I hit the web server vs. hitting 4503 directly?
  • Do I get the same results cached vs. uncached (usually by adding question mark)?
  • Do I get the same results when I request the file by itself vs. when it’s downloaded with the page?

The answer to one of those questions will usually point me to the right layer and reduce the steps. That said one thing to monitor is when you start spinning your wheels. While walking through that list methodically may seem like a waste of time it is usually worth doing once you start to spin your wheels and you aren’t making progress.