Introduction
In EPiServer CMS it is possible to track broken links for a website using the link validator scheduled job. In the following you will find out how this works and how to the scheduled job can be configured.
The Link Validator scheduled job will go through all the
links in tblContentSoftLink, do a head request against each one,
and save the links status back to tblContentSoftLink.
The result of the validation job is available as a report called Link
Status report in Report Center.
The scheduled job will first get a batch of links from tblContentSoftLink,
a maximum of 1000. Only links that are unchecked or checked earlier than the
time when the job started will be returned. The job uses the date the link was
last checked and the re-check interval to determine if it's time for the link to
be checked again.
Each of the links in the batch will be checked using a head request, if the
servers robots.txt allows for this. No host will be checked more than once every
five seconds. If a link exists on a host that has been checked in the last five
seconds the job will wait until five seconds has passed and then check the link.
The status of the link, including HTTP status code if possible, will be saved
back to tblContentSoftLink. The date the link was checked will
also be saved. For broken links, information about when they were first found
broken, will be saved. When the first batch of links has been checked, a new
batch will be fetched from the database.
The job will continue until it is not possible to get any more unchecked links
form the database, or the jobs runtime has exceeded the value set in
maximumRunTime. The job will also stop if a large number of consecutive
errors are found on external links, in case of some general network problem with
the server running the site.
Configuring the Link Validator
None of the settings are required but are avalible for customization of the
behavior of the link validation job. The <linkValidator>
node should be added as a child to the <episerver> node of the
web.config file.
Example:
XML
<linkValidator
externalLinkErrorThreshold="10"
maximumRunTime="4:00:00"
recheckInterval="30.00:00:00"
userAgent="EPiServer LinkValidator"
proxyAddress="http://myproxy.mysite.com"
proxyUser="myUserName"
proxyPassword="secretPassword"
proxyDomain=".mysite.com"
internalLinkValidation="Api">
<excludePatterns>
<add regex=".*doc"/>
<add regex=".*pdf"/>
</excludePatterns>
</linkValidator>
To configure the behavior of the link
validation job, you have the following options:
- externalLinkErrorThreshold.
If there are more than the configured value of consecutive errors on external
link the job will abort.
- maximumRunTime.
The maximum time the scheduled job will execute.
- recheckInterval.
A link that has been validated as working will not be rechecked until the
configured time span has elapsed.
- userAgent.
The user agent string to use when validating a link.
- proxyAddress.
Web proxy address for the link checker to use when validating links.
- proxyUser.
Web proxy user for authenticating proxy connection.
- proxyPassword.
Web proxy password to authenticate the proxy connection.
- proxyDomain.
Web proxy domain to authenticate the proxy connection.
- internalLinkValidation.
How the link validator should handle internal links. Possible values:
- Off. Internal links will be ignored.
- Api. The internal API will be used to validate that the
referenced page exists. [default].
- Request. Internal links will be the same way as external,
using a head request.
- excludePatterns.
A list of patterns for links that the link validation job will skip. Use the
regex attribute to identify what links to skip.
Known Limitations
The link validator does not handle private resources with the exception of
pages. This includes documents and images stored on a local VPP that does not
allow anonymous access. If forms authentication is used, these links will never
be validated and are never shown in the link report. If basic or Windows
authentication is used, links to these resources will result in 401 (access
denied) in the link report. This may be the case for an intranet site with
Windows authentication and anonymous access disabled.
Do you find this information helpful? Please log in to provide feedback.