[lf icon]

LinuxFocus.org Mirror Guideline

How to mirror the data?

Use rsync (see http://rsync.samba.org/) to mirror the Linuxfocus server. rsync minimizes network traffic and mirror time. It is the fastest and easiest way to ensure that your site is always up to date. Don't use any web crawler or ftp! Those methods are slow and generate a lot of load on the main server.

Take a look at the following script. You should use such a script to mirror LinuxFocus. Note that the domain to mirror from is rsync.linuxfocus.org and not www.linuxfocus.org.

#!/bin/sh
# Please contact guido@linuxfocus.org if you have any
# any questions.
#-----------------------------------------
## put something like this into the crontab of a user who has write
## permissions on your web-server:
## run the synclf script every day at 2:33 in the night:
#33 2 * * * /home/xxx/synclf
#
#-----------------------------------------
# ensure that you webserver can read new files:
umask 022
#-----------------------------------------
# the directory of the LinuxFocus mirror page (please edit this line):
target=/http/linuxfocus
#
# You can uncomment the following line for debug purposes:
#DEBUG="yes"
#
if [ "$DEBUG" = "yes" ]; then
    echo "debug output will be written to the file /tmp/synclf.$$ ..."
    echo "start rsync with rsync.linuxfocus.org" > /tmp/synclf.$$
    date  >> /tmp/synclf.$$
    rsync -rLtz -vv --delete rsync.linuxfocus.org::lf/ $target >> /tmp/synclf.$$ 2>&1
    exit 0
fi
# Normally (debug off) the following will be executed:
#
rsync -rLtz --delete rsync.linuxfocus.org::lf/ $target
# 
#-------------- End of rsync script ---------------
# You can get rsnyc at ftp://rsync.samba.org/pub/rsync/
# or http://rsync.samba.org/
#

The above script is an example for downloading the dynamic html pages. As an alternative you can get static pages from rsync.linuxfocus.org::statichtml/ instead of rsync.linuxfocus.org::lf/
You do not need to configure SSI and perl if you just use the static pages.
The static pages should also be used if you want to produce a cdrom from Linuxfocus. The static pages are ssi expanded and all links to directories are modified to point to the right index file (index.html or index.shtml).

When to mirror

You should mirror LinuxFocus once a day in low traffic hours from 23:00 to 5:00 in the night (UTC / GMT).

Create a text file, called crontab.txt, with the following data (please vary the time a bit):

# run the synclf script every day at 2:45 in the night:
45 2 * * * /home/where/ever/you/put/it/synclf

and then activate it with the command
crontab crontab.txt
Run crontab -l to see that it is active.

Setting up the webserver: SSI

There are basically two ways to setup your mirror:
  1. sync against the dynamic pages. In that case you need to follow the description for server-parsed pages below.
  2. sync the static html pages that are generated periodically on the linuxfocus server. In that case you do not need any server-parsing on your side but your server needs to recognize .shtml as normal html file (You just need a mime type definition for .shtml).
The linuxfocus.org web site was once setup such that a standard Apache web-server as it came with most Linux distribution would work without any configuration changes. This was true for Apache 1.3.x. Over time the distributions and Apache itself changed a lot. The result is that you need to edit these days the configuration to get it to work.

Check that you have Server Side Includes (SSI) enabled for .shtml files if you sync the dynamic html files:
# To use server-parsed HTML files
AddType text/html .shtml
AddHandler server-parsed .shtml
Both the #exec command and #include must be enabled (see http://www.apache.org/docs and search for "Server Side Includes"). You need the directory option +Includes and you must include mod_include.

The #exec command is need as linuxfocus web pages execute a perl script called lfdynahead.pl This script sets the links between the different languages. You can take a look at it if you want. It is in the document root directory of linuxfocus.org.

You can see that SSI is working if you have in the articles the line at the top that says "This article is available in:....." as shown on the following picture:
[SSI links]

Perl for SSI

LinuxFocus webpages use the #exec SSI command to execute perl from /usr/bin/perl. Any perl 5 (or higher) version will work.

/usr/bin/perl is the standard path to perl under Linux. Any common linux distribution will have perl in that location.

Setting up the webserver: Apache 2.0.x DirectoryIndex

It seems that most default configurations for Apache 2.0.x do not seach for index.shtml as DirectoryIndex. Add index.shtml if needed:
DirectoryIndex index.html index.shtml

Setting up the webserver: charset

Netscape, Internet Explorer and many other Browsers do not handle the html tag "META HTTP-EQUIV" correctly.

<META http-equiv="Content-Type" content="text/html; charset=....">
It is almost impossible to work around this problem. It does not affect the normal Latin 1 encoding (iso-8859-1) but Chinese, Russian and other languages need to have the correct encoding.

For this purpose we use .htaccess files in the respective language directory. You must setup your webserver to process these files. Add for Apache the following to the file httpd.conf:
AccessFileName .htaccess

AllowOverride FileInfo
# or: AllowOverride All 
# but FileInfo is the minimum
Our .htaccess files (you don't need to worry about them, this is just for your information) contain something like:
AddType  "text/html; charset=gb2312"  .html
AddType  "text/html; charset=gb2312"  .shtml

More Questions?

Just contact us (see http://linuxfocus.org/common/lfteam.html). Let us know when your server is up and running and we can add you to the list of mirror sites.