Archive for category Applications

Pre-Launch Checklist

This article was taken from the PHP Architect May2005

Control Search Engine Indexing
Do you have a robots.txt in your webroot?
All web sites should have a valid robots.txt file even
if every page on the site is meant to be indexed:
User-agent: *
Disallow:
Neglecting to have the file will bloat your web server
error logs with 404 ’robots.txt’ not found messages from
all the search crawlers trying to read the non-existent
file.

Favicon
Does your site have a favicon file?
The icon itself is of questionable value considering
that it is rarely even noticed, but if you omit the
favicon then each visitor to your site will add an additional
404 ’favicon.ico’ not found error message to
your log files. This is similar to the previous robots.txt
issue but considerably more severe; only bots and
search crawlers request the robots.txt, but the browser
of every visitor coming to your site requests the favicon.

If you are logging to the same hard disk that
serves the web site (not an ideal practice), then this
will also cause a minor performance penalty as the disk
needs to write to the log file for each new visitor in
between serving the files on your web site.

Minify JS and CSS files
Are your external files as small as they can get?
Minification is the act of stripping out whitespace
and comments as well as implementing other space
saving techniques such as consolidating CSS statements
and refactoring JavaScript variables to minimize size
(e.g. variables showAdvancedSearch and isModernBrowser
might become v1 and v2). This typically reduces the
file size by about 15%, although this amount varies
depending on the minification scheme as well as your
commenting and white-space habits. It is critical to
always verify that your minified JavaScript still works
in IE; the JavaScript engine in IE requires a space after
certain statement types that FireFox is able to execute
flawlessly with one character following the next.

Consolidate JavaScript and CSS
Are you avoiding inline JavaScript and CSS? Are you
serving just one JS and one CSS file?
Moving your inline JavaScript and CSS into external
files allows the browser to cache the content.
Consolidating them all into one JS and one CSS file
also reduces the number of requests needed to load
the page. This is the best-practice way of doing things,
but in the real world it is not always the optimal way,
especially if many of your pages have JS or CSS that is
solely used on one page of your web site.
Placing JS and CSS inline can often yield a faster
page load if adding a few bytes of un-cached inline
code will reduce an HTTP request or considerably shrink
the size of the consolidated global file.

Search Submission
Do you have a valid XML sitemap? Have you submitted
your site for search engine indexing?
Register your domain in Google Webmaster Tools, create
the validation file as instructed by Google and then
create an XML sitemap as described in the Sitemaps
section of Webmaster Tools. Listing 3 shows an example
of how to easily generate a sitemap in PHP. The XML
that gets output can be seen in listing 4. After launch,
you will need to log into Webmaster Tools and tell
Google the URL of the sitemap for crawling. This will get Google to queue indexing of your new site so that
it will show up in the Google search results.
Next, add your sitemap to your robots.txt:
User-agent: *
Disallow:
Sitemap: /sitemap
Then submit the site to Yahoo! Directory, Yahoo!
Search, MSN and DMOZ. To conquer the hundreds of
other search engines, I find it easiest to use a URLsubmission
service which submits the site to many
engines with a few quick clicks. Trim Down Headers
Are your server responses only consisting of valuable
data?
Turn off expose_php in your php.ini file, and remove
any unnecessary page headers to reduce the size of
each request. Also, check the site with YSlow and follow
its suggestions.

Compress Throughput
Have you enabled gzip compression on your web
server?
Gzip greatly improves your network throughput at the
cost of a negligible amount of CPU overhead. Listing
2 contains an example of enabling gzip within the
Apache configuration file.

Domain Distribution
Are the external files accessed by your site (images, JS,
CSS, etc.) distributed over four domains?
This may seem strange, but four is a magic-number!
Many browsers, by default, will not make more than
two simultaneous connections to a single domain, but
it will make many connections to different domains
simultaneously. If all your content is served from a
single domain, then the browser backs up into a queue
as it requests two files at a time. Spreading your files
over many domains (and you can use sub-domains
like images.yourdomain.com) resolves the issue but at
the cost of a slight delay as each domain needs to be
resolved before any requests can be sent to it. In most
situations, four sources of data is the ideal balance
between simultaneous request limitations and domain
resolution latency.

RAM
Are you serving every static element of your site from
your server’s RAM?
The most-effective way to maximize your server’s
traffic-handling capabilities is to avoid disk IO as much
as possible. Serve all your static files from a RAM disk.
Since RAM data will be erased if the server reboots or
the OS crashes, you will need to create a shell/batch
script that automatically restores the files from the
hard disk upon reboot.
The following Windows DOS commands will restore
files to the RAM disk R: from a HDD-based location on
the C: drive.
mkdir R:\your_site
xcopy C:\your_site R:\your_site /E/H
Now, simply put the above code into a text-file with a
.bat extension, and then create a shortcut to the file
in your Windows startup folder. The folder location varies
with Windows versions, but the path is not really
needed, you can just double-click the Startup folder in
the Start Menu to open the folder.
MySQL users can also create tables using the memory
engine which stores the data in RAM. This is ideal for
data that rarely changes such as a lookup table that
indexes country codes. Just like with the RAM disk, the
data in a memory table will be lost upon reboot so you
need to create a SQL script to repopulate your memory
tables from disk-based tables.
insert into countries_memory select * from
countries_innodb;
You can then set the file to execute automatically
on reboot by adding the init-file directive to your
my.ini file (or my.cnf on *nix). While you are editing
the MySQL configuration, you should also be sure that
query cache is enabled and that the cache memory limits
are set appropriately for your hardware (the default
settings were created many years ago, and do not take
advantage of modern computing power).
For all dynamic data that is suitable for caching
(both file-based data and database queries), use memcached or another caching package. With memcached,
you gain the additional benefit of being able to store
your PHP session data in memcached as opposed to in
files (the default) or in the database (an option best
avoided).

Preload Configuration Files
Are you preloading configuration files?
Apache users should preload all .htaccess files and
then disable on-the-fly .htaccess parsing. This saves a
whole lot of file IO on your web server but also requires
you to restart your web server whenever you want to
implement a change made to an .htaccess file. In the
section of your httpd.conf file that defines your site,
remove any existing AllowOverride statements and add
the following:
AllowOverride None
Include “/YOURSITE/.htaccess”
Restart the web server for the change to take effect. A
complete example can be seen in Listing 2.

Error Handling
Are PHP error messages being logged but not displayed
on the page? Is error reporting set at E_ALL or higher
level?
Be sure that in the php.ini file of your production
server, log_errors is on, display_errors is off and
error_reporting is set to E_ALL or higher (ideally
E_ALL | E_STRICT). This is especially important if your
development and production environments are running
different versions of PHP or if there are any serverrelated
software configurations that are not identical
between environments.
Security
Can passwords or back-end source code be accessed by
simply navigating to a URL?
Some common examples to check are:
• http://YOURSITE.com/.svn
• http://YOURSITE.com/.htpasswd
• http://YOURSITE.com/config.inc

SSL (If Needed)
Did you install the SSL certificate? Can you access your
site via both http:// and https:// without receiving
any invalid-certificate alerts or similar warnings?
If your site will need to transmit to the browser
securely over the HTTPS protocol, then you will need
to install an SSL certificate that is registered to your
site’s domain name and issued by a real certificate
authority (not self-signed). If you access your site over
SSL by any means other than the domain name (e.g.
IP address), then the browser will warn you about an
invalid certificate. If your domain does not yet point
to the server, then you will need to confirm that this
warning will not occur post-launch. The way to do
this is to open your operating system’s hosts file in
a text-editor (on Windows this is typically found at
C:\Windows\System32\drivers\etc\hosts), and add the
following to it on a new line:
127.0.0.1 YOURSITE.com
The hosts file is usually loaded and cached upon
browser startup so you will need to close any browser
windows that are already open and then open a new
browser window. Now, navigate directly to your web
site via the domain name (http://YOURSITE.com),
and the page should load normally. Next, verify SSL
(https://YOURSITE.com), and you should not receive any
browser warnings.

Email
If this is a new domain, have you set up the necessary
email accounts?
Verify that every email address appearing on your site
is functional.

No Comments