Eric's PHP Guide

From Deep Thought

Jump to: navigation, search

This page contains my thoughts on what I consider to be good PHP programming habits. Some of these are objective, but many of them are personal preference and one could make an argument against them. I certainly do not insist everyone write PHP the way I do, I would simply like to share my tips and ideas about creating good PHP.

Contents

General Style

Naming Conventions

Variables

My first serious programming language was C, so when it comes to variable names I am used to $user_data as opposed to $userData. I find the former more readable because the underscore serves the same function as the space in English writing. It is certainly possible to read text with no spaces—the Japanese languages uses no spaces whatsoever and it's not a bother to me—but programming is not Japanese, and when using English words for variables I find it easier to see the spacing there.

Functions and Methods

Where variables are the nouns of programming, functions are the verbs. So it follows that I find functions which begin with verbs are the most readable: e.g. replace_record() is better than record_replace(). By beginning function names with a verb you can often end up writing code which reads very much like an English sentence, e.g.:

if ($user->has_access()) replace_record();

Classes

Continuing my language analogy, classes are proper nouns, so I name them beginning with capital letters, e.g. Person and User_Record.

Commenting

Self-documenting code is a myth. No personal style or skill will ever compensate for a lack of comments. So write them. Always.

In my experience, programmers do not write comments because they write the code first and then try to go back and document. But by then they've already written the code, and they know what it does, so what's the point, right? That is why I advocate you write all your comments for a function before you write any code. Document in comments how your code works, then go back and actually fill in the code. Not only will this help your co-workers, it will help you make sure you have a clear understanding of what you're doing before you begin throwing down instructions.

I cannot stress this enough. By all means, use language features for clarity, write descriptive variable and function names, format your code cleanly—but never believe any of those are a valid substitute for commenting. Write comments to the point of redundancy.

Formatting

Braces

I find it easier to read code if braces are put on a line of their own. Example:

// The way I prefer it.
if ($user->get('user_type') === 'customer')
{
	$items = load_user_items($user);
	
	foreach ($items as $item_name => $item_price)
	{
		$total_price += $item_price;
	}
}

// More compact, but less readable to me.
if ($user->get('user_type') === 'customer') {
	$items = load_user_items($user);
	
	foreach ($items as $item_name => $item_price) {
		$total_price += $item_price;
	}
}

Spacing

Spacing should always come between operators. Code like if (($x<0)&&($y>10)) $z=$x+$y is not nearly as readable as if (($x < 0) && ($y > 10)) $z = $x + $y. Similar to the issue of braces, compacting code like this only hurts readability; it has zero impact on performance.

Basic PHP Coding

Errors

Somewhere towards the top of every PHP script should be this line: error_reporting(E_ALL). If you design code in a PHP environment with relaxed error checking, then when that code is used on a more strict PHP setup it may suddenly break all over. Turn on all errors and warnings, and take all of them seriously.

Operators

Operator Precedence

I don't know the precedence of every operator in PHP. Use parenthesis to make things obvious. However, an exception is with and and or. Both are almost dead last in precedence (only the comma is lower), so use them in conditionals without parenthesis; it makes things easier to read compared to using parenthesis and &&, etc. For example:

if (isset($_REQUEST['zip_code']) and is_numeric($_REQUEST['zip_code']) ...

if ((isset($_REQUEST['zip_code'])) && (is_numeric($_REQUEST['zip_code']))) ...

// I often line up multiple and's/or's like this.
if (isset($_REQUEST['zip_code'])
and is_numeric($_REQUEST['zip_code'])
and is_valid_zip($_REQUEST['zip_code']) ...

Also keep in mind that the ternary operator is not right-associative as it is in many other languages. If you nest multiple ternary operators in one statement, you must wrap them in parenthesis.

// As taken from the PHP manual...
echo (true ? 'true' : false ? 't' : 'f');

// You would think the above echos 'true', but it will actually
// echo 't' because of the associativity of the operator. PHP is
// treating the statement as if it were this:
echo ((true ? 'true' : false) ? 't' : 'f');

// Which becomes...
echo ('true' ? 't' : 'f');

You can use the ternary operator in conjunction with return, e.g. return ($user->authorize()) ? $user : false. However, you can never use the ternary operator to return a variable by reference.

Logical Operators

Above I wrote about using and, but it's important to understand the differences between the pairs of logical operators (e.g. || and or). The operators which are words have lower precedence, so they bind later than their symbolic counterparts. This is best explained through some examples. The expressions in comments use parenthesis to show how the order of operations is working for each example.

$x = false || true  // $x is true: $x = (false || true)
$x = false or true  // $x is false: ($x = false) or true
$x = true && false  // $x is false: $x = (true && false)
$x = true and false // $x is true: ($x = true) and false

All of the logical operators are short-circuit. That means && and and will stop as soon as they see a false value, while || and or will stop when the see a true value.

Finally, you can only use expressions with the logical operators. You cannot write something like return $x or return $y because return is a statement, not an expression. Since the logical operators always evaluate to true or false, you also cannot write this: return ($x or $y). If the two variables are booleans then it will work as expected, but you cannot get away with this:

$x = 100;
$y = 200;

function f()
{
	return ($x or $y);
}

$z = f(); // You might think $z = $x now, but...

// $z now equals the boolean true. In f(), $x evaluated to true
// so the expression ($x or $y) evaluated to true, not to the
// value of $x.

Experts refer to the above as lame.

Equality Operators

Use the strict comparisons wherever possible. Meaning, use === over ==, and use !== instead of !=. The strict operators check for matching types as well as values, and will not perform any type conversion. Furthermore, a useful aspect of === is that when used to test two strings it performs a case-sensitive comparison.

Post and Prefix Operators

Mixing these in compound statements can be the source of obscure bugs.

$x = $y[$z++];   // This is potentially confusing, and for no benefit.

// More obvious.
$x = $y[$z];
$z++;

It's never worth optimizing, but ++$i is ever so slightly faster than $i++. In the case of the postfix, PHP creates a temporary variable.

Quoting

Use single quotes for everything you can. PHP interpolates variables in double quoted strings. This incurs overhead during parsing. This overhead is not trivial! Replacing double quoted strings with single quoted strings in a large script can increase execution speed by as much as fifteen percent. However, do use double quotes when you are interpolating many variables and it would increase readability; concatenating a lot of variables with single quoted strings can become very messy.

Optimizing

Conditions

Setting Default Values

A very common use of the if-else construct is to give a variable the value of another variable, or a default value if that variable is false. Using the or logical operator you can condense it all to one line and make it more readable at the same time.

// How many times have you seen and written this?
if ($some_setting)
{
	$my_var = $some_setting;
}
else
{
	$my_var = $default_setting;
}

// The same thing as the above, but accomplished in one line.
$my_var = $some_setting or $my_var = $default_setting;

// This, however, does not work. See the notes on operator
// precedence for why this will fail if $some_setting is false.
$my_var = $some_setting or $default_setting;

// Some people prefer to use the ternary operator for this.
$my_var = ($some_setting ? $some_setting : $default_setting);

Loops

The majority of for loops look like this:

for ($i = 0; $i < count($some_array); $i++)
{
	// Do something here...
}

This is inefficient. Ninety-nine percent of the time, the length of the array will not be changing during the loop. However, for every iteration PHP has to recalculate the value of count($some_array). If the array is very large, this can be a big burden. By moving that calculation to the initialization of the loop you can substantially speed it up:

for ($i = 0, $count = count($some_array); $i < $count; $i++)
{
	// Do something here...
}

Strings

Length

I don't particularly advocate this because it is not obvious what the code is doing, but there are two ways to check a string's length.

// This is what we all do.
if (strlen($my_string) < 10) ...

// This is faster because isset() is not a function and does
// not incur the overhead of a function call.
if (!isset($my_string{10})) ...

Regular Expressions

Try to avoid them if you can; PHP is not nearly as great at handling regular expressions as other languages. The functions preg_replace() and preg_match() are expensive in terms of the time they take. Here's a list of the replacing and matching functions, listed from fastest to slowest:

  • Replacing
    • strtr()
    • str_replace()
    • preg_replace()
    • ereg_replace() (Never use this.)
  • Matching
    • strpos()
    • strstr()
    • preg_match()
    • preg_match_all()

Output

Never use print.

Using echo can be very slow if you are not careful. For small scripts it will likely not matter, but in something like a content management system you want your output to be fast. Multiple calls to echo is very slow. You can optimize in a number of ways.

The most obvious, and most important, is to cut down on echo.

$x = 'Pass';
$y = 'the';
$z = 'beer';

// Very slow.
echo $x;
echo $y;
echo $z;

// Faster.
echo $x . $y . $z;

// Even faster than concatenation. Most of the time this
// is the preferred way to do things.
echo $x, $y, $z;

If you already have a program with a lot of echo then you may not be able to easily change things around. In that case using the output buffering functions (e.g. ob_start()) will greatly increase performance. It is highly recommended you use it even in conjunction with the tips shown above.

Tricks and Techniques

Ideas From Functional Languages

PHP is not a functional language like Lisp, but you can borrow some idioms from such languages and use them in PHP to great effect.

Mapping

Many for and foreach loops can be reduced to an array map. This is especially effective when you are manipulating the values in an array.

function activate_user($user)
{
	if ($user->completed_registration())
		$user->activate_account();
}

// By the way, you can't use a reference in a foreach loop like
// this in PHP 4.
foreach ($users as &$user)
{
	activate_user($user);
}

// But you can do this. It accomplishes the same thing as the foreach above.
$users = array_map('activate_user', $users);

// If you don't want PHP creating a temporary array then you
// can slightly modify the arguments of the callback function
// and then use array_walk(), like so. array_walk() requires
// a callback function of two arguments: the first is the array
// value (a reference) and the second is the array key or index.
// Yes--that's backwards from the normal key->value order.

function activate_user(&$user, $user_index)
{
	if ($user->completed_registration())
		$user->activate_account();
}

array_walk($users, 'activate_user');

PHP will not let you use the following functions for mapping. Not only that, you cannot use any of these functions as a callback (although nothing is stopping you from writing wrappers):

  • array (For us Lispers, this really sucks.)
  • echo
  • empty
  • eval (Damnit!)
  • exit
  • isset
  • list
  • print
  • unset

Anonymous Functions

By using create_function you can create an anonymous function on the fly. This can be very useful in conjunction with mapping, but you must use care or else your code can quickly become messy.

$credit_card_numbers = array(
	'1234-5678-2323-1000',
	'9494-1093-3872-8624',
	'9371-0003-1982-1031'
);

// We use an anonymous function to remove all but the last four digits.
// Note that the arguments to create_function() are single quoted
// strings so that the variable names within won't be interpolated.

array_walk($credit_card_numbers,
	create_function('&$value, $key', '$value = substr($value, -4);'));
	
// Now $credit_card_numbers = ('1000', '8624', '1031').

Filtering

A good use for short, anonymous functions is to filter values out of an array, using the appropriately named array_filter().

$active_users = array_filter($users, create_function('$user', 'return $user->is_active();'));

// The above is the same as this.
foreach ($users as $user)
{
	if ($user->is_active())
		$active_users[] = $user;
}

Reduction

PHP has function called array_reduce() which performs a binary reduction on an array (a.k.a. folding). It takes a function of two arguments, and calls that function with the first two elements of the array; then it calls them with the return value of that first call, and the third element; then that return value and the fourth element, and so on.

(Note: If you have experience with programming languages which have bi-directional folding, PHP's array_reduce() is a left fold.)

// array_sum() adds together all the numbers in an array and
// returns it. If PHP did not have the function, we could
// implement it using array_reduce(), using an anonymous function
// for adding together the elements.

function array_sum($array)
{
	return array_reduce($array,
		create_function('$x, $y', 'return $x + $y;')
	);
}

If you sort an array, the element which gets sorted into the first position is called the extremum. Most functional languages have standard functions for selecting extrema, but in PHP it takes two steps: sort and then select. However, you can using reduction to implement such a function. The idea is to pass a comparison function as the callback to array_reduce(); as long as that function returns the extremum of it's two arguments, it will eventually iterate over the entire array and return the extremum.

// This function accomplishes the same thing as sorting the array from
// largest to smallest and then returning the first element. However, it does
// not actually perform a sort or modify the array.
function array_extremum($array)
{
	return array_reduce($array,
		create_function('$x, $y',
			'if ($x < $y) return $y; ' .
			'else return $x;'
		)
	);
}

$a = array(10, 3, 200, 5, 2);
$b = array(989, 737, 11111);

echo array_extremum($a);	// 200
echo array_extremum($b);	// 11111

That example is obviously for numeric extrema. Depending on your criteria, the extrema could be anything, I only meant to demonstrate the idea.

Self Modifying Code

If there was ever a technique to use with great care and restraint, this is it. Since anonymous functions are built from stings, we can manipulate those strings, creating code that literally writes other code.

// Let's say we have an array storing user information, where each
// value is itself an array of keys and values that represent an
// individual user.

$users = array(
	array('name' => 'Slick Rick', 'activated' => true, 'access_level' => 2),
	array('name' => 'Kirk McCool', 'activated' => false, 'access_level' => 1),
	array('name' => 'That Guy', 'activated' => true, 'access_level' => 5)
);

// A utility function which does the same thing strcmp() does, except for numbers.
// It's kind of odd that PHP doesn't have this function.

function cmp($x, $y)
{
	return $x == $y ? 0 : ($x < $y ? -1 : 1);
}

// We can use usort() to sort the array with a function of our own, but if we
// want to sort by 'name' we have to write a different function than if we
// wanted to sort by 'access_level'. Since that sucks, what we can do is write
// one function which takes the name of a key, and returns an anonymous function
// we will use for comparison.

function generate_sorter($key)
{
	global $users;
	
	// We need to know if we should use cmp() or strcmp(),
	// so we do a bit of trickery here.
	$key_type = gettype($users[0][$key]);
	$cmp_func = ($key_type === 'string') ? 'strcmp' : 'cmp';
	
	// Note that we use double quotes for the code, because some variables
	// need to be interpolated. But not all of them! So we escape the '$'
	// for variables which should be left unparsed.
	
	return create_function('$array_x, $array_y',
		"return $cmp_func(\$array_x['$key'], \$array_y['$key']);"
	);
}

// Now we can sort the users by different keys without resorting to
// multiple functions (actually we are using multiple functions,
// we're just creating them on the fly).

usort($users, generate_sorter('name'));
usort($users, generate_sorter('access_level'));


Admittedly the example above is inefficient, because it recreates the comparison function every time. If they were to be used many times you would want to create the functions just once and then cache them—but you get the idea.

Currying

I'll be honest—functional programming languages (and their advocates, myself included) have a bad habit of designing around concepts which are mathematically very pretty, but realistically of very little use. Currying falls into this category. Nonetheless, you can do it in PHP if you want.

function generate_adder($x)
{
	return create_function('$y', "return $x + \$y;");
}

$add_five = generate_adder(5);

echo call_user_func($add_five, 10);		// Prints 15
echo call_user_func(generate_adder(10), 10);	// Prints 20

// It's worth noting that if you assign an anonymous function
// to a variable, you can call that variable as if it were a function.
// For example, you can do this:

echo $add_five(10);

Very strictly speaking, the above is partial application and not really currying. I simply labeled it as currying because the misnomer has become pretty fixed these days—most programming texts talk about currying when they really mean partials.

(Note: The generate_sorter() function in the Self Modifying Code section is not an example of currying or partial application, even though it looks like it.)

Good Database Programming

  1. Download the MDB2 package.
  2. Install.
  3. ???????
  4. Profit.

Seriously though—the problem that so many insecure PHP programs exist can be laid directly at the feet of tutorial authors. Every time some new PHP programmer reads one of the eight million tutorials showing him how to use mysql_query(), he's immediately learning a terrible habit. Using such functions present a number of critical problems.

  • They instantly tie the program to a specific type of database.
  • They are tedious to use.
  • They provide no safety.

The last reason is the most important of all. Making direct queries via functions like pg_exec() and mysql_query is on par with pointing a loaded gun at your chest. To illustrate:

// Assume $id refers to the ID of some user whose info we are updating.

$state = $_POST['state'];
mysql_query("UPDATE user SET user_state = '$state' WHERE user_id = '$id'");

// Looks fine, but what if for the state the user entered this:
//   "SC',user_access='admin"
//
// Now the query becomes:
//   "UPDATE user SET user_state = 'SC',user_access='admin' WHERE user_id = '10'"

// To stop that we can use the function mysql_escape_string().
$state = mysql_escape_string($state);
mysql_query("UPDATE user SET user_state = '$state' WHERE user_id = '$id'");

// Well that's still busted. If magic quotes are turned on then the quotes
// will be escaped and it will screw up the entire query. So we have to
// use stripslashes() to prevent this.
if (get_magic_quotes_gpc())
	$state = stripslashes($state);

$state = mysql_escape_string($state);
mysql_query("UPDATE user SET user_state = '$state' WHERE user_id = '$id'");

// Guess what--this is still problematic. Depending on the locale of the
// MySQL server, the above is still open to SQL injection attacks.
// So we have to do this now...
if (get_magic_quotes_gpc())
	$state = stripslashes($state);

$state = mysql_real_escape_string($state);
mysql_query("UPDATE user SET user_state = '$state' WHERE user_id = '$id'");

// What kind of function is called mysql_real_escape_string()?
// "Hey--that first time I was just kidding. Here's the *real* string..."

So after jumping through all those hoops, we finally have something decently secure. But it only works for MySQL. Here's how to use the MDB2 package to avoid all this.

// First we prepare the statement. The question marks are placeholders.
$sth = $dbh->prepare('UPDATE user SET user_state = ? WHERE user_id = ?');

// Now we execute it. The execute() method will fill in the placeholders
// with the arguments we give it, taken care to properly escape the arguments
// depending on the rules of the database we are using.
$dbh->execute($sth, $state, $id);

Not only is that much easier, more compact, and far safer, but the code above works in multiple types of databases. So the trick is to just use an abstraction layer for your database needs. Right now MDB2 is one of the best out there.

Personal tools
Cyber Sprocket Labs
Cyber Sprocket Tech