The Cat Fancier's Handbook: 2005

Thursday, July 21, 2005

The when and the wherefor

Dates in Perl are a pretty big subject but the essence can be reduced to two simple rules: use 'localtime' to retrieve Perl's internal date block and then use 'sprintf' to format the retrieved date. The following subroutine returns the current date and time in the format 'dd-MMM-yy hh:mm' (e.g. 23-DEC-05 15:35)...

sub FormatCurrentDateAndTime
{
my $V_CURRENT = localtime;
my $V_SECOND = substr($V_CURRENT, 17, 2);
my $V_MINUTE = substr($V_CURRENT, 14, 2);
my $V_HOUR = substr($V_CURRENT, 11, 2);
my $V_DAY = substr($V_CURRENT, 8, 2);
my $V_MONTH = uc(substr($V_CURRENT, 4, 3));
my $V_YEAR = substr($V_CURRENT, 20, 4);

# -------------------------------
# And Format the output string...
# -------------------------------
$V_ID_STRING = sprintf("%02d-%s-%02d_%02d:%02d",
$V_DAY,
$V_MONTH,
$V_YEAR,
$V_HOUR,
$V_MINUTE);
TraceScript $Debug,
"MakeDateTimeID",
"Generated id is [" . $V_ID_STRING . "]";

return $V_ID_STRING;
}

The above code is very simple. Calling localtime without an argument gets the current date and time which is then sliced up into the named variables. Sprintf is then called to reformat the value the way we want it. The TraceScript call gives the clue that this particular example is being used to generate a unique ID.
:

Monday, July 11, 2005

Getting at the bits

One place that Perl scores very highly is reading complex file formats. A really useful built-in function for this purpose is split(). This takes a divider and a string, whose length is only limited by Perl's string handling limit. The divider is in the form of a regular expression, so it can be as complicated as you wish. Split() returns an array, with each piece of the original string as an element, broken up according to the regex and the divider(s) removed. Say, for example, you wrote ...

@Result = split(/:/, "apples:oranges:pears"

then @Result would contain the elements

apples
oranges
pears

with the dividing ':' discarded. The following, real life, example is for extracting the required item from a comma-delimited list, such as a line in a CSV file...

sub GetCsvElement
{
my @V_ARRAY;
my $P_ELEMENT = $_[0];
my $P_SOURCE_STRING = $_[1];
my $V_RESULT;

# -----------------------------------
# Drop the elements into our array...
# -----------------------------------
@V_ARRAY = split (/,/, $P_SOURCE_STRING);

# ----------------------------------------------------------------
# Arrays are zero based so we pick one less than the passed value!
# ----------------------------------------------------------------
$V_RESULT = trim $V_ARRAY[--$P_ELEMENT];

# ----------------------------
# Return the requested item...
# ----------------------------
return $V_RESULT;
}

Remember, because you're using a regular expression to define the delimiter, you can process multiple types of line or even lines with more than one delimiter.
:

Saturday, July 2, 2005

Perl sans frontiers

I often want to retrieve results from an external process within a Perl script. In theory, this is easy. All you need to do is something like...

$Result = `ls *.sql`

...which works perfectly well on any Unix (or Linux) system. The problem is that it doesn't work at all on Windows so, if you want your programme to be fully portable, you need a workaround. My solution is to use the one common feature on both Windows and Unix, the redirection character (>).

Say I want to run a SQL query (because this is something I do frequently, I have a subroutine for it)...

# =======================================
# Runs an SQL file, returning the result.
# ---------------------------------------
# $_[0] is the connection string.
# $_[1] is the file to run.
# $_[2] is a file to write the result to.
# ---------------------------------------
# (This last is necessary because the ``
# syntax does not work in Windows.)
# =======================================
sub RunSqlQuery
{
my $V_COMMAND = "sqlplus "
. $C_SQLPLUS_FLAGS
. " "
. $_[0]
." \@"
. $_[1]
. " >"
. $_[2];
my $V_RESULT;

# --------------------------------
# Run the command created above...
# --------------------------------
system $V_COMMAND;

# ---------------------------------
# Get the result back from the file
# and send it up the call chain...
# ---------------------------------
$V_RESULT = ReadResultFile $_[2];
return $V_RESULT;
}

Now the above piece of code will work, unaltered, on both Windows and Unix. Then all we need to get at the result is a matching sub-routine to read the file...

# ===============================================
# Reads a file in which a result has been placed.
# -----------------------------------------------
# $_[0] name of the file to read.
# -----------------------------------------------
# This is one of the workarounds made necessary
# because the syntax "VAR=`command`" does not
# work in Windows the same as it does in Unix.
# ===============================================
sub ReadResultFile
{
my $V_BUFFER;
open(H_SQL_FILE, "<$_[0]") or die "FAILURE: Cannot open " . $_[0];
read H_SQL_FILE, $V_BUFFER, 65535;
close H_SQL_FILE;
return $V_BUFFER;
}

The size limit of 65535 bytes is completely arbitrary and can be altered to meet the specific requirements of the application.
:

Thursday, June 30, 2005

What's your name?

I really like how easy Perl makes things. Take filename handling. I don't know about you but I spend a lot of time breaking filenames apart and reforming them. Perl turns it into a doddle and a quick sub-routine makes it even easier...

# ==================================================
# Gets a specified component from a given file name.
# --------------------------------------------------
# $_[0] Component to extract: 'PATH', 'FILE' or 'EXT'
# $_[1] Full file name to extract from
# --------------------------------------------------
# Returns the base filename
# Examples...
# GetPathComponent "FILE", "\anypath\Test1.scr"
# returns "Test1"
# GetPathComponent "PATH", "\anypath\Test1.scr"
# returns "\anypath\"
# GetPathComponent "EXT", "\anypath\Test1.scr"
# returns ".scr"
# --------------------------------------------------
# $C_LOCAL_OS must be set to use with DOS filenames.
# $C_EXTENSION_PATTERN must be set to a valid
# pattern for the extension before calling this
# routine. A typical value is \\..*? which will
# work for both DOS and Unix.
# ==================================================
sub GetPathComponent
{
my $P_COMPONENT = $_[0];
my $P_FULL_NAME = $_[1];
my $P_OS_NAME = $_[2];
my $V_BASE_NAME;
my $V_PATH_NAME;
my $V_FILE_EXT;

# -----------------------------------------------------
# DOS uses the escape character (\) as a path seperator
# which defeats the fileparse function, so convert any
# escape character to a forward slash first...
# -----------------------------------------------------
if( $C_LOCAL_OS eq "DOS" ) { $TESTVAL =~ s/\\/\//g; }

# -----------------------------------------
# The fileparse sub does the actual work...
# -----------------------------------------
($V_BASE_NAME, $V_PATH_NAME, $V_FILE_EXT)
= fileparse($P_FULL_NAME, $C_EXTENSION_PATTERN);

# -----------------------------------------------------
# Return the requested component (or die if invalid)...
# -----------------------------------------------------
if($P_COMPONENT eq "PATH") { return($V_PATH_NAME); }
elsif($P_COMPONENT eq "FILE") { return($V_BASE_NAME); }
elsif($P_COMPONENT eq "EXT") { return($V_FILE_EXT); }
else {die "Invalid component ["
. $P_COMPONENT
. "] specified in GetPathComponent" }
}

Simple, isn't it? I could have treated the return value from fileparse() as an array but I prefer to use descriptive filenames wherever possible. I'm funny that way.
:

Saturday, May 7, 2005

You say potato and I say tomato

I've been thinking about the names people give to variables and functions. It's scary to see just how many instances of $i and 'fred' there are in supposedly professional code. As it's spring, how about cleaning up your code so that others can maintain it without wrapping a wet towel round their aching heads?

The first rule of naming is that names should add to the reader's understanding rather than obscuring it. For example, The name of a function that returns an element of an array should be something like GetNextCustomer() rather than GetNextArrayElement().

Some other general points about naming...

If you have difficulty in finding an appropriate name for a function, procedure or object then you may need to further analyse or define the purpose of that item.
Names should be long enough to be meaningful but no longer.
Remember that names have no meaning to the compiler/interpreter other than to distinguish one item from another.

Some rules that you may wish to follow include...

Use mixed case to aid readability. Capitalise the first letter of each word in a function or procedure name and the same for variable names but with the very first character in lower case. Thus a procedure might be called SortNewCustomers while a variable migh be called customerName.
Use all capitals with words seperated by underlines for constants as in TOTAL_SHOPS or SALES_BAND_A.
Don't use names that are ambiguous, such as AnalyzeThis() for a routine, or A47 for a variable.
Don't include class names in the names you give to class properties, such as Customer.CustomerID. Instead, use Customer.ID.
If a routine performs an operation on a given object, include the object name in the routine's name, as in ValidateCustomerID().
Never overload a function in such a way that it performs * In languages that permit function overloading, all overloads should perform a similar function. For those languages that do not permit function overloading, establish a naming standard that relates similar functions.
Where a variable holds the result of a computation, add the type of computation to the end of the variable name as in salesAvg, customerSum, temperatureMin, temperatureMax, customerIndex) and so on.
When producing complementary pairs of variables, name them appropriately, e.g. min/max, begin/end or open/close.
If a variable is Boolean i.e. it contains only Yes/No or True/False values, use a name which highlights this such as fileIsFound or customerIsValid.
Avoid the lazy use of terms like Flag when naming status variables. Instead of documentFlag, use a more descriptive name such as documentFormatType.
Always avoid single character variable names in loops and so on. "for ThisShop = 1 to TOTAL_SHOPS" is far more informative than "for i = 1 to 23"
Never use literals for constants. Always assign values to constants in a single place at the top of the programme for the good and sufficient reason that constants need changing surprisingly often!

Some points that apply particularly to database handling...

When naming tables, express the name in the singular form. For example, use Employee instead of Employees.
When naming columns of tables do not repeat the table name; for example, avoid a field called EmployeeLastName in a table called Employee.
Do not incorporate the data type in the name of a column. This will reduce the amount of work should it become necessary to change the data type later.
Do not prefix stored procedures with sp, which is generally a prefix reserved for identifying system stored procedures.
Do not prefix user-defined functions with fn_, which is generally a prefix reserved for identifying built-in functions.
Do not prefix extended stored procedures with xp_, which is generally a prefix reserved for identifying system extended stored procedures.

Other points you may wish to consider...

Don't abbreviate unless you must. If you do, use those that you have created consistently. Any abbreviation should have only one meaning and likewise, each abbreviated word should have only one abbreviation. For example, if you use min to abbreviate minimum, do so everywhere and do not use min to also abbreviate minute.
When naming functions, include a description of the value being returned, such as GetCustomerName().
File and path names should also accurately describe their purpose such as SalesTransferFile or SalesFileDirectory.
Don't use easily confused names for multiple items. Especially avoid confusions such as a routine called ProcessSales() and a variable loaded from it called iProcessSales.
Try to avoid words that sound the same (homonyms), such as 'write' and 'right'. They can lead to all sorts of confusion when several people work on a project.
Use the appropriate spelling for your contry such as color/colour or check/cheque.
It's no longer considered valid to use character type identifiers, such as $ for strings or % for integers as various languages, such as Perl, reserve these symbols.

The Cat Fancier's Handbook