Programming Perl, Second Edition

Previous Chapter 3
Functions
Next
 

3.2 Perl Functions in Alphabetical Order

/PATTERN/

/PATTERN/
m/PATTERN/

The match operator. See "Regular Expressions" in Chapter 2, The Gory Details.

?PATTERN?

?PATTERN?

This is just like the /PATTERN/ search, except that it matches only once between calls to reset, so it finds only the first occurrence of something rather than all occurrences. (In other words, the operator works repeatedly until it actually matches something, then it turns itself off until you explicitly turn it back on with reset.) This may be useful (and efficient) if you want to see only the first occurrence of the pattern in each file of a set of files. Note that m?? is equivalent to ??.

The reset operator will only reset instances of ?? that were compiled in the same package that it was.

abs

abs VALUE

This function returns the absolute value of its argument (or $_ if omitted).

accept

accept NEWSOCKET, GENERICSOCKET

This function does the same thing as the accept system call--see accept (2). It is used by server processes that wish to accept socket connections from clients. Execution is suspended until a connection is made, at which time the NEWSOCKET filehandle is opened and attached to the newly made connection. The function returns the connected address if the call succeeded, false otherwise (and puts the error code into $!). GENERICSOCKET must be a filehandle already opened via the socket operator and bound to one of the server's network addresses. For example:

unless ($peer = accept NS, S) {
    die "Can't accept a connection: $!\n";
}

See also the example in the section "Sockets" in Chapter 6, Social Engineering.

alarm

alarm EXPR

This function sends a SIGALRM signal to the executing Perl program after EXPR seconds. On some older systems, alarms go off at the "top of the second," so, for instance, an alarm 1 may go off anywhere between 0 to 1 second from now, depending on when in the current second it is. An alarm 2 may go off anywhere from 1 to 2 seconds from now. And so on. For better resolution, you may be able to use syscall to call the itimer routines that some UNIX systems support. Or you can use the timeout feature of the select function.

Each call disables the previous timer, and an argument of 0 may be supplied to cancel the previous timer without starting a new one. The return value is the number of seconds remaining on the previous timer.

atan2

atan2 Y, X

This function returns the arctangent of Y/X in the range -pi to pi. A quick way to get an approximate value of pi is to say:

$pi = atan2(1,1) * 4;

For the tangent operation, you may use the POSIX::tan() function, or use the familiar relation:

sub tan { sin($_[0]) / cos($_[0]) }

bind

bind SOCKET, NAME

This function does the same thing as the bind system call--see bind (2). It attaches an address (a name) to an already opened socket specified by the SOCKET filehandle. The function returns true if it succeeded, false otherwise (and puts the error code into $!). NAME should be a packed address of the proper type for the socket.

bind S, $sockaddr or die "Can't bind address: $!\n";

See also the example in the section "Sockets" in Chapter 6, Social Engineering.

binmode

binmode FILEHANDLE

This function arranges for the file to be treated in binary mode on operating systems that distinguish between binary and text files. It should be called after the open but before any I/O is done on the filehandle. The only way to reset binary mode on a filehandle is to reopen the file.

On systems that distinguish binary mode from text mode, files that are read in text mode have \r\n sequences translated to \n on input and \n translated to \r\n on output. binmode has no effect under UNIX or Plan9. If FILEHANDLE is an expression, the value is taken as the name of the filehandle. The following example shows how a Perl script might prepare to read a word processor file with embedded control codes:

open WP, "$file.wp" or die "Can't open $file.wp: $!\n";
binmode WP;
while (read WP, $buf, 1024) {...}

bless

bless REF, CLASSNAME
bless REF

This function looks up the item pointed to by reference REF and tells the item that it is now an object in the CLASSNAME package--or the current package if no CLASSNAME is specified, which is often the case. It returns the reference for convenience, since a bless is often the last thing in a constructor function. (Always use the two-argument version if the constructor doing the blessing might be inherited by a derived class. In such cases, the class you want to bless your object into will normally be found as the first argument to the constructor in question.) See "Objects" in Chapter 5, Packages, Modules, and Object Classes for more about the blessing (and blessings) of objects.

caller

caller EXPR
caller

This function returns information about the stack of current subroutine calls. Without an argument it returns the package name, filename, and line number that the currently executing subroutine was called from:

($package, $filename, $line) = caller;

With an argument it evaluates EXPR as the number of stack frames to go back before the current one. It also reports some additional information.

$i = 0;
while (($pack, $file, $line, $subname, $hasargs, $wantarray) = caller($i++)) {
    ...
}

Furthermore, when called from within the DB package, caller returns more detailed information: it sets the list variable @DB::args to be the arguments passed in the given stack frame.

chdir

chdir EXPR

This function changes the working directory to EXPR, if possible. If EXPR is omitted, it changes to the home directory. The function returns 1 upon success, 0 otherwise (and puts the error code into $!).

chdir "$prefix/lib" or die "Can't cd to $prefix/lib: $!\n";

The following code can be used to move to the user's home directory, one way or another:

$ok = chdir($ENV{"HOME"} || $ENV{"LOGDIR"} || (getpwuid($<))[7]);

Alternately, taking advantage of the default, you could say this:

$ok = chdir() || chdir((getpwuid($<))[7]);

See also the Cwd module, described in Chapter 7, The Standard Perl Library, which lets you keep track of your current directory.

chmod

chmod LIST

This function changes the permissions of a list of files. The first element of the list must be the numerical mode, as in chmod (2). (When using nonliteral mode data, you may need to convert an octal string to a decimal number using the oct function.) The function returns the number of files successfully changed. For example:

$cnt = chmod 0755, 'file1', 'file2';

will set $cnt to 0, 1, or 2, depending on how many files got changed (in the sense that the operation succeeded, not in the sense that the bits were different afterward). Here's a more typical usage:

chmod 0755, @executables;

If you need to know which files didn't allow the change, use something like this:

@cannot = grep {not chmod 0755, $_} 'file1', 'file2', 'file3';
die "$0: could not chmod @cannot\n" if @cannot;

This idiom makes use of the grep function to select only those elements of the list for which the chmod function failed.

chomp

chomp VARIABLE
chomp LIST
chomp

This is a slightly safer version of chop (see below) in that it removes only any line ending corresponding to the current value of $/, and not just any last character. Unlike chop, chomp returns the number of characters deleted. If $/ is empty (in paragraph mode), chomp removes all trailing newlines from the selected string (or strings, if chomping a LIST).

chop

chop VARIABLE
chop LIST
chop

This function chops off the last character of a string and returns the character chopped. The chop operator is used primarily to remove the newline from the end of an input record, but is more efficient than s/\n$//. If VARIABLE is omitted, the function chops the $_ variable. For example:

while (<PASSWD>) {
    chop;   # avoid \n on last field
    @array = split /:/;
    ...
}

If you chop a LIST, each string in the list is chopped:

@lines = `cat myfile`;
chop @lines;

You can actually chop anything that is an lvalue, including an assignment:

chop($cwd = `pwd`);
chop($answer = <STDIN>);

Note that this is different from:

$answer = chop($tmp = <STDIN>);  # WRONG

which puts a newline into $answer, because chop returns the character chopped, not the remaining string (which is in $tmp). One way to get the result intended here is with substr:

$answer = substr <STDIN>, 0, -1;

But this is more commonly written as:

chop($answer = <STDIN>);

To chop more than one character, use substr as an lvalue, assigning a null string. The following removes the last five characters of $caravan:

substr($caravan, -5) = "`;

The negative subscript causes substr to count from the end of the string instead of the beginning.

chown

chown LIST

This function changes the owner (and group) of a list of files. The first two elements of the list must be the numerical uid and gid, in that order. The function returns the number of files successfully changed. For example:

$cnt = chown $uid, $gid, 'file1', 'file2';

will set $cnt to 0, 1, or 2, depending on how many files got changed (in the sense that the operation succeeded, not in the sense that the owner was different afterward). Here's a more typical usage:

chown $uid, $gid, @filenames;

Here's a subroutine that looks everything up for you, and then does the chown:

sub chown_by_name {
    local($user, $pattern) = @_;
    chown((getpwnam($user))[2,3], glob($pattern));
}
&chown_by_name("fred", "*.c");

Notice that this forces the group of each file to be the gid fetched from the passwd file. An alternative is to pass a -1 for the gid, which leaves the group of the file unchanged.

On most systems, you are not allowed to change the ownership of the file unless you're the superuser, although you should be able to change the group to any of your secondary groups. On insecure systems, these restrictions may be relaxed, but this is not a portable assumption.

chr

chr NUMBER

This function returns the character represented by that NUMBER in the character set. For example, chr(65) is "A" in ASCII. To convert multiple characters, use pack(`C*`, LIST) instead.

chroot

chroot FILENAME

This function does the same operation as the chroot system call--see chroot (2). If successful, FILENAME becomes the new root directory for the current process--the starting point for pathnames beginning with "/". This directory is inherited across exec calls and by all subprocesses. There is no way to undo a chroot. Only the superuser can use this function. Here's some code that approximates what many FTP servers do:

chroot +(getpwnam('ftp'))[7]
    or die "Can't do anonymous ftp: $!\n";

close

close FILEHANDLE

This function closes the file, socket, or pipe associated with the filehandle. You don't have to close FILEHANDLE if you are immediately going to do another open on it, since the next open will close it for you. (See open.) However, an explicit close on an input file resets the line counter ($.), while the implicit close done by open does not. Also, closing a pipe will wait for the process executing on the pipe to complete (in case you want to look at the output of the pipe afterward), and it prevents the script from exiting before the pipeline is finished.[1] Closing a pipe explicitly also puts the status value of the command executing on the pipe into $?. For example:

[1] Note, however, that a dup'ed pipe is treated as an ordinary filehandle, and close will not wait for the child on that filehandle. You have to wait for the child by closing the filehandle on which it was originally opened.

open OUTPUT, '|sort >foo';     # pipe to sort
...                            # print stuff to output
close OUTPUT;                  # wait for sort to finish
die "sort failed" if $?;       # check for sordid sort
open INPUT, 'foo';             # get sort's results

FILEHANDLE may be an expression whose value gives the real filehandle name. It may also be a reference to a filehandle object returned by some of the newer object-oriented I/O packages.

closedir

closedir DIRHANDLE

This function closes a directory opened by opendir. See the examples under opendir.

connect

connect SOCKET, NAME

This function does the same thing as the connect system call--see connect (2). The function initiates a connection with another process that is waiting at an accept (2). The function returns true if it succeeded, false otherwise (and puts the error code into $!). NAME should be a packed network address of the proper type for the socket. For example:

connect S, $destadd
    or die "Can't connect to $hostname: $!\n";

To disconnect a socket, either close or shutdown. See also the example in the section "Sockets" in Chapter 6, Social Engineering.

cos

cos EXPR

This function returns the cosine of EXPR (expressed in radians). For example, the following script will print a cosine table of angles measured in degrees:

# Here's the lazy way of getting degrees-to-radians.
$pi = atan2(1,1) * 4;
$piover180 = $pi/180;
# Print table.
for ($_ = 0; $_ <= 90; $_++) {
    printf "%3d %7.5f\n", $_, cos($_ * $piover180);
}

For the inverse cosine operation, you may use the POSIX::acos() function, or use this relation:

sub acos { atan2( sqrt(1 - $_[0] * $_[0]), $_[0] ) }

crypt

crypt PLAINTEXT, SALT

This function encrypts a string exactly in the manner of crypt (3). This is useful for checking the password file for lousy passwords.[2] Only the guys wearing white hats are allowed to do this.

[2] What you really want to do is prevent people from adding the bad passwords in the first place.

To see whether a typed-in password $guess matches the password $pass obtained from a file (such as /etc/passwd), try something like the following:

if (crypt($guess, $pass) eq $pass) {
    # guess is correct
}

Note that there is no easy way to decrypt an encrypted password apart from guessing. Also, truncating the salt to two characters is a waste of CPU time, although the manpage for crypt (3) would have you believe otherwise.

Here's an example that makes sure that whoever runs this program knows their own password:

$pwd = (getpwuid $<)[1];
$salt = substr $pwd, 0, 2;
system "stty -echo";
print "Password: ";
chop($word = <STDIN>);
print "\n";
system "stty echo";
if (crypt($word, $salt) ne $pwd) {
    die "Sorry...\n";
} else {
    print "ok\n";
}

Of course, typing in your own password to whoever asks for it is unwise.

The crypt function is unsuitable for encrypting large quantities of data. Find a library module for PGP (or something like that) for something like that.

dbmclose

dbmclose HASH

This function breaks the binding between a DBM file and a hash.

This function is actually just a call to untie with the proper arguments, but is provided for backward compatibility with older versions of Perl.

dbmopen

dbmopen HASH, DBNAME, MODE

This binds a DBM file to a hash (that is, an associative array). (DBM stands for Data Base Management, and consists of a set of C library routines that allow random access to records via a hashing algorithm.) HASH is the name of the hash (with a %). DBNAME is the name of the database (without the .dir or .pag extension). If the database does not exist, and a valid MODE is specified, the database is created with the protection specified by MODE (as modified by the umask). To prevent creation of the database if it doesn't exist, you may specify a MODE of undef, and the function will return a false value if it can't find an existing database. If your system supports only the older DBM functions, you may have only one dbmopen in your program.

Values assigned to the hash prior to the dbmopen are not accessible.

If you don't have write access to the DBM file, you can only read the hash variables, not set them. If you want to test whether you can write, either use file tests or try setting a dummy array entry inside an eval, which will trap the error.

Note that functions such as keys and values may return huge list values when used on large DBM files. You may prefer to use the each function to iterate over large DBM files. This example prints out the mail aliases on a system using sendmail:

dbmopen %ALIASES, "/etc/aliases", 0666
    or die "Can't open aliases: $!\n";
while (($key,$val) = each %ALIASES) {
    print $key, ' = ', $val, "\n";
}
dbmclose %ALIASES;

Hashes bound to DBM files have the same limitations as DBM files, in particular the restrictions on how much you can put into a bucket. If you stick to short keys and values, it's rarely a problem. Another thing you should bear in mind is that many existing DBM databases contain null-terminated keys and values because they were set up with C programs in mind. The B News history file and the old sendmail aliases file are examples. Just use "$key\0" instead of $key.

There is currently no built-in way to lock generic DBM files. Some would consider this a bug. The DB_File module does provide locking at the granularity of the entire file, however. See the documentation on that module in Chapter 7, The Standard Perl Library for details.

This function is actually just a call to tie with the proper arguments, but is provided for backward compatibility with older versions of Perl.

defined

defined EXPR

This function returns a Boolean value saying whether EXPR has a real value or not. A scalar that contains no valid string, numeric, or reference value is known as the undefined value, or undef for short. Many operations return the undefined value under exceptional conditions, such as end of file, uninitialized variable, system error, and such. This function allows you to distinguish between an undefined null string and a defined null string when you're using operators that might return a real null string.

You may also check to see whether arrays, hashes, or subroutines have been allocated any memory yet. Arrays and hashes are allocated when you first put something into them, whereas subroutines are allocated when a definition has been successfully parsed. Using defined on the predefined special variables is not guaranteed to produce intuitive results.

Here is a fragment that tests a scalar value from a hash:

print if defined $switch{'D'};

When used on a hash element like this, defined only tells you whether the value is defined, not whether the key has an entry in the hash table. It's possible to have an undefined scalar value for an existing hash key. Use exists to determine whether the hash key exists.

In the next example we use the fact that some operations return the undefined value when you run out of data:

print "$val\n" while defined($val = pop(@ary));

The same thing goes for error returns from system calls:

die "Can't readlink $sym: $!"
    unless defined($value = readlink $sym);

Since symbol tables for packages are stored as hashes (associative arrays), it's possible to check for the existence of a package like this:

die "No XYZ package defined" unless defined %XYZ::;

Finally, it's possible to avoid blowing up on nonexistent subroutines:

sub saymaybe {
   if (defined &say) {
       say(@_);
   }
   else {
       warn "Can't say";
   }
}

See also undef.

delete

delete EXPR

This function deletes the specified key and associated value from the specified hash. (It doesn't delete a file. See unlink for that.) Deleting from $ENV{} modifies the environment. Deleting from a hash that is bound to a (writable) DBM file deletes the entry from the DBM file.

The following naïve example inefficiently deletes all the values of a hash:

foreach $key (keys %HASH) {
    delete $HASH{$key};
}

(It would be faster to use the undef command.) EXPR can be arbitrarily complicated as long as the final operation is a hash key lookup:

delete $ref->[$x][$y]{$key};

For normal hashes, the delete function happens to return the value (not the key) that was deleted, but this behavior is not guaranteed for tied hashes, such as those bound to DBM files.

To test whether a hash element has been deleted, use exists.

die

die LIST

Outside of an eval, this function prints the concatenated value of LIST to STDERR and exits with the current value of $! (errno). If $! is 0, it exits with the value of ($? >> 8) (which is the status of the last reaped child from a system, wait, close on a pipe, or `command`). If ($? >> 8) is 0, it exits with 255. If LIST is unspecified, the current value of the $@ variable is propagated, if any. Otherwise the string "Died" is used as the default.

Equivalent examples:


die "Can't cd to spool: $!\n" unless chdir '/usr/spool/news';
chdir '/usr/spool/news' or die "Can't cd to spool: $!\n"

(The second form is generally preferred, since the important part is the chdir.)

Within an eval, the function sets the $@ variable equal to the error message that would have been produced otherwise, and aborts the eval, which then returns the undefined value. The die function can thus be used to raise named exceptions that can be caught at a higher level in the program. See the section on the eval function later in this chapter.

If the final value of LIST does not end in a newline, the current script filename, line number, and input line number (if any) are appended to the message, as well as a newline. Hint: sometimes appending `, stopped" to your message will cause it to make better sense when the string "at scriptname line 123" is appended. Suppose you are running script canasta:

die "/etc/games is no good";
die "/etc/games is no good, stopped";

which produces, respectively:

/etc/games is no good at canasta line 123.
/etc/games is no good, stopped at canasta line 123.

If you want your own error messages reporting the filename and linenumber, use the _ _FILE_ _ and _ _LINE_ _ special tokens:

die '"', _  _FILE_  _, '", line ', _  _LINE_  _, ", phooey on you!\n";

This produces output like:

"canasta", line 38, phooey on you!

See also exit and warn.

do

do BLOCK
do SUBROUTINE(LIST)
do EXPR

The do BLOCK form executes the sequence of commands in the BLOCK, and returns the value of the last expression evaluated in the block. When modified by a loop modifier, Perl executes the BLOCK once before testing the loop condition. (On other statements the loop modifiers test the conditional first.)

The do SUBROUTINE(LIST) is a deprecated form of a subroutine call. See "Subroutines" in Chapter 2, The Gory Details.

The do EXPR, form uses the value of EXPR as a filename and executes the contents of the file as a Perl script. Its primary use is (or rather was) to include subroutines from a Perl subroutine library, so that:

do 'stat.pl';

is rather like:

eval `cat stat.pl`;

except that it's more efficient, more concise, keeps track of the current filename for error messages, and searches all the directories listed in the @INC array. (See the section on "Special Variables" in Chapter 2, The Gory Details.) It's the same, however, in that it does reparse the file every time you call it, so you probably don't want to do this inside a loop.

Note that inclusion of library modules is better done with the use and require operators, which also do error checking and raise an exception if there's a problem.

dump

dump LABEL
dump

This function causes an immediate core dump. Primarily this is so that you can use undump (1) to turn your core dump into an executable binary after having initialized all your variables at the beginning of the program. (The undump program is not supplied with the Perl distribution, and is not even possible on some architectures. There are hooks in the code for using the GNU unexec() routine as an alternative. Other methods may be supported in the future.) When the new binary is executed it will begin by executing a goto LABEL (with all the restrictions that goto suffers). Think of the operation as a goto with an intervening core dump and reincarnation. If LABEL is omitted, the function arranges for the program to restart from the top. Please note that any files opened at the time of the dump will not be open any more when the program is reincarnated, with possible confusion resulting on the part of Perl. See also the -u command-line switch. For example:

#!/usr/bin/perl
use Getopt::Std;
use MyHorridModule;
%days = (
    Sun => 1,
    Mon => 2,
    Tue => 3,
    Wed => 4,
    Thu => 5,
    Fri => 6,
    Sat => 7,
);
dump QUICKSTART if $ARGV[0] eq '-d';
QUICKSTART:
Getopts('f:');
...

This startup code does some slow initialization code, and then calls the dump function to take a snapshot of the program's state. When the dumped version of the program is run, it bypasses all the startup code and goes directly to the QUICKSTART label. If the original script is invoked without the -d switch, it just falls through and runs normally.

If you're looking to use dump to speed up your program, check out the discussion of efficiency matters in Chapter 8, Other Oddments, as well the Perl native-code compiler in Chapter 6, Social Engineering. You might also consider autoloading, which at least makes it appear to run faster.

each

each HASH

This function returns a two-element list consisting of the key and value for the next value of a hash. With successive calls to each you can iterate over the entire hash. Entries are returned in an apparently random order. When the hash is entirely read, a null list is returned (which, when used in a list assignment, produces a false value). The next call to each after that will start a new iteration. The iterator can be reset either by reading all the elements from the hash, or by calling the keys function in scalar context. You must not add elements to the hash while iterating over it, although you are permitted to use delete. In a scalar context, each returns just the key, but watch out for false keys.

There is a single iterator for each hash, shared by all each, keys, and values function calls in the program. This means that after a keys or values call, the next each call will start again from the beginning. The following example prints out your environment like the printenv (1) program, only in a different order:

while (($key,$value) = each %ENV) {
    print "$key=$value\n";
}

See also keys and values.

eof

eof FILEHANDLE
eof()
eof

This function returns true if the next read on FILEHANDLE will return end of file, or if FILEHANDLE is not open. FILEHANDLE may be an expression whose value gives the real filehandle name. An eof without an argument returns the end-of-file status for the last file read. Empty parentheses () may be used in connection with the combined files listed on the command line. That is, inside a while (<>) loop eof() will detect the end of only the last of a group of files. Use eof(ARGV) or eof (without the parentheses) to test each file in a while (<>) loop. For example, the following code inserts dashes just before the last line of the last file:

while (<>) {
    if (eof()) {
        print "-" x 30, "\n";
    }
    print;
}

On the other hand, this script resets line numbering on each input file:

while (<>) {
    print "$.\t$_";
    if (eof) {       # Not eof().
        close ARGV;  # reset $.
    }
}

Like "$" in a sed program, eof tends to show up in line number ranges. Here's a script that prints lines from /pattern/ to end of each input file:

while (<>) {
    print if /pattern/ .. eof;
}

Here, the flip-flop operator (..) evaluates the regular expression match for each line. Until the pattern matches, the operator returns false. When it finally matches, the operator starts returning true, causing the lines to be printed. When the eof operator finally returns true (at the end of the file being examined), the flip-flop operator resets, and starts returning false again.

Note that the eof function actually reads a byte and then pushes it back on the input stream with ungetc (3), so it is not very useful in an interactive context. In fact, experienced Perl programmers rarely use eof, since the various input operators already behave quite nicely in while-loop conditionals. See the example in the description of foreach in Chapter 2, The Gory Details.

eval

eval EXPR
eval BLOCK

The value expressed by EXPR is parsed and executed as though it were a little Perl program. It is executed in the context of the current Perl program, so that any variable settings remain afterward, as do any subroutine or format definitions. The code of the eval is treated as a block, so any locally scoped variables declared within the eval last only until the eval is done. (See local and my.) As with any code in a block, a final semicolon is not required. If EXPR is omitted, the operator evaluates $_.

The value returned from an eval is the value of the last expression evaluated, just as with subroutines. Similarly, you may use the return operator to return a value from the middle of the eval. If there is a syntax error or run-time error (including any produced by the die operator), eval returns the undefined value and puts the error message in $@. If there is no error, $@ is guaranteed to be set to the null string, so you can test it reliably afterward for errors.

Here's a statement that assigns an element to a hash chosen at run-time:

eval "\$$arrayname{\$key} = 1";

(You can accomplish that more simply with soft references--see "Symbolic References" in Chapter 4, References and Nested Data Structures.) And here is a simple Perl shell:

while (<>) { eval; print $@; }

Since eval traps otherwise-fatal errors, it is useful for determining whether a particular feature (such as socket or symlink) is implemented. In fact, eval is the way to do all exception handling in Perl. If the code to be executed doesn't vary, you should use the eval BLOCK form to trap run-time errors; the code in the block is compiled only once rather than on each execution, yielding greater efficiency. The error, if any, is still returned in $@. Examples:

# make divide-by-zero non-fatal
eval { $answer = $a / $b; }; warn $@ if $@;
# same thing, but less efficient
eval '$answer = $a / $b'; warn $@ if $@;
# a compile-time error (not trapped)
eval { $answer = };
# a run-time error
eval '$answer =';  # sets $@

Here, the code in the BLOCK has to be valid Perl code to make it past the compilation phase. The code in the string doesn't get examined until run-time, and so doesn't cause an error until run-time.

With an eval you should be careful to remember what's being looked at when:

eval $x;          # CASE 1
eval "$x";        # CASE 2
eval '$x';        # CASE 3
eval { $x };      # CASE 4
eval "\$$x++";    # CASE 5
$$x++;            # CASE 6

Cases 1 and 2 above behave identically: they run the code contained in the variable $x. (Case 2 has misleading double quotes, making the reader wonder what else might be happening, when nothing is. The contents of $x would in any event have to be converted to a string for parsing.) Cases 3 and 4 likewise behave in the same way: they run the code $x, which does nothing at all except return the value of $x. (Case 4 is preferred since the expression doesn't need to recompiled each time.) Case 5 is a place where normally you would like to use double quotes to let you interpolate the variable name, except that in this particular situation you can just use symbolic references instead, as in case 6.

A frequently asked question is how to set up an exit routine. One common way is to use an END block. But you can also do it with an eval, like this:

#!/usr/bin/perl
eval <<'EndOfEval';  $start = __LINE__;
   .
   .           # your ad here
   .
EndOfEval
# Cleanup
unlink "/tmp/myfile$$";
$@ && ($@ =~ s/\(eval \d+\) at line (\d+)/$0 .
    " line " . ($1+$start)/e, die $@);
exit 0;

Note that the code supplied for an eval might not be recompiled if the text hasn't changed. On the rare occasions when you want to force a recompilation (because you want to reset a .. operator, for instance), you could say something like this:

eval $prog . '#' . ++$seq;

exec

exec LIST

This function terminates the currently running Perl script by executing another program in place of itself. If there is more than one argument in LIST (or if LIST is an array with more than one value) the function calls C's execvp (3) routine with the arguments in LIST. This bypasses any shell processing of the command. If there is only one scalar argument, the argument is checked for shell metacharacters. If metacharacters are found, the entire argument is passed to "/bin/sh -c" for parsing.[3] If there are no metacharacters, the argument is split into words and passed directly to execvp (3) in the interests of efficiency, since this bypasses all the overhead of shell processing. Ordinarily exec never returns--if it does return, it always returns false, and you should check $! to find out what went wrong. Note that exec (and system) do not flush your output buffer, so you may need to enable command buffering by setting $| on one or more filehandles to avoid lost output. This statement runs the echo program to print the current argument list:

[3] Under UNIX, that is. Other operating systems may use other command interpreters.

exec 'echo', 'Your arguments are: ', @ARGV;

This example shows that you can exec a pipeline:

exec "sort $outfile | uniq"
  or die "Can't do sort/uniq: $!\n";

The UNIX execv (3) call provides the ability to tell a program the name it was invoked as. This name might have nothing to do with the name of the program you actually gave the operating system to run. By default, Perl simply replicates the first element of LIST and uses it for both purposes. If, however, you don't really want to execute the first argument of LIST, but you want to lie to the program you are executing about its own name, you can do so. Put the real name of the program you want to run into a variable and then put that variable out in front of the LIST without a comma, kind of like a filehandle for a print statement. (This always forces interpretation of the LIST as a multi-valued list, even if there is only a single scalar in the list.) Then the first element of LIST will be used only to mislead the executing program as to its name. For example:

$shell = '/bin/csh';
exec $shell '-sh', @args;      # pretend it's a login shell
die "Couldn't execute csh: $!\n";

You can also replace the simple scalar holding the program name with a block containing arbitrary code, which simplifies the above example to:

exec {'/bin/csh'} '-sh', @args; # pretend it's a login shell

exists

exists EXPR

This function returns true if the specified hash key exists in its hash, even if the corresponding value is undefined.

print "Exists\n" if exists $hash{$key};
print "Defined\n" if defined $hash{$key};
print "True\n" if $hash{$key};

A hash element can only be true if it's defined, and can only be defined if it exists, but the reverse doesn't necessarily hold true in either case.

EXPR can be arbitrarily complicated as long as the final operation is a hash key lookup:

if (exists $ref->[$x][$y]{$key}) { ... }

exit

exit EXPR

This function evaluates EXPR and exits immediately with that value. Here's a fragment that lets a user exit the program by typing x or X:

$ans = <STDIN>;
exit 0 if $ans =~ /^[Xx]/;

If EXPR is omitted, the function exits with 0 status. You shouldn't use exit to abort a subroutine if there's any chance that someone might want to trap whatever error happened. Use die instead, which can be trapped by an eval.

exp

exp EXPR

This function returns e to the power of EXPR. If EXPR is omitted, it gives exp($_). To do general exponentiation, use the ** operator.

fcntl

fcntl FILEHANDLE, FUNCTION, SCALAR

This function calls UNIX's fcntl (2) function. (fcntl stands for "file control".) You'll probably have to say:

use Fcntl;

first to get the correct function definitions. SCALAR will be read and/or written depending on the FUNCTION--a pointer to the string value of SCALAR will be passed as the third argument of the actual fcntl call. (If SCALAR has no string value but does have a numeric value, that value will be passed directly rather than a pointer to the string value.)

The return value of fcntl (and ioctl) is as follows:

System call returns Perl returns
-1 undefined value
0 string "0 but true"
anything else that number

Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating system:

$retval = fcntl(...) or $retval = -1;
printf "System returned %d\n", $retval;

Here, even the string "0 but true" prints as 0, thanks to the %d format.

For example, since Perl always sets the close-on-exec flag for file descriptors above 2, if you wanted to pass file descriptor 3 to a subprocess, you might want to clear the flag like this:

use Fcntl;
open TTY,"+>/dev/tty" or die "Can't open /dev/tty: $!\n";
fileno TTY == 3 or die "Internal error: fd mixup";
fcntl TTY, &F_SETFL, 0
    or die "Can't clear the close-on-exec flag: $!\n";

fcntl will produce a fatal error if used on a machine that doesn't implement fcntl (2). On machines that do implement it, you can do such things as modify the close-on-exec flags, modify the non-blocking I/O flags, emulate the lockf (3) function, and arrange to receive the SIGIO signal when I/O is pending. You might even have record-locking facilities.

fileno

fileno FILEHANDLE

This function returns the file descriptor for a filehandle. (A file descriptor is a small integer, unlike the filehandle, which is a symbol.) It returns undef if the handle is not open. It's useful for constructing bitmaps for select, and for passing to certain obscure system calls if syscall (2) is implemented. It's also useful for double-checking that the open function gave you the file descriptor you wanted--see the example under fcntl.

If FILEHANDLE is an expression, its value is taken to represent a filehandle, either indirectly by name, or directly as a reference to a filehandle object.

A caution: don't count on the association of a Perl filehandle and a numeric file descriptor throughout the life of the program. If a file has been closed and reopened, the file descriptor may change. Filehandles STDIN, STDOUT, and STDERR start with file descriptors of 0, 1, and 2 (the UNIX standard convention), but even they can change if you start closing and opening them with wild abandon. But you can't get into trouble with 0, 1, and 2 as long as you always reopen immediately after closing, since the basic rule on UNIX systems is to pick the lowest available descriptor, and that'll be the one you just closed.

flock

flock FILEHANDLE, OPERATION

This function calls flock (2) on FILEHANDLE. See the manual page for flock (2) for the definition of OPERATION. Invoking flock will produce a fatal error if used on a machine that doesn't implement flock (2) or emulate it through some other locking mechanism. Here's a mailbox appender for some BSD-based systems:

$LOCK_SH = 1;
$LOCK_EX = 2;
$LOCK_NB = 4;
$LOCK_UN = 8;
sub lock {
    flock MBOX, $LOCK_EX;
    # and, in case someone appended
    # while we were waiting...
    seek MBOX, 0, 2;
}
sub unlock {
    flock MBOX, $LOCK_UN;
}
open MBOX, ">>/usr/spool/mail/$ENV{'USER'}"
    or die "Can't open mailbox: $!";
lock();
print MBOX $msg, "\n\n";
unlock();

Note that flock is unlikely to work on a file being accessed through a network file system.

fork

fork

This function does a fork (2) call. If it succeeds, the function returns the child pid to the parent process and 0 to the child process. (If it fails, it returns the undefined value to the parent process. There is no child process.) Note that unflushed buffers remain unflushed in both processes, which means you may need to set $| on one or more filehandles earlier in the program to avoid duplicate output.

A nearly bulletproof way to launch a child process while checking for "cannot fork" errors would be:

FORK: {
    if ($pid = fork) {
        # parent here
        # child process pid is available in $pid
    } elsif (defined $pid) { # $pid is zero here if defined
        # child here
        # parent process pid is available with getppid
    } elsif ($! =~ /No more process/) {     
        # EAGAIN, supposedly recoverable fork error
        sleep 5;
        redo FORK;
    } else {
        # weird fork error
        die "Can't fork: $!\n";
    }
}

These precautions are not necessary on operations which do an implicit fork (2), such as system, backquotes, or opening a process as a filehandle, because Perl automatically retries a fork on a temporary failure in these cases. Be very careful to end the child code with an exit, or your child may inadvertently leave the conditional and start executing code intended only for the parent process.

If you fork your child processes, you'll have to wait on their zombies when they die. See the wait function for examples of doing this.

The fork function is unlikely to be implemented on any operating system not resembling UNIX, unless it purports POSIX compliance.

format

format NAME =
    picture line
    value list
    ...
.

Declares a named sequence of picture lines (with associated values) for use by the write function. If NAME is omitted, the name defaults to STDOUT, which happens to be the default format name for the STDOUT filehandle. Since, like a sub declaration, this is a global declaration that happens at compile time, any variables used in the value list need to be visible at the point of the format's declaration. That is, lexically scoped variables must be declared earlier in the file, while dynamically scoped variables merely need to be set in the routine that calls write. Here's an example (which assumes we've already calculated $cost and $quantity:

my $str = "widget";               # A lexically scoped variable.
format Nice_Output =
Test: @<<<<<<<< @||||| @>>>>>
      $str,     $%,    '$' . int($num)
.
$~ = "Nice_Output";               # Select our format.
local $num = $cost * $quantity;   # Dynamically scoped variable.
write;

Like filehandles, format names are identifiers that exist in a symbol table (package) and may be fully qualified by package name. Within the typeglobs of a symbol table's entries, formats reside in their own namespace, which is distinct from filehandles, directory handles, scalars, arrays, hashes, or subroutines. Like those other six types, however, a format named Whatever would also be affected by a local on the *Whatever typeglob. In other words, a format is just another gadget contained in a typeglob, independent of the other gadgets.

The "Formats" section in Chapter 2, The Gory Details contains numerous details and examples of their use. The "Per Filehandle Special Variables" and "Global Special Variables" sections in Chapter 2, The Gory Details describe the internal format-specific variables, and the English and FileHandle modules in Chapter 7, The Standard Perl Library provide easier access to them.

formline

formline PICTURE, LIST

This is an internal function used by formats, although you may also call it. It formats a list of values according to the contents of PICTURE, placing the output into the format output accumulator, $^A. Eventually, when a write is done, the contents of $^A are written to some filehandle, but you could also read $^A yourself and then set $^A back to "". Note that a format typically does one formline per line of form, but the formline function itself doesn't care how many newlines are embedded in the PICTURE. This means that the ~ and ~~ tokens will treat the entire PICTURE as a single line. You may therefore need to use multiple formlines to implement a single record-format, just like the format compiler.

Be careful if you put double quotes around the picture, since an @ character may be taken to mean the beginning of an array name. formline always returns true. See "Formats" in Chapter 2, The Gory Details for other examples.

getc

getc FILEHANDLE
getc

This function returns the next byte from the input file attached to FILEHANDLE. At end-of-file, it returns a null string. If FILEHANDLE is omitted, the function reads from STDIN. This operator is very slow, but is occasionally useful for single-character, buffered input from the keyboard. This does not enable single-character input. For unbuffered input, you have to be slightly more clever, in an operating-system-dependent fashion. Under UNIX you might say this:

if ($BSD_STYLE) {
  system "stty cbreak </dev/tty >/dev/tty 2>&1";
} else {
  system "stty", "-icanon", "eol", "\001";
}
$key = getc;
if ($BSD_STYLE) {
  system "stty -cbreak </dev/tty >/dev/tty 2>&1";
} else {
  system "stty", "icanon", "eol", "^@"; # ASCII NUL
}
print "\n";

This code puts the next character typed on the terminal in the string $key. If your stty program has options like cbreak, you'll need to use the code where $BSD_STYLE is true, otherwise, you'll need to use the code where it is false. Determining the options for stty is left as an exercise to the reader.

The POSIX module in Chapter 7, The Standard Perl Library provides a more portable version of this using the POSIX::getattr() function. See also the TERM::ReadKey module from your nearest CPAN site.

getgrent

getgrent
setgrent
endgrent

These functions do the same thing as their like-named system library routines--see getgrent (3). These routines iterate through your /etc/group file (or its moral equivalent coming from some server somewhere). The return value from getgrent in list context is:

($name, $passwd, $gid, $members)

where $members contains a space-separated list of the login names of the members of the group. To set up a hash for translating group names to gids, say this:

while (($name, $passwd, $gid) = getgrent) {
    $gid{$name} = $gid;
}

In scalar context, getgrent returns only the group name.

getgrgid

getgrgid GID

This function does the same thing as getgrgid (3): it looks up a group file entry by group number. The return value in list context is:

($name, $passwd, $gid, $members)

where $members contains a space-separated list of the login names of the members of the group. If you want to do this repeatedly, consider caching the data in a hash (associative array) using getgrent.

In scalar context, getgrgid returns only the group name.

getgrnam

getgrnam NAME

This function does the same thing as getgrnam (3): it looks up a group file entry by group name. The return value in list context is:

($name, $passwd, $gid, $members)

where $members contains a space-separated list of the login names of the members of the group. If you want to do this repeatedly, consider slurping the data into a hash (associative array) using getgrent.

In scalar context, getgrnam returns only the numeric group ID.

gethostbyaddr

gethostbyaddr ADDR, ADDRTYPE

This function does the same thing as gethostbyaddr (3): it translates a packed binary network address to its corresponding names (and alternate addresses). The return value in list context is:

($name, $aliases, $addrtype, $length, @addrs)

where @addrs is a list of packed binary addresses. In the Internet domain, each address is four bytes long, and can be unpacked by saying something like:

($a, $b, $c, $d) = unpack('C4', $addrs[0]);

In scalar context, gethostbyaddr returns only the host name. See the section on "Sockets" in Chapter 6, Social Engineering for another approach.

gethostbyname

gethostbyname NAME

This function does the same thing as gethostbyname (3): it translates a network hostname to its corresponding addresses (and other names). The return value in list context is:

($name, $aliases, $addrtype, $length, @addrs)

where @addrs is a list of raw addresses. In the Internet domain, each address is four bytes long, and can be unpacked by saying something like:

($a, $b, $c, $d) = unpack('C4', $addrs[0]);

In scalar context, gethostbyname returns only the host address. See the section on "Sockets" in Chapter 6, Social Engineering for another approach.

gethostent

gethostent
sethostent STAYOPEN
endhostent

These functions do the same thing as their like-named system library routines--see gethostent (3).

They iterate through your /etc/hosts file and return each entry one at a time. The return value from gethostent is:

($name, $aliases, $addrtype, $length, @addrs)

where @addrs is a list of raw addresses. In the Internet domain, each address is four bytes long, and can be unpacked by saying something like:

($a, $b, $c, $d) = unpack('C4', $addrs[0]);

Scripts that use these routines should not be considered portable. If a machine uses a nameserver, it would interrogate most of the Internet to try to satisfy a request for all the addresses of every machine on the planet. So these routines are unimplemented on such machines.

getlogin

getlogin

This function returns the current login from /etc/utmp, if any. If null, use getpwuid. For example:

$login = getlogin || (getpwuid($<))[0] || "Intruder!!";

getnetbyaddr

getnetbyaddr ADDR, ADDRTYPE

This function does the same thing as getnetbyaddr (3): it translates a network address to the corresponding network name or names. The return value in list context is:

($name, $aliases, $addrtype, $net)

In scalar context, getnetbyaddr returns only the network name.

getnetbyname

getnetbyname NAME

This function does the same thing as getnetbyname (3): it translates a network name to its corresponding network address. The return value in list context is:

($name, $aliases, $addrtype, $net)

In scalar context, getnetbyname returns only the network address.

getnetent

getnetent
setnetent STAYOPEN
endnetent

These functions do the same thing as their like-named system library routines--see getnetent (3). They iterate through your /etc/networks file, or moral equivalent. The return value in list context is:

($name, $aliases, $addrtype, $net)

In scalar context, getnetent returns only the network name.

getpeername

getpeername SOCKET

This function returns the packed socket address of other end of the SOCKET connection. For example:

use Socket;
$hersockaddr = getpeername SOCK;
($port, $heraddr) = unpack_sockaddr_in($hersockaddr);
$herhostname = gethostbyaddr($heraddr, AF_INET);
$herstraddr = inet_ntoa($heraddr);

getpgrp

getpgrp PID

This function returns the current process group for the specified PID (use a PID of 0 for the current process). Invoking getpgrp will produce a fatal error if used on a machine that doesn't implement getpgrp (2). If PID is omitted, the function returns the process group of the current process (the same as using a PID of 0). On systems implementing this operator with the POSIX getpgrp (2) system call, PID must be omitted or, if supplied, must be 0.

getppid

getppid

This function returns the process ID of the parent process. On the typical UNIX system, if your parent process ID changes to 1, your parent process has died and you've been adopted by the init program.

getpriority

getpriority WHICH, WHO

This function returns the current priority for a process, a process group, or a user. See getpriority (2). Invoking getpriority will produce a fatal error if used on a machine that doesn't implement getpriority (2). For example, to get the priority of the current process, use:

$curprio = getpriority(0, 0);

getprotobyname

getprotobyname NAME

This function does the same thing as getprotobyname (3): it translates a protocol name to its corresponding number. The return value in list context is:

($name, $aliases, $protocol_number)

In scalar context, getprotobyname returns only the protocol number.

getprotobynumber

getprotobynumber NUMBER

This function does the same thing as getprotobynumber (3): it translates a protocol number to its corresponding name. The return value in list context is:

($name, $aliases, $protocol_number)

In scalar context, getprotobynumber returns only the protocol name.

getprotoent

getprotoent
setprotoent STAYOPEN
endprotoent

These functions do the same thing as their like-named system library routines--see getprotent (3). The return value from getprotoent is:

($name, $aliases, $protocol_number)

In scalar context, getprotoent returns only the protocol name.

getpwent

getpwent
setpwent
endpwent

These functions do the same thing as their like-named system library routines--see getpwent (3). They iterate through your /etc/passwd file (or its moral equivalent coming from some server somewhere). The return value in list context is:

($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell)

Some machines may use the quota and comment fields for other purposes, but the remaining fields will always be the same. To set up a hash for translating login names to uids, say this:

while (($name, $passwd, $uid) = getpwent) {
    $uid{$name} = $uid;
}

In scalar context, getpwent returns only the username.

getpwnam

getpwnam NAME

This function does the same thing as getpwnam (3): it translates a username to the corresponding passwd file entry. The return value in list context is:

($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell)

If you want to do this repeatedly, consider caching the data in a hash (associative array) using getpwent.

In scalar context, getpwnam returns only the numeric user ID.

getpwuid

getpwuid UID

This function does the same thing as getpwuid (3): it translates a numeric user id to the corresponding passwd file entry. The return value in list context is:

($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell)

If you want to do this repeatedly, consider slurping the data into a hash using getpwent.

In scalar context, getpwuid returns the username.

getservbyname

getservbyname NAME, PROTO

This function does the same thing as getservbyname (3): it translates a service (port) name to its corresponding port number. PROTO is a protocol name such as "tcp". The return value in list context is:

($name, $aliases, $port_number, $protocol_name)

In scalar context, getservbyname returns only the service port number.

getservbyport

getservbyport PORT, PROTO

This function does the same thing as getservbyport (3): it translates a service (port) number to its corresponding names. PROTO is a protocol name such as "tcp". The return value in list context is:

($name, $aliases, $port_number, $protocol_name)

In scalar context, getservbyport returns only the service port name.

getservent

getservent
setservent STAYOPEN
endservent

These functions do the same thing as their like-named system library routines--see getservent (3). They iterate through the /etc/services file or its equivalent. The return value in list context is:

($name, $aliases, $port_number, $protocol_name)

In scalar context, getservent returns only the service port name.

getsockname

getsockname SOCKET

This function returns the packed sockaddr address of this end of the SOCKET connection. (And why wouldn't you know your own address already? Because you might have bound an address containing wildcards to the generic socket before doing an accept. Or because you might have been passed a socket by your parent process--for example, inetd.)

use Socket;
$mysockaddr = getsockname(SOCK);
($port, $myaddr) = unpack_sockaddr_in($mysockaddr);

getsockopt

getsockopt SOCKET, LEVEL, OPTNAME

This function returns the socket option requested, or the undefined value if there is an error. See setsockopt for more.

glob

glob EXPR

This function returns the value of EXPR with filename expansions such as a shell would do. (If EXPR is omitted, $_ is globbed instead.) This is the internal function implementing the <*> operator, except that it may be easier to type this way. For example, compare these two:

@result = map { glob($_) } "*.c", "*.c,v";
@result = map <${_}>, "*.c", "*.c,v";

The glob function is not related to the Perl notion of typeglobs, other than that they both use a * to represent multiple items.

gmtime

gmtime EXPR

This function converts a time as returned by the time function to a 9-element list with the time correct for the Greenwich time zone (aka GMT, or UTC, or even Zulu in certain cultures, not including the Zulu culture, oddly enough). Typically used as follows:

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
        gmtime(time);

All list elements are numeric, and come straight out of a struct tm (that's a C programming structure--don't sweat it). In particular this means that $mon has the range 0..11, $wday has the range 0..6, and the year has had 1,900 subtracted from it. (You can remember which ones are 0-based because those are the ones you're always using as subscripts into 0-based arrays containing month and day names.) If EXPR is omitted, it does gmtime(time). For example, to print the current month in London:

$london_month = (qw(Jan Feb Mar Apr May Jun
        Jul Aug Sep Oct Nov Dec))[(gmtime)[4]];

The Perl library module Time::Local contains a subroutine, timegm( ), that can convert in the opposite direction.

In scalar context, gmtime returns a ctime (3)-like string based on the GMT time value.

goto

goto LABEL
goto EXPR
goto &NAME

goto LABEL finds the statement labeled with LABEL and resumes execution there. It may not be used to go into any construct that requires initialization, such as a subroutine or a foreach loop. It also can't be used to go into a construct that is optimized away. It can be used to go almost anywhere else within the dynamic scope,[4] including out of subroutines, but for that purpose it's usually better to use some other construct such as last or die. The author of Perl has never felt the need to use this form of goto (in Perl, that is--C is another matter).

[4] This means that if it doesn't find the label in the current routine, it looks back through the routines that called the current routine for the label, thus making it nearly impossible to maintain your program.

Going to even greater heights of orthogonality (and depths of idiocy), Perl allows goto EXPR, which expects EXPR to evaluate to a label name, whose scope is guaranteed to be unresolvable until run-time since the label is unknown when the statement is compiled. This allows for computed gotos per FORTRAN, but isn't necessarily recommended[5] if you're optimizing for maintainability:

[5] Understatement is reputed to be funny, so we thought we'd try one here.

goto +("FOO", "BAR", "GLARCH")[$i];

goto &NAME is highly magical, substituting a call to the named subroutine for the currently running subroutine. This is used by AUTOLOAD subroutines that wish to load another subroutine and then pretend that this subroutine--and not the original one--had been called in the first place (except that any modifications to @_ in the original subroutine are propagated to the replacement subroutine). After the goto, not even caller will be able to tell that the original routine was called first.

grep

grep EXPR, LIST
grep BLOCK LIST

This function evaluates EXPR or BLOCK in a Boolean context for each element of LIST, temporarily setting $_ to each element in turn. In list context, it returns a list of those elements for which the expression is true. (The operator is named after a beloved UNIX program that extracts lines out of a file that match a particular pattern. In Perl the expression is often a pattern, but doesn't have to be.) In scalar context, grep returns the number of times the expression was true.

Presuming @all_lines contains lines of code, this example weeds out comment lines:

@code_lines = grep !/^#/, @all_lines;

Since $_ is a reference into the list value, altering $_ will modify the elements of the original list. While this is useful and supported, it can occasionally cause bizarre results if you aren't expecting it. For example:

@list = qw(barney fred dino wilma);
@greplist = grep { s/^[bfd]// } @list;

@greplist is now "arney", "red", "ino", but @list is now "arney", "red", "ino", "wilma"! Caveat Programmor.

See also map. The following two statements are functionally equivalent:

@out = grep { EXPR } @in;
@out = map { EXPR ? $_ : () } @in

hex

hex EXPR

This function interprets EXPR as a hexadecimal string and returns the equivalent decimal value. (To interpret strings that might start with 0 or 0x see oct.) If EXPR is omitted, it interprets $_. The following code sets $number to 4,294,906,560:

$number = hex("ffff12c0");

To do the inverse function, use:

sprintf "%lx", $number;         # (That's an ell, not a one.)

import

import CLASSNAME LIST
import CLASSNAME

There is no built-in import function. It is merely an ordinary class method defined (or inherited) by modules that wish to export names to another module through the use operator. See use for details.

index

index STR, SUBSTR, POSITION
index STR, SUBSTR

This function returns the position of the first occurrence of SUBSTR in STR. The POSITION, if specified, says where to start looking. Positions are based at 0 (or whatever you've set the $[ variable to--but don't do that). If the substring is not found, the function returns one less than the base, ordinarily -1. To work your way through a string, you might say:

$pos = -1;
while (($pos = index($string, $lookfor, $pos)) > -1) {
    print "Found at $pos\n";
    $pos++;
}

int

int EXPR

This function returns the integer portion of EXPR. If EXPR is omitted, it uses $_. If you're a C programmer, you'll often forget to use int in conjunction with division, which is a floating-point operation in Perl:

$average_age = 939/16;      # yields 58.6875 (58 in C)
$average_age = int 939/16;  # yields 58

ioctl

ioctl FILEHANDLE, FUNCTION, SCALAR

This function implements the ioctl (2) system call. You'll probably have to say:

require "ioctl.ph";
    # probably /usr/local/lib/perl/ioctl.ph

first to get the correct function definitions. If ioctl.ph doesn't exist or doesn't have the correct definitions you'll have to roll your own, based on your C header files such as <sys/ioctl.h>. (The Perl distribution includes a script called h2ph to help you do this, but it's non-trivial.) SCALAR will be read and/or written depending on the FUNCTION--a pointer to the string value of SCALAR will be passed as the third argument of the actual ioctl (2) call. (If SCALAR has no string value but does have a numeric value, that value will be passed directly rather than a pointer to the string value.) The pack and unpack functions are useful for manipulating the values of structures used by ioctl. The following example sets the erase character to DEL on many UNIX systems (see the POSIX module in Chapter 7, The Standard Perl Library for a slightly more portable interface):

require 'ioctl.ph';
$getp = &TIOCGETP or die "NO TIOCGETP";
$sgttyb_t = "ccccs";            # 4 chars and a short
if (ioctl STDIN, $getp, $sgttyb) {
    @ary = unpack $sgttyb_t, $sgttyb;
    $ary[2] = 127;
    $sgttyb = pack $sgttyb_t, @ary;
    ioctl STDIN, &TIOCSETP, $sgttyb
        or die "Can't ioctl TIOCSETP: $!";
}

The return value of ioctl (and fcntl) is as follows:

System call returns Perl returns
-1 undefined value
0 string "0 but true"
anything else that number

Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating system:

$retval = ioctl(...) or $retval = -1;
printf "System returned %d\n", $retval;

Calls to ioctl should not be considered portable. If, say, you're merely turning off echo once for the whole script, it's much more portable (and not much slower) to say:

system "stty -echo";   # Works on most UNIX boxen.

Just because you can do something in Perl doesn't mean you ought to. To quote the Apostle Paul, "Everything is permissible--but not everything is beneficial."

join

join EXPR, LIST

This function joins the separate strings of LIST into a single string with fields separated by the value of EXPR, and returns the string. For example:

$_ = join ':', $login,$passwd,$uid,$gid,$gcos,$home,$shell;

To do the opposite, see split. To join things together into fixed-position fields, see pack.

The most efficient way to concatenate many strings together is to join them with a null string.

keys

keys HASH

This function returns a list consisting of all the keys of the named hash. The keys are returned in an apparently random order, but it is the same order as either the values or each function produces (assuming that the hash has not been modified between calls). Here is yet another way to print your environment:

@keys = keys %ENV;
@values = values %ENV;
while (@keys) {
    print pop(@keys), '=', pop(@values), "\n";
}

or how about sorted by key:

foreach $key (sort keys %ENV) {
    print $key, '=', $ENV{$key}, "\n";
}

To sort an array by value, you'll need to provide a comparison function. Here's a descending numeric sort of a hash by its values:

foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash)) {
    printf "%4d %s\n", $hash{$key}, $key;
}

Note that using keys on a hash bound to a largish DBM file will produce a largish list, causing you to have a largish process. You might prefer to use the each function in this case, which will iterate over the hash entries one-by-one without slurping them all into a single gargantuan list.

In scalar context, keys returns the number of elements of the hash (and resets the each iterator). However, to get this information for tied hashes, including DBM files, Perl must still walk the entire hash, so it's not very efficient in that case.

kill

kill LIST

This function sends a signal to a list of processes. The first element of the list must be the signal to send. You may use a signal name in quotes (without a SIG on the front). The function returns the number of processes successfully signaled. If the signal is negative, the function kills process groups instead of processes. (On System V, a negative process number will also kill process groups, but that's not portable.) Examples:

$cnt = kill 1, $child1, $child2;
kill 9, @goners;
kill 'STOP', getppid;  # Can *so* suspend my login shell...

last

last LABEL
last

The last command is like the break statement in C (as used in loops); it immediately exits the loop in question. If the LABEL is omitted, the command refers to the innermost enclosing loop. The continue block, if any, is not executed.

LINE: while (<STDIN>) {
    last LINE if /^$/; # exit when done with header
    # rest of loop here
}

lc

lc EXPR

This function returns a lowercased version of EXPR (or $_ if omitted). This is the internal function implementing the \L escape in double-quoted strings. POSIX setlocale (3) settings are respected.

lcfirst

lcfirst EXPR

This function returns a version of EXPR (or $_ if omitted) with the first character lowercased. This is the internal function implementing the \l escape in double-quoted strings. POSIX setlocale (3) settings are respected.

length

length EXPR

This function returns the length in bytes of the scalar value EXPR. If EXPR is omitted, the function returns the length of $_, but be careful that the next thing doesn't look like the start of an EXPR, or the tokener will get confused. When in doubt, always put in parentheses.

Do not try to use length to find the size of an array or hash. Use scalar @array for the size of an array, and scalar keys %hash for the size of a hash. (The scalar is typically dropped when redundant, which is typical.)

link

link OLDFILE, NEWFILE

This function creates a new filename linked to the old filename. The function returns 1 for success, 0 otherwise (and puts the error code into $!). See also symlink later in this chapter. This function is unlikely to be implemented on non-UNIX systems.

listen

listen SOCKET, QUEUESIZE

This function does the same thing as the listen (2) system call. It tells the system that you're going to be accepting connections on this socket and that the system can queue the number of waiting connections specified by QUEUESIZE. Imagine having call-waiting on your phone, with up to five callers queued. (Gives me the willies!) The function returns true if it succeeded, false otherwise (and puts the error code into $!). See the section "Sockets" in Chapter 6, Social Engineering.

local

local EXPR

This operator declares one or more global variables to have locally scoped values within the innermost enclosing block, subroutine, eval, or file. If more than one variable is listed, the list must be placed in parentheses, because the operator binds more tightly than comma. All the listed variables must be legal lvalues, that is, something you could assign to. This operator works by saving the current values of those variables on a hidden stack and restoring them upon exiting the block, subroutine, or eval, or file. After the local is executed, but before the scope is exited, any called subroutines will see the local, inner value, not the previous, outer value, because the variable is still a global variable, despite having a localized value. The technical term for this is "dynamic scoping".

The EXPR may be assigned to if desired, which allows you to initialize your local variables. (If no initializer is given, all scalars are initialized to the undefined value and all arrays and hashes to empty.) Commonly, this is used to name the formal arguments to a subroutine. As with ordinary assignment, if you use parentheses around the variables on the left (or if the variable is an array or hash), the expression on the right is evaluated in list context. Otherwise the expression on the right is evaluated in scalar context.

Here is a routine that executes some random piece of code that depends on $i running through a range of numbers. Note that the scope of $i propagates into the eval code.

&RANGEVAL(20, 30, '$foo[$i] = $i');
sub RANGEVAL {
    local($min, $max, $thunk) = @_;
    local $result = "";
    local $i;
    # Presumably $thunk makes reference to $i
    for ($i = $min; $i < $max; $i++) {
        $result .= eval $thunk;
    }
    $result;
}

This code demonstrates how to make a temporary modification to a global array:

if ($sw eq '-v') {
    # init local array with global array
    local @ARGV = @ARGV;
    unshift @ARGV, 'echo';
    system @ARGV;
}
# @ARGV restored

You can also temporarily modify hashes:

# temporarily add a couple of entries to the %digits hash
if ($base12) {
    # (NOTE: not claiming this is efficient!)
    local(%digits) = (%digits, T => 10, E => 11);
    parse_num();
}

But you probably want to be using my instead, because local isn't really what most people think of as local. See the section on my later.

localtime

localtime EXPR

This function converts the value returned by time to a nine-element list with the time corrected for the local time zone. It's typically used as follows:

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
        localtime(time);

All list elements are numeric, and come straight out of a struct tm. (That's a bit of C programming lingo--don't worry about it.) In particular this means that $mon has the range 0..11, $wday has the range 0..6, and the year has had 1,900 subtracted from it. (You can remember which ones are 0-based because those are the ones you're always using as subscripts into 0-based arrays containing month and day names.) If EXPR is omitted, it does localtime(time). For example, to get the name of the current day of the week:

$thisday = (Sun,Mon,Tue,Wed,Thu,Fri,Sat)[(localtime)[6]];

The Perl library module Time::Local contains a subroutine, timelocal(), that can convert in the opposite direction.

In scalar context, localtime returns a ctime (3)-like string based on the localtime value. For example, the date command can be emulated with:

perl -e 'print scalar localtime'

See also POSIX::strftime() in Chapter 7, The Standard Perl Library for a more fine-grained approach to formatting times.

log

log EXPR

This function returns logarithm (base e) of EXPR. If EXPR is omitted, the function returns the logarithm of $_.

lstat

lstat EXPR

This function does the same thing as the stat function, but if the last component of the filename is a symbolic link, stats a symbolic link instead of the file the symbolic link points to. (If symbolic links are unimplemented on your system, a normal stat is done instead.)

map

map BLOCK LIST
map EXPR, LIST

This function evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation. It evaluates BLOCK or EXPR in a list context, so each element of LIST may produce zero, one, or more elements in the returned value. These are all flattened into one list. For instance:

@words = map { split ' ' } @lines;

splits a list of lines into a list of words. Often, though, there is a one-to-one mapping between input values and output values:

@chars = map chr, @nums;

translates a list of numbers to the corresponding characters. And here's an example of a one-to-two mapping:

%hash = map { genkey($_), $_ } @array;

which is just a funny functional way to write this:

%hash = ();
foreach $_ (@array) {
    $hash{genkey($_)} = $_;
}

See also grep. map differs from grep in that map returns a list consisting of the results of each successive evaluation of EXPR, whereas grep returns a list consisting of each value of LIST for which EXPR evaluates to true.

mkdir

mkdir FILENAME, MODE

This function creates the directory specified by FILENAME, with permissions specified by the numeric MODE (as modified by the current umask). If it succeeds it returns 1, otherwise it returns 0 and sets $! (from the value of errno).

If mkdir (2) is not built in to your C library, Perl emulates it by calling the mkdir (1) program. If you are creating a long list of directories on such a system it will be more efficient to call the mkdir program yourself with the list of directories to avoid starting zillions of subprocesses.

msgctl

msgctl ID, CMD, ARG

This function calls the msgctl (2) system call. See msgctl (2) for details. If CMD is &IPC_STAT, then ARG must be a variable that will hold the returned msqid_ds structure. The return value works like ioctl's: the undefined value for error, "0 but true" for zero, or the actual return value otherwise. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "msg.ph";

This function is available only on machines supporting System V IPC, which turns out to be far fewer than those supporting sockets.

msgget

msgget KEY, FLAGS

This function calls the System V IPC msgget (2) system call. See msgget (2) for details. The function returns the message queue ID, or the undefined value if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "msg.ph";

This function is available only on machines supporting System V IPC.

msgrcv

msgrcv ID, VAR, SIZE, TYPE, FLAGS

This function calls the msgrcv (2) system call to receive a message from message queue ID into variable VAR with a maximum message size of SIZE. See msgrcv (2) for details. When a message is received, the message type will be the first thing in VAR, and the maximum length of VAR is SIZE plus the size of the message type. The function returns true if successful, or false if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "msg.ph";

This function is available only on machines supporting System V IPC.

msgsnd

msgsnd ID, MSG, FLAGS

This function calls the msgsnd (2) system call to send the message MSG to the message queue ID. See msgsnd (2) for details. MSG must begin with the long integer message type. You can create a message like this:

$msg = pack "L a*", $type, $text_of_message;

The function returns true if successful, or false if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "msg.ph";

This function is available only on machines supporting System V IPC.

my

my EXPR

This operator declares one or more private variables to exist only within the innermost enclosing block, subroutine, eval, or file. If more than one variable is listed, the list must be placed in parentheses, because the operator binds more tightly than comma. Only simple scalars or complete arrays and hashes may be declared this way. The variable name may not be package qualified, because package variables are all global, and private variables are not related to any package. Unlike local, this operator has nothing to do with global variables, other than hiding any other variable of the same name from view within its scope. (A global variable can always be accessed through its package-qualified form, however.) A private variable is not visible until the statement after its declaration. Subroutines called from within the scope of such a private variable cannot see the private variable unless the subroutine is also textually declared within the scope of the variable. The technical term for this is "lexical scoping", so we often call these "lexical variables". In C culture they're called "auto" variables, since they're automatically allocated and deallocated at scope entry and exit.

The EXPR may be assigned to if desired, which allows you to initialize your lexical variables. (If no initializer is given, all scalars are initialized to the undefined value and all arrays and hashes to empty arrays.) As with ordinary assignment, if you use parentheses around the variables on the left (or if the variable is an array or hash), the expression on the right is evaluated in list context. Otherwise the expression on the right is evaluated in scalar context. You can name your formal subroutine parameters with a list assignment, like this:

my ($friends, $romans, $countrymen) = @_;

Be careful not to omit the parentheses indicating list assignment, like this:

my $country = @_;  # right or wrong?

This assigns the length of the array (that is, the number of the subroutine's arguments) to the variable, since the array is being evaluated in scalar context. You can profitably use scalar assignment for a formal parameter though, as long as you use the shift operator. In fact, since object methods are passed the object as the first argument, many such method subroutines start off like this:

sub simple_as {
    my $self = shift;   # scalar assignment
    my ($a,$b,$c) = @_; # list assignment
    ...
}

new

new CLASSNAME LIST
new CLASSNAME

There is no built-in new function. It is merely an ordinary constructor method (subroutine) defined (or inherited) by the CLASSNAME module to let you construct objects of type CLASSNAME. Most constructors are named "new", but only by convention, just to delude C++ programmers into thinking they know what's going on.

next

next LABEL
next

The next command is like the continue statement in C: it starts the next iteration of the loop designated by LABEL:

LINE: while (<STDIN>) {
    next LINE if /^#/;     # discard comments
    ...
}

Note that if there were a continue block in this example, it would execute immediately following the invocation of next. When LABEL is omitted, the command refers to the innermost enclosing loop.

no

no Module LIST

See the use operator, which no is the opposite of, kind of.

oct

oct EXPR

This function interprets EXPR as an octal string and returns the equivalent decimal value. (If EXPR happens to start off with 0x, it is interpreted as a hex string instead.) The following will handle decimal, octal, and hex in the standard notation:

$val = oct $val if $val =~ /^0/;

If EXPR is omitted, the function interprets $_. To perform the inverse function on octal numbers, use:

$oct_string = sprintf "%lo", $number;

open

open FILEHANDLE, EXPR
open FILEHANDLE

This function opens the file whose filename is given by EXPR, and associates it with FILEHANDLE. If EXPR is omitted, the scalar variable of the same name as the FILEHANDLE must contain the filename. (And you must also be careful to use "or die" after the statement rather than "|| die", because the precedence of || is higher than list operators like open.) FILEHANDLE may be a directly specified filehandle name, or an expression whose value will be used for the filehandle. The latter is called an indirect filehandle. If you supply an undefined variable for the indirect filehandle, Perl will not automatically fill it in for you--you have to make sure the expression returns something unique, either a string specifying the actual filehandle name, or a filehandle object from one of the object-oriented I/O packages. (A filehandle object is unique because you call a constructor to generate the object. See the example later in this section.)

After the filehandle is determined, the filename string is processed. First, any leading and trailing whitespace is removed from the string. Then the string is examined on both ends for characters specifying how the file is to be opened. (By an amazing coincidence, these characters look just like the characters you'd use to indicate I/O redirection to the Bourne shell.) If the filename begins with < or nothing, the file is opened for input. If the filename begins with >, the file is truncated and opened for output. If the filename begins with >>, the file is opened for appending. (You can also put a + in front of the > or < to indicate that you want both read and write access to the file.) If the filename begins with |, the filename is interpreted as a command to which output is to be piped, and if the filename ends with a |, the filename is interpreted as command which pipes input to us. You may not have an open command that pipes both in and out, although the IPC::Open2 and IPC::Open3 library routines give you a close equivalent. See the section "Bidirectional Communication" in Chapter 6, Social Engineering.

Any pipe command containing shell metacharacters is passed to /bin/sh for execution; otherwise it is executed directly by Perl. The filename "-" refers to STDIN, and ">-" refers to STDOUT. open returns non-zero upon success, the undefined value otherwise. If the open involved a pipe, the return value happens to be the process ID of the subprocess.

If you're unfortunate enough to be running Perl on a system that distinguishes between text files and binary files (modern operating systems don't care), then you should check out binmode for tips for dealing with this. The key distinction between systems that need binmode and those that don't is their text file formats. Systems like UNIX and Plan9 that delimit lines with a single character, and that encode that character in C as '\n', do not need binmode. The rest need it.

Here is some code that shows the relatedness of a filehandle and a variable of the same name:

$ARTICLE = "/usr/spool/news/comp/lang/perl/misc/38245";
open ARTICLE or die "Can't find article $ARTICLE: $!\n";
while (<ARTICLE>) {...

Append to a file like this:

open LOG, '>>/usr/spool/news/twitlog'; # (`log' is reserved)

Pipe your data from a process:

open ARTICLE, "caesar <$article |";   # decrypt article with rot13

Here < does not indicate that Perl should open the file for input, because < is not the first character of EXPR. Rather, the concluding | indicates that input is to be piped from caesar <$article (from the program caesar, which takes $article as its standard input). The < is interpreted by the subshell that Perl uses to start the pipe, because < is a shell metacharacter.

Or pipe your data to a process:

open EXTRACT, "|sort >/tmp/Tmp$$";    # $$ is our process number

In this next example we show one way to do recursive opens, via indirect filehandles. The files will be opened on filehandles fh01, fh02, fh03, and so on. Because $input is a local variable, it is preserved through recursion, allowing us to close the correct file before we return.

# Process argument list of files along with any includes.
foreach $file (@ARGV) {
    process($file, 'fh00');
}
sub process {
    local($filename, $input) = @_;
    $input++;               # this is a string increment
    unless (open $input, $filename) {
        print STDERR "Can't open $filename: $!\n";
        return;
    }
    while (<$input>) {      # note the use of indirection
        if (/^#include "(.*)"/) {
            process($1, $input);
            next;
        }
        ...               # whatever
    }
    close $input;
}

You may also, in the Bourne shell tradition, specify an EXPR beginning with >&, in which case the rest of the string is interpreted as the name of a filehandle (or file descriptor, if numeric) which is to be duped and opened.[6] You may use & after >, >>, <, +>, +>>, and +<. The mode you specify should match the mode of the original filehandle. Here is a script that saves, redirects, and restores STDOUT and STDERR:

[6] The word "dup" is UNIX-speak for "duplicate". We're not really trying to dupe you. Trust us.

#!/usr/bin/perl
open SAVEOUT, ">&STDOUT";
open SAVEERR, ">&STDERR";
open STDOUT, ">foo.out" or die "Can't redirect stdout";
open STDERR, ">&STDOUT" or die "Can't dup stdout";
select STDERR; $| = 1;         # make unbuffered
select STDOUT; $| = 1;         # make unbuffered
print STDOUT "stdout 1\n";     # this propagates to
print STDERR "stderr 1\n";     # subprocesses too
close STDOUT;
close STDERR;
open STDOUT, ">&SAVEOUT";
open STDERR, ">&SAVEERR";
print STDOUT "stdout 2\n";
print STDERR "stderr 2\n";

If you specify <&=N, where N is a number, then Perl will do an equivalent of C's fdopen (3) of that file descriptor; this is more parsimonious with file descriptors than the dup form described earlier. (On the other hand, it's more dangerous, since two filehandles may now be sharing the same file descriptor, and a close on one filehandle may prematurely close the other.) For example:

open FILEHANDLE, "<&=$fd";

If you open a pipe to or from the command "-" (that is, either |- or -|), then an implicit fork is done, and the return value of open is the pid of the child within the parent process, and 0 within the child process. (Use defined($pid) in either the parent or child to determine whether the open was successful.) The filehandle behaves normally for the parent, but input and output to that filehandle is piped from or to the STDOUT or STDIN of the child process. In the child process the filehandle isn't opened--I/O happens from or to the new STDIN or STDOUT. Typically this is used like the normal piped open when you want to exercise more control over just how the pipe command gets executed, such as when you are running setuid, and don't want to have to scan shell commands for metacharacters. The following pairs are equivalent:

open FOO, "|tr '[a-z]' '[A-Z]'";
open FOO, "|-" or exec 'tr', '[a-z]', '[A-Z]';
open FOO, "cat -n file|";
open FOO, "-|" or exec 'cat', '-n', 'file';

Explicitly closing any piped filehandle causes the parent process to wait for the child to finish, and returns the status value in $?. On any operation which may do a fork, unflushed buffers remain unflushed in both processes, which means you may need to set $| on one or more filehandles to avoid duplicate output (and then do output to flush them).

Filehandles STDIN, STDOUT, and STDERR remain open following an exec. Other filehandles do not. (However, on systems supporting the fcntl function, you may modify the close-on-exec flag for a filehandle. See fcntl earlier in this chapter. See also the special $^F variable.)

Using the constructor from the FileHandle module, described in Chapter 7, The Standard Perl Library, you can generate anonymous filehandles which have the scope of whatever variables hold references to them, and automatically close whenever and however you leave that scope:

use FileHandle;
...
sub read_myfile_munged {
    my $ALL = shift;
    my $handle = new FileHandle;
    open $handle, "myfile" or die "myfile: $!";
    $first = <$handle> or return ();      # Automatically closed here.
    mung $first or die "mung failed";     # Or here.
    return $first, <$handle> if $ALL;     # Or here.
    $first;                               # Or here.
}

In order to open a file with arbitrary weird characters in it, it's necessary to protect any leading and trailing whitespace, like this:

$file =~ s#^\s#./$&#;
open FOO, "< $file\0";

But we've never actually seen anyone use that in a script . . .

If you want a real C open (2), then you should use the sysopen function. This is another way to protect your filenames from interpretation. For example:

use FileHandle;
sysopen HANDLE, $path, O_RDWR|O_CREAT|O_EXCL, 0700
    or die "sysopen $path: $!";
HANDLE->autoflush(1);
HANDLE->print("stuff $$\n");
seek HANDLE, 0, 0;
print "File contains: ", <HANDLE>;

See seek for some details about mixing reading and writing.

opendir

opendir DIRHANDLE, EXPR

This function opens a directory named EXPR for processing by readdir, telldir, seekdir, rewinddir, and closedir. The function returns true if successful. Directory handles have their own namespace separate from filehandles.

ord

ord EXPR

This function returns the numeric ASCII value of the first character of EXPR. If EXPR is omitted, it uses $_. The return value is always unsigned. If you want a signed value, use unpack('c', EXPR). If you want all the characters of the string converted to a list of numbers, use unpack('C*', EXPR) instead.

pack

pack TEMPLATE, LIST

This function takes a list of values and packs it into a binary structure, returning the string containing the structure. The TEMPLATE is a sequence of characters that give the order and type of values, as follows:

Character Meaning
a An ASCII string, will be null padded
A An ASCII string, will be space padded
b A bit string, low-to-high order (like vec( ))
B A bit string, high-to-low order
c A signed char value
C An unsigned char value
d A double-precision float in the native format
f A single-precision float in the native format
h A hexadecimal string, low nybble first
H A hexadecimal string, high nybble first
i A signed integer value
I An unsigned integer value
l A signed long value
L An unsigned long value (continued)
n A short in "network" (big-endian) order
N A long in "network" (big-endian) order
p A pointer to a string
P A pointer to a structure (fixed-length string)
s A signed short value
S An unsigned short value
v A short in "VAX" (little-endian) order
V A long in "VAX" (little-endian) order
u A uuencoded string
x A null byte
X Back up a byte
@ Null-fill to absolute position

Each character may optionally be followed by a number which gives a repeat count. Together the character and the repeat count make a field specifier. Field specifiers may be separated by whitespace, which will be ignored. With all types except "a" and "A", the pack function will gobble up that many values from the LIST. Saying "*" for the repeat count means to use however many items are left. The "a" and "A" types gobble just one value, but pack it as a string of length count, padding with nulls or spaces as necessary. (When unpacking, "A" strips trailing spaces and nulls, but "a" does not.) Real numbers (floats and doubles) are in the native machine format only; due to the multiplicity of floating formats around, and the lack of a standard network representation, no facility for interchange has been made. This means that packed floating-point data written on one machine may not be readable on another--even if both use IEEE floating-point arithmetic (as the endian-ness of the memory representation is not part of the IEEE spec). Also, Perl uses doubles internally for all numeric calculation, and converting from double to float to double will lose precision; that is, unpack(`f`, pack(`f`,$num)) will not in general equal $num.

This first pair of examples packs numeric values into bytes:

$out = pack "cccc", 65, 66, 67, 68;      # $out eq "ABCD"
$out = pack "c4", 65, 66, 67, 68;        # same thing

This does a similar thing, with a couple of nulls thrown in:

$out = pack "ccxxcc", 65, 66, 67, 68;    # $out eq "AB\0\0CD"

Packing your shorts doesn't imply that you're portable:

$out = pack "s2", 1, 2;    # "\1\0\2\0" on little-endian
                           # "\0\1\0\2" on big-endian

On binary and hex packs, the count refers to the number of bits or nybbles, not the number of bytes produced:

$out = pack "B32", "01010000011001010111001001101100";
$out = pack "H8", "5065726c";    # both produce "Perl"

The length on an "a" field applies only to one string:

$out = pack "a4", "abcd", "x", "y", "z";      # "abcd"

To get around that limitation, use multiple specifiers:

$out = pack "aaaa",  "abcd", "x", "y", "z";   # "axyz"
$out = pack "a" x 4, "abcd", "x", "y", "z";   # "axyz"

The "a" format does null filling:

$out = pack "a14", "abcdefg";   # "abcdefg\0\0\0\0\0\0\0"

This template packs a C struct tm record (at least on some systems):

$out = pack "i9pl", gmtime, $tz, $toff;

The same template may generally also be used in the unpack function. If you want to join variable length fields with a delimiter, use the join function.

Note that, although all of our examples use literal strings as templates, there is no reason you couldn't pull in your templates from a disk file. You could, in fact, build an entire relational database system around this function.

package

package NAMESPACE

This is not really a function, but a declaration that says that the rest of the innermost enclosing block, subroutine, eval or file belongs to the indicated namespace. (The scope of a package declaration is thus the same as the scope of a local or my declaration.) All subsequent references to unqualified global identifiers will be resolved by looking them up in the declared package's symbol table. A package declaration affects only global variables--including those you've used local on--but not lexical variables created with my.

Typically you would put a package declaration as the first thing in a file that is to be included by the require or use operator, but you can put one anywhere that a statement would be legal. When defining a class or a module file, it is customary to name the package the same name as the file, to avoid confusion. (It's also customary to name such packages beginning with a capital letter, because lowercase modules are by convention interpreted as pragmas.)

You can switch into a given package in more than one place; it merely influences which symbol table is used by the compiler for the rest of that block. (If it sees another package declaration at the same level, the new one overrides the previous one.) Your main program is assumed to start with a package main declaration.

You can refer to variables and filehandles in other packages by qualifying the identifier with the package name and a double colon: $Package::Variable. If the package name is null, the main package as assumed. That is, $::sail is equivalent to $main::sail.

The symbol table for a package is stored in a hash with a name ending in a double colon. The main package's symbol table is named %main:: for example. So the package symbol *main::sail can also be accessed as $main::{"sail"}.

See "Packages" in Chapter 5, Packages, Modules, and Object Classes, for more information about packages, modules, and classes. See my in Chapter 3, Functions, for other scoping issues.

pipe

pipe READHANDLE, WRITEHANDLE

Like the corresponding system call, this function opens a pair of connected pipes--see pipe (2). This call is almost always used right before a fork, after which the pipe's reader should close WRITEHANDLE, and the writer close READHANDLE. (Otherwise the pipe won't indicate EOF to the reader when the writer closes it.) Note that if you set up a loop of piped processes, deadlock can occur unless you are very careful. In addition, note that Perl's pipes use standard I/O buffering, so you may need to set $| on your WRITEHANDLE to flush after each output command, depending on the application--see select (output filehandle).

See also the section on "Pipes" in Chapter 6, Social Engineering.

pop

pop ARRAY
pop

This function treats an array like a stack--it pops and returns the last value of the array, shortening the array by 1. If ARRAY is omitted, the function pops @ARGV (in the main program), or @_ (in subroutines). It has the same effect as:

$tmp = $ARRAY[$#ARRAY--];

or:

$tmp = splice @ARRAY, -1;

If there are no elements in the array, pop returns the undefined value. See also push and shift. If you want to pop more than one element, use splice.

Note that pop requires its first argument to be an array, not a list. If you just want the last element of a list, use this:

(something_returning_a_list)[-1]

pos

pos SCALAR

Returns the location in SCALAR where the last m//g search over SCALAR left off. It returns the offset of the character after the last one matched. (That is, it's equivalent to length($`) + length($&).) This is the offset where the next m//g search on that string will start. Remember that the offset of the beginning of the string is 0. For example:

$grafitto = "fee fie foe foo";
while ($grafitto =~ m/e/g) {
    print pos $grafitto, "\n";
}

prints 2, 3, 7, and 11, the offsets of each of the characters following an "e". The pos function may be assigned a value to tell the next m//g where to start:

$grafitto = "fee fie foe foo";
pos $grafitto = 4;  # Skip the fee, start at fie
while ($grafitto =~ m/e/g) {
        print pos $grafitto, "\n";
}

This prints only 7 and 11. (Thank heaven.) The regular expression assertion, \G, matches only at the location currently specified by pos for the string being searched.

print

print FILEHANDLE LIST
print LIST
print

This function prints a string or a comma-separated list of strings. The function returns 1 if successful, 0 otherwise. FILEHANDLE may be a scalar variable name (unsubscripted), in which case the variable contains either the name of the actual filehandle or a reference to a filehandle object from one of the object-oriented filehandle packages. FILEHANDLE may also be a block that returns either kind of value:

print { $OK ? "STDOUT" : "STDERR" } "stuff\n";
print { $iohandle[$i] } "stuff\n";

Note that if FILEHANDLE is a variable and the next token is a term, it may be misinterpreted as an operator unless you interpose a + or put parentheses around the arguments. For example:

print $a - 2;   # prints $a - 2 to default filehandle (usually STDOUT)
print $a (- 2); # prints -2 to filehandle specified in $a
print $a -2;    # ditto (weird parsing rules :-)

If FILEHANDLE is omitted, the function prints to the currently selected output filehandle, initially STDOUT. To set the default output filehandle to something other than STDOUT use the select(FILEHANDLE) operation.[7] If LIST is also omitted, prints $_. Note that, because print takes a LIST, anything in the LIST is evaluated in list context, and any subroutine that you call will likely have one or more of its own internal expressions evaluated in list context. Thus, when you say:

[7] Thus, STDOUT isn't really the default filehandle for print. It's merely the default default filehandle.

print OUT <STDIN>;

it is not going to print out the next line from standard input, but all the rest of the lines from standard input up to end-of-file, since that's what <STDIN> returns in list context. Also, remembering the if-it-looks-like-a-function-it-is-a-function rule, be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print--interpose a + or put parens around all the arguments:

print (1+2)*3, "\n";            # WRONG
print +(1+2)*3, "\n";           # ok
print ((1+2)*3, "\n");          # ok

printf

printf FILEHANDLE LIST
printf LIST

This function prints a formatted string to FILEHANDLE or, if omitted, the currently selected output filehandle, initially STDOUT. The first item in the LIST must be a string that says how to format the rest of the items. This is similar to the C library's printf (3) and fprintf (3) function, except that the * field width specifier is not supported. The function is equivalent to:

print FILEHANDLE sprintf LIST

See print and sprintf. The description of sprintf includes the list of acceptable specifications for the format string.

Don't fall into the trap of using a printf when a simple print would do. The print is more efficient, and less error prone.

push

push ARRAY, LIST

This function treats ARRAY as a stack, and pushes the values of LIST onto the end of ARRAY. The length of ARRAY increases by the length of LIST. The function returns this new length. The push function has the same effect as:

foreach $value (LIST) {
    $ARRAY[++$#ARRAY] = $value;
}

or:

splice @ARRAY, @ARRAY, 0, LIST;

but is more efficient (for both you and your computer). You can use push in combination with shift to make a fairly time-efficient shift register or queue:

for (;;) {
    push @ARRAY, shift @ARRAY;
    ...
}

See also pop and unshift.

q/STRING/

q/STRING/
qq/STRING/
qx/STRING/
qw/STRING/

Generalized quotes. See Chapter 2, The Gory Details.

quotemeta

quotemeta EXPR

This function returns the value of EXPR (or $_ if not specified) with all non-alphanumeric characters backslashed. This is the internal function implementing the \Q escape in interpolative contexts (including double-quoted strings, backticks, and patterns).

rand

rand EXPR
rand

This function returns a random fractional number between 0 and the value of EXPR. (EXPR should be positive.) If EXPR is omitted, the function returns a value between 0 and 1 (including 0, but excluding 1). See also srand.

To get an integral value, combine this with int, as in:

$roll = int(rand 6) + 1;       # $roll is now an integer between 1 and 6

read

read FILEHANDLE, SCALAR, LENGTH, OFFSET
read FILEHANDLE, SCALAR, LENGTH

This function attempts to read LENGTH bytes of data into variable SCALAR from the specified FILEHANDLE. The function returns the number of bytes actually read, 0 at end-of-file. It returns the undefined value on error. SCALAR will be grown or shrunk to the length actually read. The OFFSET, if specified, says where in the variable to start putting bytes, so that you can do a read into the middle of a string.

To copy data from filehandle FROM into filehandle TO, you could say:

while (read FROM, $buf, 16384) {
    print TO $buf;
}

Note that the opposite of read is simply a print, which already knows the length of the string you want to write, and can write a string of any length.

Perl's read function is actually implemented in terms of standard I/O's fread (3) function, so the actual read (2) system call may read more than LENGTH bytes to fill the input buffer, and fread (3) may do more than one system read (2) in order to fill the buffer. To gain greater control, specify the real system call using sysread. Calls to read and sysread should not be intermixed unless you are into heavy wizardry (or pain).

readdir

readdir DIRHANDLE

This function reads directory entries from a directory handle opened by opendir. In scalar context, this function returns the next directory entry, if any, otherwise an undefined value. In list context, it returns all the rest of the entries in the directory, which will of course be a null list if there are none. For example:

opendir THISDIR, "." or die "serious dainbramage: $!";
@allfiles = readdir THISDIR;
closedir THISDIR;
print "@allfiles\n";

prints all the files in the current directory on one line. If you want to avoid the "." and ".." entries, use this instead:

@allfiles = grep !/^\.\.?$/, readdir THISDIR;

And to avoid all .* files (like the ls program):

@allfiles = grep !/^\./, readdir THISDIR;

To get just text files, say this:

@textfiles = grep -T, readdir THISDIR;

But watch out on that last one, because the result of readdir needs to have the directory part glued back on if it's not the current directory--like this:

opendir THATDIR, $thatdir;
@text_of_thatdir = grep -T, map "$thatdir/$_", readdir THATDIR;
closedir THATDIR;

readlink

readlink EXPR

This function returns the name of a file pointed to by a symbolic link. EXPR should evaluate to a filename, the last component of which is a symbolic link. If it is not a symbolic link, or if symbolic links are not implemented, or if some system error occurs, the undefined value is returned, and you should check the error code in $!. If EXPR is omitted, the function uses $_.

Be aware that the returned symlink may be relative to the location you specified. For instance, you may say:

readlink "/usr/local/src/express/yourself.h"

and readlink might return:

../express.1.23/includes/yourself.h

which is not directly usable as a filename unless your current directory happens to be /usr/local/src/express.

recv

recv SOCKET, SCALAR, LEN, FLAGS

This function receives a message on a socket. It attempts to receive LENGTH bytes of data into variable SCALAR from the specified SOCKET filehandle. The function returns the address of the sender, or the undefined value if there's an error. SCALAR will be grown or shrunk to the length actually read. The function takes the same flags as recv (2). See the section "Sockets" in Chapter 6, Social Engineering.

redo

redo LABEL
redo

The redo command restarts a loop block without evaluating the conditional again. The continue block, if any, is not executed. If the LABEL is omitted, the command refers to the innermost enclosing loop. This command is normally used by programs that wish to deceive themselves about what was just input:

# A loop that joins lines continued with a backslash.
LINE: while (<STDIN>) {
    if (s/\\\n$// and $nextline = <STDIN>) {
        $_ .= $nextline;
        redo LINE;
    }
    print;  # or whatever...
}

ref

ref EXPR

The ref operator returns a true value if EXPR is a reference, the null string otherwise. The value returned depends on the type of thing the reference is a reference to. Built-in types include:

REF
SCALAR
ARRAY
HASH
CODE
GLOB

If the referenced object has been blessed into a package, then that package name is returned instead. You can think of ref as a "typeof" operator.

if (ref($r) eq "HASH") {
    print "r is a reference to a hash.\n";
} 
elsif (ref($r) eq "Hump") {
    print "r is a reference to a Hump object.\n";
} 
elsif (not ref $r) {
    print "r is not a reference at all.\n";
}

See Chapter 4, References and Nested Data Structures for more details.

rename

rename OLDNAME, NEWNAME

This function changes the name of a file. It returns 1 for success, 0 otherwise (and puts the error code into $!). It will not work across filesystem boundaries. If there is already a file named NEWNAME, it will be destroyed.

require

require EXPR
require

This function asserts a dependency of some kind on its argument. (If EXPR is not supplied, $_ is used as the argument.)

If the argument is a string, this function includes and executes the Perl code found in the separate file whose name is given by the string. This is similar to performing an eval on the contents of the file, except that require checks to see that the library file has not been included already. (It can thus be used to express file dependencies without worrying about duplicate compilation.) The function also knows how to search the include path stored in the @INC array (see the section "Special Variables" in Chapter 2, The Gory Details).

This form of the require function behaves much like this subroutine:

sub require {
    my($filename) = @_;
    return 1 if $INC{$filename};
    my($realfilename, $result);
    ITER: {
        foreach $prefix (@INC) {
            $realfilename = "$prefix/$filename";
            if (-f $realfilename) {
                $result = eval `cat $realfilename`;
                last ITER;
            }
        }
        die "Can't find $filename in \@INC";
    }
    die $@ if $@;
    die "$filename did not return true value" unless $result;
    $INC{$filename} = $realfilename;
    return $result;
}

Note that the file must return true as the last value to indicate successful execution of any initialization code, so it's customary to end such a file with 1; unless you're sure it'll return true otherwise.

This operator differs from the now somewhat obsolete do EXPR operator in that the file will not be included again if it was included previously with either a require or do EXPR command, and any difficulties will be detected and reported as fatal errors (which may be trapped by use of eval). The do command does know how to do the @INC path search, however.

If require's argument is a number, the version number of the currently executing Perl binary (as known by $]) is compared to EXPR, and if smaller, execution is immediately aborted. Thus, a script that requires Perl version 5.003 can have as its first line:

require 5.003;

and earlier versions of Perl will abort.

If require's argument is a package name (see package), require assumes an automatic .pm suffix, making it easy to load standard modules. This is like use, except that it happens at run-time, not compile time, and the import routine is not called. For example, to pull in Socket.pm without introducing any symbols into the current package, say this:

require Socket; # instead of "use Socket;"

However, one can get the same effect with the following, which has the advantage of giving a compile-time warning if Socket.pm can't be located:

use Socket ();

reset

reset EXPR
reset

This function is generally used at the top of a loop or in a continue block at the end of a loop, to clear global variables or reset ?? searches so that they work again. The expression is interpreted as a list of single characters (hyphens are allowed for ranges). All scalar variables, arrays, and hashes beginning with one of those letters are reset to their pristine state. If the expression is omitted, one-match searches (?PATTERN?) are reset to match again. The function resets variables or searches for the current package only. It always returns 1.

To reset all "X" variables, say this:

reset 'X';

To reset all lowercase variables, say this:

reset 'a-z';

Lastly, to just reset ?? searches, say:

reset;

Note that resetting "A-Z" is not recommended since you'll wipe out your ARGV, INC, ENV, and SIG arrays.

Lexical variables (created by my) are not affected. Use of reset is vaguely deprecated.

return

return EXPR

This function returns from a subroutine (or eval) with the value specified. (In the absence of an explicit return, the value of the last expression evaluated is returned.) Use of return outside of a subroutine or eval is verboten, and results in a fatal error. Note also that an eval cannot do a return on behalf of the subroutine that called the eval.

The supplied expression will be evaluated in the context of the subroutine invocation. That is, if the subroutine was called in a scalar context, EXPR is also evaluated in scalar context. If the subroutine was invoked in a list context, then EXPR is also evaluated in list context, and can return a list value. A return with no argument returns the undefined value in scalar context, and a null list in list context. The context of the subroutine call can be determined from within the subroutine by using the (misnamed) wantarray function.

reverse

reverse LIST

In list context, this function returns a list value consisting of the elements of LIST in the opposite order. This is fairly efficient because it just swaps the pointers around. The function can be used to create descending sequences:

for (reverse 1 .. 10) { ... }

Because of the way hashes flatten into lists when passed to (non-hash-aware) functions, reverse can also be used to invert a hash, presuming the values are unique:

%barfoo = reverse %foobar;

In scalar context, the function concatenates all the elements of LIST together and then returns the reverse of that, character by character.

A small hint: reversing a list sorted earlier by a user-defined function can sometimes be achieved more easily by simply sorting in the opposite direction in the first place.

rewinddir

rewinddir DIRHANDLE

This function sets the current position to the beginning of the directory for the readdir routine on DIRHANDLE. The function may not be available on all machines that support readdir.

rindex

rindex STR, SUBSTR, POSITION
rindex STR, SUBSTR

This function works just like index except that it returns the position of the last occurrence of SUBSTR in STR (a reverse index). The function returns $[-1 if not found. Since $[ is almost always 0 nowadays, the function almost always returns -1. POSITION, if specified, is the rightmost position that may be returned. To work your way through a string backward, say:

$pos = length $string;
while (($pos = rindex $string, $lookfor, $pos) >= 0) {
    print "Found at $pos\n";
    $pos--;
}

rmdir

rmdir FILENAME

This function deletes the directory specified by FILENAME if it is empty. If it succeeds, it returns 1, otherwise it returns 0 and puts the error code into $!. If FILENAME is omitted, the function uses $_.

s///

s///

The substitution operator. See "Pattern Matching Operators" in Chapter 2, The Gory Details.

scalar

scalar EXPR

This pseudo-function may be used within a LIST to force EXPR to be evaluated in scalar context when evaluation in list context would produce a different result.

For example:

local($nextvar) = scalar <STDIN>;

prevents <STDIN> from reading all the lines from standard input before doing the assignment, since assignment to a local list provides a list context. (Without the use of scalar in this example, the first line from <STDIN> would still be assigned to $nextvar, but the subsequent lines would be read and thrown away. This is because the assignment is being made to a list--one that happens to be able to receive only a single, scalar value.)

Of course, a simpler way with less typing would be to simply leave the parentheses off, thereby changing the list context to a scalar one:

local $nextvar = <STDIN>;

Since a print function is a LIST operator, you have to say:

print "Length is ", scalar(@ARRAY), "\n";

if you want the length of @ARRAY to be printed out.

One never needs to force evaluation in a list context, because any operation that wants a list already provides a list context to its list arguments for free. So there's no list function corresponding to scalar.

seek

seek FILEHANDLE, OFFSET, WHENCE

This function positions the file pointer for FILEHANDLE, just like the fseek (3) call of standard I/O. The first position in a file is at offset 0, not offset 1, and offsets refer to byte positions, not line numbers. (In general, since line lengths vary, it's not possible to access a particular line number without examining the whole file up to that line number, unless all your lines are known to be of a particular length, or you've built an index that translates line numbers into byte offsets.) FILEHANDLE may be an expression whose value gives the name of the filehandle or a reference to a filehandle object. The function returns 1 upon success, 0 otherwise. For handiness, the function can calculate offsets from various file positions for you. The value of WHENCE specifies which file position your OFFSET is relative to: 0, the beginning of the file; 1, the current position in the file; or 2, the end of the file. OFFSET may be negative for a WHENCE of 1 or 2.

One interesting use for this function is to allow you to follow growing files, like this:

for (;;) {
    while (<LOG>) {
        ...           # Process file.
    }
    sleep 15;
    seek LOG,0,1;      # Reset end-of-file error.
}

The final seek clears the end-of-file error without moving the pointer. If that doesn't work (depending on your C library's standard I/O implementation), then you may need something more like this:

for (;;) {
    for ($curpos = tell FILE; $_ = <FILE>; $curpos = tell FILE) {
        # search for some stuff and put it into files
    }
    sleep $for_a_while;
    seek FILE, $curpos, 0;
}

Similar strategies could be used to remember the seek addresses of each line in an array.

seekdir

seekdir DIRHANDLE, POS

This function sets the current position for the readdir routine on DIRHANDLE. POS must be a value returned by telldir. This function has the same caveats about possible directory compaction as the corresponding system library routine. The function may not be implemented everywhere that readdir is. It's certainly not implemented where readdir isn't.

select (output filehandle)

select FILEHANDLE
select

For historical reasons, there are two select operators that are totally unrelated to each other. See the next section for the other one. This select operator returns the currently selected output filehandle, and if FILEHANDLE is supplied, sets the current default filehandle for output. This has two effects: first, a write or a print without a filehandle will default to this FILEHANDLE. Second, special variables related to output will refer to this output filehandle. For example, if you have to set the same top-of-form format for more than one output filehandle, you might do the following:

select REPORT1;
$^ = 'MyTop';
select REPORT2;
$^ = 'MyTop';

But note that this leaves REPORT2 as the currently selected filehandle. This could be construed as antisocial, since it could really foul up some other routine's print or write statements. Properly written library routines leave the currently selected filehandle the same on exit as it was upon entry. To support this, FILEHANDLE may be an expression whose value gives the name of the actual filehandle. Thus, you can save and restore the currently selected filehandle:

my $oldfh = select STDERR; $| = 1; select $oldfh;

or (being bizarre and obscure):

select((select(STDERR), $| = 1)[0])

This example works by building a list consisting of the returned value from select(STDERR) (which selects STDERR as a side effect) and $| = 1 (which is always 1), but sets autoflushing on the now-selected STDERR as a side effect. The first element of that list (the previously selected filehandle) is now used as an argument to the outer select. Bizarre, right? That's what you get for knowing just enough Lisp to be dangerous.

However, now that we've explained all that, we should point out that you rarely need to use this form of select nowadays, because most of the special variables you would want to set have object-oriented wrapper methods to do it for you. So instead of setting $| directly, you might say:

use FileHandle;
STDOUT->autoflush (1);

And the earlier format example might be coded as:

use FileHandle;
REPORT1->format_top_name("MyTop");
REPORT2->format_top_name("MyTop");

select (ready file descriptors)

select RBITS, WBITS, EBITS, TIMEOUT

The four-argument select operator is totally unrelated to the previously described operator. This operator is for discovering which (if any) of your file descriptors are ready to do input or output, or to report an exceptional condition. (This helps you avoid having to do polling.) It calls the select (2) system call with the bitmasks you've specified, which you can construct using fileno and vec, like this:

$rin = $win = $ein = "";
vec($rin, fileno(STDIN), 1) = 1;
vec($win, fileno(STDOUT), 1) = 1;
$ein = $rin | $win;

If you want to select on many filehandles you might wish to write a subroutine:

sub fhbits {
    my @fhlist = @_;
    my $bits;
    for (@fhlist) {
        vec($bits, fileno($_), 1) = 1;
    }
    return $bits;
}
$rin = fhbits(qw(STDIN TTY MYSOCK));

If you wish to use the same bitmasks repeatedly (and it's more efficient if you do), the usual idiom is:

($nfound, $timeleft) =
    select($rout=$rin, $wout=$win, $eout=$ein, $timeout);

Or to block until any file descriptor becomes ready:

$nfound = select($rout=$rin, $wout=$win, $eout=$ein, undef);

The $wout=$win trick works because the value of an assignment is its left side, so $wout gets clobbered first by the assignment, and then by the select, while $win remains unchanged.

Any of the bitmasks can also be undef. The timeout, if specified, is in seconds, which may be fractional. (A timeout of 0 effects a poll.) Not many implementations are capable of returning the $timeleft. If not, they always return $timeleft equal to the supplied $timeout.

One use for select is to sleep with a finer resolution than sleep allows. To do this, specify undef for all the bitmasks. So, to sleep for (at least) 4.75 seconds, use:

select undef, undef, undef, 4.75;

(On some non-UNIX systems this may not work, and you may need to fake up at least one bitmask for a valid descriptor that won't ever be ready.)

Mixing buffered I/O (like read or <HANDLE>) with four-argument select is asking for trouble. Use sysread instead.

semctl

semctl ID, SEMNUM, CMD, ARG

This function calls the System V IPC system call semctl (2). If CMD is &IPC_STAT or &GETALL, then ARG must be a variable which will hold the returned semid_ds structure or semaphore value array. The function returns like ioctl: the undefined value for error, "0 but true" for zero, or the actual return value otherwise. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "sem.ph";

This function is available only on machines supporting System V IPC.

semget

semget KEY, NSEMS, SIZE, FLAGS

This function calls the System V IPC system call semget (2). The function returns the semaphore ID, or the undefined value if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "sem.ph";

This function is available only on machines supporting System V IPC.

semop

semop KEY, OPSTRING

This function calls the System V IPC system call semop (2) to perform semaphore operations such as signaling and waiting. OPSTRING must be a packed array of semop structures. You can make each semop structure by saying pack(`s*`, $semnum, $semop, $semflag). The number of semaphore operations is implied by the length of OPSTRING. The function returns true if successful, or false if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "sem.ph";

The following code waits on semaphore $semnum of semaphore id $semid:

$semop = pack "s*", $semnum, -1, 0;
die "Semaphore trouble: $!\n" unless semop $semid, $semop;

To signal the semaphore, simply replace -1 with 1.

This function is available only on machines supporting System V IPC.

send

send SOCKET, MSG, FLAGS, TO
send SOCKET, MSG, FLAGS

This function sends a message on a socket. It takes the same flags as the system call of the same name--see send (2). On unconnected sockets you must specify a destination to send TO, in which case send works like sendto (2). The function returns the number of bytes sent, or the undefined value if there is an error. On error, it puts the error code into $!.

(Some non-UNIX systems improperly treat sockets as different objects than ordinary file descriptors, with the result that you must always use send and recv on sockets rather than the handier standard I/O operators.)

setpgrp

setpgrp PID, PGRP

This function sets the current process group (pgrp) for the specified PID (use a PID of 0 for the current process). Invoking setpgrp will produce a fatal error if used on a machine that doesn't implement setpgrp (2). Beware: some systems will ignore the arguments you provide and always do setpgrp(0, $$). Fortunately, those are the arguments one usually provides. (For better portability (by some definition), use the setpgid() function in the POSIX module, or if you're really just trying to daemonize your script, consider the POSIX::setsid() function as well.)

setpriority

setpriority WHICH, WHO, PRIORITY

This function sets the current priority for a process, a process group, or a user. See setpriority (2). Invoking setpriority will produce a fatal error if used on a machine that doesn't implement setpriority (2). To "nice" your process down by four units (the same as executing your program with nice (1)), try:

setpriority 0, 0, getpriority(0, 0) + 4;

The interpretation of a given priority may vary from one operating system to the next.

setsockopt

setsockopt SOCKET, LEVEL, OPTNAME, OPTVAL

This function sets the socket option requested. The function returns undefined if there is an error. OPTVAL may be specified as undef if you don't want to pass an argument. A common option to set on a socket is SO_REUSEADDR, to get around the problem of not being able to bind to a particular address while the previous TCP connection on that port is still making up its mind to shut down. That would look like this:

use Socket;
...
setsockopt(MYSOCK, SOL_SOCKET, SO_REUSEADDR, 1)
        or warn "Can't do setsockopt: $!\n";

shift

shift ARRAY
shift

This function shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down. (Or up, or left, depending on how you visualize the array list.) If there are no elements in the array, the function returns the undefined value. If ARRAY is omitted, the function shifts @ARGV (in the main program), or @_ (in subroutines). See also unshift, push, pop, and splice. The shift and unshift functions do the same thing to the left end of an array that pop and push do to the right end.

shmctl

shmctl ID, CMD, ARG

This function calls the System V IPC system call, shmctl (2). If CMD is &IPC_STAT, then ARG must be a variable which will hold the returned shmid_ds structure. The function returns like ioctl: the undefined value for error, "0 but true" for zero, or the actual return value otherwise. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "shm.ph";

This function is available only on machines supporting System V IPC.

shmget

shmget KEY, SIZE, FLAGS

This function calls the System V IPC system call, shmget (2). The function returns the shared memory segment ID, or the undefined value if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "shm.ph";

This function is available only on machines supporting System V IPC.

shmread

shmread ID, VAR, POS, SIZE

This function reads from the shared memory segment ID starting at position POS for size SIZE (by attaching to it, copying out, and detaching from it). VAR must be a variable that will hold the data read. The function returns true if successful, or false if there is an error. On error, it puts the error code into $!. This function is available only on machines supporting System V IPC.

shmwrite

shmwrite ID, STRING, POS, SIZE

This function writes to the shared memory segment ID starting at position POS for size SIZE (by attaching to it, copying in, and detaching from it). If STRING is too long, only SIZE bytes are used; if STRING is too short, nulls are written to fill out SIZE bytes. The function returns true if successful, or false if there is an error. On error, it puts the error code into $!. This function is available only on machines supporting System V IPC.

shutdown

shutdown SOCKET, HOW

This function shuts down a socket connection in the manner indicated by HOW. If HOW is 0, further receives are disallowed. If HOW is 1, further sends are disallowed. If HOW is 2, everything is disallowed.

(If you came here trying to figure out how to shut down your system, you'll have to execute an external program to do that. See system.)

sin

sin EXPR

Sorry, there's nothing wicked about this operator. It merely returns the sine of EXPR (expressed in radians). If EXPR is omitted, it returns sine of $_.

For the inverse sine operation, you may use the POSIX::asin() function, or use this relation:

sub asin { atan2($_[0], sqrt(1 - $_[0] * $_[0])) }

sleep

sleep EXPR
sleep

This function causes the script to sleep for EXPR seconds, or forever if no EXPR. It may be interrupted by sending the process a SIGALRM. The function returns the number of seconds actually slept. On some systems, the function sleeps till the "top of the second," so, for instance, a sleep 1 may sleep anywhere from 0 to 1 second, depending on when in the current second you started sleeping. A sleep 2 may sleep anywhere from 1 to 2 seconds. And so on. If available, the select (ready file descriptors) call can give you better resolution. You may also be able to use syscall to call the getitimer (2) and setitimer (2) routines that some UNIX systems support.

socket

socket SOCKET, DOMAIN, TYPE, PROTOCOL

This function opens a socket of the specified kind and attaches it to filehandle SOCKET. DOMAIN, TYPE, and PROTOCOL are specified the same as for socket (2). Before using this function, your program should contain the line:

use Socket;

This gives you the proper constants. The function returns true if successful. See the examples in the section "Sockets" in Chapter 6, Social Engineering.

socketpair

socketpair SOCKET1, SOCKET2, DOMAIN, TYPE, PROTOCOL

This function creates an unnamed pair of sockets in the specified domain, of the specified type. DOMAIN, TYPE, and PROTOCOL are specified the same as for socketpair (2). If socketpair (2) is unimplemented, invoking this function yields a fatal error. The function returns true if successful.

This function is typically used just before a fork. One of the resulting processes should close SOCKET1, and the other should close SOCKET2. You can use these sockets bidirectionally, unlike the filehandles created by the pipe function.

sort

sort SUBNAME LIST
sort BLOCK LIST
sort LIST

This function sorts the LIST and returns the sorted list value. By default, it sorts in standard string comparison order (undefined values sorting before defined null strings, which sort before everything else). SUBNAME, if given, is the name of a subroutine that returns an integer less than, equal to, or greater than 0, depending on how the elements of the list are to be ordered. (The handy <=> and cmp operators can be used to perform three-way numeric and string comparisons.) In the interests of efficiency, the normal calling code for subroutines is bypassed, with the following effects: the subroutine may not be a recursive subroutine, and the two elements to be compared are passed into the subroutine not via @_ but as $a and $b (see the examples below). The variables $a and $b are passed by reference, so don't modify them in the subroutine. SUBNAME may be a scalar variable name (unsubscripted), in which case the value provides the name of (or a reference to) the actual subroutine to use. In place of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort subroutine.

To do an ordinary numeric sort, say this:

sub numerically { $a <=> $b; }
@sortedbynumber = sort numerically 53,29,11,32,7;

To sort in descending order, simply reverse the $a and $b. To sort a list value by some associated value, use a hash lookup in the sort routine:

sub byage {
    $age{$a} <=> $age{$b};
}
@sortedclass = sort byage @class;

As an extension of that notion, you can cascade several different comparisons using the handy comparison operators, which work nicely for this because when they return 0 they fall through to the next case. The routine below sorts to the front of the list those people who are first richer, then taller, then younger, then less alphabetically challenged. We also put a final comparison between $a and $b to make sure the ordering is always well defined.

sub prospects {
    $money{$b} <=> $money{$a}
       or
    $height{$b} <=> $height{$a}
       or
    $age{$a} <=> $age{$b}
       or
    $lastname{$a} cmp $lastname{$b}
       or
    $a cmp $b;
}
@sortedclass = sort prospects @class;

To sort fields without regard to case, say:

@sorted = sort { lc($a) cmp lc($b) } @unsorted;

And finally, note the equivalence of the two ways to sort in reverse:

sub backwards { $b cmp $a; }
@harry = qw(dog cat x Cain Abel);
@george = qw(gone chased yz Punished Axed);
print sort @harry;                   # prints AbelCaincatdogx
print sort backwards @harry;         # prints xdogcatCainAbel
print reverse sort @harry;           # prints xdogcatCainAbel
print sort @george, "to", @harry;    # Remember, it's one LIST.
        # prints AbelAxedCainPunishedcatchaseddoggonetoxyz

Do not declare $a and $b as lexical variables (with my). They are package globals (though they're exempt from the usual restrictions on globals when you're using use strict). You do need to make sure your sort routine is in the same package though, or qualify $a and $b with the package name of the caller.

One last caveat. Perl's sort is implemented in terms of C's qsort (3) function. Some qsort (3) versions will dump core if your sort subroutine provides inconsistent ordering of values.

splice

splice ARRAY, OFFSET, LENGTH, LIST
splice ARRAY, OFFSET, LENGTH
splice ARRAY, OFFSET

This function removes the elements designated by OFFSET and LENGTH from an array, and replaces them with the elements of LIST, if any. The function returns the elements removed from the array. The array grows or shrinks as necessary. If LENGTH is omitted, the function removes everything from OFFSET onward. The following equivalences hold (assuming $[ is 0):

Direct Method Splice Equivalent
push(@a, $x, $y) splice(@a, $#a+1, 0, $x, $y)
pop(@a) splice(@a, -1)
shift(@a) splice(@a, 0, 1)
unshift(@a, $x, $y) splice(@a, 0, 0, $x, $y)
$a[$x] = $y splice(@a, $x, 1, $y);

The splice function is also handy for carving up the argument list passed to a subroutine. For example, assuming list lengths are passed before lists:

sub list_eq {       # compare two list values
    my @a = splice(@_, 0, shift);
    my @b = splice(@_, 0, shift);
    return 0 unless @a == @b;       # same len?
    while (@a) {
        return 0 if pop(@a) ne pop(@b);
    }
    return 1;
}
if (list_eq($len, @foo[1..$len], scalar(@bar), @bar)) { ... }

It would probably be cleaner just to use references for this, however.

split

split /PATTERN/, EXPR, LIMIT
split /PATTERN/, EXPR
split /PATTERN/
split

This function scans a string given by EXPR for delimiters, and splits the string into a list of substrings, returning the resulting list value in list context, or the count of substrings in scalar context. The delimiters are determined by repeated pattern matching, using the regular expression given in PATTERN, so the delimiters may be of any size, and need not be the same string on every match. (The delimiters are not ordinarily returned, but see below.) If the PATTERN doesn't match at all, split returns the original string as a single substring. If it matches once, you get two substrings, and so on.

If LIMIT is specified and is not negative, the function splits into no more than that many fields (though it may split into fewer if it runs out of delimiters). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT has been specified. If LIMIT is omitted, trailing null fields are stripped from the result (which potential users of pop would do well to remember). If EXPR is omitted, the function splits the $_ string. If PATTERN is also omitted, the function splits on whitespace, /\s+/, after skipping any leading whitespace.

Strings of any length can be split:

@chars = split //, $word;
@fields = split /:/, $line;
@words = split ' ', $paragraph;
@lines = split /^/m, $buffer;

A pattern capable of matching either the null string or something longer than the null string (for instance, a pattern consisting of any single character modified by a * or ?) will split the value of EXPR into separate characters wherever it is the null string that produces the match; non-null matches will skip over occurrences of the delimiter in the usual fashion. (In other words, a pattern won't match in one spot more than once, even if it matched with a zero width.) For example:

print join ':', split / */, 'hi there';

produces the output "h:i:t:h:e:r:e". The space disappears because it matched as part of the delimiter. As a trivial case, the null pattern // simply splits into separate characters (and spaces do not disappear).

The LIMIT parameter is used to split only part of a string:

($login, $passwd, $remainder) = split /:/, $_, 3;

We encourage you to split to lists of names like this in order to make your code self-documenting. (For purposes of error checking, note that $remainder would be undefined if there were fewer than three fields.) When assigning to a list, if LIMIT is omitted, Perl supplies a LIMIT one larger than the number of variables in the list, to avoid unnecessary work. For the split above, LIMIT would have been 4 by default, and $remainder would have received only the third field, not all the rest of the fields. In time-critical applications it behooves you not to split into more fields than you really need.

We said earlier that the delimiters are not returned, but if the PATTERN contains parentheses, then the substring matched by each pair of parentheses is included in the resulting list, interspersed with the fields that are ordinarily returned. Here's a simple case:

split /([-,])/, "1-10,20";

produces the list value:

(1, '-', 10, ',', 20)

With more parentheses, a field is returned for each pair, even if some of the pairs don't match, in which case undefined values are returned in those positions. So if you say:

split /(-)|(,)/, "1-10,20";

you get the value:

(1, '-', undef, 10, undef, ',', 20)

The /PATTERN/ argument may be replaced with an expression to specify patterns that vary at run-time. (To do run-time compilation only once, use /$variable/o.) As a special case, specifying a space ` ` will split on whitespace just as split with no arguments does. Thus, split(` `) can be used to emulate awk 's default behavior, whereas split(/ /) will give you as many null initial fields as there are leading spaces. (Other than this special case, if you supply a string instead of a regular expression, it'll be interpreted as a regular expression anyway.)

The following example splits an RFC-822 message header into a hash containing $head{Date}, $head{Subject}, and so on. It uses the trick of assigning a list of pairs to a hash, based on the fact that delimiters alternate with delimited fields. It makes use of parentheses to return part of each delimiter as part of the returned list value. Since the split pattern is guaranteed to return things in pairs by virtue of containing one set of parentheses, the hash assignment is guaranteed to receive a list consisting of key/value pairs, where each key is the name of a header field. (Unfortunately this technique loses information for multiple lines with the same key field, such as Received-By lines. Ah, well. . . .)

$header =~ s/\n\s+/ /g;      # Merge continuation lines.
%head = ('FRONTSTUFF', split /^([-\w]+):/m, $header);

The following example processes the entries in a UNIX passwd file. You could leave out the chop, in which case $shell would have a newline on the end of it.

open PASSWD, '/etc/passwd';
while (<PASSWD>) {
    chop;        # remove trailing newline
    ($login, $passwd, $uid, $gid, $gcos, $home, $shell) =
            split /:/;
    ...
}

The inverse of split is performed by join (except that join can only join with the same delimiter between all fields). To break apart a string with fixed-position fields, use unpack.

sprintf

sprintf FORMAT, LIST

This function returns a string formatted by the usual printf conventions. The FORMAT string contains text with embedded field specifiers into which the elements of LIST are substituted, one per field. Field specifiers are roughly of the form:

%m.nx

where the m and n are optional sizes whose interpretation depends on the type of field, and x is one of:

Code Meaning
c Character
d Decimal integer
e Exponential format floating-point number
f Fixed point format floating-point number
g Compact format floating-point number
ld Long decimal integer
lo Long octal integer
lu Long unsigned decimal integer
lx Long hexadecimal integer
o Octal integer
s String
u Unsigned decimal integer
x Hexadecimal integer
X Hexadecimal integer with upper-case letters

The various combinations are fully documented in the manpage for printf (3), but we'll mention that m is typically the minimum length of the field (negative for left justified), and n is precision for exponential formats and the maximum length for other formats. Padding is typically done with spaces for strings and zeroes for numbers. The * character as a length specifier is not supported. But, you can easily get around this by including the length expression directly into FORMAT, as in:

$width = 20; $value = sin 1.0;
foreach $precision (0..($width-2)) {
    printf "%${width}.${precision}f\n", $value;
}

sqrt

sqrt EXPR
sqrt

This function returns the square root of EXPR. If EXPR is omitted, it returns the square root of $_. For other roots such as cube roots, you can use the ** operator to raise something to a fractional power.[8]

[8] Don't try either of these approaches with negative numbers, as that poses a slightly more complex problem.

srand

srand EXPR

This function sets the random number seed for the rand operator. If EXPR is omitted, it does srand(time), which is pretty predictable, so don't use it for security-type things, such as random password generation. Try something like this instead:[9]

[9] Frequently called programs (like CGI scripts) that simply use

time ^ $$

for a seed can fall prey to the mathematical property that

a^b == (a+1)^(b+1)

one-third of the time. If you're particularly concerned with this, see the Math::TrulyRandom module in CPAN.

srand( time() ^ ($$ + ($$ << 15)) );

Of course, you'd need something much more random than that for serious cryptographic purposes, since it's easy to guess the current time. Checksumming the compressed output of one or more rapidly changing operating system status programs is the usual method. For example:

srand (time ^ $$ ^ unpack "%32L*", `ps axww | gzip`);

Do not call srand multiple times in your program unless you know exactly what you're doing and why you're doing it. The point of the function is to "seed" the rand function so that rand can produce a different sequence each time you run your program. Just do it once at the top of your program, or you won't get random numbers out of rand!

stat

stat FILEHANDLE
stat EXPR

This function returns a 13-element list giving the statistics for a file, either the file opened via FILEHANDLE, or named by EXPR. It's typically used as follows:

($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
    $atime,$mtime,$ctime,$blksize,$blocks)
            = stat $filename;

Not all fields are supported on all filesystem types. Here are the meanings of the fields:

Field Meaning
dev Device number of filesystem
ino Inode number
mode File mode (type and permissions)
nlink Number of (hard) links to the file
uid Numeric user ID of file's owner
gid Numeric group ID of file's owner
rdev The device identifier (special files only)
size Total size of file, in bytes
atime Last access time since the epoch
mtime Last modify time since the epoch
ctime Inode change time (NOT creation time!) since the epoch
blksize Preferred blocksize for file system I/O
blocks Actual number of blocks allocated

$dev and $ino, taken together, uniquely identify a file. The $blksize and $blocks are likely defined only on BSD-derived filesystems. The $blocks field (if defined) is reported in 512-byte blocks. Note that $blocks*512 can differ greatly from $size for files containing unallocated blocks, or "holes", which aren't counted in $blocks.

If stat is passed the special filehandle consisting of an underline, no actual stat (2) is done, but the current contents of the stat structure from the last stat or stat-based file test (the -x operators) are returned.

The following example first stats $file to see whether it is executable. If it is, it then pulls the device number out of the existing stat structure and tests it to see whether it looks like a Network File System (NFS). Such filesystems tend to have negative device numbers.

if (-x $file and ($d) = stat(_) and $d < 0) {
    print "$file is executable NFS file\n";
}

Hint: if you need only the size of the file, check out the -s file test operator, which returns the size in bytes directly. There are also file tests that return the ages of files in days.

study

study SCALAR
study

This function takes extra time to study SCALAR ($_ if unspecified) in anticipation of doing many pattern matches on the string before it is next modified. This may or may not save time, depending on the nature and number of patterns you are searching on, and on the distribution of character frequencies in the string to be searched--you probably want to compare run-times with and without it to see which runs faster. Those loops that scan for many short constant strings (including the constant parts of more complex patterns) will benefit most. If all your pattern matches are constant strings, anchored at the front, study won't help at all, because no scanning is done. You may have only one study active at a time--if you study a different scalar the first is "unstudied".

The way study works is this: a linked list of every character in the string to be searched is made, so we know, for example, where all the "k" characters are. From each search string, the rarest character is selected, based on some static frequency tables constructed from some C programs and English text. Only those places that contain this rarest character are examined.

For example, here is a loop that inserts index-producing entries before any line containing a certain pattern:

while (<>) {
    study;
    print ".IX foo\n" if /\bfoo\b/;
    print ".IX bar\n" if /\bbar\b/;
    print ".IX blurfl\n" if /\bblurfl\b/;
    ...
    print;
}

In searching for /\bfoo\b/, only those locations in $_ that contain "f" will be looked at, because "f" is rarer than "o". In general, this is a big win except in pathological cases. The only question is whether it saves you more time than it took to build the linked list in the first place.

If you have to look for strings that you don't know until run-time, you can build an entire loop as a string and eval that to avoid recompiling all your patterns all the time. Together with setting $/ to input entire files as one record, this can be very fast, often faster than specialized programs like fgrep. The following scans a list of files (@files) for a list of words (@words), and prints out the names of those files that contain a match:

$search = 'while (<>) { study;';
foreach $word (@words) {
    $search .= "++\$seen{\$ARGV} if /\\b$word\\b/;\n";
}
$search .= "}";
@ARGV = @files;
undef $/;               # slurp each entire file
eval $search;           # this screams
die $@ if $@;           # in case eval failed
$/ = "\n";              # put back to normal input delim
foreach $file (sort keys(%seen)) {
    print $file, "\n";
}

sub

sub NAME BLOCK
sub NAME
sub BLOCK
sub NAME PROTO BLOCK
sub NAME PROTO
sub PROTO BLOCK

The first two of these are not really operators, but rather they declare the existence of named subroutines, which is why the syntax includes a NAME, after all. (As declarations, they return no value.) The first one additionally defines the subroutine with a BLOCK, which contains the code for the subroutine. The second one (the one without the BLOCK) is just a forward declaration, that is, a declaration that introduces the subroutine name without defining it, with the expectation that the real definition will come later. (This is useful because the parser treats a word specially if it knows it's a user-defined subroutine. You can call such a subroutine as if it were a list operator, for instance.)

The third form really is an operator, in that it can be used within expressions to generate an anonymous subroutine at run-time. (More specifically, it returns a reference to an anonymous subroutine, since you can't talk about something anonymous without some kind of reference to it.) If the anonymous subroutine refers to any lexical variables declared outside its BLOCK, it functions as a closure, which means that different calls to the same sub operator will do the bookkeeping necessary to keep the correct "version" of each such lexical variable in sight for the life of the closure, even if the original scope of the lexical variable has been destroyed.

The final three forms are identical to the first three, except that they also supply a prototype that lets you specify how calls to your subroutine should be parsed and analyzed, so you can make your routines act more like some of Perl's built-in functions. See "Subroutines" in Chapter 2, The Gory Details and "Anonymous Subroutines" in Chapter 4, References and Nested Data Structures for more details.

substr

substr EXPR, OFFSET, LENGTH
substr EXPR, OFFSET

This function extracts a substring out of the string given by EXPR and returns it. The substring is extracted starting at OFFSET characters from the front of the string. (Note: if you've messed with $[, the beginning of the string isn't at 0, but since you haven't messed with it (have you?), it is.) If OFFSET is negative, the substring starts that far from the end of the string instead. If LENGTH is omitted, everything to the end of the string is returned. If LENGTH is negative, the length is calculated to leave that many characters off the end of the string. Otherwise, LENGTH indicates the length of the substring to extract, which is sort of what you'd expect.

You can use substr as an lvalue (something to assign to), in which case EXPR must also be a legal lvalue. If you assign something shorter than the length of your substring, the string will shrink, and if you assign something longer than the length, the string will grow to accommodate it. To keep the string the same length you may need to pad or chop your value using sprintf or the x operator.

To prepend the string "Larry" to the current value of $_, use:

substr($_, 0, 0) = "Larry";

To instead replace the first character of $_ with "Moe", use:

substr($_, 0, 1) = "Moe";

and finally, to replace the last character of $_ with "Curly", use:

substr($_, -1, 1) = "Curly";

These last few examples presume you haven't messed with the value of $[. You haven't, have you? Good.

symlink

symlink OLDFILE, NEWFILE

This function creates a new filename symbolically linked to the old filename. The function returns 1 for success, 0 otherwise. On systems that don't support symbolic links, it produces a fatal error at run-time. To check for that, use eval to trap the potential error:

$can_symlink = (eval { symlink("", ""); }, $@ eq "");

Or use the Config module. Be careful if you supply a relative symbolic link, since it'll be interpreted relative to the location of the symbolic link itself, not your current working directory.

See also link and readlink earlier in this chapter.

syscall

syscall LIST

This function calls the system call specified as the first element of the list, passing the remaining elements as arguments to the system call. (Many of these are now more readily available through the POSIX module, and others.) The function produces a fatal error if syscall (2) is unimplemented. The arguments are interpreted as follows: if a given argument is numeric, the argument is passed as a C integer. If not, a pointer to the string value is passed. You are responsible for making sure the string is long enough to receive any result that might be written into it. Otherwise you're looking at a coredump. If your integer arguments are not literals and have never been interpreted in a numeric context, you may need to add 0 to them to force them to look like numbers. (See the following example.)

This example calls the setgroups (2) system call to add to the group list of the current process. (It will only work on machines that support multiple group membership.)

require 'syscall.ph';
syscall &SYS_setgroups, @groups+0, pack("i*", @groups);

Note that you may have to run h2ph as indicated in the Perl installation instructions for syscall.ph to exist. Some systems may require a pack template of "s*" instead. Best of all, the syscall function assumes the size equivalence of the C types int, long, and char *.

Try not to think of syscall as the epitome of portability.

sysopen

sysopen FILEHANDLE, FILENAME, MODE
sysopen FILEHANDLE, FILENAME, MODE, PERMS

This function opens the file whose filename is given by FILENAME, and associates it with FILEHANDLE. If FILEHANDLE is an expression, its value is used as the name of (or reference to) the filehandle. This function calls open (2) with the parameters FILENAME, MODE, PERMS.

The possible values and flag bits of the MODE parameter are system-dependent; they are available via the Fcntl library module. However, for historical reasons, some values are universal: zero means read-only, one means write-only, and two means read/write.

If the file named by FILENAME does not exist and sysopen creates it (typically because MODE includes the O_CREAT flag), then the value of PERMS specifies the permissions of the newly created file. If PERMS is omitted, the default value is 0666, which allows read and write for all. This default is reasonable: see umask.

The FileHandle module described in Chapter 7, The Standard Perl Library provides a more object-oriented approach to sysopen. See also open earlier in this chapter.

sysread

sysread FILEHANDLE, SCALAR, LENGTH, OFFSET
sysread FILEHANDLE, SCALAR, LENGTH

This function attempts to read LENGTH bytes of data into variable SCALAR from the specified FILEHANDLE using read (2). The function returns the number of bytes actually read, or 0 at EOF. It returns the undefined value on error. SCALAR will be grown or shrunk to the length actually read. The OFFSET, if specified, says where in the string to start putting the bytes, so that you can read into the middle of a string that's being used as a buffer. For an example, see syswrite. You should be prepared to handle the problems (like interrupted system calls) that standard I/O normally handles for you. Also, do not mix calls to read and sysread on the same filehandle unless you are into heavy wizardry (and/or pain).

system

system LIST

This function executes any program on the system for you. It does exactly the same thing as exec LIST except that it does a fork first, and then, after the exec, it waits for the exec'd program to complete. That is (in non-UNIX terms), it runs the program for you, and returns when it's done, unlike exec, which never returns (if it succeeds). Note that argument processing varies depending on the number of arguments, as described for exec. The return value is the exit status of the program as returned by the wait (2) call. To get the actual exit value, divide by 256. (The lower 8 bits are set if the process died from a signal.) See exec.

Because system and backticks block SIGINT and SIGQUIT, killing the program they're running with one of those signals doesn't actually interrupt your program.

@args = ("command", "arg1", "arg2");
system(@args) == 0 
     or die "system @args failed: $?"

Here's a more elaborate example of analyzing the return value from system on a UNIX system to check for all possibilities, including for signals and coredumps.

$rc = 0xffff & system @args;
printf "system(%s) returned %#04x: ", "@args", $rc;
if ($rc == 0) {
    print "ran with normal exit\n";
} 
elsif ($rc == 0xff00) {
    print "command failed: $!\n";
} 
elsif ($rc > 0x80) {
    $rc >>= 8;
    print "ran with non-zero exit status $rc\n";
} 
else {
    print "ran with ";
    if ($rc &   0x80) {
        $rc &= ~0x80;
        print "coredump from ";
    } 
    print "signal $rc\n"
} 
$ok = ($rc != 0);

syswrite

syswrite FILEHANDLE, SCALAR, LENGTH, OFFSET
syswrite FILEHANDLE, SCALAR, LENGTH

This function attempts to write LENGTH bytes of data from variable SCALAR to the specified FILEHANDLE using write (2). The function returns the number of bytes actually written, or the undefined value on error. You should be prepared to handle the problems that standard I/O normally handles for you, such as partial writes. The OFFSET, if specified, says where in the string to start writing from, in case you're using the string as a buffer, for instance, or you need to recover from a partial write. To copy data from filehandle FROM into filehandle TO, use something like:

$blksize = (stat FROM)[11] || 16384;  # preferred block size?
while ($len = sysread FROM, $buf, $blksize) {
    if (!defined $len) {
        next if $! =~ /^Interrupted/;
        die "System read error: $!\n";
    }
    $offset = 0;
    while ($len) {          # Handle partial writes.
        $written = syswrite TO, $buf, $len, $offset;
        die "System write error: $!\n"
            unless defined $written;
        $len -= $written;
        $offset += $written;
    };
}

Do not mix calls to (print or write) and syswrite on the same filehandle unless you are into heavy wizardry.

tell

tell FILEHANDLE
tell

This function returns the current file position (in bytes, 0-based) for FILEHANDLE. This value is typically fed to the seek function at some future time to get back to the current position. FILEHANDLE may be an expression whose value gives the name of the actual filehandle, or a reference to a filehandle object. If FILEHANDLE is omitted, the function returns the position of the file last read. File positions are only meaningful on regular files. Devices, pipes, and sockets have no file position.

See seek for an example.

telldir

telldir DIRHANDLE

This function returns the current position of the readdir routines on DIRHANDLE. This value may be given to seekdir to access a particular location in a directory. The function has the same caveats about possible directory compaction as the corresponding system library routine. This function may not be implemented everywhere that readdir is. Even if it is, no calculation may be done with the return value. It's just an opaque value, meaningful only to seekdir.

tie

tie VARIABLE, CLASSNAME, LIST

This function binds a variable to a package class that will provide the implementation for the variable. VARIABLE is the name of the variable to be tied. CLASSNAME is the name of a class implementing objects of an appropriate type. Any additional arguments are passed to the "new" method of the class (meaning TIESCALAR, TIEARRAY, or TIEHASH). Typically these are arguments such as might be passed to the dbm_open (3) function of C, but this is package dependent. The object returned by the "new" method is also returned by the tie function, which can be useful if you want to access other methods in CLASSNAME. (The object can also be accessed through the tied function.) So, a class for tying a hash to an ISAM implementation might provide an extra method to traverse a set of keys sequentially (the "S" of ISAM), since your typical DBM implementation can't do that.

Note that functions such as keys and values may return huge list values when used on large objects like DBM files. You may prefer to use the each function to iterate over such. For example:

use NDBM_File;
tie %ALIASES, "NDBM_File", "/etc/aliases", 1, 0
    or die "Can't open aliases: $!\n";
while (($key,$val) = each %ALIASES) {
    print $key, ' = ', $val, "\n";
}
untie %ALIASES;

A class implementing a hash should provide the following methods:

TIEHASH $class, LIST
DESTROY $self
FETCH $self, $key
STORE $self, $key, $value
DELETE $self, $key
EXISTS $self, $key
FIRSTKEY $self
NEXTKEY $self, $lastkey

A class implementing an ordinary array should provide the following methods:

TIEARRAY $classname, LIST
DESTROY $self
FETCH $self, $subscript
STORE $self, $subscript, $value

(As of this writing, other methods are still being designed. Check the online documentation for additions.)

A class implementing a scalar should provide the following methods:

TIESCALAR $classname, LIST
DESTROY $self
FETCH $self, 
STORE $self, $value

See "Using Tied Variables" in Chapter 5, Packages, Modules, and Object Classes for detailed discussion of all these methods. Unlike dbmopen, the tie function will not use or require a module for you--you need to do that explicitly yourself. See the DB_File and Config modules for interesting tie implementations.

tied

tied VARIABLE

This function returns a reference to the object underlying VARIABLE (the same value that was originally returned by the tie call which bound the variable to a package.) It returns the undefined value if VARIABLE isn't tied to a package. So, for example, you can use:

ref tied %hash

to find out which package your hash is currently tied to. (Presuming you've forgotten.)

time

time

This function returns the number of non-leap seconds since January 1, 1970, UTC.[10] The returned value is suitable for feeding to gmtime and localtime, and for comparison with file modification and access times returned by stat, and for feeding to utime--see the examples under utime.

[10] Also known as the "epoch", not to be confused with the "epic", which is about the making of UNIX. (Other operating systems may have a different epoch, not to mention a different epic.)

times

times

This function returns a four-element list giving the user and system CPU times, in seconds (possibly fractional), for this process and the children of this process.

($user, $system, $cuser, $csystem) = times;

For example, to time the execution speed of a section of Perl code:

$start = (times)[0];
...
$end = (times)[0];
printf "that took %.2f CPU seconds\n", $end - $start;

tr///

tr///
y///

This is the translation operator, which is like the one in the UNIX sed program, only better, in everybody's humble opinion. See Chapter 2, The Gory Details.

truncate

truncate FILEHANDLE, LENGTH
truncate EXPR, LENGTH

This function truncates the file opened on FILEHANDLE, or named by EXPR, to the specified length. The function produces a fatal error if truncate (2) or an equivalent isn't implemented on your system. (You can always truncate a file by copying the front of it, if you have the disk space.)

uc

uc EXPR

This function returns an uppercased version of EXPR (or $_ if EXPR is omitted). This is the internal function implementing the \U escape in double-quoted strings. POSIX setlocale (3) settings are respected.

ucfirst

ucfirst EXPR

This function returns a version of EXPR (or $_ if EXPR is omitted) with the first character uppercased, that is, capitalized. This is the internal function implementing the \u escape in double-quoted strings. POSIX setlocale (3) settings are respected.

To force initial caps, and everything else lowercase, use:

ucfirst lc $word

which is equivalent to "\u\L$word".

umask

umask EXPR
umask

This function sets the umask for the process and returns the old one. (The umask tells UNIX which permission bits to disallow when creating a file.) If EXPR is omitted, the function merely returns the current umask. For example, to ensure that the "other" bits are turned on, and the "user" bits are turned off, try something like:

umask((umask() & 077) | 7);

undef

undef EXPR
undef

This function undefines the value of EXPR, which must be an lvalue. Use only on a scalar value, an entire array or hash, or a subroutine name (using the & prefix). Any storage associated with the object will be recovered for reuse (though not returned to the system, for most versions of UNIX). The undef function will probably not do what you expect on most special variables.

The function always returns the undefined value. This is useful because you can omit the EXPR, in which case nothing gets undefined, but you still get an undefined value that you could, for instance, return from a subroutine to indicate an error. Here are some uses of undef as a unary operator:

undef $foo;
undef $bar{'blurfl'};
undef @ary;
undef %assoc;
undef &mysub;

Without an argument, undef is just used for its value:

return (wantarray ? () : undef) if $they_blew_it;
select(undef, undef, undef, $naptime);

You may use undef as a placeholder on the left side of a list assignment, in which case the corresponding value from the right side is simply discarded. Apart from that, you may not use undef as an lvalue.

unlink

unlink LIST

This function deletes a list of files.[11] If LIST is omitted, it unlinks the file given in $_. The function returns the number of files successfully deleted. Some sample commands:

[11] Actually, under UNIX, it removes the directory entries that refer to the real files. Since a file may be referenced (linked) from more than one directory, the file isn't actually removed until the last reference to it is removed.

$cnt = unlink 'a', 'b', 'c';
unlink @goners;
unlink <*.bak>;

Note that unlink will not delete directories unless you are superuser and the -U flag is supplied to Perl. Even if these conditions are met, be warned that unlinking a directory can inflict Serious Damage on your filesystem. Use rmdir instead.

Here's a very simple rm command with very simple error checking:

#!/usr/bin/perl
@cannot = grep {not unlink} @ARGV;
die "$0: could not unlink @cannot\n" if @cannot;

unpack

unpack TEMPLATE, EXPR

This function does the reverse of pack: it takes a string (EXPR) representing a data structure and expands it out into a list value, returning the list value. (In a scalar context, it can be used to unpack a single value.) The TEMPLATE has much the same format as in the pack function--it specifies the order and type of the values to be unpacked. (See pack for a more detailed description of TEMPLATE.)

Here's a subroutine that does (some of) substr, only slower:

sub substr {
    my($what, $where, $howmuch) = @_;
    if ($where < 0) {
        $where = -$where;
        return unpack "\@* X$where a$howmuch", $what;
    }
    else {
        return unpack "x$where a$howmuch", $what;
    }
}

and then there's:

sub signed_ord { unpack "c", shift }

Here's a complete uudecode program:

#!/usr/bin/perl
$_ = <> until ($mode,$file) = /^begin\s*(\d*)\s*(\S*)/;
open(OUT,"> $file") if $file ne "";
while (<>) {
    last if /^end/;
    next if /[a-z]/;
    next unless int((((ord() - 32) & 077) + 2) / 3) ==
                int(length() / 4);
    print OUT unpack "u", $_;
}
chmod oct $mode, $file;

In addition, you may prefix a field with %number to indicate that you want it to return a number-bit checksum of the items instead of the items themselves. Default is a 16-bit checksum. For example, the following computes the same number as the System V sum program:

undef $/;
$checksum = unpack ("%32C*", <>) % 32767;

The following efficiently counts the number of set bits in a bit vector:

$setbits = unpack "%32b*", $selectmask;

Here's a simple MIME decoder:

while (<>) {
  tr#A-Za-z0-9+/##cd;                   # remove non-base64 chars
  tr#A-Za-z0-9+/# -_#;                  # convert to uuencoded format
  $len = pack("c", 32 + 0.75*length);   # compute length byte
  print unpack("u", $len . $_);         # uudecode and print
}

unshift

unshift ARRAY, LIST

This function does the opposite of a shift. (Or the opposite of a push, depending on how you look at it.) It prepends LIST to the front of the array, and returns the new number of elements in the array:

unshift @ARGV, '-e', $cmd unless $ARGV[0] =~ /^-/;

untie

untie VARIABLE

Breaks the binding between a variable and a package. See tie.

use

use Module LIST
use Module

The use declaration imports some semantics into the current package from the named module, generally by aliasing certain subroutine or variable names into your package. It is exactly equivalent to the following:

BEGIN { require Module; import Module LIST; }

The BEGIN forces the require and import to happen at compile time. The require makes sure the module is loaded into memory if it hasn't been yet. The import is not a built-in--it's just an ordinary static method call into the package named by Module to tell the module to import the list of features back into the current package. The module can implement its import method any way it likes, though most modules just choose to derive their import method via inheritance from the Exporter class that is defined in the Exporter module. See Chapter 5, Packages, Modules, and Object Classes for more information.

If you don't want your namespace altered, explicitly supply an empty list:

use Module ();

That is exactly equivalent to the following:

BEGIN { require Module; }

Because this is a wide-open interface, pragmas (compiler directives) are also implemented this way. Currently implemented pragmas include:

use integer;
use diagnostics;
use sigtrap qw(SEGV BUS);
use strict  qw(subs vars refs);

These pseudomodules typically import semantics into the current block scope, unlike ordinary modules, which import symbols into the current package. (The latter are effective through the end of the file.)

There's a corresponding declaration, no, that "unimports" any meanings originally imported by use, but that have since become, er, unimportant:

no integer;
no strict 'refs';

See Chapter 7, The Standard Perl Library for a list of standard modules and pragmas.

utime

utime LIST

This function changes the access and modification times on each file of a list of files. The first two elements of the list must be the numerical access and modification times, in that order. The function returns the number of files successfully changed. The inode change time of each file is set to the current time. Here's an example of a touch command:

#!/usr/bin/perl
$now = time;
utime $now, $now, @ARGV;

and here's a more sophisticated touch command with a bit of error checking:

#!/usr/bin/perl
$now = time;
@cannot = grep {not utime $now, $now, $_} @ARGV;
die "$0: Could not touch @cannot.\n" if @cannot;

The standard touch command will actually create missing files, something like this:

$now = time;
foreach $file (@ARGV) {
    utime $now, $now, $file
        or open TMP, ">>$file"
        or warn "Couldn't touch $file: $!\n";
}

To read the times from existing files, use stat.

values

values HASH

This function returns a list consisting of all the values of the named hash. The values are returned in an apparently random order, but it is the same order as either the keys or each function would produce on the same hash. To sort the hash by its values, see the example under keys. Note that using values on a hash that is bound to a humongous DBM file is bound to produce a humongous list, causing you to have a humongous process, leaving you in a bind. You might prefer to use the each function, which will iterate over the hash entries one by one without slurping them all into a single gargantuan (that is, humongous) list.

vec

vec EXPR, OFFSET, BITS

This function treats a string (the value of EXPR) as a vector of unsigned integers, and returns the value of the element specified by OFFSET and BITS. The function may also be assigned to, which causes the element to be modified. The purpose of the function is to provide very compact storage of lists of small integers. The integers may be very small--vectors can hold numbers that are as small as one bit, resulting in a bitstring.

The OFFSET specifies how many elements to skip over to find the one you want. BITS is the number of bits per element in the vector, so each element can contain an unsigned integer in the range 0..(2**BITS)-1. BITS must be one of 1, 2, 4, 8, 16, or 32. As many elements as possible are packed into each byte, and the ordering is such that vec($vectorstring,0,1) is guaranteed to go into the lowest bit of the first byte of the string. To find out the position of the byte in which an element is going to be put, you have to multiply the OFFSET by the number of elements per byte. When BITS is 1, there are eight elements per byte. When BITS is 2, there are four elements per byte. When BITS is 4, there are two elements (called nybbles) per byte. And so on.

Regardless of whether your machine is big-endian or little-endian, vec($foo, 0, 8) always refers to the first byte of string $foo. See select for examples of bitmaps generated with vec.

Vectors created with vec can also be manipulated with the logical operators |, &, ^, and ~, which will assume a bit vector operation is desired when the operands are strings.

A bit vector (BITS == 1) can be translated to or from a string of 1s and 0s by supplying a b* template to unpack or pack. Similarly, a vector of nybbles (BITS == 4) can be translated with an h* template.

wait

wait

This function waits for a child process to terminate and returns the pid of the deceased process, or -1 if there are no child processes. The status is returned in $?. If you get zombie child processes, you should be calling this function, or waitpid. A common strategy to avoid such zombies is:

$SIG{CHLD} = sub { wait };

If you expected a child and didn't find it, you probably had a call to system, a close on a pipe, or backticks between the fork and the wait. These constructs also do a wait (2) and may have harvested your child process. Use waitpid to avoid this problem.

waitpid

waitpid PID, FLAGS

This function waits for a particular child process to terminate and returns the pid when the process is dead, or -1 if there are no child processes, or if the FLAGS specify non-blocking and the process isn't dead yet. The status of the dead process is returned in $?. To get valid flag values say this:

use POSIX "sys_wait_h";

On systems that implement neither the waitpid (2) nor wait4 (2) system call, FLAGS may be specified only as 0. In other words, you can wait for a specific PID, but you can't do it in non-blocking mode.

wantarray

wantarray

This function returns true if the context of the currently executing subroutine is looking for a list value. The function returns false if the context is looking for a scalar. Here's a typical usage, demonstrating an "unsuccessful" return:

return wantarray ? () : undef;

See also caller. This function should really have been named "wantlist", but we named it back when list contexts were still called array contexts.

warn

warn LIST

This function produces a message on STDERR just like die, but doesn't try to exit or throw an exception. For example:

warn "Debug enabled" if $debug;

If the message supplied is null, the message "Something's wrong" is used. As with die, a message not ending with a newline will have file and line number information automatically appended. The warn operator is unrelated to the -w switch.

write

write FILEHANDLE
write

This function writes a formatted record (possibly multi-line) to the specified filehandle, using the format associated with that filehandle--see the section "Formats" in Chapter 2, The Gory Details. By default the format for a filehandle is the one having the same name as the filehandle. However, the format for a filehandle may be changed by saying:

use FileHandle;
HANDLE->format_name("NEWNAME");

Top-of-form processing is handled automatically: if there is insufficient room on the current page for the formatted record, the page is advanced by writing a form feed, a special top-of-page format is used to format the new page header, and then the record is written. The number of lines remaining on the current page is in variable $-, which can be set to 0 to force a new page on the next write. (You may need to select the filehandle first.) By default the name of the top-of-page format is the name of the filehandle with "_TOP" appended, but the format for a filehandle may be changed by saying:

use FileHandle;
HANDLE->format_top_name("NEWNAME_TOP");

If FILEHANDLE is unspecified, output goes to the current default output filehandle, which starts out as STDOUT but may be changed by the select operator. If the FILEHANDLE is an expression, then the expression is evaluated to determine the actual FILEHANDLE at run-time.

Note that write is not the opposite of read. Use print for simple string output. If you looked up this entry because you wanted to bypass standard I/O, see syswrite.

y///

y///

The translation operator, also known as tr///. See Chapter 2, The Gory Details.


Previous Home Next
Perl Functions by Category Book Index References and Nested Data Structures

HTML: The Definitive Guide CGI Programming JavaScript: The Definitive Guide Programming Perl WebMaster in a Nutshell