Discover Perl's File::Find module

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

ANALYSIS
If you have any experience on UNIX, you've certainly found a need for the find command, which is useful to search for filenames throughout the file system. In particular, you can use wildcards to match filenames and recursively traverse any directory structure (where permissions allow). The UNIX find command can also execute other commands on the files it finds. The File::Find module within Perl encompasses the same functionality and also gives you the advantage of programmatic structures. To show how this works, I'll walk you through a sample script that employs the File::Find module. A simple example

This simple Perl script can help you clean up your PC hard drive by finding any files that end with .tmp, .chk, or .zip or that begin with the ~ symbol. (You can see the entire script in Listing A.) The script will print the full path of each file it finds, and a tally of the number of bytes being consumed will appear at the end. You can run the script on either Windows or UNIX if Perl has been installed. Note that in a UNIX environment, you must modify the first line of the script: Change the /bin/perl path to match the path to Perl in your environment. For this example, I will assume that you are running it in a Windows environment. The solution

Of course, Microsoft's GUI Find utility offers a portion of this functionality. But I wrote the script because once you have the file within Perl, you can do all sorts of things with it, such as open it up and look for a particular pattern, automatically delete it, or use it as input for another application. I make use of one module within the standard Perl Library and one Perl function, so all of the necessary modules should be available when you install Perl on your Windows machine. (I grabbed Perl from ActiveState.) The File::Find function mimics the UNIX find command and will traverse a file tree. Here's the API for the method: Find(\&yoursubroutine, ‘dir1', ‘dir2'…); You provide the subroutine, which I will detail later, and a list of directories where you want the search to be conducted. Remember that these directories will be traversed in a depth-first fashion. The other method I use is the stat() function (similar to the C library function of the same name), which gives you all sorts of information about the filename it takes as an argument. Listing B shows the API for the method. Notice that the function returns the values in a list. The only value we're interested in is $size, which contains the size in bytes of the file given. All of the work will be performed for the utility in the subroutine. Remember that it will be called each time a file is encountered, so it's our job to determine whether the filename matches the files we're looking for. The File::Find method has special variables available that will be populated with certain information, as shown here:
  • $_ contains the current filename within the directory
  • $File::Find::dir contains the current directory name
  • $File::Find::name contains $File::Find::dir/$_
When the subroutine is called, you will actually be in the directory in the variable $File::Find::dir . As you can see in Listing C, our subroutine uses regular expression matching on $_ using an if statement to look for all of the filenames we detailed earlier. If the filename stored within $_ (the default pattern-searching space) matches any of the five regular expressions in the if statement, we will enter the block of code below it. The regular expressions are quite simple. The “\.” indicates a literal dot rather than the special meaning “.” in the world of regular expressions. We use the “\” character to escape the special meaning. The “$” indicates a match at the end of a string and the “^” matches the beginning of the line. Table A maps the files we are trying to match with their corresponding regular expressions.
Table A

File that ends with .zip /\.zip$/
File that ends with .tmp /\.tmp$/
File that ends with .TMP /\.TMP$/
File that begins with ~ /^~/
File that ends with .chk /\.chk/
Note that the script looks for both the lowercase tmp and the uppercase TMP. For the sake of efficiency, you can uppercase the filename and check for only the TMP match. Finally, the script employs the stat() function to tally the number of bytes being used by all of the files that match one of the conditions in the if statement. If the condition is met, the script stores the value within $size and adds it to the $ByteCount tally variable, as shown in the following code snippet: $ByteCount += $size; To run the script on your machine, type the following command in a DOS command prompt window: Perl diskrpt.pl This assumes that you have modified your PATH variable to include the Perl executable and that you have saved the utility in a file called Diskrpt.pl. The output of the command will appear in the DOS command prompt window. In future articles, I will walk through modifications of the script, such as deleting certain file types or feeding the results to another application.
Have your say instantly in the
Tech Update forum. Find out what's where in the new Tech Update with our
Guided Tour. Let the editors know what you think in the
Mailroom.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

UnderINK

I agree with the previous commenter wholeheartedly. I couldn't say it better myself. This is very 'Big Brother'. And while I agree with protecting...

56 minutes ago by UnderINK on European e-identity plan to be unveiled this month
Simon Bisson and Mary Branscombe

Nice to see that Turing's idea of a general purpose computer doing once-hardware-powered tasks in software is now universal ;-) Mary

6 hours ago by Simon Bisson and Mary Branscombe on Software with everything
Jason Burchell

seriously now. I've only bothered to read a small bit of the comments. do me and the rest of the world a favour. stop saying it does not work or...

10 hours ago by Jason Burchell via Facebook on Music industry negotiating over 24-bit downloads
Philip Charles Cohen

Read about it and weep, John Donahoe ... In addition to Visa’s V.me, there is now MasterCard’s PayPass digital wallet soon to arrive; another...

14 hours ago by Philip Charles Cohen via Facebook on PayPal takes phone-based payments to the high street
apexwm

Leslie Satenstein : Where have you ever seen Mozilla even mention this? Firefox is the most popular browser in the GNU/Linux OS, so I don't see...

15 hours ago by apexwm on Firefox rapid release improves Fedora Linux
songmaster

SHleG: Do you remember building a clockwork scorpion kit (I'm pretty sure I have a photo of it somewhere) — I think it was called something like...

17 hours ago by songmaster on Software with everything
Chris Wortman

Good I love Yahoo! Their search engine is getting better than Google as of late. I find more of what I want on the first page, and usually within...

17 hours ago by Chris Wortman via Facebook on Linux Mint 13 ramps up for KDE release
PatrickG

openhgs has made the point for Windows 8 multiple monitors without realising it! With Windows 7 you have to switch the mouse and so your focus...

19 hours ago by PatrickG on Windows 8 could speed multi-monitor uptake
Leslie Satenstein

Mozilla has threatened to stop supporting Linux. I guess that UBUNTU is going with another browser. I indicated that if Mozilla stops supporting...

20 hours ago by Leslie Satenstein via Facebook on Firefox rapid release improves Fedora Linux
Andy Bolstridge

Much as I abhor Microsoft's licensing practices, this is almost certainly down to purchasing IT equipment via 3rd party consultants - you get the...

21 hours ago by Andy Bolstridge via Facebook on 6 million wasted licences and £1,200 PCs: welcome to government IT
Jack Schofield

@openhgs Windows users have had multiple desktops since Linus started writing Linux. They just haven't shipped as standard because not enough...

2 days ago by Jack Schofield on Windows 8 could speed multi-monitor uptake
Jack Schofield

@Phil at Cloud4 What, Microsoft gets £1,200 per PC and £1,622 per server? Gosh, I'm amazed....

2 days ago by Jack Schofield on 6 million wasted licences and £1,200 PCs: welcome to government IT
craigsc

You guys have no idea what is going on at Autonomy. Autonomy could have been a much more profitable organization. The sales operations at Autonomy...

2 days ago by craigsc on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Moley

How does this impact on dual or multi booting? Seems to me to more or less prohibit this, from Windows 8 anyway. Will Grub 2 recognise Windows 8,...

2 days ago by Moley on Windows 8 start-up speed forces USB boot workaround
apexwm

I don't understand why there cannot be a slight pause during the boot process so the user can press a key. Many operating systems do this, even if...

2 days ago by apexwm on Windows 8 start-up speed forces USB boot workaround
Gavin Goodman

You can now buy the Xi3 modular computer in the UK at http://www.ocdistribution.com . This can be bought with the Tand3m software, pricing and...

2 days ago by Gavin Goodman on CES 2012: Xi3 microSERV3R
Phil at Cloud4

I agree: Mike Lynch can clearly build a business and manage strategy. I suspect the exit of Mike is more likely the end of a planned handover...

2 days ago by Phil at Cloud4 on HP cuts 27,000 staff as Autonomy chief Lynch leaves
Phil at Cloud4

This is unbeleivable government wastage with only one winner... Microsoft 1 - Tax payer Nil!

2 days ago by Phil at Cloud4 on 6 million wasted licences and £1,200 PCs: welcome to government IT
Mispam

So what do you do when you can't boot into windows? Why can't I just hold Shift while I power up instead of having to boot into windows and click a...

2 days ago by Mispam on Windows 8 start-up speed forces USB boot workaround
apexwm

I've also seen that Mac OS X for Intel machines is supposed to run in VirtualBox, which would also be a nice solution. I've never tried it though.

2 days ago by apexwm on xTreme Triple Booting: Linux, Mac & Windows