Piping (aka Pipeline)

It’s not about plumbing – it’s linux shell commands and its special construction allowing to pass the output from one command right to input of another one creating a pipeline (command chains). Everyone who ever came across administration of a computer system knows how useful this can be! But you don’t have to be an administrator to use and appreciate this functionality – it can help you with everything, speeding up your work.
The real power of shell is that you get lots of small programs doing lots of small things – you can then connect them together to build another greater program, which then again could be used to build another, even bigger script. Who said that administration can’t be creative?
But this time lets stick to basics;

A pipeline is created by putting “|” char in between commands.

Example:

find ~/ -iname ‘*.jpg’ -type f -atime +30

This command will find all files (only) in the user home directory with extension .jpg that were last accessed more than a month a go.
Now, lets say that you want to delete these files. If doing it by hand it would take ages – imagine if there were around 500 files to be deleted in various directories! With great help comes the pipeline – lets just add the rm command and redirect output from find command to it.

find ~/ -iname ‘*.jpg’ -type f -atime +30 | xargs rm

So now, as you see, we’ve added “|” char, and right after that we’ve added new commands. Now all output is redirected to rm which will remove each file. Command xargs is there to run rm and pass to it the output from find. This could also be achieved in other ways.

The example below shows another approach to piping using output from find. This one is actually compressing all folders in home – no doubt useful when there is a 100 of them. It’s not using typical piping construction and all is controlled by find command that takes tar command as an -exec argument and runs it for every folder found replacing {} with name of the folder.

find ~/* -maxdepth 1 -type d -exec tar -jvcf ‘{}-ARCH.tgz’ ‘{}’ \;

The possibilities are endless and limited only by yourself. This could be archiving those files or just sending report to a user by email.

Here are more examples below:

grep -l KEYWORD * | xargs rm

This will remove all files in the current directory that contain the KEYWORD in it – helpful when you want get rid of all undelivered or removed emails.

ps aux | grep guest | grep -v grep | awk ‘{print $2}’

This set will display PIDs of all running processes of guest user – now we could just add | xargs kill to finish them.

And finally, to demonstrate some more power and complexity I’d like to share here an example that I’ve found on the internet (wikipedia.org):

curl “http://en.wikipedia.org/wiki/Pipeline_(Unix)” | \
sed ‘s/[^a-zA-Z ]/ /g’ | \
tr ‘A-Z ‘ ‘a-z\n’ | \
grep ‘[a-z]‘ | \
sort -u | \
comm -23 – /usr/share/dict/words

Here is what it does:
curl    reads the HTML from a web page,
sed    replaces all chars that aren’t spac93.174.137.179es or content of the website with spaces,
tr    brings all to lower case and converts spaces to new lines,
grep    selects only those lines that have at least 1 lower case char and removes empty lines,
sort    sort all the words in an alphabetical order and removes all duplicates (-u),
comm    finds common lines for two files, removes all lines present in second file (-23), the common ones, leaving only unique ones from first file, char “–“ tells it to             read from pipeline.

Backslash (“\”) on the end of each line allows you to write in many lines as if it was one line. This way you could check the spelling on whole web page just from the command line.

The number of available commands and the fact that everyone has specific syntax makes it a bit difficult in the beginning, but it will pay back later.

This entry was posted in Managed Hosting. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>