Spaces within

“Unix doesn’t works properly with filenames with spaces”. This assertion from a coworker of mine prompted my harsh reply: “Unix works perfectly with spaces, it’s just programmer sloppiness that prevents it from properly handling blanks”. I was true (as always, I would humbly add) but my pal wasn’t completely wrong, at least when talking about the specific subpart of Unix named shell. Although far superior than MS-DOS batch shell (that’s about the same command line you find in Windows) I bet it originated in more or less the same way – a core with functionalities clustered over during time in response of new needs or new opportunities.
Unix shell (be it bash, ksh, zsh or fish) is nowadays a powerful programming tool allowing the programmer to craft rather complex artifacts. This scope is for sure much broader than the one envisioned by first developers, and this turns out in multiple ways to do the same thing, different ways to do similar things and cryptic ways to do simple things.
The Unix command line conception dates nearly 40 years back in time! Things were pretty different, but I won’t annoy you with details, just leave your imagination wild… likely it was even worse. Fish is a recent attempt to overcome most of the shell problems, but it is not widespread as bash could be. As put by a professional some time ago: emacs may have tons of neat features, but you are SURE you’ll always find vi on any Unix, while you are not certain you’ll have emacs. So better invest your learning time on vi.
Well, back to shell. What’s wrong with blanks? The main problem is that a space is a valid character in a file name and, at the same time, it is a separator for command line arguments. Back to when every single byte could make the difference, it seemed the right thing to do to have optional quotes if the filename doesn’t contain any space. So you can write:

$ ls foo/

To get the listing of directory foo, but you have to write:

$ ls "bar baz/"

if you want the listing of directory “bar baz” (or you could escape the space with a backslash). This could be boring on interactive shells, but is usually overcome by the auto-completion feature (type ‘ba’ then tab and the line gets completed with the available options, in this case: bar baz).
From boring it turns in the range from annoying to irritating in shell scripts, where variables are not real variables like those you are used in high level languages, but just convoluted macros. For example:

a="bar baz"
ls $a

is processed and interpreted as:

ls bar baz

As you see the quotes disappears because they are processed by the assignment to put bar+space+baz in the ‘a’ variable. Once ‘$a’ is expanded, quotes are just forgotten memories. In order to write proper shell scripts you have to do something like:

a="bar baz"
ls "$a"

Of course this is error prone, because not only the syntax is valid, not only the script is likely to work perfectly in the simple test case the programmer uses to test the script, but also it is likely to work fine most of the times. After all the space character is used only by those Windows naïve users that aren’t aware of the blanks-hating-device they keep hidden under their desks.
Well, I proud myself of writing space-safe shell scripts, at least until I tried to write a script to find duplicated files on a filesystem.
The goal is simple, after many virtual relocations, multiple pet-projects and home-works, I have many files scattered around with the same content. It is not a matter of space saving, rather it is a question of order. Avoid redundant information or make sure that it is really the same stuff.
My design was to have a command similar to ‘find’, something that accepts any number of directories or files on the command line, such as:

$ find_dupes dir1/ dir2/ file …

Shell has two ways for operating this pattern – use the shift command or use one of the special variables $@ and $*.
The first way is useful if you want the shell to process one argument at time, while the latter is handy when you want to relay the command line to a command. In my case I wanted to pass the entire command line to the ‘find’ command, something like:

$ find $@ -type f -exec md5sum {} ;

This line works fine until a filename with space is encountered. In this case, since variables are indeed macros, a single argument with space is expanded into two (or more) distinct arguments. And there is no way to work around the limitation, unless you read the manual J. In this area, the discoverability of bash is quite lacking. The man page states that $* expands in the sequence of arguments separated by the first character of the IFS environment variable. E.g. if IFS is set to dash (‘-‘) and the command line has the following arguments foo bar baz, then $* expands to foo-bar-baz.
Conversely $@ expands to a space separated sequence of arguments, but if you enclose it in quotes, then single arguments are expanded in quotes. E.g. $@ expands to foo bar baz, and “$@” expands to “foo” “bar” “baz”. Eventually this is the solution.
So, basically it is true that Unix has no problem whatsoever with spaces inside filenames, it is also true that shell programming can handle them as well and ultimately is up to programmer sloppiness if the batch script fails, but it has to be recognized that a great effort and investment is required to the programmer to climb out his sloppiness.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.