File Naming Conventions
in the Eos Web

 

   
 
       
   

A world apart: No man is an island in Eos

My computer desktop is often cluttered with junk. My boss's desktop is reserved only for application shortcuts. Everyone is an expert on themselves, and no two people are alike. The filing system I use works great when I need to find something on my desktop. That "junk" is actually a chronologically sorted list of my most recent works and downloads. When it gets too full I just sort through them or delete them and the process starts over again. I also work in windows, so I don't think twice about spaces in file names, or whether or not to name a file "Portfolio.jpg" or "porfolio.jpg". It's my space and as long as I can use it, why should anyone care?

However, shared web space and personal hard-drive space are two very different things.  When it comes to naming files to go on the web there isn't a lot of room for debate in our office. Our rule of thumb is:

Use only lowercase alpha-numeric characters, plus the period and hyphen. Never use spaces. Never use underscores.

This isn't just an arbitrary standard. The naming conventions we use are a practical necessity of the web environment we work in.

The Eos Web consists of all the web space in the college of engineering, just as Eos Computing consists of all of the college computing*. Every web server ITECS runs is on a uniform server configuration and part of a very specific computing environment. The specifics of this computing environment shape the naming conventions. This facilitates the pursuits of finding, linking to, and sharing documents on the web.

The Case for Linux

The number one reason that we say files saved in web space should always be all lower case is because the web servers run on Linux and means they are case sensitive. This means that if you ask the server for "Portfolio.jpg" it will look for a file that matches that name exactly, including the capital"P". Even if there is a file named "portfolio.jpg", Linux will tell you "Portfolio.jpg" does not exist. This might seem absurd to Windows users. Windows is case insensitive, meaning it considers the lower-case "p" to be the same as "P" where file names are concerned. What is all boils down to is what is considered important. Unix and Linux were designed with a certain set of objectives in mind. Some of these objectives were simple,fast, and reliable. Windows on the other hand was made to be easy to use, pretty, and popular. Windows is doing extra work so the user doesn't have to. Linux is asserting that there is a difference between "P" and "p". It is possible to have both "portfolio.jpg" and "Portfolio.jpg" in the same folder in Linux. You can't do that on Windows. While having both files probably isn't that important, the speed and efficiency of Linux are. Linux doesn't waste time trying to figure out if you mean "P" or "p", it assumes you know what you're doing.

It may still seem ok to name files with capital letters, as long at it's done consistently. The problem is defining consistency. Does consistency mean we follow the rules of English? Does it mean we always capitalize the first letter? Do other people on the web share our standards?

By-in-large, most people use all lower case. The reason is simple. Capital letters mean extra work. There is no way around it, each switch between capital and lower case means an extra hand motion, and an increased chance of user error. I can't count how many times I've hit Caps Lock instead of Shift, or Tab instead of Caps Lock. All lowercase is just easier.

Another problem directly derived from the Linux/Windows wars is a quirk when working with AFS. AFS stands for Andrew File System. It is a file system that all users on campus share. AFS is built on the Unix/Linux file system models, so it is case sensitive. Windows users can access AFS space, but when there are two files with the same name, but different cases, something interesting happens. Windows can only understand the existence of one file with the same name. It thinks "Portfolio.jpg" and "portfolio.jpg" are the same thing. Windows will assume they are the same thing. It's a delusion it has to create to keep itself sane. If a user opens one of these files Windows will open the correct one, but from that point on (until the AFS cache, it's memory of AFS, is flushed) it will always open that same file, no matter which one the user tells it to open. So if the user opens "Portfolio.jpg", saves it, and then opens "portfolio.jpg" it will appear as if they are opening the file they have just saved. To the user it may look like Windows as saved over what was in "portfolio.jpg", but in actuality Windows is showing them "Portfolio.jpg", the file they just saved. To make matters worse the same thing happens with folders with names that are identical except for case. The entire contents of the folders appear identical.

Special Characters

All characters are special. But some are more important to a computer than others. Many, like the question mark, and Ampersand can't be used in filenames Linux, Windows, or both. There are some special characters that one or both will let you use. Please don't. Almost all special characters in URLs have to be converted by the browser to be understood by the server. This is a process that used to be unreliable. As the web has matured, browser and servers have gotten better at talking to each other. Still, all those special characters have to be translated and it just creates an extra point of failure in publishing, maintaining, and using web pages. The period and hyphen are the only punctuation you should use in URL's and file names for the web.

Users falling through the cracks

The final file naming pitfalls are spaces and underscores. Spaces actually fall under the realm of special characters. Unix/Linux file systems can deal with files with spaces in them, but they really don't like doing it. They weren't designed to. URLs weren't designed to have space in them. The Space is a "special character" that has to be translated in the conversation between the server and the browser. The reason to not use spaces is purely technical. The web wasn't designed with spaces in mind.

Underscores are another matter. Many people use underscores in URLs to replace spaces. There is no technical reason not to. The problem with underscores lies with the user, not the computer. Most links on the web are underlined. In most fonts an underlined underscore looks identical to an underlined space. It's hard or impossible to tell the difference. Visually users have a hard time with the distinction. Even without the problem of underlining, underscores are subliminal. At an subconscious level many people equivocate the underscore and the space. Users tend to forget underscores in file names. They remember the space, but they don't remember the low lying line. Psychologically, Underscores are problematic in URLs.

URLs for the People

In general, short concise urls are best. If a file can be named in one word, you don't have any reason to use a space or an underscore. Even if there are two words, if they are the right two words people will remember them, and most users will try a url without spaces first. Those that don't are learning to. Spaces in URLs are bad, and people rarely try using the underscore between words. Let users be lazy, don't make them use shift. Less work for them means less work for you.

*Yes I concede that not all computing in the college is part of Eos. But any time you SSH into remote.eos, use /afs/eos/, or log into a lab that is Eos. With the arrival of Student Owned Computing, and proposals beginning for college supported faculty computers, the realm of Eos is expanding every day.