File Upload and Canonical Issues

Never trust the user input. The incoming data can be the source of many devils and a security flaw can be there just waiting for the right moment and the right person to break your application.
After finishing with the upload control I finally did the integration with the website. Now the users can select the files and send it to the website to be processed.
What are the security risks here? Something that can be called 'canonicalization issue'.
For a start all data can be seen on its canonical form. A canonical form is the most simple and most stardard form that any data can be represented, thus canonicalization is the process of converting the data to its canonical form.
Proficient JavaScript programmers are very aware of what I am talking about, and as a matter of fact in our system the user can search for a name using wildcards. So you can ask him: "Retrieve me a list of all the instances where its canonical form includes Bill as mandatory prefix" The user will probably say: "Retrieve what???" but if you ask them: "Give me a list of all the users where their names start with Bill" they will type in the system 'bill*'. The user normally does not know that but he is doing is performing a 'type of canonical query'.
Now, back to our file upload issue. A file name is a very common canonical type. You can call the same file as:
  • thairecipes.doc
  • c:\recipes\thairecipes.doc
  • c:\\recipes\\thairecipes.doc
  • c:\   recipes\thairecipes.doc
  • c:%3A%5Crecipes%5Cthairecipes.doc
As you probably figured the last one is the issue. Your Windows operating system will recognize the symbols %5C and %3A.
You see now because we are giving to the user the option to save in our system just about any file name he wants to at the same time we are also opening a door for a sort of canonical attack. Remember : Never trust the user. And by user I am not only talking about a person. In our context an user is any entity who uses a given resource or service, and for that matter an user indeed can be another system or another application.
A hacker would think: "how can I break into this site? Does it allow any easy access to any of its resources?". In our case, yes our website must allow the user to upload files.
What to do now? How to handle a file upload to a web server?
Well, first as a general rule you must not design a website that accept just about any file names created by the user and save it like that. As a matter of fact, any input must be validated and sanitized  if possible, not only in client-side but on the server-side as well.
A better design: Do not allow the user to save the file in the web server with the filename that he wants to use. Accept the file, keep the original filename somewhere and let the application rename that file with another name and then save it. I would suggest you to use a GUID string for that matter. That way you are not only closing the doors for a possible canonical attack but also you do not give a chance to a malicious user to try to find out the filenames you might have in your server. For example, If a hacker knows that there is a file called http:\\mywebsite\mydocs\clientid1\file1.doc he will try something like http:\\mywebsite\mydocs\clientid1\file2.doc, and then http:\\mywebsite\mydocs\clientid1\file3.doc and so on. By using an internal name rule creation you minimize his surface.
Another thing to observe: You don't have to fight against and defeat a malicious user, probably there can be hundreds of hackers trying to break your code and you are just one guy against them ( and you don't want to have any sleepless nights during weekends, do you? ) They always find a way to break your code. The best option is to minimize their attack surface. Chances are they are going to move on and concentrate their efforts to break a "weaker website" if your site if strong enough for the first rounds of attack.

These would be some instinctive considerations and additionally I would suggest to take a look at implementing File I/O guidelines as well. At the end of the day, it all depends about how secure you want to be, how much time you have available to implement it and how rigid the specifications were given.
See you later.