user1150103 - 1 year ago 59
HTML Question

# Remember and Repopulate File Input

I have a website that allows the user to upload a file multiple times for processing. At the moment I have a single file input but I want to be able to remember the users choice and show it on the screen.

What I want to know how to do is after a user selects a file I will remember their choice and redisplay the file input with the file pre-selected on reload of the page. All I need to know is how to remember and repopulate a file input.

I am also open to approaches that don't use a file input (if that is possible).

I am using JQuery

Ok, you want to "Remember and Repopulate File Input", "remember their choice and redisplay the file input with the file pre-selected on reload of the page"..
And in the comment to my previous answer you state that you're not really open to alternatives: "Sorry but no Flash and Applets, just javscript and/or file input, possibly drag and drop."

I noticed while browsing (quite some) duplicate questions (1, 2, 3, etc.), that virtually all other answers are along the lines of: "No you can't, that would be a security-issue", optionally followed by a simple conceptual or code example outlining the security-risk.

However, someone stubborn as a mule (not necessarily a bad thing up to a certain level) might perceive those answers as: "No, because I said so", which is indeed something different then: "No, and here are the specs that dis-allow it".
So this, is my third and last attempt to answer your question (I guided you to the watering-hole, I lead you to the river, now I'm pushing you to the source, but I can't make you drink).

Edit 3:

What you want to do was actually once described/'suggested' in RFC1867 Section 3.4:

The VALUE attribute might be used with <INPUT TYPE=file> tags for a default file name. This use is probably platform dependent. It might be useful, however, in sequences of more than one transaction, e.g., to avoid having the user prompted for the same file name over and over again.

And indeed, the HTML 4.01 spec section 17.4.1 specifies that:

User agents may use the value of the value attribute as the initial file name.

(By 'User agents' they mean 'browsers').

Given the facts that javascript can both modify and submit a form (including a file-input) and one could use css to hide forms/form-elements (like the file-input), the above statements alone would make it possible to silently upload files from a user's computer without his intention/knowledge.
It is clearly extremely important that this is not possible, and as such, (above) RFC1867 states in section 8 security Considerations:

It is important that a user agent not send any file that the user has not explicitly asked to be sent. Thus, HTML interpreting agents are expected to confirm any default file names that might be suggested with <INPUT TYPE=file VALUE="yyyy">.

However, the only browser (I'm aware of) that ever implemented this features was (some older versions of) Opera: it accepted a <input type="file" value="C:\foo\bar.txt> or value set by javascript (elm_input_file.value='c:\\foo\\bar.txt';).
When this file-box was unchanged upon form-submit, Opera would pop-up a security-window informing the user of what file(s) where about to be uploaded to what location (url/webserver).

Now one might argue that all other browsers were in violation of the spec, but that would be wrong: since the spec stated: "may" (it did not say "must") ".. use value attribute as the initial file name".
And, if the browser doesn't accept setting the file-input value (aka, having that value just be 'read-only') then the browser also would not need to pop-up such a 'scary' and 'difficult' security-pop-up (that might not even serve it's purpose if the user didn't understand it (and/or was 'conditioned' to always click 'OK')).

Let's fast-forward to HTML 5 then..
Here all this ambiguity is cleared up (yet it still takes some puzzling):

Under 4.10.7.1.18 File Upload state we can read in the bookkeeping details:

• The value IDL attribute is in mode filename.
...
• The element's value attribute must be omitted.

So, a file-input's value attribute must be omitted, yet it also operates in some kind of 'mode' called 'filename' which is described in 4.10.7.4 Common input element APIs:

The value IDL attribute allows scripts to manipulate the value of an input element. The attribute is in one of the following modes, which define its behavior:

skipping to this 'mode filename':

On getting, it must return the string "C:\fakepath\" followed by the filename of the first file in the list of selected files, if any, or the empty string if the list is empty. On setting, if the new value is the empty string, it must empty the list of selected files; otherwise, it must throw an InvalidStateError exception.

Let me repeat that: "it must throw an InvalidStateError exception" if one tries to set an file-input value to a string that is not empty !!! (But one can clear the input-field by setting it's value to an empty string.)

Thus, currently and in the foreseeable HTML5 future (and in the past, except Opera), only the user can populate a file-input (via the browser or os-supplied 'file-chooser'). One can not (re-)populate the file-input to a file/directory with javascript or by setting the default value.

## Getting the filename/file-path

Now, suppose it was not impossible to (re-)populate a file-input with a default value, then obviously you'd need the full path: directory + filename(+ extension).

In the past, some browsers like (most notable) IE6 (up to IE8) did reveal the full path+filename as value: just a simple alert( elm_input_file.value ); etc. in javascript AND the browser also sent this full path+filename(+ extension) to the receiving server on form-submit.
Note: some browsers also have a 'file or fileName' attribute (usually sent to the server) but obviously this would not include a path..

That is a realistic security/privacy risk: a malicious website(owner/exploiter) could obtain the path to a users home-directory (where personal stuff, accounts, cookies, user-portion of registry, history, favorites, desktop etc. is located in known constant locations) when the typical non-tech windows-user will upload his files from: C:\Documents and Settings\[UserName]\My Documents\My Pictures\kinky_stuff\image.ext.
I did not even talk about the risks while transmitting the data (even 'encrypted' via https) or 'safe' storage of this data!

As such, more and more alternative browsers were starting to follow one of the oldest proven security-measures: share information on a need-to-know basis.
And the vast majority of websites do not need to know the file-path, so they only revealed the filename(+ extension).

By the time IE8 was released, MS decided to follow the competition and added an URLAction option, called “Include local directory path when uploading files”, which was set to 'disabled' for the general internet-zone (and 'enabled' in the trusted zone) by default.

This change created a small havoc (mostly in 'optimized for IE' environments) where all kinds of both custom code and proprietary 'controls' couldn't get the filename of files that were uploaded: they were hard-coded to expect a string containing a full path and extract the part after the last backslash (or forward slash if you were lucky...). 1, 2

Along came HTML5,
and as you have read above, the 'mode filename' specifies:

On getting, it must return the string "C:\fakepath\" followed by the filename of the first file in the list of selected files, if any, or the empty string if the list is empty.

and they note that

This "fakepath" requirement is a sad accident of history

and

For historical reasons, the value IDL attribute prefixes the filename with the string "C:\fakepath\". Some legacy user agents actually included the full path (which was a security vulnerability). As a result of this, obtaining the filename from the value IDL attribute in a backwards-compatible way is non-trivial. The following function extracts the filename in a suitably compatible manner:

function extractFilename(path) {
if (path.substr(0, 12) == "C:\\fakepath\\")
return path.substr(12); // modern browser
var x;
x = path.lastIndexOf('/');
if (x >= 0) // Unix-based path
return path.substr(x+1);
x = path.lastIndexOf('\\');
if (x >= 0) // Windows-based path
return path.substr(x+1);
return path; // just the filename
}


Note: I think this function is stupid: the whole point is to always have a fake windows-path to parse.. So the first 'if' is not only useless but even invites a bug: imagine a user with an older browser that uploads a file from: c:\fakepath\Some folder\file.ext (as it would return: Some folder\file.ext)...
I would simply use:

function extractFilename(s){
// returns string containing everything from the end of the string
//   that is not a back/forward slash or an empty string on error
//   so one can check if return_value===''
return (typeof s==='string' && (s=s.match(/[^\\\/]+\$/)) && s[0]) || '';
}


(as the HTML5 spec clearly intended).

Let's recap (getting the path/file name):

• older browsers (and newer browsers where one could enable this as an option like IE>=8) will reveal a full windows/unix path
• less older browsers will not reveal any path, just a filename(+extension)
• current/future/HTML5-compliant browsers will always pre-pend the string: c:\fakepath\ to the filename when getting the file-input's value
On top of that, they will only return the first filename (from a 'list of selected files') should the file-input accept multiple files and the user has selected multiple files.

Thus, in the recent past, currently and in the foreseeable HTML5 future one will usually only get the file-name.

That brings us to the last thing we need to examine: this 'list of selected files' / multiple-files, that leads us to the third part of the puzzle:

## (HTML5) File API

First of all: the 'File API' should not be confused with the 'File System API', here is the abstract of the File System API:

This specification defines an API to navigate file system hierarchies, and defines a means by which a user agent may expose sandboxed sections of a user's local filesystem to web applications. It builds on [FILE-WRITER-ED], which in turn built on [FILE-API-ED], each adding a different kind of functionality.

The 'sandboxed sections of a user's local filesystem' already clearly indicates that one can't use this to get a hold of user-files outside of the sandbox (so not relevant to the question, although one could copy the user-selected file to the persistent local storage and re-upload that copy using AJAX etc. Useful as a 'retry' on failed upload.. But it wouldn't be a pointer to the original file that might have changed in the mean-time).
Even more important is the fact that only webkit (think older versions of chrome) implemented this feature and the spec is most probably not going to survive as it is no more actively maintained, the specification is abandonned for the moment as it didn't get any significant traction

Let's continue with the 'File API',
it's abstract tells us:

This specification provides an API for representing file objects in web applications, as well as programmatically selecting them and accessing their data. This includes:

• A FileList interface, which represents an array of individually selected files from the underlying system. The user interface for selection can be invoked via <input type="file">, i.e. when the input element is in the File Upload state [HTML] .
• A Blob interface, which represents immutable raw binary data, and allows access to ranges of bytes within the Blob object as a separate Blob.
• A File interface, which includes readonly informational attributes about a file such as its name and the date of the last modification (on disk) of the file.
• A FileReader interface, which provides methods to read a File or a Blob, and an event model to obtain the results of these reads.
• A URL scheme for use with binary data such as files, so that they can be referenced within web applications.

So, FileList can be populated by an input field in file-mode: <input type="file">.
That means that all of the above about the value-attribute still applies!

When an input field is in file-mode, it gets a read-only attribute files which is an array-like FileList object that references the input-element's user-selected file(s) and is(/are) accessible by the FileList interface.
Did I mention that the files-attribute of the type FileList is read-only (File API section 5.2) ? :

The HTMLInputElement interface [HTML] has a readonly attribute of type FileList...

Well, what about drag and drop?

The real magic happens in the drop() function:

function drop(e) {
e.stopPropagation();
e.preventDefault();

var dt = e.dataTransfer;
var files = dt.files;

handleFiles(files);
}


Here, we retrieve the dataTransfer field from the event, then pull the file list out of it, passing that to handleFiles(). From this point on, handling the files is the same whether the user used the input element or drag and drop.

So, (just like the input-field type="file",) the event's dataTransfer attribute has an array-like attribute files which is an array-like FileList object and we have just learned (above) that the FileList is read-only..

The FileList contains references to the file(s) that a user selected (or dropped on a drop-target) and some attributes. From the File API Section 7.2 File Attributes we can read:

name

The name of the file; on getting, this must return the name of the file as a string. There are numerous file name variations on different systems; this is merely the name of the file, without path information. On getting, if user agents cannot make this information available, they must return the empty string.

lastModifiedDate

The last modified date of the file. On getting, if user agents can make this information available, this must return a new Date[HTML] object initialized to the last modified date of the file. If the last modification date and time are not known, the attribute must return the current date and time as a Date object.

and there is a size attribute:

F.size is the same as the size of the fileBits Blob argument, which must be the immutable raw data of F.

Again, no path, just the read-only filename.

Thus:

• (elm_input||event.dataTransfer).files gives the FileList Object.
• (elm_input||event.dataTransfer).files.length gives the number of files.
• (elm_input||event.dataTransfer).files[0] is the first file selected.
• (elm_input||event.dataTransfer).files[0].name is the file-name of the first file selected
(and this is the value that is returned from an input type="file").

What about this 'URL scheme for use with binary data such as files, so that they can be referenced within web applications', surely that can hold an private reference to a file that a user selected?

From the File API - A URL for Blob and File reference we can learn that:

This specification defines a scheme with URLs of the sort:

These are stored in an URL store (and browsers should even have their own mini HTTP-server aboard so one can use these urls in css, img src and even XMLHttpRequest.

One can create those Blob URLs with:

• var myBlobURL=window.URL.createFor(object); returns a Blob URL that is automatically revoked after it's first use.
• var myBlobURL=window.URL.createObjectURL(object, flag_oneTimeOnly); returns a re-usable Blob URL (unless the flag_oneTImeOnly evaluates to true) and can be revoked with window.URL.revokeObjectURL(myBlobURL).

Bingo you might think... however... the URL Store is only maintained during a session (so it will survive a page-refresh, since it is still the same session) and lost when the document is unloaded.

From the MDN - Using object URLs:

The object URL is a string identifying the File object. Each time you call window.URL.createObjectURL(), a unique object URL is created, even if you've created an object URL for that file already. Each of these must be released. While they are released automatically when the document is unloaded, if your page uses them dynamically, you should release them explicitly by calling window.URL.revokeObjectURL()

That means, that even when you store the Blob URL string in a cookie or persistent local storage, that string would be useless in a new session!

That should bring us to a full circle and the final conclusion:
It is not possible to (re-)populate an input-field or user-selected file (that is not in the browsers sandboxed 'Local storage' area).
(Unless you force your users to use an outdated version of Opera, or force your users to use IE and some activeX coding/modules (implementing a custom file-picker), etc)