newbie newbie - 27 days ago 7
C Question

C Programming File Reading/Writing Technique

It is my first time to create a program with file reading and writing involved. Actually I'm wondering what is the best technique on doing this. Because when I compared my work with my classmate, our logic are very different from each other.

You see, our teacher asked us to do a simple student list system where users can add, edit and delete records. And he required us to make a file to save all the records so that we can access it the next time we use the program.

My solution to this problem is before the program open its menu, i read all the records inside and save it in an array[]. In doing so, i could manipulate all the records. Then before the user exits the program, I save it on the same file and overwrite all the records on it.

My classmate's solution is like this. When she adds a record, she access the file and append the data, when she edits a record, she access the file and edit the particular record, and when she deletes a record she access the file and deletes the record. So what she did is for every function she made, she's accessing the file.

Both of our work is of course, possible to code. But I am wondering, which is more efficient and effective to use if we are dealing with thousands or millions of records. Or are there other solutions better than what we did. Maybe you could share your file handling experiences with us... Thank you.

Answer Source

This is a classic case you'll encounter time and time again in programming: do I optimize for speed or memory usage?

And, like all such conundrums, there is no "correct" answer or perfect solution. In other words, you and your classmate are both right in your solutions to the problem.

With your solution of loading all of the records into memory, you "spend" memory in order to make accessing and modifying each of those records faster at run time. Storing all of the records in an array in memory takes up space, but because memory access is almost infinitely faster than disk access, your approach is going to run a lot faster than your classmate's.

By way of contrast, your classmate conserves RAM by waiting to load the data on demand from the hard disk. But that's going to cost her: hitting the hard disk is a terribly expensive process compared to fetching data that's already in memory, and she's going to be stuck doing this each time the user makes a change. Think about how long it takes to start a program versus switching to one that's already open.

And therein lies the tradeoff. Some of the important things to ask yourself here are:

  1. Is the data set (in the common configurations you'll be dealing with) too large (or going to become too large) to fit completely in memory? If you're dealing with typically small sets of data, computers now have enough RAM that it's probably worth it.

  2. How fast do you need to be able to access the data? Is real-time access important? Is it a particularly large or complex data set that would take too long to load from the hard disk on demand? What kind of performance do your users expect?

  3. What kind of system is your application targeting? Sometimes embedded systems and other special cases necessitate their own unique design approaches. You might have an abundance of RAM and very limited amounts of fixed storage, or you might have exactly the opposite. If you're using standard, modern PC hardware, what do your users want/need/already have? If most of your target users are using relatively "beefy" hardware already, you might make different design decisions than if you're aiming to target a larger potential audience—you've surely seen these trade offs made explicit before through a program's expressed system requirements.

  4. Do you need to allow for special situations? Things like concurrent access by multiple users make keeping all of your data in memory much more difficult. How are other users going to be able to read in the data that's only stored in memory on a local computer? Sharing a common file (perhaps even on a shared server) is probably going to be necessary here.

  5. Are there certain portions of your data that are accessed more frequently than others? Consider keeping those specific portions always in memory and lazy-loading the rest (meaning, you only attempt to fetch them into memory when/if they are accessed by the user).

And as that last point hints, something of a balanced or combined approach is probably about as close as you'll come to an "ideal" solution. You could store as much of the data in RAM as possible, while periodically writing any edits or modifications back to the file on disk during your application's idle state. There's plenty of time that the average program spends waiting on the user to do something, as opposed to the other way around. You can take advantage of these idle CPU cycles to flush out things being held in memory back to the disk without incurring any noticeable speed penalty. This approach is used all the time in software development, and helps to avoid the pitfall pointed out by EClaesson's answer. If your application crashes or otherwise quits unexpectedly, only a very small portion of data is likely to be lost because most of it was already committed to disk behind the scenes.

Postscript: Of course, Dark Falcon's answer is correct that in a production application, you would more than likely use something like a database to handle the data. But since this appears to be for educational purposes, I think understanding the basic trade offs behind each approach is far more important.