You have to read the whole file anyway, so you can't help doing that. And, unless your words are all exactly the same length, you can't really do a binary search on the file, since you don't know how many bytes to jump forward/back to find the next record of interest. Even when you've found it, you've got to read the rest of the file in, write out the new word, then write out the buffer again. You might as well read the entire file into a buffer, walk sequentially through the buffer until you find where you want to put the new word then write the buffer to where you are now, the new word, and then the rest of the buffer. This is still linear in time, but since you're only doing one read and 2 or 3 writes it should be much faster e.g:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void insert_word(const char *file_name, const char *new_word)
{
FILE *wlist = fopen(file_name, "r+");
fseek(wlist, 0, SEEK_END);
size_t nbytes = ftell(wlist);
rewind(wlist);
char *buffer = malloc(nbytes+1);
fread(buffer, nbytes, 1, wlist);
buffer[nbytes] = 0;
rewind(wlist);
char *word_begin = buffer;
char *word_end = strchr(buffer, '\n');
while( *word_begin ) {
int cmp = strcoll(word_begin, new_word);
if(cmp == 0) {
break;
} else if( cmp > 0 ) {
size_t buffer_to_here = word_begin - buffer;
fwrite(buffer, buffer_to_here, 1, wlist);
fprintf(wlist, "%s\n", new_word);
size_t buffer_to_end = nbytes - buffer_to_here;
fwrite(word_begin, buffer_to_end, 1, wlist);
break;
}
word_begin = word_end + 1;
word_end = strchr(word_begin, '\n');
}
if(*word_begin == '\0') {
fseek(wlist, 0, SEEK_END);
fprintf(wlist, "%s\n", new_word);
}
fclose(wlist);
free(buffer);
}
I've omitted error checking in the above to reduce clutter. There really should be checks after malloc, fopen, fread, fwrite. I use strcoll() rather than strcmp() since the ordering of the words might be significant via the locale. Don't forget to call
locale(LC_COLLATE, "")
before calling this. The comparison will terminate at worst at the length of new_word, regardless of the length from word_begin to the end of the buffer, so there's no concern that you might be comparing megabytes of data. I tested this on a RPI2, and was able to add a 'zzzzzzzzz' to the end of a list of approx 500,000 words in a 4.8MB file in about 250 ms.
Note that your solution does not call free() for everything you malloc(). Not to do so means that you'll be leaking memory, and if you're on a system with limited memory, you could run out if called repeatedly. In my solution I pass in the file name of the file to work on. If you want to use an already open FILE pointer, you just need to pass that as
FILE *
. In that case, you have a requirement that the caller has opened the file with mode "r+".
Update:
actually, we can save a write here:
size_t buffer_to_here = word_begin - buffer;
fwrite(buffer, buffer_to_here, 1, wlist);
We haven't changed the file, so we can just fseek() instead:
fseek(wlist, buffer_to_here, SEEK_SET);
That's a minor improvement, but it saves
some disk i/o.