Click here to Skip to main content
14,693,599 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I've written a utility program in "C" that works on text files and I need to check if the file selected by the user is a text file (not by using suffix). It needs to work on Windows and Linux. How can I determine if the file is or is not a text file?

What I have tried:

On Windows I've tried
GetFileType()
and
GetFileAttributes()
Posted
Updated 7-Oct-20 17:17pm

On Linux you can use the file command and it will give you a good idea of the type of file:

I just ran it against a few files and here is what it looks like:

dev-1:~/Downloads$ file vext_v017.zip
vext_v017.zip: Zip archive data, at least v2.0 to extract
dev-1:~/Downloads$ file typescript.mobi
typescript.mobi: Mobipocket E-book "TypeScript Deep Dive", 2010970 bytes uncompressed, version 6, codepage 65001
dev-1:~/Downloads$ file download.png
download.png: PNG image data, 724 x 724, 8-bit/color RGBA, non-interlaced


However, I don't know of an equivalent on Windows.

The following link suggests some possible workarounds.
I tried out the Bash solution and it does indeed work. If you run the bash shell program on windows then you can use the file command and it works rather nicely.


What is the equivalent to the Linux File command for windows? - Super User[^]
   
v2
Comments
Richard MacCutchan 7-Oct-20 11:26am
   
Super tip.
You cant'. All files are just a stream of bytes. There is no difference between a "text file" and a "binary file".

The only difference is in your code. How you open the file, and with what options, is what determines how the content of the file is interpreted, be it text or data.
   
You can't. There is no "attribute" that windows stores which determines file type - just the extension which can indicate what kind of programs can open it successfully.
Some files do contain information at the start which says "this is a ..." but it's not compulsory, and in some cases is text based anyway - like an EXE file which will start with either "MZ" or "PE" - but there is nothing preventing a text file from starting with "PET LIST" and passing the "is it executable?" test!

Sorry, the only way to determine if a file is a text file is to read ti and see if it contains "non-text" values. Even then, if it's a Unicode file in (for example) Persian it will contain values outside the "normal text" range.
   
Quote:
I need to check if the file selected by the user is a text file

The only way I see is reading the file and check if contain match your criteria.
I fear you have to write your own utility.
First, you need to define what is text for you:
English is mostly lower 128 ASCII, but under 32, one can argue that it is not text expect for CR and LF.
But English also include words from other languages like "déjà vu" which comes from French.
When I use csv files, I replace commas with tabs (ASCII 9), is it text for you ?
If file comes from DOS, text that use codes above 128 are perfectly normal for Europeans.
File can also use UTF encoding.
   
This may need some changes, but appears to basically work (for my purposes).

//====== FUNCTION - CHECK IF INPUT FILE IS A TEXT FILE ======//
int fnCheckIfTextFile(FILE *pfFile) {
  fseek(pfFile, 0, SEEK_SET);
  if (ftell(pfFile) != 0) {
    printf("fnCheckIfTextFile: There is an BOF problem with this file.\n");
    return 0;                           // FAILURE
  }
  int iByteTot = 0;
  int iAsciiTot = 0;
  int cVal = 0;
  while((cVal = fgetc(_pfFileIn))!=EOF && iByteTot < 10000) {
    iByteTot++;
    iAsciiTot += (cVal > 31 && cVal < 127);
  }
  printf("Total bytes = %d, ascii bytes = %d\n", iByteTot, iAsciiTot);
  int iPercentAscii = ((iAsciiTot*100) / iByteTot);
  if (iPercentAscii < 90) {
    printf("This is not a text file\n");
  }
  return (iPercentAscii >= 90);
}
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900