Click here to Skip to main content
15,561,011 members
Articles / Programming Languages / Bash
Tip/Trick
Posted 14 Oct 2020

Tagged as

Stats

6.6K views
1 bookmarked

Using Locate Databases on MacOS Unix

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
14 Oct 2020CPOL8 min read
Using file name databases on MacOS Unix
This is a small tutorial on how to use file name databases on MacOS Unix with tips and tricks to get around pitfalls.

Introduction

As probably all Unix systems, MacOS also supports file name databases which are part of the Unix findutils. File name databases are relatively unknown to occasional Unix users but provide some useful features that are worth exploring.

Background

I myself came across the file name database feature when searching the internet for ways to organize the files on my Mac. While file name databases are not doing the job of organizing your files, they can become quite useful for searching files by name.

Using the Code

Usually, you would use the Unix find command to search for files by name. Say, you want to find all files with extension jpg starting in the current directory and searching all its subfolders, you would enter at the Unix prompt:

$ find . -name *.jpg -print

This works perfectly fine but find will do a full scan of the file system it is supposed to search each time it is invoked. If there are not too many directories to search, this is still efficient. However, if you repeatedly need to search for files in a larger directory structure, it is more efficient to use a file name database.

A file name database is a file that contains a list of file names including the full path where the files are located. When searching for files using a file name database, you would not scan the directory structure but just lookup the file and path in the file name database which is of course much quicker for large directory structures.

So to search for all files in your home directory and all its subfolders with extension jpg, you would enter at the Unix prompt:

$ locate $HOME/*.jpg

The drawback, however, is that in order to be able to use a file name database in your search, you need to first build one and then update it regularly.

Building a File Name Database

Before you can use the locate command on the Unix prompt, you first need to build a file name database which is done by running the /usr/libexec/locate.updatedb command.

Running the command can be done straight away, however it is advisable to first look at the settings that are used to build the file name database.

Settings for Building the File Name Database

The /usr/libexec/locate.updatedb command takes the settings for building the file name database from the following variables:

  • TMPDIR: This is the directory which is used for temporary files.
  • FCODES: This variable holds the name and path of the file name database.
  • SEARCHPATHS: This variable holds a list of paths to be searched.
  • PRUNEPATHS: This variable holds a list of paths inside the paths of SEARCHPATHS to be excluded.

The values of these variables are set as per below:

  • The /usr/libexec/locate.updatedb command first checks for the environment variable LOCATE_CONFIG. If it is set to a file name, the variable settings will be taken from this file.
  • In case the environment variable LOCATE_CONFIG is not set, the variable settings will be taken from the /etc/locate.rc file.
  • In case the /etc/locate.rc file does not exist or contains no settings, defaults will be used that are hard coded in /usr/libexec/locate.updatedb

Usually, neither the LOCATE_CONFIG environment variable is set nor the /etc/locate.rc file has any variable settings, so the defaults are used which are:

  • TMPDIR="/tmp"
  • FCODES="/var/db/locate.database"
  • SEARCHPATHS="/"
  • PRUNEPATHS="/tmp /var/tmp"

So with these settings, the /usr/libexec/locate.updatedb command would search the complete directory structure starting from the root directory (/) excluding directories /tmp /var/tmp.

At first glance, these settings look like a good starting point, however there is a twist to it as only files are added to the file name database that the user under which the command is run has actually access to.

As another twist, running it with sudo as super user will also not give you the full picture due to the internal workings of locate.updatedb.

Internal Workings of locate.updatedb

The command locate.updatedb is in fact a Unix shell script that basically does the following:

  1. in case it is invoked by super user (using sudo), it recursively calls itself under user nobody. Otherwise (which is also the case in the recursive call as user nobody), it starts directly with the following next step.
  2. It calls another Unix script /usr/libexec/locate.mklocatedb that uses the find command to search for all files starting in the directory tree(s) specified by the SEARCHPATHS variable (omitting the subtrees specified by the PRUNEPATHS variable) and writes them with their full path to a temporary file name database.
  3. It copies the content of the temporary file name database to the name and location specified by the FCODES variable.

This means that if you run it as super user using sudo, you will end up with a file name database that only contains file names of files to which the user nobody would have access to.

The reason for this behavior is this: As Unix is a multi user system and the file name database is accessible by every user, users could query each other's directory structure and the names of files therein which they normally could not.

A more sensible approach would therefore be to have individual file name databases for each user containing the directory structure and files of their respective home directories and a single central one for all other directories (excluding users' home directories).

Creation of a Central File Name Database

To create a central file name database excluding users' home directories as outlined in the previous section, edit the /etc/locate.rc file as per below:

$ sudo -e /etc/locate.rc

#
# /etc/locate.rc -  command script for updatedb(8)
#
# $FreeBSD: src/usr.bin/locate/locate/locate.rc,v 1.9 2005/08/22 08:22:48 cperciva Exp $

#
# All commented values are the defaults
#
# temp directory
TMPDIR="/tmp"

# the actual database
#FCODES="/var/db/locate.database"

# directories to be put in the database
SEARCHPATHS="/"

# directories unwanted in output
PRUNEPATHS="/tmp /var/tmp /Users /Volumes"

# filesystems allowed. Beware: a non-listed filesystem will be pruned
# and if the SEARCHPATHS starts in such a filesystem locate will build
# an empty database.
#
# be careful if you add 'nfs'
FILESYSTEMS="hfs ufs apfs"

This will search the directory structure starting with the root directory (/) omitting /tmp /var/tmp /Users and /Volumes. As you might have noticed, the FCODES Variable is not commented out. See the Points of Interest section below on the reasons behind it.

Once you have made the changes to /etc/locate.rc, you may start the creation of the file name database by running /usr/libexec/locate.updatedb as super user:

$ sudo /usr/libexec/locate.updatedb

If no output is printed, the command completed successfully and you will be able to use this file name database to locate files. You can check this out by trying the following examples (it is important that you start with a / before the *):

$ locate /*.txt

$ locate /*.jpg

These should return a more or less lengthy output.

To check that user directories were not scanned, run the below command:

$ locate /Users

This should not return any files in user directories.

Creation of an Individual File Name Databases per User

To create individual file name databases for directories and files in a user's home directory, first copy the /etc/locate.rc file to /etc/locate.users.rc and then edit it as per below:

$ sudo cp /etc/locate.rc /etc/locate.users.rc

$ sudo -e /etc/locate.users.rc

#
# Configuration for user home directory search
#
# temp directory
TMPDIR="/tmp"

# the actual database
FCODES="$HOME/locate.user.database"

# directories to be put in the database
SEARCHPATHS="$HOME"

# directories unwanted in output
# PRUNEPATHS="/tmp /var/tmp /Users /Volumes"

# filesystems allowed. Beware: a non-listed filesystem will be pruned
# and if the SEARCHPATHS starts in such a filesystem locate will build
# an empty database.
#
# be careful if you add 'nfs'
FILESYSTEMS="hfs ufs apfs"

Once you have made the changes to /etc/locate.users.rc, you may start the creation by running /usr/libexec/locate.updatedb as per below:

$ export LOCATE_CONFIG="/etc/locate.users.rc";/usr/libexec/locate.updatedb

If no output is printed, the command completed successfully and you will be able to use this file name database to locate files. You can check this out by trying the following examples (it is important that you start with a / before the *):

$ locate -d $HOME/locate.user.database /*.txt

$ locate -d $HOME/locate.user.database /*.jpg

These should return a more or less lengthy output with files from your home directory. The -d option tells locate to use the user's individual file name database.

Updating the File Name Directories

As files get continuously added, renamed or removed and also the directory structure is subject to change, you will need to regularly update the file name directories. To update the file name directories, you need to follow the same steps as for building them as outlined above, either manually or in a /System/Library/LaunchDaemons/com.apple.locate.plist job.

When manually updating the central file name database, first check that the value of LOCATE_CONFIG is not pointing to the configuration for the user file name database.

Points of Interest

The way the /usr/libexec/locate.updatedb script is implemented, it creates a temporary file name database in Step 1 (see section Internal Workings of locate.updatedb above) when invoked as super user that it then passes on to the recursive invocation as user nobody as value for the variable FCODES and to which the content of the other temporary file name database is copied to in Step 3. In Step 2 however, the FCODES value from the configuration file (/etc/locate.rc or specified by LOCATE_CONFIG variable) is loaded if set and is overwriting the FCODES value.

Very confusing but the bottom line is that in case you set FCODES to a value in the configuration file for locate.updatedb (even if it is to the default value) and invoke the script as super user, you will get the below error message and the script is aborted:

/usr/libexec/locate.updatedb: line 97: /var/db/locate.database: Permission denied

Also when invoked as super user, the script uses /var/db/locate.database hardcoded as the final file name database so even without the permission denied error, the value from /etc/locate.rc would not be used as the final name of the file name database.

I have attached a script to illustrate how a fix for these issues could look like.

History

  • 15th October, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Germany Germany
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionMessage Closed Pin
20-Apr-21 4:13
jprokos20-Apr-21 4:13 
AnswerInstructions need to be changed Pin
jprokos15-Sep-21 13:30
jprokos15-Sep-21 13:30 
GeneralRe: Instructions need to be changed Pin
h_wiedey2-Oct-21 22:54
h_wiedey2-Oct-21 22:54 
Good point. * would usually be expanded by the shell. However I just tested it without quotes and * seems to be passed without getting expanded. It seems the / makes the difference:
$ locate *.txt (returns nothing)
$ locate /*.txt (returns result from locate)
$ locate "*.txt" (returns result from locate)
$ echo "${BASH_VERSION}"
3.2.57(1)-release
$ uname -a
Darwin (...) 17.7.0 Darwin Kernel Version 17.7.0: Fri Oct 30 13:34:27 PDT 2020; root:xnu-4570.71.82.8~1/RELEASE_X86_64 x86_64

Also the result produced seems to be the same:
$ locate "*.txt" > out1.txt
$ locate /*.txt > out2.txt
$ diff out1.txt out2.txt
$ diff out1.txt out2.txt|wc -l
       0
$ wc -l out*
    1417 out1.txt
    1417 out2.txt
    2834 total

Anyway, it is the correct way to use quotes as you pointed out.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.