How to identify which packages (and which functions) are actually being used by an R file

31 March 2020

When inheriting someone else’s R code, I always like to check which packages and functions it uses. Not just those it loads with library function calls, but those it actually uses. To this end I recently discovered this useful little utility which will provide you a list of all the packages and all the functions within those packages referenced in a given R file.

The package you will need to install is NCmisc, and the function we are going to use within it is the aptly named list.functions.in.file.

Example use of NCmisc::list.functions.in.file()

It works by identifying all the function calls in the given file and comparing these to functions currently loaded in memory.

This means that before running list.functions.in.file you will need to have loaded into memory all the packages you believe are referenced, and only those packages. The easiest way to do this is to open a fresh instance of RStudio and just execute the code in the file. (In the example above, the “global.R” file is part of a Shiny app which I first ran, then stopped.)

The output is a named list. Each item is named for the name of the package, or packages (plural) if a particular function name is common to more than one package. Each list entry is a list of function names. Any custom functions defined in the file itself will appear under .GlobalEnv.

If your R script file loads a package, but that package is not listed in this output, it is redundant and you can remove it. Decluttering is good.

In our example there are five functions referenced in this file which hail from the plotly package. But there is another one — the layout function — which is defined in both plotly and graphics.

Finding potential function name clashes

So this can be a tool to spot potential namespace conflicts. I happen to know we don’t need to worry about this particular clash, but that will not always be the case. In this same example, there is a function called get in the config package which clashes with a completely different base function of the same name.

Even with the config package loaded, we should always call its get function with an explicit reference — using the double-colon notation — to avoid unintended consequences.

And that’s it. You are of course free to wrap this in your own code if you want it to parse through multiple files, look for specific packages / functions, output results to a text file etc.

Related articles :

What the A-level results crisis tells us about responsibility in decision analytics

There are a lot of hot takes on the current exam results crisis, but alternative solutions may have their own significant failings…

More

Benefits management: 5 reasons benefits managers need to hire analysts

Without an analyst on your benefits team, you could be looking at fuzzy benefits, loose terminology and a shortage of numbers.

More

Quality Assurance of Analytical Modelling: The importance of QA in Model Risk Management

Quality assurance of analytical modelling is critical, but it in many organisations it doesn’t get the attention that it demands.

More