How to identify which packages (and which functions) are actually being used by an R file

31 March 2020

When inheriting someone else’s R code, I always like to check which packages and functions it uses. Not just those it loads with library function calls, but those it actually uses. To this end I recently discovered this useful little utility which will provide you a list of all the packages and all the functions within those packages referenced in a given R file.

The package you will need to install is NCmisc, and the function we are going to use within it is the aptly named list.functions.in.file.

Example use of NCmisc::list.functions.in.file()

It works by identifying all the function calls in the given file and comparing these to functions currently loaded in memory.

This means that before running list.functions.in.file you will need to have loaded into memory all the packages you believe are referenced, and only those packages. The easiest way to do this is to open a fresh instance of RStudio and just execute the code in the file. (In the example above, the “global.R” file is part of a Shiny app which I first ran, then stopped.)

The output is a named list. Each item is named for the name of the package, or packages (plural) if a particular function name is common to more than one package. Each list entry is a list of function names. Any custom functions defined in the file itself will appear under .GlobalEnv.

If your R script file loads a package, but that package is not listed in this output, it is redundant and you can remove it. Decluttering is good.

In our example there are five functions referenced in this file which hail from the plotly package. But there is another one — the layout function — which is defined in both plotly and graphics.

Finding potential function name clashes

So this can be a tool to spot potential namespace conflicts. I happen to know we don’t need to worry about this particular clash, but that will not always be the case. In this same example, there is a function called get in the config package which clashes with a completely different base function of the same name.

Even with the config package loaded, we should always call its get function with an explicit reference — using the double-colon notation — to avoid unintended consequences.

And that’s it. You are of course free to wrap this in your own code if you want it to parse through multiple files, look for specific packages / functions, output results to a text file etc.

Related articles :

Quality Assurance of Analytical Modelling: The importance of QA in Model Risk Management

Quality assurance of analytical modelling is critical, but it in many organisations it doesn’t get the attention that it demands.

More

How NOT to predict football match scores using analytical modelling techniques

If it were easy to predict the outcomes of football matches, all analysts would be millionaires, right? They aren’t. Here’s why…

More

Helping RNIB inform their strategic priorities with multi-criteria decision analysis

We were honoured to work with RNIB on a pro bono project to help identify the charity’s strategic priorities.

More