Skip to contents

This vignette will go over best practices and standard workflows for using R and RStudio for Blackstone Research and Evaluation (BRE).

This will include:

  • A brief introduction to R and RStudio

  • A discussion of shared workflow and file management at BRE.

  • Setting up basic global options in RStudio.

  • Creating and using RStudio Projects.

  • Using the here package to build relative file paths using .Rproj files.

Introduction and Installation of R and RStudio

RStudio is an integrated development environment (IDE) that is designed to run R, an open-source programming language for statistical computing and graphics. Both R and RStudio should be kept up to date, the most current release for each can be found at the respective links here: “Install R and “Install RStudio.

Shared Workflow

At Blackstone Research and Evaluation, we will strive to utilize best practices with data science and that starts with implementing a shared workflow in order to make all of our work reproducible across all of our projects.

A shared workflow will ensure that anyone at BRE will be able to open any work product (Rmarkdown files ‘.Rmd’ or R scripts ‘.R’) and run it on their own local machine to reproduce the exact same results as the original author. This ‘code as truth’ approach reduces confusion and saves time since R is doing the work once the code is produced and it will be unnecessary to spend a lot of time pointing and clicking to save files or create figures in other applications.

In order to achieve this, we must use the same standard workflow, file management, and setup in RStudio.

File Management

All work products (Rmarkdown files ‘.Rmd’ or R scripts ‘.R’) created in the course of completing a data analysis task, including any data cleaning and transformation, should be saved to the correct Blackstone Google Drive project folder.1

Our project folders in the Blackstone Google Drive have a standard folder structure, which looks something like this:2

#> /home/runner/work/_temp/Library/blackstone/2429_DEMO
#> ├── Archive
#> ├── Phase1_Sales_Finance
#> ├── Phase2_Evaluation
#> │   ├── Client_Meeting_Notes
#> │   ├── Concept_Data_Analysis_Guide
#> │   ├── Data
#> │   │   ├── Year1_(2020-2021)
#> │   │   ├── Year2_(2021-2022)
#> │   │   ├── Year3_(2022-2023)
#> │   │   └── Year4_(2023-2024)
#> │   ├── Data_Analysis
#> │   │   ├── Year1_(2020-2021)
#> │   │   ├── Year2_(2021-2022)
#> │   │   ├── Year3_(2022-2023)
#> │   │   └── Year4_(2023-2024)
#> │   ├── Instruments
#> │   └── Logic_Model
#> └── Phase3_Reporting

All raw survey data files should be kept in the correct project year folder in the Phase 2 - Evaluation/Data folder. So if you have survey data from ‘Year 4 (2023-2024)’ of a project it should be save in Phase 2 - Evaluation/Data/Year 4 (2023-2024). Using subfolders at the project year level will help to organize data from pre and post surveys, or multiple surveys with different topics for a given year.

All data analysis files should be kept in the correct project year folder in the Phase 2 - Evaluation/Data Analysis folder.

Later, we will go over how to use RStudio Projects to create a subfolder in the correct project year folder in Phase 2 - Evaluation/Data Analysis folder to organize these files and how to run all analyses in the Google Drive folder by importing data from the Phase 2 - Evaluation/Data subfolders. First, lets turn to some basic setup for RStudio.

RStudio Setup

Our workflow for data analysis at BRE should center around producing R code that is reproducible for anyone. We need to all maintain RStudio settings which help to facilitate that process.

The first step is to make sure that each new R session is a blank slate and nothing that we have done previously carries over to our current work.

There are two ways to do this:

  1. Either run usethis::use_blank_slate()3 or

  2. In RStudio, go to the top menu bar and select ‘Tools’, then ‘Global Options’, ‘General’ settings and make sure the ‘Workspace’ section (in the blue box) matches the options in figure 1 below:

Figure 1: RStudio Global Options

Figure 1: RStudio Global Options

This will ensure that each new R session starts with any empty global environment (no data or variables from a previous session).

Also, it is important to routinely restart your R session and re-run your code to make sure that everything in code produces the expected results, figures, and outputs. This can be done by in RStudio by first saving your current work in the Rmarkdown file(s) (.Rmd) or R script(s) (.R) file, and then by selecting ‘Session’ in to the top menu bar, clicking on ‘Restart R’.4 This will clear your global environment and unload all R packages.

Re-run your code to ensure it includes everything necessary to complete the assigned data tasks.

Creating and using RStudio Projects

Introduction to RStudio Projects

In order to have a shared workflow, we have to have a standard way to set up and organize all of files necessary for project work at BRE. We also need a way to set the working directory5 for each R session that is reproducible across different users. Using setwd() in code will work for the original author, but subsequent users will have to change this file path.

RStudio supports this with RStudio Projects. An RStudio Project allows the user to launch RStudio with its associated folder designated as the current working directory. Designating a new or existing folder as an RStudio Project creates a new file in that folder with the extension .Rproj, this file saves the settings for the Project, among other things. In the next section, we will go over how to use this file to set up relative file paths using the designated Project folder as the root directory.

Double-clicking on the .Rproj, will launch a fresh RStudio instance with a new R process that has the working directory set to the parent folder. This allows the user to easily switch between projects and not have to worry about setting the working directory or starting a new R process/clean global environment. Multiple instances of RStudio can also be opened to different Projects at the same time.

The file browser in each the RStudio instance will show the current working directory, see figure 2:

Figure 2: RStudio File Browser

Figure 2: RStudio File Browser

It is also shown just above the R Console as shown in figure 3:

Figure 3: RStudio Console Working Directory

Figure 3: RStudio Console Working Directory

Creating an RStudio Project

To create a new RStudio Project, go to the top menu bar and select ‘File’, then click on ‘New Project…’, which will launch the ‘New Project Wizard’.

On the first page ‘Create Project’, Select ‘New Directory’:

Figure 4.1: New Project Wizard- Create Project

Figure 4.1: New Project Wizard- Create Project

On the second page ‘Project Type’, Select ‘New Project’:

Figure 4.2: New Project Wizard- Project Type

Figure 4.2: New Project Wizard- Project Type

On the third page ‘Create New Project’, check the box for ‘Open in new session’ in lower-left corner, set the ‘Directory name’ to ‘analysis_{Your Initials}’, In ‘Create project as subdirectory of’. click ‘Browse…’, then navigate to the correct project folder: shown below in the next figure 4.4.

Figure 4.3: New Project Wizard- Name and Set Parent Folder

Figure 4.3: New Project Wizard- Name and Set Parent Folder

If hypothetically, you were tasked with a data analysis for Year 4 (2023-2024) of a project, you would go set the parent folder (where the new project folder will be created) to Phase 2 - Evaluation/Data Analysis/Year 4 (2023-2024).

After you click on ‘Browse…’ next to ‘Create project as subdirectory of’: navigate to the project folder in the Google Drive then go to the evaluation folder- ‘Phase 2 - Evaluation’, then the ‘Data Analysis’ then ‘Year 4 (2023-2024)’ and click ‘Open

Figure 4.4: Browse to Correct Project Folder

Figure 4.4: Browse to Correct Project Folder

Finally, after setting the correct parent folder, you will be back on the third page ‘Create New Project’.

If you have completed the above steps, the settings should resemble figure 4.5 below and you can proceed to click ‘Create Project’ in the lower-right corner. This will launch the new Project named ‘analysis_{Your Initials}’

Figure 4.5: New Project Wizard- Create New Project

Figure 4.5: New Project Wizard- Create New Project

Another way to create an RStudio Project is by using the rstudioapi6 package, which allows you to interact with RStudio with R code.

The code below creates a new folder, then creates the .Rproj file with the same name, running the last line of code will open the new RStudio Project in a new RStudio instance:7

# file path for new `analysis` folder:
analysis_fp <- fs::path("Phase 2 - Evaluation", "Data Analysis", "Year 4 (2023-2024)", "analysis")
# Create a new folder named `analysis` in for "Data Analysis"/Year4_(2023-2024):
fs::dir_create(analysis_fp) # Data Analysis Year 4

# Creating a new RStudio Project in the `analysis` folder:
rstudioapi::initializeProject(path = analysis_fp)

# Open the new project in a a new `RStudio` instance:
rstudioapi::openProject(path = analysis_fp, newSession = TRUE)

Now that we have an RStudio Project we can use it to save all of the files necessary for a data analysis task inside its folder and we don’t have to worry about using setwd() to manage the working directory.

It is also

Managing File Paths with the here Package

In order to highlight another advantage for working in an RStudio Project, let’s say we were tasked with the data analysis of a hypothetical Blackstone project. The project folder named 2429_DEMO in the Blackstone Google Drive and the analysis is for its fourth year (Year 4 (2023-2024)).

We have completed the above steps to create an RStudio Project in the Phase 2 - Evaluation/Data Analysis/Year 4 (2023-2024) folder which we named ‘analysis’ as shown in the directory tree below, there is a subfolder of Phase 2 - Evaluation/Data Analysis/Year 4 (2023-2024) named ‘analysis’ with an RStudio Project file named analysis.Rproj. We have also created an Rmarkdown file in the same folder called report.Rmd.

#> /home/runner/work/_temp/Library/blackstone/2429_DEMO/Phase2_Evaluation/Data_Analysis
#> ├── Year1_(2020-2021)
#> ├── Year2_(2021-2022)
#> ├── Year3_(2022-2023)
#> └── Year4_(2023-2024)
#>     └── analysis
#>         ├── analysis.Rproj
#>         └── report.Rmd

To launch the RStudio Project we can either go to the top menu bar in RStudio, click ‘File’, ‘Open Project…’ and navigate to the analysis.Rproj and click ‘Open’ or go to your operating system’s File Finder/Explorer navigate to the analysis.Rproj and double-click it.

To confirm which RStudio Project is open in your current RStudio instance go to the top-right corner of RStudio.

Image of RStudio top-right corner of IDE with arrow pointing out the display of the RStudio project that is currently open.

Figure 5: Current RStudio Project (top-right corner of IDE)

If you click on the current RStudio Project shown above, a drop-down menu gives you options to switch between Projects or open new RStudio instances to run multiple projects at once.

Now, if we have opened the RStudio Project file analysis.Rproj our working directory will be set to the analysis folder with this file path: /home/runner/work/_temp/Library/blackstone/2429_DEMO/Phase2_Evaluation/Data_Analysis/analysis

But that is the file path on my local machine, it would differ based on the location of the 2429_DEMO folder because how your Google Drive Desktop is set up (different user names or shortcut IDs). This also creates an issue if we want to navigate to the Data folder in our code to import data for analysis. How do we write out a file path that is stable over time and for different users?

Introduction to the here Package

The here package solves this issue by using RStudio Project files (.Rproj) to create relative file paths. In R anytime we set the working directory (either by using setwd() or by opening an RStudio Project), we can reference file paths using paths relative to the working directory using getwd(). But if that file path changes, either by moving folders or by the code running on different machines, this method will no longer work.

The here packages provides a way to always reference the location of a specified RStudio Project files (.Rproj) as the root directory for relative file paths, even if the working directory is switched to a subfolder of an RStudio Project folder (which is the case when knitting an .Rmd file).

The here package has two simple functions which allow us build reproducible file paths.

Using the here Package

If the here package is not installed, install it with: install.packages("here")

The first step is to set the current location of the file you are working in using here::i_am()8 with a relative path to the project root directory. This has to be done at the beginning of an R script (‘.R’) or first code chunk in an Rmarkdown file (‘.Rmd’).

If we were working in the file report.Rmd contained inside the analysis folder that would look like this:

here::i_am("report.Rmd")
#> here() starts at /home/runner/work/_temp/Library/blackstone/2429_DEMO/Phase2_Evaluation/Data_Analysis/Year4_(2023-2024)/analysis

here::i_am() finds and returns the project root directory. Since the file report.Rmd is contained alongside the analysis.Rproj file inside the analysis folder, the call above returns the file path to the analysis folder.

Once we have called here::i_am(), we can then load the here package and use the here() to again return the project root folder.

library(here)
here()
#> [1] "/home/runner/work/_temp/Library/blackstone/2429_DEMO/Phase2_Evaluation/Data_Analysis/Year4_(2023-2024)/analysis"

To be clear, the here() function differs from just returning the current working directory. In fact, if you change the working directory and here() will still return the project root directory set by the initial call to here::i_am().

Relative File Paths with here()

Now, we can use here() to build relative paths to read and write data. This works by including the folder and file destination inside the function, either as a single string or separated by commas. If we wanted to build a path to the Data folder in our example project folder, let’s see how that would work.

First, it is helpful to review the folder structure:

#> /home/runner/work/_temp/Library/blackstone/2429_DEMO/Phase2_Evaluation
#> ├── Client_Meeting_Notes
#> ├── Concept_Data_Analysis_Guide
#> ├── Data
#> │   ├── Year1_(2020-2021)
#> │   ├── Year2_(2021-2022)
#> │   ├── Year3_(2022-2023)
#> │   └── Year4_(2023-2024)
#> │       ├── clean_data
#> │       │   └── sm_data_clean.csv
#> │       ├── post_data
#> │       │   └── sm_data_post.csv
#> │       └── pre_data
#> │           └── sm_data_pre.csv
#> ├── Data_Analysis
#> │   ├── Year1_(2020-2021)
#> │   ├── Year2_(2021-2022)
#> │   ├── Year3_(2022-2023)
#> │   └── Year4_(2023-2024)
#> │       └── analysis
#> │           ├── analysis.Rproj
#> │           └── report.Rmd
#> ├── Instruments
#> └── Logic_Model

In order to build a path from the Phase2_EvaluationData_Analysis/Year4_(2023-2024)/analysis (which is out top-level project directory) to the Phase2_Evaluation/Data folder, we need to go “up” three levels which is done with file paths using “..” like this: ../../../

Using here() we can build a relative path from the analysis to the `Data folder:

# Setting up file path to the Data folder for Year 4:
data_fp <- here("../../../Data") # or can be written as: here("..", ".." ,".." ,"Data")
data_fp
#> [1] "/home/runner/work/_temp/Library/blackstone/2429_DEMO/Phase2_Evaluation/Data_Analysis/Year4_(2023-2024)/analysis/../../../Data"

This path can be added to other file paths to complete the desired path to any number of files within the Data folder.

Returning to the original example evaluation folder, if we want to read in data from the Data/Year 4 (2023-2024) folder to run analyses in the report.Rmd we can set up a series of file paths using here().

# Setting up file path to the Data folder for Year 4:
data_year4_fp <- here(data_fp, "Year4_(2023-2024)") # or can be written as: here("..", ".." ,".." ,"Data", "Year4_(2023-2024)")
data_year4_fp
#> [1] "/home/runner/work/_temp/Library/blackstone/2429_DEMO/Phase2_Evaluation/Data_Analysis/Year4_(2023-2024)/analysis/../../../Data/Year4_(2023-2024)"

We can re-use that file path to build the rest of the path needed to read in the clean data9 from Year 4:

# File path to clean data from Year 4 named: "sm_data_clean.csv"
here(data_year4_fp, "clean_data", "sm_data_clean.csv") 
#> [1] "/home/runner/work/_temp/Library/blackstone/2429_DEMO/Phase2_Evaluation/Data_Analysis/Year4_(2023-2024)/analysis/../../../Data/Year4_(2023-2024)/clean_data/sm_data_clean.csv"
# or can be written as: here("..", ".." ,".." ,"Data", "Year4_(2023-2024)", "clean_data", "sm_data_clean.csv")

# Reading in the clean data from Year 4 named: "sm_data_clean.csv"
readr::read_csv(file = here(data_year4_fp, "clean_data", "sm_data_clean.csv"), show_col_types = FALSE, n_max = 3, col_select = c(1,6:12))
#> # A tibble: 3 × 8
#>   respondent_id email_address      first_name last_name unique_id post_knowledge
#>           <dbl> <chr>              <chr>      <chr>         <dbl> <chr>         
#> 1  114628000001 coraima59@medhurs… Dellia     Collier    24290001 A little know…
#> 2  114628000002 mstamm@hermiston.… Etter      Williams…  24290002 Not knowledge…
#> 3  114628000003 precious.feil@gma… Marin      Lind       24290003 Extremely kno…
#> # ℹ 2 more variables: post_research_1 <chr>, post_research_2 <chr>

Try Yourself

The blackstone R package installation comes with a full example project directory, with the analysis folder containing analysis.Proj and the report.Rmd. The final code chunk below allows you to easily access the files where ever they are installed on your local machine. Any edits you make to these files will be lost when you re-install/update the blackstone R package.

Run the code below to open the example project in a new RStudio instance and then open report.Rmd by copying the next line of code into the console in the new RStudio instance:10

# Code to open the example `analysis` `RStudio` Project (will launch in a new `RStudio` instance):
rstudioapi::openProject(path = fs::path_package("2429_DEMO/Phase2_Evaluation/Data_Analysis/Year4_(2023-2024)/analysis/analysis.Rproj", package = "blackstone"),
                        newSession = TRUE)

# Code to open "report.Rmd" (must be in the `analysis` `RStudio` Project):
rstudioapi::navigateToFile(
  file = "report.Rmd"
)

Additional Resources for R, RStudio, and Project Workflows