Working with Data
R packages can include data — not just functions. This is helpful when:
- You want to ship example data with your package
- You’re documenting datasets for teaching or analysis
- You use specific reference data across projects
There are two types of package data:
- External data: available to users after loading the package
- Internal data: used inside the package only
Step 1: Add Raw Data Script
Run:
::use_data_raw("my_dataset") usethis
This creates:
- A
data-raw/
folder (if it doesn’t exist) - A starter script at
data-raw/my_dataset.R
Inside that script, read in and clean your data.
Step 2: Read and Save Data
Replace the contents of data-raw/my_dataset.R
with something like:
# Read raw CSV
<- read.csv("path/to/your.csv")
my_dataset
# Clean or rename variables here
# my_dataset <- dplyr::rename(my_dataset, new_col = old_col)
# Save it for package use
::use_data(my_dataset, overwrite = TRUE) usethis
After this runs, you’ll see:
data/
└── my_dataset.rda
Step 3: Load and Use the Data
After loading your package:
library(mypackage)
my_dataset
The dataset is ready for use — just like iris
or mtcars
.
The object name (
my_dataset
) is the same as the name in your R script and.rda
file.
Step 4: Document the Dataset
You document datasets with roxygen2, just like functions.
Create a new file, e.g. R/data-documentation.R
:
#' Example dataset: my_dataset
#'
#' A dataset containing example data imported from a CSV.
#'
#' @format A data frame with 100 rows and 5 variables:
#' \describe{
#' \item{col1}{Description of column 1}
#' \item{col2}{Description of column 2}
#' \item{col3}{Description of column 3}
#' }
#' @source path/to/your.csv
"my_dataset"
Then run:
::document() devtools
Now you can view:
?my_dataset
✅ You’ll see a proper help page for the data!
Internal vs External Data
If you want to use a dataset inside your package but not expose it to users:
::use_data(my_secret_data, internal = TRUE) usethis
This stores it in R/sysdata.rda
. You can still use it inside your package code, but users won’t see it in the global namespace.
Tips for Good Data Documentation
📋 When documenting data, always include:
- What the dataset represents
- The number of rows and columns
- Descriptions for each variable (
@format
) - The source of the data
- Any transformation or filtering applied
Use @examples
too if it makes sense:
#' @examples
#' summary(my_dataset)
Coming Up Next
Now that you can include data and document it properly, let’s look at how to build, install, and share your package.