Clean Up and Format Open-ended Text

Usage

openendedCleanup(df, var, remove_values)

Arguments

df: Required, A tibble/data frame containing the character variable of text.
var: Required, the character variable to clean up from the tibble/data frame, needs to be in quotes.
remove_values: Required, a character vector of additional text to remove from the text, see example.

Value

A tibble which contains one character variable of clean text ready for use or output.

Examples

# Example data:
#  Training usefulness composite scale- 5 variables of that make up a scale:
# Responsible, Ethics, Standards, Practices, Morals
#  these are all on a 5-point likert scale of 1 to 5 needs to be
#  recoded to: c("Not at all useful", "Slightly useful", "Somewhat useful",
#                "Very useful", "Extremely useful")
# levels useful:
levels_useful <- c("Not at all useful", "Slightly useful", "Somewhat useful",
                   "Very useful", "Extremely useful")
# Data:
data <- dplyr::tibble(
 Responsible = sample(levels_useful, size = 100, replace = TRUE,
                       prob = c(0.1, 0.2, 0.3, 0.2, 0.1)),
 Ethics = sample(levels_useful, size = 100, replace = TRUE, prob = c(0.1, 0.2, 0.3, 0.2, 0.1)),
 Standards = sample(levels_useful, size = 100, replace = TRUE, prob = c(0.1, 0.1, 0.2, 0.3, 0.3)),
 Practices = sample(levels_useful, size = 100, replace = TRUE, prob = c(0.1, 0.1, 0.2, 0.3, 0.3)),
 Morals = sample(levels_useful, size = 100, replace = TRUE, prob = c(0.05, 0.05, 0.2, 0.3, 0.4)),
 Responsible_oe = ifelse(Responsible == "Not at all useful",
                 stringi::stri_rand_lipsum(sample(1:100)), NA_character_),
 Ethics_oe = ifelse(Ethics == "Not at all useful",
                 stringi::stri_rand_lipsum(sample(1:100)), NA_character_),
 Standards_oe = ifelse(Standards == "Not at all useful",
                 stringi::stri_rand_lipsum(sample(1:100)), NA_character_),
 Practices_oe = ifelse(Practices == "Not at all useful",
                 stringi::stri_rand_lipsum(sample(1:100)), NA_character_),
 Morals_oe = ifelse(Morals == "Not at all useful",
                 stringi::stri_rand_lipsum(sample(1:100)), NA_character_)
 ) %>% dplyr::select(dplyr::ends_with("_oe"))

# Set up character vector of text or other things like punctuation to remove from the text data:
remove_values <- c("N/A", ".", "A")

# Cleanup the open-ended response in the variable "Responsible_oe" with the function:
data %>% openendedCleanup(., "Responsible_oe", remove_values)
#> # A tibble: 13 × 1
#>    Responsible_oe                                                               
#>    <chr>                                                                        
#>  1 "A ligula ultricies, posuere litora ac erat curabitur, magna ut cursus\npell…
#>  2 "Aliquam commodo nisi sapien tellus ipsum dictumst habitant. Mauris tempor\n…
#>  3 "Aliquam nisi sapien ut ac habitasse, nec lectus eget erat. Sem, vehicula\nt…
#>  4 "Aptent tellus ornare aliquet magna finibus amet, sapien. Congue sed eleifen…
#>  5 "Dolor iaculis lacus enim velit neque id consectetur vitae odio. Iaculis lit…
#>  6 "Facilisi turpis ante posuere. Sed mi egestas elit, integer. Sed suscipit\np…
#>  7 "In aptent non sed felis quis litora justo at sed. Montes elementum ante\nau…
#>  8 "Litora montes nam, sem morbi pharetra vulputate velit ultrices posuere amet…
#>  9 "Nec hac at vivamus, amet ac felis. Praesent sollicitudin metus non eros lac…
#> 10 "Nulla, ut vivamus ac sed interdum. Mi nullam commodo porttitor sagittis. Eu…
#> 11 "Penatibus in at porta ut donec. Nec class lacinia malesuada, mauris egestas…
#> 12 "Pretium libero, erat, euismod ac erat non dictum id donec nec morbi. In, in…
#> 13 "Vel a et sodales tincidunt interdum justo etiam ac pellentesque in hac. Pur…