Introduction

This section covers the basics of handling and analysing the starwars dataset from the dplyr package in R. You can use these example functions and dataset to create and use custom functions to explore this dataset for your mini-package.

The Star Wars Dataset

  • Is part of the dplyr package in R.
  • Includes details about characters from the Star Wars universe.
  • Variables include: name, height, mass, hair_color, skin_color, eye_color, birth_year, gender, homeworld, species.
  • Some variables are lists and need special handling. If you load the my_stawars.csv dataset here, these columns are dropped for easier handling.

Function: top_starwars

Purpose

Returns a data frame ranked by a user-specified variable. Useful for sorting characters by attributes like height or mass.

Code

# Define the function
top_starwars <- function(variable = "height", top = 10) {
        # Load necessary libraries
        require(dplyr)
        
        # Ensure the dataset is loaded
        if (!exists("starwars")) {
                starwars <- dplyr::starwars
        }
        
        # Check if the specified variable exists in the dataframe
        if (!variable %in% names(starwars)) {
                stop("The specified variable does not exist in the starwars dataset.")
        }
        
        # Sort the dataframe based on the specified variable and handle missing values
        ranked_df <- starwars %>% 
                filter(!is.na(variable)) %>%
                arrange(desc(across(all_of(variable)))) %>%
                select(name, all_of(variable)) %>% 
                slice_head(n = top)
        
        # Return the top n entries
        return(ranked_df)
}

# Example usage:
# top_starwars("mass", 5)

Function: plot_relationship

Purpose

Plots the relationship between two variables in the starwars dataset, showcasing the potential correlations or distributions.

Code

# Define the function
plot_relationship <- function(x, y) {
        
        # Load necessary libraries
        require(ggplot2)
        require(dplyr)
        
        # Ensure the dataset is loaded
        if (!exists("starwars")) {
                starwars <- dplyr::starwars
        }
        
        # Check if the specified variables exist in the dataframe
        if (!(x %in% names(starwars) && y %in% names(starwars))) {
                stop("One or both of the specified variables do not exist in the starwars dataset.")
        }
        
        # Create the plot
        plot <- ggplot(data = starwars, aes(x = !!sym(x), y = !!sym(y))) +
                geom_point(aes(color = species)) +
                labs(title = paste("Relationship between", x, "and", y),
                     x = x,
                     y = y) +
                theme_minimal(base_family = "Helvetica") +
                theme(plot.background = element_rect(fill = "black",
                                                     colour = "black"),
                      panel.background = element_rect(fill = "black"),
                      text = element_text(colour = "white"),
                      axis.text = element_text(colour = "white"),
                      plot.title = element_text(size = 20, face = "bold",
                                                hjust = 0.5, colour = "white"),
                      legend.position = "bottom",
                      legend.text = element_text(size = 6),  # Smaller legend text
                      legend.key.size = unit(0.1, "cm"))  # Smaller legend keys)
        
        # Print the plot
        print(plot)
}

# Example usage:
# plot_relationship(x = "height", y = "mass")

Next steps

Use the provided dataset (here) and at least one of these functions to build your own mini-package.

Remember that to provide a meaningful documentation for your functions which should include:

  • a description of the function,
  • the input parameters,
  • the output,
  • and an example of usage.