# Function to calculate summary statistics for a specified numeric variable by species
<- function(data, variable) {
summarize_species
require(dplyr)
%>%
data group_by(species) %>%
summarise(
Count = n(),
Mean = mean({{variable}}, na.rm = TRUE),
SD = sd({{variable}}, na.rm = TRUE),
Min = min({{variable}}, na.rm = TRUE),
Max = max({{variable}}, na.rm = TRUE)
%>%
) ungroup()
}
# Example usage:
# summarize_species(data = palmerpenguins::penguins,
# variable = flipper_length_mm)
Introduction
This section covers the basics of handling and analyzing the palmerpenguins
dataset. You can use these example functions and dataset to create and use custom functions to explore this dataset for your mini-package.
The Palmer Penguins Dataset
- Is a relatively new dataset that contains information about penguins and is accessible through the
palmerpenguins
package. - Includes data about penguin species collected from Palmer Station, Antarctica.
- Variables include: species, island, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, sex, and year.
- The dataset is clean and requires minimal preprocessing for analysis.
- Download it here
Function: penguin_summary
Purpose
Returns a data frame with summary statistics. Useful for quickly summarising penguins by attributes like flipper length or body mass.
Code
Function: penguin_plot
Purpose
Plots a histogram of the selected attribute for each of the palmerpenguins species.
Code
# Function to create a histogram for each species based on a specified numeric variable
<- function(data, variable_name) {
plot_species_distribution
require(ggplot2)
ggplot(data, aes_string(x = variable_name, fill = "species")) +
geom_histogram(bins = 30, alpha = 0.6, position = "identity") +
facet_wrap(~species, scales = "free_y") +
labs(title = paste("Distribution of", variable_name, "by Species"),
x = variable_name,
y = "Frequency") +
theme_minimal()
}
# Example usage:
# Plot the distribution of 'body_mass_g' for each penguin species
# plot_species_distribution(palmerpenguins::penguins, "body_mass_g")
Next steps
Use the provided dataset here and at least one of these functions to build your own mini-package.
Remember that to provide meaningful documentation for your functions, which should include:
- a description of the function,
- the input parameters,
- the output,
- and an example of usage.