Packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
√ dplyr 1.1.4 √ read r 2.1.5
√ forcats 1.0.0 √ stringr 1.5.1
√ ggplot2 3.5.1 √ tibble 3.2.1
√ lu bridate 1.9.3 √ tidyr 1.3.1
√ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
× dplyr::filter() masks stats::filter()
× dplyr::lag() masks stats::lag()
Use the conflicted package (
library(broom)
Crabs
The crab species Leptograpsusvariegatus has two colour forms, blue and orange. Fifty crabs of each colour form. and sex were collected at Fremantle, Western Australia, and various measurements were taken on each crab. The variables of interest to us are:
· sp species (B is blue and O is orange, a letter O)
· sex Male or female
· CL carapace length
· index a number between 1 and 50 unique within each species-sex combination.
All of the measurements are in millimetres. The data are in http://ritsokiguess.site/datafiles/crabs.csv.
1. (1 point) Read in and display (some of) the data.
2. (2 points) Create, save, and display (some of) a dataframe. with only the species and the carapace length (but all the observations for those two variables).
3. (3 points) Attempt to display the carapace lengths in two columns, one column for each species. What happens? Why did it happen? Explain briefly.
4. (2 points) Starting from the dataframe. you read in from the file, create, save, and display (some of) a dataframe. with species, sex, carapace length, and the column index (with all the observations).
5. (3 points) Repeat your code to put the carapace lengths into two columns, one for each
species, but starting from the dataframe. you just created. Do you now get something that makes sense? Why is it different from your previous attempt? Explain briefly.
Consumption of natural gas
How does the price of natural gas affect how much of it people use? Consumption of natural gas was measured for 20 towns in Texas (in thousands of cubic feet per customer). In each town, the price was different (in cents per cubic feet). The data are in http://ritsokiguess.site/datafiles/texasgas.csv.
6. (1 point) Read in and display (some of) the data.
7. (2 points) Make a suitable graph of the two variables in your dataframe.
8. (2 points) What does your graph tell you about the relationship between price and consumption? (If you wish, use “form, direction, strength” as a guide.)
9. (2 points) Fit a straight line relationship between the two variables, and display the output. (You may not think this is the best thing to do, but do it anyway for the moment.)
10. (4 points) Draw a graph that indicates a problem with the model you just fitted, and explain briefly what that problem is.
11. (2 points) Find out what R’s ifelse function does, and explain briefly. Cite your source in a way that the grader can check it (should they wish to).
12. (2 points) Create a new column in your dataframe. called gt that is the price minus 60 if price is greater than 60 and 0 otherwise. Save the resulting dataframe.
13. (2 points) Fit a regression predicting consumption from price and your new variable, and display the results.
14. (4 points) Plot the data and fitted relationship from the model you just fitted, on one graph. Join the fitted values with lines. Describe the form. of the relationship you just fitted.