library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
patients <- read.delim("patient-data.txt")
patients <- tbl_df(patients)
patients_clean <- mutate(patients, Sex = factor(str_trim(Sex)))
patients_clean <- mutate(patients_clean, Height= as.numeric(str_replace_all(patients_clean$Height,pattern = "cm","")))
- We want to calculate the Body Mass Index (BMI) for each of our patients
- \(BMI = (Weight) / (Height^2)\)
- where Weight is measured in Kilograms, and Height in Metres
- Create a new BMI variable in the dataset
- A BMI of 25 is considered overweight, calculate a new variable to indicate which individuals are overweight
- For a follow-on study, we are interested in overweight smokers
- clean the
Smokes
column to contain just TRUE
or FALSE
values
- How many candidates (Overweight and Smoker) do you have?
- (EXTRA) What other problems can you find in the data?
patients_clean <- mutate(patients_clean, Weight = as.numeric(str_replace_all(patients_clean$Weight,"kg","")))
patients_clean <- mutate(patients_clean, BMI = (Weight/(Height/100)^2), Overweight = BMI > 25)
patients_clean <- mutate(patients_clean, Smokes = str_replace_all(Smokes, "Yes", TRUE))
patients_clean <- mutate(patients_clean, Smokes = as.logical(str_replace_all(Smokes, "No", FALSE)))