class: center middle main-title section-title-7 # ggplot2 and the Tidyverse .class-info[ **Optional lab** .light[PMAP 8521: Program evaluation<br> Andrew Young School of Policy Studies ] ] --- name: outline class: title title-inv-8 # Plan for today -- .box-6.medium.sp-after-half[Packages and data] -- .box-1.medium.sp-after-half[Visualize data with ggplot2] -- .box-2.medium.sp-after-half[Transform data with dplyr] --- class: center middle main-title section-title-6 # Packages and data --- <figure> <img src="img/01/packages-base.png" alt="R packages, base" title="R packages, base" width="100%"> </figure> --- <figure> <img src="img/01/packages-packages.png" alt="R packages, other" title="R packages, other" width="100%"> </figure> --- class: title title-6 # Using packages .pull-left[ ```r install.packages("name") ``` .box-inv-6[Downloads files<br>to your computer] .box-inv-6[Do this once per computer] ] -- .pull-right[ ```r library("name") ``` .box-inv-6[Loads the package] .box-inv-6[Do this once per R session] ] --- class: title title-6 # The tidyverse .pull-left[ .box-inv-6.small["The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures."] .box-inv-6[… the tidyverse makes data science faster, easier and more fun…] ] .pull-right[ <figure> <img src="img/01/tidyverse.png" alt="The tidyverse" title="The tidyverse" width="100%"> </figure> ] --- class: title title-6 # The tidyverse <figure> <img src="img/01/tidyverse-language.png" alt="tidyverse and language" title="tidyverse and language" width="100%"> </figure> ??? From "Master the Tidyverse" by RStudio --- class: title title-6 # The tidyverse package .center[ ```r library(tidyverse) ``` ] .box-inv-6[The tidyverse package is a shortcut for<br>installing and loading all the key tidyverse packages] --- .pull-left[ ```r install.packages("tidyverse") ``` .tiny[ ```r install.packages("ggplot2") install.packages("dplyr") install.packages("tidyr") install.packages("readr") install.packages("purrr") install.packages("tibble") install.packages("stringr") install.packages("forcats") install.packages("lubridate") install.packages("hms") install.packages("DBI") install.packages("haven") install.packages("httr") install.packages("jsonlite") install.packages("readxl") install.packages("rvest") install.packages("xml2") install.packages("modelr") install.packages("broom") ``` ] ] -- .pull-right[ ```r library("tidyverse") ``` .tiny[ ```r library("ggplot2") library("dplyr") library("tidyr") library("readr") library("purrr") library("tibble") library("stringr") library("forcats") ``` ] ] --- class: title title-6 # Data frames and tibbles .box-inv-6.medium[Data frames are the most common kind of data objects; used for rectangular data (like spreadsheets)] -- .box-6[Data frames: R's native data object] -- .box-6[Tibbles (`tbl`): a fancier enhanced kind of data frame] -- .box-6.small[(You really won't notice a difference in this class)] --- class: title title-6 # Vectors .box-inv-6[Vectors are a list of values of the same time<br>(all text, or all numbers, etc.)] .box-inv-6[Make them with `c()`:] ```r c(1, 4, 2, 5, 7) ``` -- .box-inv-6[You'll usually want to assign them to something:] ```r neat_numbers <- c(1, 4, 2, 5, 7) ``` --- class: title title-6 # Basic data types <table> <tr> <td><b>Integer</b></td> <td>Whole numbers</td> <td><code class="remark-inline-code">c(1, 2, 3, 4)</code></td> </tr> <tr> <td><b>Double</b></td> <td>Numbers</td> <td><code class="remark-inline-code">c(1, 2.4, 3.14, 4)</code></td> </tr> <tr> <td><b>Character</b></td> <td>Text</td> <td><code class="remark-inline-code">c("1", "blue", "fun", "monster")</code></td> </tr> <tr> <td><b>Logical</b></td> <td>True or false</td> <td><code class="remark-inline-code">c(TRUE, FALSE, TRUE, FALSE)</code></td> </tr> <tr> <td><b>Factor</b></td> <td>Category</td> <td><code class="remark-inline-code">c("Strongly disagree", "Agree", "Neutral")</code></td> </tr> </table> --- class: title title-6 # Packages for importing data <table> <tr> <td><img src="img/01/readr.png" alt="readr" title="readr" width="150px"></td> <td>Work with plain text data</td> <td><code class="remark-inline-code">my_data <- read_csv("file.csv")</code></td> </tr> <tr> <td><img src="img/01/readxl.png" alt="readxl" title="readxl" width="150px"></td> <td>Work with Excel files</td> <td><code class="remark-inline-code">my_data <- read_excel("file.xlsx")</code></td> </tr> <tr> <td><img src="img/01/haven.png" alt="haven" title="haven" width="150px"></td> <td>Work with Stata, SPSS, and SAS data</td> <td><code class="remark-inline-code">my_data <- read_stata("file.dta")</code></td> </tr> </table> --- class: center middle main-title section-title-1 # Visualize data<br>with ggplot2 .class-info[ <figure> <img src="img/01/ggplot-logo.png" alt="ggplot" title="ggplot" width="15%"> </figure> ] --- ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) ``` <img src="01_lab_files/figure-html/unnamed-chunk-10-1.png" width="60%" style="display: block; margin: auto;" /> --- class: bg-full bg-y-75 background-image: url("img/01/napoleon-retreat.jpg") ??? Source: [Wikipedia](https://en.wikipedia.org/wiki/File:National_Museum_in_Poznan_-_Przej%C5%9Bcie_przez_Berezyn%C4%99.JPG) --- layout: true class: title title-1 --- # Long distance! .center[ <figure> <img src="img/01/napoleon-google-maps.png" alt="Moscow to Vilnius" title="Moscow to Vilnius" width="80%"> <figcaption>Moscow to Vilnius</figcaption> </figure> ] --- # Very cold! <img src="01_lab_files/figure-html/minard-temps-1.png" width="864" style="display: block; margin: auto;" /> --- # Lots of people died! <img src="01_lab_files/figure-html/minard-deaths-1.png" width="468" style="display: block; margin: auto;" /> --- layout: false class: bg-full background-image: url("img/01/minard.png") ??? Source: [Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/2/29/Minard.png) --- layout: true class: title title-1 --- # Mapping data to aesthetics .pull-left.center[ <figure> <img src="img/01/gg-book.jpg" alt="Grammar of Graphics book" title="Grammar of Graphics book" width="55%"> </figure> ] .pull-right[ .box-inv-1.medium[Aesthetic] .box-1[Visual property of a graph] .box-1.sp-after[Position, shape, color, etc.] .box-inv-1.medium[Data] .box-1[A column in a dataset] ] --- # Mapping data to aesthetics <table> <tr> <th class="cell-left">Data</th> <th class="cell-left">Aesthetic</th> <th class="cell-left">Graphic/Geometry</th> </tr> <tr> <td class="cell-left">Longitude</td> <td class="cell-left">Position (x-axis) </td> <td class="cell-left">Point</td> </tr> <tr> <td class="cell-left">Latitude</td> <td class="cell-left">Position (y-axis)</td> <td class="cell-left">Point</td> </tr> <tr> <td class="cell-left">Army size</td> <td class="cell-left">Size</td> <td class="cell-left">Path</td> </tr> <tr> <td class="cell-left">Army direction </td> <td class="cell-left">Color</td> <td class="cell-left">Path</td> </tr> <tr> <td class="cell-left">Date</td> <td class="cell-left">Position (x-axis)</td> <td class="cell-left">Line + text</td> </tr> <tr> <td class="cell-left">Temperature</td> <td class="cell-left">Position (y-axis)</td> <td class="cell-left">Line + text</td> </tr> </table> --- # Mapping data to aesthetics <table> <tr> <th class="cell-left">Data</th> <th class="cell-left"><code class="remark-inline-code">aes()</code></th> <th class="cell-left"><code class="remark-inline-code">geom</code></th> </tr> <tr> <td class="cell-left">Longitude</td> <td class="cell-left"><code class="remark-inline-code">x</code></td> <td class="cell-left"><code class="remark-inline-code">geom_point()</code></td> </tr> <tr> <td class="cell-left">Latitude</td> <td class="cell-left"><code class="remark-inline-code">y</code></td> <td class="cell-left"><code class="remark-inline-code">geom_point()</code></td> </tr> <tr> <td class="cell-left">Army size</td> <td class="cell-left"><code class="remark-inline-code">size</code></td> <td class="cell-left"><code class="remark-inline-code">geom_path()</code></td> </tr> <tr> <td class="cell-left">Army direction </td> <td class="cell-left"><code class="remark-inline-code">color</code> </td> <td class="cell-left"><code class="remark-inline-code">geom_path()</code></td> </tr> <tr> <td class="cell-left">Date</td> <td class="cell-left"><code class="remark-inline-code">x</code></td> <td class="cell-left"><code class="remark-inline-code">geom_line() + geom_text()</code></td> </tr> <tr> <td class="cell-left">Temperature</td> <td class="cell-left"><code class="remark-inline-code">y</code></td> <td class="cell-left"><code class="remark-inline-code">geom_line() + geom_text()</code></td> </tr> </table> --- # `ggplot()` template <code class ='r hljs remark-code'>ggplot(data = <b><span style='background-color:#CBB5FF'>DATA</span></b>) +<br> <b><span style='background-color:#FFDFD1'>GEOM_FUNCTION</span></b>(mapping = aes(<b><span style='background-color:#FFD0CF'>AESTHETIC MAPPINGS</span></b>))</code> -- <code class ='r hljs remark-code'>ggplot(data = <b><span style='background-color:#CBB5FF'>troops</span></b>) +<br> <b><span style='background-color:#FFDFD1'>geom_path</span></b>(mapping = aes(<b><span style='background-color:#FFD0CF'>x = longitude</span></b>,<br> <b><span style='background-color:#FFD0CF'>y = latitude</span></b>,<br> <b><span style='background-color:#FFD0CF'>color = direction</span></b>,<br> <b><span style='background-color:#FFD0CF'>size = survivors</span></b>))</code> --- layout: false .box-1[This is a dataset named `troops`:] .small[ <table> <thead> <tr> <th style="text-align:left;"> longitude </th> <th style="text-align:left;"> latitude </th> <th style="text-align:left;"> direction </th> <th style="text-align:left;"> survivors </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 24 </td> <td style="text-align:left;"> 54.9 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> 340000 </td> </tr> <tr> <td style="text-align:left;"> 24.5 </td> <td style="text-align:left;"> 55 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> 340000 </td> </tr> <tr> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> </tr> </tbody> </table> ] -- <code class ='r hljs remark-code'>ggplot(data = <b><span style='background-color:#CBB5FF'>troops</span></b>) +<br> <b><span style='background-color:#FFDFD1'>geom_path</span></b>(mapping = aes(<b><span style='background-color:#FFD0CF'>x = longitude</span></b>,<br> <b><span style='background-color:#FFD0CF'>y = latitude</span></b>,<br> <b><span style='background-color:#FFD0CF'>color = direction</span></b>,<br> <b><span style='background-color:#FFD0CF'>size = survivors</span></b>))</code> --- <img src="01_lab_files/figure-html/show-basic-minard-1.png" width="100%" style="display: block; margin: auto;" /> --- ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) ``` <img src="01_lab_files/figure-html/unnamed-chunk-11-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: true class: title title-1 --- # Heavy cars with better mileage? <img src="01_lab_files/figure-html/unnamed-chunk-12-1.png" width="60%" style="display: block; margin: auto;" /> --- # Aesthetics .pull-left-3[ .box-inv-1.small[`color` (discrete)] <img src="01_lab_files/figure-html/aes-color-discrete-1.png" width="100%" style="display: block; margin: auto;" /> .box-inv-1.small[`color` (continuous)] <img src="01_lab_files/figure-html/aes-color-continuous-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-middle-3[ .box-inv-1.small[`size`] <img src="01_lab_files/figure-html/aes-size-1.png" width="100%" style="display: block; margin: auto;" /> .box-inv-1.small[`fill`] <img src="01_lab_files/figure-html/aes-fill-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right-3[ .box-inv-1.small[`shape`] <img src="01_lab_files/figure-html/aes-shape-1.png" width="100%" style="display: block; margin: auto;" /> .box-inv-1.small[`alpha`] <img src="01_lab_files/figure-html/aes-alpha-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Mapping columns to aesthetics .small[ ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = class)) ggplot(mpg) + geom_point(aes(x = displ, y = hwy, size = class)) ggplot(mpg) + geom_point(aes(x = displ, y = hwy, shape = class)) ggplot(mpg) + geom_point(aes(x = displ, y = hwy, alpha = class)) ``` ] --- layout: false ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = class)) ``` <img src="01_lab_files/figure-html/unnamed-chunk-14-1.png" width="60%" style="display: block; margin: auto;" /> --- class: title title-1 section-title-inv-1 # ggplot playground .box-1[Add color, size, alpha, and shape aesthetics to your graph.] .box-1[Experiment!] .box-1[Do different things happen when you map aesthetics to discrete and continuous variables?] .box-1[What happens when you use more than one aesthetic?] --- class: title title-1 # How would you make this plot? <img src="01_lab_files/figure-html/unnamed-chunk-15-1.png" width="70%" style="display: block; margin: auto;" /> --- .left-code[ ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = class)) ``` ] .right-plot[ ![](01_lab_files/figure-html/color-aes-example-1.png) ] --- .left-code[ ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy), color = "blue") ``` ] .right-plot[ ![](01_lab_files/figure-html/color-set-example-1.png) ] --- .pull-left[ .small[ ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = "blue")) ``` <img src="01_lab_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> ] ] .pull-right[ .small[ ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy), color = "blue") ``` <img src="01_lab_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> ] ] --- layout: true class: title title-1 --- # What's the same? What's different? .pull-left[ <img src="01_lab_files/figure-html/unnamed-chunk-18-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="01_lab_files/figure-html/unnamed-chunk-19-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Geoms <code class ='r hljs remark-code'>ggplot(data = <b><span style='background-color:#CBB5FF'>DATA</span></b>) +<br> <b><span style='background-color:#FFDFD1'>GEOM_FUNCTION</span></b>(mapping = aes(<b><span style='background-color:#FFD0CF'>AESTHETIC MAPPINGS</span></b>))</code> --- # Possible geoms <table> <tr> <th class="cell-left"></th> <th class="cell-left">Example geom</th> <th class="cell-left">What it makes</th> </tr> <tr> <td class="cell-left"><img src="img/01/geom_bar.png"></td> <td class="cell-left"><code class="remark-inline-code">geom_col()</code></td> <td class="cell-left">Bar charts</td> </tr> <tr> <td class="cell-left"><img src="img/01/geom_text.png"></td> <td class="cell-left"><code class="remark-inline-code">geom_text()</code></td> <td class="cell-left">Text</td> </tr> <tr> <td class="cell-left"><img src="img/01/geom_point.png"></td> <td class="cell-left"><code class="remark-inline-code">geom_point()</code></td> <td class="cell-left">Points</td> </tr> <tr> <td class="cell-left"><img src="img/01/geom_boxplot.png"></td> <td class="cell-left"><code class="remark-inline-code">geom_boxplot()</code> </td> <td class="cell-left">Boxplots</td> </tr> <tr> <td class="cell-left"><img src="img/01/geom_sf.png"></td> <td class="cell-left"><code class="remark-inline-code">geom_sf()</code></td> <td class="cell-left">Maps</td> </tr> </table> --- # Possible geoms .box-inv-1[There are dozens of possible geoms!] .box-1[See [the **ggplot2** documentation](https://ggplot2.tidyverse.org/reference/index.html#section-layer-geoms) for<br>complete examples of all the different geom layers] .box-1[Also see the ggplot cheatsheet] --- class: title title-1 # Complex graphs! ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + geom_smooth(mapping = aes(x = displ, y = hwy)) ``` --- class: title title-1 # Complex graphs! <img src="01_lab_files/figure-html/unnamed-chunk-21-1.png" width="70%" style="display: block; margin: auto;" /> --- class: title title-1 # Global vs. local .box-inv-1[Any aesthetics in `ggplot()` will show up in all `geom_` layers] .small[ ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point() + geom_smooth() ``` <img src="01_lab_files/figure-html/unnamed-chunk-22-1.png" width="60%" style="display: block; margin: auto;" /> ] --- class: title title-1 # Global vs. local .box-inv-1[Any aesthetics in `geom_` layers only apply to that layer] .small[ ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point(mapping = aes(color = drv)) + geom_smooth() ``` <img src="01_lab_files/figure-html/unnamed-chunk-23-1.png" width="60%" style="display: block; margin: auto;" /> ] --- layout: true class: title title-1 --- # So much more! .pull-left[ .box-inv-1[There are many other layers we can use to make and enhance graphs!] .box-inv-1[We sequentially add layers onto the foundational `ggplot()` plot to create complex figures] ] .pull-right[ ![](img/01/ggplot-layers@4x.png) ] --- # Putting it all together .box-inv-1.medium[We can build a plot sequentially<br>to see how each grammatical layer<br>changes the appearance] --- layout: false .left-code[ .box-1[Start with data and aesthetics] ```r *ggplot(data = mpg, * mapping = aes(x = displ, * y = hwy, * color = drv)) ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-1-1.png) ] --- .left-code[ .box-1[Add a point geom] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + * geom_point() ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-2-1.png) ] --- .left-code[ .box-1[Add a smooth geom] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + * geom_smooth() ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-3-1.png) ] --- .left-code[ .box-1[Make it straight] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + * geom_smooth(method = "lm") ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-4-1.png) ] --- .left-code[ .box-1[Use a viridis color scale] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + * scale_color_viridis_d() ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-5-1.png) ] --- .left-code[ .box-1[Facet by drive] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + * facet_wrap(vars(drv), ncol = 1) ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-6-1.png) ] --- .left-code[ .box-1[Add labels] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + * labs(x = "Displacement", y = "Highway MPG", * color = "Drive", * title = "Heavier cars get lower mileage", * subtitle = "Displacement indicates weight(?)", * caption = "I know nothing about cars") ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-7-1.png) ] --- .left-code[ .box-1[Add a theme] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + * theme_bw() ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-8-1.png) ] --- .left-code[ .box-1[Modify the theme] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + theme_bw() + * theme(legend.position = "bottom", * plot.title = element_text(face = "bold")) ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-9-1.png) ] --- .left-code[ .box-1[Finished!] ```r ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + theme_bw() + theme(legend.position = "bottom", plot.title = element_text(face = "bold")) ``` ] .right-plot[ ![](01_lab_files/figure-html/mpg-layers-finished-1.png) ] --- class: center middle main-title section-title-2 # Transform data<br>with dplyr .class-info[ <figure> <img src="img/01/dplyr.png" alt="dplyr" title="dplyr" width="15%"> </figure> ] --- .small[ ```r gapminder ``` ``` ## # A tibble: 1,704 x 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ## 7 Afghanistan Asia 1982 39.9 12881816 978. ## 8 Afghanistan Asia 1987 40.8 13867957 852. ## 9 Afghanistan Asia 1992 41.7 16317921 649. ## 10 Afghanistan Asia 1997 41.8 22227415 635. ## # … with 1,694 more rows ``` ] --- class: title title-2 # The tidyverse <figure> <img src="img/01/tidyverse-language.png" alt="tidyverse and language" title="tidyverse and language" width="100%"> </figure> ??? From "Master the Tidyverse" by RStudio --- class: title title-2 # The tidyverse .center[ <figure> <img src="img/01/tidyverse.png" alt="The tidyverse" title="The tidyverse" width="50%"> </figure> ] --- class: title title-2 # dplyr: verbs for manipulating data <table> <tr> <td>Extract rows with <code>filter()</code></td> <td><img src="img/01/filter.png" alt="filter" title="filter" height="80px"></td> </tr> <tr> <td>Extract columns with <code>select()</code></td> <td><img src="img/01/select.png" alt="select" title="select" height="80px"></td> </tr> <tr> <td>Arrange/sort rows with <code>arrange()</code></td> <td><img src="img/01/arrange.png" alt="arrange" title="arrange" height="80px"></td> </tr> <tr> <td>Make new columns with <code>mutate()</code></td> <td><img src="img/01/mutate.png" alt="mutate" title="mutate" height="80px"></td> </tr> <tr> <td>Make group summaries with<br><code>group_by() %>% summarize()</code></td> <td><img src="img/01/summarize.png" alt="summarize" title="summarize" height="80px"></td> </tr> </table> --- class: center middle section-title section-title-2 # `filter()` --- layout: false class: title title-2 # `filter()` .box-inv-2[Extract rows that meet some sort of test] .pull-left[ <code class ='r hljs remark-code'>filter(.data = <b><span style='background-color:#FFDFD1'>DATA</span></b>, <b><span style='background-color:#FFD0CF'>...</span></b>)</code> ] .pull-right[ - <b><span style="background: #FFDFD1">`DATA`</span></b> = Data frame to transform - <b><span style="background: #FFD0CF">`...`</span></b> = One or more tests <br>.small[`filter()` returns each row for which the test is TRUE] ] --- <code class ='r hljs remark-code'>filter(.data = <b><span style='background-color:#FFDFD1'>gapminder</span></b>, <b><span style='background-color:#FFD0CF'>country == "Denmark"</span></b>)</code> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:left;"> year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1952 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1957 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1962 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1967 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1972 </td> </tr> <tr> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:right;"> year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Denmark </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 1952 </td> </tr> <tr> <td style="text-align:left;"> Denmark </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 1957 </td> </tr> <tr> <td style="text-align:left;"> Denmark </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 1962 </td> </tr> <tr> <td style="text-align:left;"> Denmark </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 1967 </td> </tr> <tr> <td style="text-align:left;"> Denmark </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 1972 </td> </tr> <tr> <td style="text-align:left;"> Denmark </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 1977 </td> </tr> </tbody> </table> ] --- class: title title-2 # `filter()` .pull-left[ <code class ='r hljs remark-code'>filter(.data = <b><span style='background-color:#FFDFD1'>gapminder</span></b>, <br> <b><span style='background-color:#FFD0CF'>country == "Denmark"</span></b>)</code> ] .pull-right[ .box-inv-2[One `=` sets an argument] .box-inv-2[Two `==` tests if equal<br>.small[returns TRUE or FALSE)]] ] --- class: title title-2 # Logical tests <table> <tr> <th class="cell-center">Test</th> <th class="cell-left">Meaning</th> <th class="cell-center">Test</th> <th class="cell-left">Meaning</th> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">x < y</code></td> <td class="cell-left">Less than</td> <td class="cell-center"><code class="remark-inline-code">x %in% y</code></td> <td class="cell-left">In (group membership)</td> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">x > y</code></td> <td class="cell-left">Greater than</td> <td class="cell-center"><code class="remark-inline-code">is.na(x)</code></td> <td class="cell-left">Is missing</td> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">==</code></td> <td class="cell-left">Equal to</td> <td class="cell-center"><code class="remark-inline-code">!is.na(x)</code></td> <td class="cell-left">Is not missing</td> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">x <= y</code></td> <td class="cell-left">Less than or equal to</td> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">x >= y</code></td> <td class="cell-left">Greater than or equal to</td> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">x != y</code></td> <td class="cell-left">Not equal to</td> </tr> </table> --- class: title title-2 section-title-inv-2 # Your turn #1: Filtering .box-2[Use `filter()` and logical tests to show…] 1. The data for Canada 2. All data for countries in Oceania 3. Rows where the life expectancy is greater than 82 --- .medium[ ```r filter(gapminder, country == "Canada") ``` ] -- .medium[ ```r filter(gapminder, continent == "Oceania") ``` ] -- .medium[ ```r filter(gapminder, lifeExp > 82) ``` ] --- class: title title-2 # Common mistakes .pull-left[ .box-inv-2[Using `=` instead of `==`] <code class ='r hljs remark-code'>filter(gapminder, <br> country <b><span style='color:#FF4136'>=</span></b> "Canada")</code> <code class ='r hljs remark-code'>filter(gapminder, <br> country <b><span style='color:#2ECC40'>==</span></b> "Canada")</code> ] -- .pull-right[ .box-inv-2[Quote use] <code class ='r hljs remark-code'>filter(gapminder, <br> country == <b><span style='color:#FF4136'>Canada</span></b>)</code> <code class ='r hljs remark-code'>filter(gapminder, <br> country == <b><span style='color:#2ECC40'>"Canada"</span></b>)</code> ] --- class: title title-2 # `filter()` with multiple conditions .box-inv-2[Extract rows that meet *every* test] <code class ='r hljs remark-code'>filter(<b><span style='background-color:#FFDFD1'>gapminder</span></b>, <b><span style='background-color:#FFD0CF'>country == "Denmark", year > 2000</span></b>)</code> --- <code class ='r hljs remark-code'>filter(<b><span style='background-color:#FFDFD1'>gapminder</span></b>, <b><span style='background-color:#FFD0CF'>country == "Denmark", year > 2000</span></b>)</code> .pull-left[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:left;"> year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1952 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1957 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1962 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1967 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1972 </td> </tr> <tr> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> </tr> </tbody> </table> ] -- .pull-right[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:right;"> year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Denmark </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 2002 </td> </tr> <tr> <td style="text-align:left;"> Denmark </td> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 2007 </td> </tr> </tbody> </table> ] --- class: title title-2 # Boolean operators <table> <tr> <th class="cell-center">Operator</th> <th class="cell-center">Meaning</th> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">a & b</code></td> <td class="cell-center">and</td> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">a | b</code></td> <td class="cell-center">or</td> </tr> <tr> <td class="cell-center"><code class="remark-inline-code">!a</code></td> <td class="cell-center">not</td> </tr> </table> --- class: title title-2 # Default is "and" .box-inv-2[These do the same thing:] <code class ='r hljs remark-code'>filter(<b><span style='background-color:#FFDFD1'>gapminder</span></b>, <b><span style='background-color:#FFD0CF'>country == "Denmark", year > 2000</span></b>)</code> <code class ='r hljs remark-code'>filter(<b><span style='background-color:#FFDFD1'>gapminder</span></b>, <b><span style='background-color:#FFD0CF'>country == "Denmark" & year > 2000</span></b>)</code> --- class: title title-2 section-title-inv-2 # Your turn #2: Filtering .box-2[Use `filter()` and Boolean logical tests to show…] 1. Canada before 1970 2. Countries where life expectancy in 2007 is below 50 3. Countries where life expectancy in 2007 is below 50 and are not in Africa --- ```r filter(gapminder, country == "Canada", year < 1970) ``` -- ```r filter(gapminder, year == 2007, lifeExp < 50) ``` -- ```r filter(gapminder, year == 2007, lifeExp < 50, continent != "Africa") ``` --- class: title title-2 # Common mistakes .pull-left[ .box-inv-2[Collapsing multiple tests<br>into one] .small-code[ <code class ='r hljs remark-code'>filter(gapminder, <b><span style='color:#FF4136'>1960 < year < 1980</span></b>)</code> ] .small-code[ <code class ='r hljs remark-code'>filter(gapminder, <br> <b><span style='color:#2ECC40'>year > 1960, year < 1980</span></b>)</code> ] ] -- .pull-right[ .box-inv-2[Using multiple tests<br>instead of `%in%`] .small-code[ <code class ='r hljs remark-code'>filter(gapminder, <br> <b><span style='color:#FF4136'>country == "Mexico", <br> country == "Canada", <br> country == "United States"</span></b>)</code> ] .small-code[ <code class ='r hljs remark-code'>filter(gapminder, <br> <b><span style='color:#2ECC40'>country %in% c("Mexico", "Canada", <br> "United States")</span></b>)</code> ] ] --- class: title title-2 # Common syntax .box-inv-2[Every dplyr verb function follows the same pattern] .box-inv-2[First argument is a data frame; returns a data frame] .pull-left[ <code class ='r hljs remark-code'><b><span style='background-color:#EFB3FF'>VERB</span></b>(<b><span style='background-color:#FFDFD1'>DATA</span></b>, <b><span style='background-color:#FFD0CF'>...</span></b>)</code> ] .pull-right[ - <b><span style="background: #EFB3FF">`VERB`</span></b> = dplyr function/verb - <b><span style="background: #FFDFD1">`DATA`</span></b> = Data frame to transform - <b><span style="background: #FFD0CF">`...`</span></b> = Stuff the verb does ] --- class: title title-2 # `mutate()` .box-inv-2[Create new columns] .pull-left[ <code class ='r hljs remark-code'>mutate(<b><span style='background-color:#FFDFD1'>.data</span></b>, <b><span style='background-color:#FFD0CF'>...</span></b>)</code> ] .pull-right[ - <b><span style="background: #FFDFD1">`DATA`</span></b> = Data frame to transform - <b><span style="background: #FFD0CF">`...`</span></b> = Columns to make ] --- <code class ='r hljs remark-code'>mutate(<b><span style='background-color:#FFDFD1'>gapminder</span></b>, <b><span style='background-color:#FFD0CF'>gdp = gdpPercap * pop</span></b>)</code> .pull-left.small[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> year </th> <th style="text-align:left;"> gdpPercap </th> <th style="text-align:left;"> pop </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1952 </td> <td style="text-align:left;"> 779.4453145 </td> <td style="text-align:left;"> 8425333 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1957 </td> <td style="text-align:left;"> 820.8530296 </td> <td style="text-align:left;"> 9240934 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1962 </td> <td style="text-align:left;"> 853.10071 </td> <td style="text-align:left;"> 10267083 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1967 </td> <td style="text-align:left;"> 836.1971382 </td> <td style="text-align:left;"> 11537966 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1972 </td> <td style="text-align:left;"> 739.9811058 </td> <td style="text-align:left;"> 13079460 </td> </tr> <tr> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> </tr> </tbody> </table> ] -- .pull-right.small[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:right;"> year </th> <th style="text-align:left;"> … </th> <th style="text-align:right;"> gdp </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1952 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 6567086330 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1957 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 7585448670 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1962 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 8758855797 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1967 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 9648014150 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1972 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 9678553274 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1977 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 11697659231 </td> </tr> </tbody> </table> ] --- <code class ='r hljs remark-code'>mutate(<b><span style='background-color:#FFDFD1'>gapminder</span></b>, <b><span style='background-color:#FFD0CF'>gdp = gdpPercap * pop,</span></b><br> <b><span style='background-color:#FFD0CF'>pop_mil = round(pop / 1000000)</span></b>)</code> .pull-left.small[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> year </th> <th style="text-align:left;"> gdpPercap </th> <th style="text-align:left;"> pop </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1952 </td> <td style="text-align:left;"> 779.4453145 </td> <td style="text-align:left;"> 8425333 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1957 </td> <td style="text-align:left;"> 820.8530296 </td> <td style="text-align:left;"> 9240934 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1962 </td> <td style="text-align:left;"> 853.10071 </td> <td style="text-align:left;"> 10267083 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1967 </td> <td style="text-align:left;"> 836.1971382 </td> <td style="text-align:left;"> 11537966 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> 1972 </td> <td style="text-align:left;"> 739.9811058 </td> <td style="text-align:left;"> 13079460 </td> </tr> <tr> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> </tr> </tbody> </table> ] -- .pull-right.small[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:right;"> year </th> <th style="text-align:left;"> … </th> <th style="text-align:right;"> gdp </th> <th style="text-align:right;"> pop_mil </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1952 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 6567086330 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1957 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 7585448670 </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1962 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 8758855797 </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1967 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 9648014150 </td> <td style="text-align:right;"> 12 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1972 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 9678553274 </td> <td style="text-align:right;"> 13 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:right;"> 1977 </td> <td style="text-align:left;"> … </td> <td style="text-align:right;"> 11697659231 </td> <td style="text-align:right;"> 15 </td> </tr> </tbody> </table> ] --- class: title title-2 # `ifelse()` .box-inv-2[Do conditional tests within `mutate()`] .pull-left[ <code class ='r hljs remark-code'>ifelse(<b><span style='background-color:#FFC0DC'>TEST</span></b>, <br> <b><span style='background-color:#FFDFD1'>VALUE_IF_TRUE</span></b>, <br> <b><span style='background-color:#CBB5FF'>VALUE_IF_FALSE</span></b>)</code> ] .pull-right[ - <b><span style="background: #FFC0DC">`TEST`</span></b> = A logical test - <b><span style="background: #FFDFD1">`VALUE_IF_TRUE`</span></b> = What happens if test is true - <b><span style="background: #CBB5FF">`VALUE_IF_FALSE`</span></b> = What happens if test is false ] --- <code class ='r hljs remark-code'>mutate(gapminder, <br> after_1960 = ifelse(<b><span style='background-color:#FFC0DC'>year > 1960</span></b>, <b><span style='background-color:#FFDFD1'>TRUE</span></b>, <b><span style='background-color:#CBB5FF'>FALSE</span></b>))</code> <code class ='r hljs remark-code'>mutate(gapminder, <br> after_1960 = ifelse(<b><span style='background-color:#FFC0DC'>year > 1960</span></b>, <br> <b><span style='background-color:#FFDFD1'>"After 1960"</span></b>, <br> <b><span style='background-color:#CBB5FF'>"Before 1960"</span></b>))</code> --- class: title title-2 section-title-inv-2 # Your turn #3: Mutating .box-2[Use `mutate()` to…] 1. Add an `africa` column that is TRUE if the country is on the African continent 2. Add a column for logged GDP per capita (hint: use `log()`) 3. Add an `africa_asia` column that says “Africa or Asia” if the country is in Africa or Asia, and “Not Africa or Asia” if it’s not --- ```r mutate(gapminder, africa = ifelse(continent == "Africa", TRUE, FALSE)) ``` -- ```r mutate(gapminder, log_gdpPercap = log(gdpPercap)) ``` -- ```r mutate(gapminder, africa_asia = ifelse(continent %in% c("Africa", "Asia"), "Africa or Asia", "Not Africa or Asia")) ``` --- class: title title-2 # What if you have multiple verbs? .box-inv-2.sp-after[Make a dataset for just 2002 *and* calculate logged GDP per capita] -- .box-inv-2[Solution 1: Intermediate variables] <code class ='r hljs remark-code'><b><span style='background-color:#FFC0DC'>gapminder_2002</span></b> <- filter(gapminder, year == 2002)<br><br><b><span style='background-color:#FFC0DC'>gapminder_2002</span></b>_log <- mutate(<b><span style='background-color:#FFC0DC'>gapminder_2002</span></b>,<br> log_gdpPercap = log(gdpPercap))</code> --- class: title title-2 # What if you have multiple verbs? .box-inv-2.sp-after[Make a dataset for just 2002 *and* calculate logged GDP per capita] .box-inv-2[Solution 2: Nested functions] <code class ='r hljs remark-code'><b><span style='background-color:#FFC0DC'>filter(</span></b><b><span style='background-color:#FFDFD1'>mutate(gapminder_2002,</span></b> <br> <b><span style='background-color:#FFDFD1'>log_gdpPercap = log(gdpPercap))</span></b>, <br> <b><span style='background-color:#FFC0DC'>year == 2002)</span></b></code> --- class: title title-2 # What if you have multiple verbs? .box-inv-2.sp-after[Make a dataset for just 2002 *and* calculate logged GDP per capita] .box-inv-2[Solution 3: Pipes!] .box-inv-2[The `%>%` operator (pipe) takes an object on the left<br>and passes it as the first argument of the function on the right] <code class ='r hljs remark-code'><b><span style='background-color:#FFC0DC'>gapminder</span></b> %>% filter(<b><span style='background-color:#FFC0DC'>_____</span></b>, country == "Canada")</code> --- class: title title-2 # What if you have multiple verbs? .box-inv-2[These do the same thing!] <code class ='r hljs remark-code'>filter(<b><span style='background-color:#FFC0DC'>gapminder</span></b>, country == "Canada")</code> <code class ='r hljs remark-code'><b><span style='background-color:#FFC0DC'>gapminder</span></b> %>% filter(country == "Canada")</code> --- class: title title-2 # What if you have multiple verbs? .box-inv-2.sp-after[Make a dataset for just 2002 *and* calculate logged GDP per capita] .box-inv-2[Solution 3: Pipes!] <code class ='r hljs remark-code'>gapminder %>% <br> filter(year == 2002) %>% <br> mutate(log_gdpPercap = log(gdpPercap))</code> --- class: title title-2 # `%>%` <code class ='r hljs remark-code'><b>leave_house</b>(<b>get_dressed</b>(<b>get_out_of_bed</b>(<b>wake_up</b>(<span style='color:#E16462'>me</span>, <span style='color:#0D0887'>time</span> = <span style='color:#E16462'>"8:00"</span>), <span style='color:#0D0887'>side</span> = <span style='color:#E16462'>"correct"</span>), <span style='color:#0D0887'>pants</span> = <span style='color:#E16462'>TRUE</span>, <span style='color:#0D0887'>shirt</span> = <span style='color:#E16462'>TRUE</span>), <span style='color:#0D0887'>car</span> = <span style='color:#E16462'>TRUE</span>, <span style='color:#0D0887'>bike</span> = <span style='color:#E16462'>FALSE</span>)</code> -- <code class ='r hljs remark-code'>me %>% <br> <b>wake_up</b>(<span style='color:#0D0887'>time</span> = <span style='color:#E16462'>"8:00"</span>) %>% <br> <b>get_out_of_bed</b>(<span style='color:#0D0887'>side</span> = <span style='color:#E16462'>"correct"</span>) %>% <br> <b>get_dressed</b>(<span style='color:#0D0887'>pants</span> = <span style='color:#E16462'>TRUE</span>, <span style='color:#0D0887'>shirt</span> = <span style='color:#E16462'>TRUE</span>) %>% <br> <b>leave_house</b>(<span style='color:#0D0887'>car</span> = <span style='color:#E16462'>TRUE</span>, <span style='color:#0D0887'>bike</span> = <span style='color:#E16462'>FALSE</span>)</code> --- class: title title-2 # `summarize()` .box-inv-2[Compute a table of summaries] <code class ='r hljs remark-code'><b><span style='background-color:#FFDFD1'>gapminder</span></b> %>% summarize(<b><span style='background-color:#FFD0CF'>mean_life = mean(lifeExp)</span></b>)</code> .pull-left.small[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:left;"> year </th> <th style="text-align:left;"> lifeExp </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1952 </td> <td style="text-align:left;"> 28.801 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1957 </td> <td style="text-align:left;"> 30.332 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1962 </td> <td style="text-align:left;"> 31.997 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1967 </td> <td style="text-align:left;"> 34.02 </td> </tr> <tr> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> </tr> </tbody> </table> ] -- .pull-right.small[ <table> <thead> <tr> <th style="text-align:right;"> mean_life </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 59.47444 </td> </tr> </tbody> </table> ] --- class: title title-2 # `summarize()` <code class ='r hljs remark-code'><b><span style='background-color:#FFDFD1'>gapminder</span></b> %>% summarize(<b><span style='background-color:#FFD0CF'>mean_life = mean(lifeExp),</span></b><br> <b><span style='background-color:#FFD0CF'>min_life = min(lifeExp)</span></b>)</code> .pull-left.small[ <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:left;"> year </th> <th style="text-align:left;"> lifeExp </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1952 </td> <td style="text-align:left;"> 28.801 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1957 </td> <td style="text-align:left;"> 30.332 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1962 </td> <td style="text-align:left;"> 31.997 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1967 </td> <td style="text-align:left;"> 34.02 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:left;"> 1972 </td> <td style="text-align:left;"> 36.088 </td> </tr> <tr> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> <td style="text-align:left;"> … </td> </tr> </tbody> </table> ] -- .pull-right.small[ <table> <thead> <tr> <th style="text-align:right;"> mean_life </th> <th style="text-align:right;"> min_life </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 59.47444 </td> <td style="text-align:right;"> 23.599 </td> </tr> </tbody> </table> ] --- class: title title-2 section-title-inv-2 # Your turn #4: Summarizing .box-2[Use `summarize()` to calculate…] 1. The first (minimum) year in the dataset 2. The last (maximum) year in the dataset 3. The number of rows in the dataset (use the cheatsheet) 4. The number of distinct countries in the dataset (use the cheatsheet) --- ```r gapminder %>% summarize(first = min(year), last = max(year), num_rows = n(), num_unique = n_distinct(country)) ``` .small[ <table> <thead> <tr> <th style="text-align:right;"> first </th> <th style="text-align:right;"> last </th> <th style="text-align:right;"> num_rows </th> <th style="text-align:right;"> num_unique </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1952 </td> <td style="text-align:right;"> 2007 </td> <td style="text-align:right;"> 1704 </td> <td style="text-align:right;"> 142 </td> </tr> </tbody> </table> ] --- class: title title-2 section-title-inv-2 # Your turn #5: Summarizing .box-2[Use `filter()` and `summarize()` to calculate<br>(1) the number of unique countries and<br>(2) the median life expectancy on the<br>African continent in 2007] --- ```r gapminder %>% filter(continent == "Africa", year == 2007) %>% summarise(n_countries = n_distinct(country), med_le = median(lifeExp)) ``` .small[ <table> <thead> <tr> <th style="text-align:right;"> n_countries </th> <th style="text-align:right;"> med_le </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 52 </td> <td style="text-align:right;"> 52.9265 </td> </tr> </tbody> </table> ] --- class: title title-2 # `group_by()` .box-inv-2[Put rows into groups based on values in a column] <code class ='r hljs remark-code'><b><span style='background-color:#FFDFD1'>gapminder</span></b> %>% group_by(<b><span style='background-color:#FFD0CF'>continent</span></b>)</code> -- .box-inv-2[Nothing happens by itself!] -- .box-inv-2[Powerful when combined with `summarize()`] --- ```r gapminder %>% group_by(continent) %>% summarize(n_countries = n_distinct(country)) ``` -- .small[ <table> <thead> <tr> <th style="text-align:left;"> continent </th> <th style="text-align:right;"> n_countries </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Africa </td> <td style="text-align:right;"> 52 </td> </tr> <tr> <td style="text-align:left;"> Americas </td> <td style="text-align:right;"> 25 </td> </tr> <tr> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 33 </td> </tr> <tr> <td style="text-align:left;"> Europe </td> <td style="text-align:right;"> 30 </td> </tr> <tr> <td style="text-align:left;"> Oceania </td> <td style="text-align:right;"> 2 </td> </tr> </tbody> </table> ] --- <code class ='r hljs remark-code'><b><span style='background-color:#FFDFD1'>pollution</span></b> %>% <br> summarize(<b><span style='background-color:#FFD0CF'>mean = mean(amount), sum = sum(amount), n = n()</span></b>)</code> .pull-left.small[ <table> <thead> <tr> <th style="text-align:left;"> city </th> <th style="text-align:left;"> particle_size </th> <th style="text-align:right;"> amount </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> New York </td> <td style="text-align:left;"> Large </td> <td style="text-align:right;"> 23 </td> </tr> <tr> <td style="text-align:left;"> New York </td> <td style="text-align:left;"> Small </td> <td style="text-align:right;"> 14 </td> </tr> <tr> <td style="text-align:left;"> London </td> <td style="text-align:left;"> Large </td> <td style="text-align:right;"> 22 </td> </tr> <tr> <td style="text-align:left;"> London </td> <td style="text-align:left;"> Small </td> <td style="text-align:right;"> 16 </td> </tr> <tr> <td style="text-align:left;"> Beijing </td> <td style="text-align:left;"> Large </td> <td style="text-align:right;"> 121 </td> </tr> <tr> <td style="text-align:left;"> Beijing </td> <td style="text-align:left;"> Small </td> <td style="text-align:right;"> 56 </td> </tr> </tbody> </table> ] -- .pull-right.small[ <table> <thead> <tr> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sum </th> <th style="text-align:right;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 252 </td> <td style="text-align:right;"> 6 </td> </tr> </tbody> </table> ] --- <code class ='r hljs remark-code'><b><span style='background-color:#FFDFD1'>pollution</span></b> %>% <br> group_by(<b><span style='background-color:#FFD0CF'>city</span></b>) %>% <br> summarize(<b><span style='background-color:#FFD0CF'>mean = mean(amount), sum = sum(amount), n = n()</span></b>)</code> .pull-left.small[ <table> <thead> <tr> <th style="text-align:left;"> city </th> <th style="text-align:left;"> particle_size </th> <th style="text-align:right;"> amount </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #B2B1F9 !important;"> New York </td> <td style="text-align:left;background-color: #B2B1F9 !important;"> Large </td> <td style="text-align:right;background-color: #B2B1F9 !important;"> 23 </td> </tr> <tr> <td style="text-align:left;background-color: #B2B1F9 !important;"> New York </td> <td style="text-align:left;background-color: #B2B1F9 !important;"> Small </td> <td style="text-align:right;background-color: #B2B1F9 !important;"> 14 </td> </tr> <tr> <td style="text-align:left;background-color: #EFB3FF !important;"> London </td> <td style="text-align:left;background-color: #EFB3FF !important;"> Large </td> <td style="text-align:right;background-color: #EFB3FF !important;"> 22 </td> </tr> <tr> <td style="text-align:left;background-color: #EFB3FF !important;"> London </td> <td style="text-align:left;background-color: #EFB3FF !important;"> Small </td> <td style="text-align:right;background-color: #EFB3FF !important;"> 16 </td> </tr> <tr> <td style="text-align:left;background-color: #FFD0CF !important;"> Beijing </td> <td style="text-align:left;background-color: #FFD0CF !important;"> Large </td> <td style="text-align:right;background-color: #FFD0CF !important;"> 121 </td> </tr> <tr> <td style="text-align:left;background-color: #FFD0CF !important;"> Beijing </td> <td style="text-align:left;background-color: #FFD0CF !important;"> Small </td> <td style="text-align:right;background-color: #FFD0CF !important;"> 56 </td> </tr> </tbody> </table> ] -- .pull-right.small[ <table> <thead> <tr> <th style="text-align:left;"> city </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sum </th> <th style="text-align:right;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #FFD0CF !important;"> Beijing </td> <td style="text-align:right;background-color: #FFD0CF !important;"> 88.5 </td> <td style="text-align:right;background-color: #FFD0CF !important;"> 177 </td> <td style="text-align:right;background-color: #FFD0CF !important;"> 2 </td> </tr> <tr> <td style="text-align:left;background-color: #EFB3FF !important;"> London </td> <td style="text-align:right;background-color: #EFB3FF !important;"> 19.0 </td> <td style="text-align:right;background-color: #EFB3FF !important;"> 38 </td> <td style="text-align:right;background-color: #EFB3FF !important;"> 2 </td> </tr> <tr> <td style="text-align:left;background-color: #B2B1F9 !important;"> New York </td> <td style="text-align:right;background-color: #B2B1F9 !important;"> 18.5 </td> <td style="text-align:right;background-color: #B2B1F9 !important;"> 37 </td> <td style="text-align:right;background-color: #B2B1F9 !important;"> 2 </td> </tr> </tbody> </table> ] --- <code class ='r hljs remark-code'><b><span style='background-color:#FFDFD1'>pollution</span></b> %>% <br> group_by(<b><span style='background-color:#FFD0CF'>particle_size</span></b>) %>% <br> summarize(<b><span style='background-color:#FFD0CF'>mean = mean(amount), sum = sum(amount), n = n()</span></b>)</code> .pull-left.small[ <table> <thead> <tr> <th style="text-align:left;"> city </th> <th style="text-align:left;"> particle_size </th> <th style="text-align:right;"> amount </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #FFDFD1 !important;"> New York </td> <td style="text-align:left;background-color: #FFDFD1 !important;"> Large </td> <td style="text-align:right;background-color: #FFDFD1 !important;"> 23 </td> </tr> <tr> <td style="text-align:left;background-color: #FFF0D4 !important;"> New York </td> <td style="text-align:left;background-color: #FFF0D4 !important;"> Small </td> <td style="text-align:right;background-color: #FFF0D4 !important;"> 14 </td> </tr> <tr> <td style="text-align:left;background-color: #FFDFD1 !important;"> London </td> <td style="text-align:left;background-color: #FFDFD1 !important;"> Large </td> <td style="text-align:right;background-color: #FFDFD1 !important;"> 22 </td> </tr> <tr> <td style="text-align:left;background-color: #FFF0D4 !important;"> London </td> <td style="text-align:left;background-color: #FFF0D4 !important;"> Small </td> <td style="text-align:right;background-color: #FFF0D4 !important;"> 16 </td> </tr> <tr> <td style="text-align:left;background-color: #FFDFD1 !important;"> Beijing </td> <td style="text-align:left;background-color: #FFDFD1 !important;"> Large </td> <td style="text-align:right;background-color: #FFDFD1 !important;"> 121 </td> </tr> <tr> <td style="text-align:left;background-color: #FFF0D4 !important;"> Beijing </td> <td style="text-align:left;background-color: #FFF0D4 !important;"> Small </td> <td style="text-align:right;background-color: #FFF0D4 !important;"> 56 </td> </tr> </tbody> </table> ] -- .pull-right.small[ <table> <thead> <tr> <th style="text-align:left;"> particle_size </th> <th style="text-align:right;"> mean </th> <th style="text-align:right;"> sum </th> <th style="text-align:right;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #FFDFD1 !important;"> Large </td> <td style="text-align:right;background-color: #FFDFD1 !important;"> 55.33333 </td> <td style="text-align:right;background-color: #FFDFD1 !important;"> 166 </td> <td style="text-align:right;background-color: #FFDFD1 !important;"> 3 </td> </tr> <tr> <td style="text-align:left;background-color: #FFF0D4 !important;"> Small </td> <td style="text-align:right;background-color: #FFF0D4 !important;"> 28.66667 </td> <td style="text-align:right;background-color: #FFF0D4 !important;"> 86 </td> <td style="text-align:right;background-color: #FFF0D4 !important;"> 3 </td> </tr> </tbody> </table> ] --- class: title title-2 section-title-inv-2 # Your turn #6: Grouping and summarizing .box-2[Find the minimum, maximum, and median<br>life expectancy for each continent] .box-2[Find the minimum, maximum, and median<br>life expectancy for each continent in 2007 only] --- ```r gapminder %>% group_by(continent) %>% summarize(min_le = min(lifeExp), max_le = max(lifeExp), med_le = median(lifeExp)) ``` -- ```r gapminder %>% filter(year == 2007) %>% group_by(continent) %>% summarize(min_le = min(lifeExp), max_le = max(lifeExp), med_le = median(lifeExp)) ``` --- class: title title-2 # dplyr: verbs for manipulating data <table> <tr> <td>Extract rows with <code>filter()</code></td> <td><img src="img/01/filter.png" alt="filter" title="filter" height="80px"></td> </tr> <tr> <td>Extract columns with <code>select()</code></td> <td><img src="img/01/select.png" alt="select" title="select" height="80px"></td> </tr> <tr> <td>Arrange/sort rows with <code>arrange()</code></td> <td><img src="img/01/arrange.png" alt="arrange" title="arrange" height="80px"></td> </tr> <tr> <td>Make new columns with <code>mutate()</code></td> <td><img src="img/01/mutate.png" alt="mutate" title="mutate" height="80px"></td> </tr> <tr> <td>Make group summaries with<br><code>group_by() %>% summarize()</code></td> <td><img src="img/01/summarize.png" alt="summarize" title="summarize" height="80px"></td> </tr> </table>