Data Visualisation & Data Organisation in Spreadsheets

CVEN 5837 - Summer 2023

Lars Schöbitz

Solving coding problems

Tipps for search engines

  • Use actionable verbs that describe what you want to do
  • Be specific
  • Add R to the search query
  • Add the name of the R package name to the search query
  • Scroll through the top 5 results (don’t just pick the first)

Example: “How to remove a legend from a plot in R ggplot2”

Stack Overflow

What is it?

  • The biggest support network for (coding) problems
  • Can be intimidating at first
  • Up-vote system

Workflow

  • First, briefly read the question that was posted
  • Then, read the answer marked as “correct”
  • Then, read one or two more answers with high votes
  • Then, check out the “Linked” posts
  • Always give credit for the solution

Give credit

Give credit

Give credit

ggplot(data = global_waste_data_kg_year,
       mapping = aes(x = income_id, 
                     y = capita_kg_year,
                     color = income_id)) +
  ## Remove legend ref: https://stackoverflow.com/a/35622358/6816220
  theme(legend.position = "none")

Other sources for help

  • The Canvas Discussion pages
  • RStudio Community Forum: https://community.rstudio.com/
  • Documentation websites: https://dplyr.tidyverse.org/
  • Twitter community: #rstats

Minimal reproducible example (reprex)

  • Needed when asking questions online
  • Good support information: https://www.tidyverse.org/help/#reprex

Homework assignment solutions

remember: git clone

remember: git commit

remember: git push

remember: git push

collaborate: git clone

track work: git commit

update: git ???

update: git push

git ???

new: git pull

Learning Objectives (for this week)

  1. Learners can describe the four main aesthetic mappings that can be used to visualise data using the ggplot2 R Package.
  2. Learners can control the colour scaling applied to a plot using colour as an aesthetic mapping.
  3. Learners can compare three different geoms and their use case.
  4. Learners can apply a theme to control font types and sizes within a plot.
  5. Learners can apply 12 principles for data organisation in spreadsheets in the layout of a collected dataset.

Exploratory Data Analysis with ggplot2

R Package ggplot2

  • ggplot2 is tidyverse’s data visualization package
  • gg in ggplot2 stands for Grammar of Graphics
  • Inspired by the book Grammar of Graphics by Leland Wilkinson
  • Documentation: https://ggplot2.tidyverse.org/
  • Book: https://ggplot2-book.org

Code structure

  • ggplot() is the main function in ggplot2
  • Plots are constructed in layers
  • Structure of the code for plots can be summarized as
ggplot(data = [dataset], 
       mapping = aes(x = [x-variable], 
                     y = [y-variable])) +
   geom_xxx() +
   other options

Code structure

ggplot()

Code structure

ggplot(data = gapminder_yr_2007)

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes()) 

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes(x = continent,
                     y = lifeExp))  

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes(x = continent,
                     y = lifeExp)) +
  geom_boxplot() 

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes(x = continent,
                     y = lifeExp)) +
  geom_boxplot() +
  theme_minimal()

Code structure

ggplot(data = gapminder_yr_2007,
       mapping = aes(x = continent,
                     y = lifeExp)) +
  geom_boxplot() +
  theme_minimal(base_size = 14)

Live Coding Exercise: Reproduce this plot

live-02a-data-visualiation

  1. Head over to rstudio.cloud
  2. Open the workspace for the course (cven5837-ss22)
  3. Open “Projects”
  4. Open the “course-materials” project
  5. Follow along with me

Break

10:00

Visualising numerical data

Types of variables

numerical

discrete variables

  • non-negative
  • whole numbers
  • e.g. number of students, roll of a dice

continuous variables

  • infinite number of values
  • also dates and times
  • e.g. length, weight, size

non-numerical

categorical variables

  • finite number of values
  • distinct groups (e.g. EU countries, continents)
  • ordinal if levels have natural ordering (e.g. week days, school grades)

data-to-viz.com

Data Collection Tools

Data Collection Tools

  • Questionnaires for survey based data
  • Spreadsheets for manual experimental/observational data
  • Sensors for automated near real-time data

Survey tools

Commonly used in the Global Engineering and Development sector

  • KOBO Toolbox
  • mWater
  • OpenDataKit

Data Organisation in Spreadsheets

Data Organisation in Spreadsheets

Read the paper (it’s part of your homework), but you can also:

  • Go through the annotated slides: https://kbroman.org/Talk_DataOrg/dataorg_notes.pdf
  • Watch Karl Broman give the talk (02:36 to 45:00): https://youtu.be/t74E0a90gkA?t=156
  • Read the content on a website: https://kbroman.org/dataorg/

But, especially apply it to your data

via GIPHY

Why?

Because it will make your life easier!

License? CC0 (!)

Homework week 2

Bring your own data

  • Generate data doing a short survey or observational study
  • Find a data online that interests you
  • Use a dataset that you already have available

Homework due dates

  • All material on course website
  • Homework assignment & learning reflection due: Friday, June 23th

Thanks! 🌻

Slides created via revealjs and Quarto: https://quarto.org/docs/presentations/revealjs/ Access slides as PDF on GitHub

All material is licensed under Creative Commons Attribution Share Alike 4.0 International.