Pre-Workshop Setup

Open up Socrative again

QUESTION: How many of you have had to spend time cleaning up a dataset?

One of the good things about databases is that they are picky about data being in consistent formats. This means that data has to be clean to work well with databases.

OpenRefine is a cleaning tool, originally made by Google. There are other ways to clean data (by hand, Excel, R) but we think this is one of the most efficient.

1.5 hours - break - 1.25 hrs

1. OpenRefine (30 minutes)

Importing Data (3 minutes)

Text Facets (8 minutes)

Faceting Challenge #1

Undo/Redo (1 minute)

Transforming Columns (5 minutes)

Transforming Challenge:

Numeric Facets (5 minutes)

---------- Socrative #1 ----------: Clean up DBH, any outliers?

Stacking Facets (5 minutes)

---------- Socrative #2 ----------: # Black Oaks > 40 cm DBH @ BS_GCL

Exporting (1 minute)

Help (1 minute)

30 minutes


2. Intro To Relational Databases (5 minutes)

Why use a database instead of Excel?

Lots of Different types of databases

Exploring in DB Browser (2 minutes)

3. Selecting Data with SELECT (8 minutes)

---------- Socrative #3 ----------: person, quant, reading from Survey

Write a query that displays the person, quant, and reading fields from the Survey table

4. Sorting Data (3 minutes)

5. Filtering with WHERE (15 minutes)

---------- Socrative #4 ----------: normalized salinity outliers ouside of [0.0, 1.0]

Normalized salinity readings are supposed to be between 0.0 and 1.0. Which query will select all records from Survey with salinity values that are not normalized?

63 minutes

6. Missing Data (8 minutes) SKIP/Abbreviate if after 2 PM

---------- Socrative #5 ----------: Visted, sorted by date, excluding NULL

Write a query that sorts the records in Visited by date, excluding entries for which the date is not known

7. Combining Data with JOIN (12 minutes)

---------- Socrative #6 ----------: Site name and salinity readings

Write a query that lists all the salinity readings along with the site names they were collected at

90 minutes: BREAK

8. Data Hygiene (4 minutes)

8. Creating Databases (15 minutes)

DROP TABLE

CREATE TABLE

Data Types

Insert Data

---------- Socrative #7 ----------: Make family table with unique ID, names, and age

Create a table named Family. Make fields for a unique identifier, first and last names, and age. Add a record for yourself, and one for a family member.

Exporting a Database

Importing a CSV

9. Modifying Data (10 minutes)

UPDATE

---------- Socrative #8 ----------: OPTIONAL

We want to change the first salinity value in the Survey table to be 0.5. What happens when we run the following query?

It works just as we expected and changes the first salinity value to 0.5 We get an error It changes all the salinity values to 0.5

---------- Socrative #9 ----------: OPTIONAL

Write a SQL query to replace all the NULL cells in Survey.person with the string 'unknown'

DELETE

10. Calculating New Values (if time)

11. Programming with Databases in R (30 minutes)

Setup

Connecting & Simple Functions

Executing Queries in R

---------- Socrative #10 ----------:

Write some R code to execute one of our JOIN queries from earlier, and return the results in R

Using Databases with dplyr

Inserting Data

Disconnect the database