Assignment 6: Web Data Formats, SQL, and Database Analysis

1. Look up websites containing the following data representations:

a) Using JSON

b) Using XML

2. SQL exercise:

i. Express the following query in SQL using no subqueries and no set operations. (Hint: left outer join)

select id 
from student 
except 
select s_id 
from advisor 
where i_id is not null

New query:

select id
from student
left outer join advisor
on student.id = advisor.s_id
where i_id is null;
ID
19991
54321
55739
70557

ii. Using the university schema, write an SQL query to find the names and IDs of those instructors who teach every course taught in his or her department (i.e., every course that appears in the course relation with the instructor’s department name). Order result by name.

select i.name, i.ID
from instructor i
where not exists (
    select 1
    from course c
    where c.dept_name = i.dept_name
      and not exists (
          select 1
          from teaches t
          where t.ID = i.ID
            and t.course_id = c.course_id
      )
)
order by i.name;
name ID
Einstein 22222
El Said 32343
Kim 98345
Mozart 15151
Wu 12121

3. R and PostgreSQL

# Connect PostgreSQL database using R
# Packages: DBI, odbc, RPostgres
# Documentation:
## DBI: https://dbi.r-dbi.org
## RPostgres https://github.com/r-dbi/RPostgres
## R https://solutions.posit.co/connections/db/
## odbc: https://solutions.posit.co/connections/db/best-practices/drivers/

## Load libraries
library(RPostgres) # Provides the Postgres() driver
Warning: package 'RPostgres' was built under R version 4.5.3
library(DBI)        # Generic R Database Interface
Warning: package 'DBI' was built under R version 4.5.3
library(odbc)       # Interface to ODBC driver
Warning: package 'odbc' was built under R version 4.5.2
## Connect to PostgreSQL and database

con <- dbConnect(
  RPostgres::Postgres(),
  dbname   = "University",   
  host     = "localhost",    
  port     = 5432,           
  user     = "postgres",     
  password = Sys.getenv("DB_PASSWORD")     
)


## Perform queries

# (a) Simple query: fetch all rows/columns in 'instructor' and create a data object
instructor_data <- dbGetQuery(con, "SELECT * FROM instructor")
head(instructor_data)
     id           name   dept_name    salary
1 63395       McKinnon Cybernetics  94333.99
2 78699          Pingr  Statistics  59303.62
3 96895           Mird   Marketing 119921.41
4  4233            Luo     English  88791.45
5  4034         Murata   Athletics  61387.56
6 50885 Konstantinides   Languages  32570.50
# (b) Another query: fetch instructors in 'Comp. Sci.' department 
# with a salary > 60000 (example condition)
comp_sci_instructors <- dbGetQuery(
  con, 
  "SELECT * FROM instructor 
   WHERE dept_name = 'Comp. Sci.' AND salary > 60000;"
)
comp_sci_instructors
     id     name  dept_name    salary
1 34175    Bondi Comp. Sci. 115469.11
2  3335 Bourrier Comp. Sci.  80797.83
# (c) Query a different table, e.g., 'student', and store in an R dataframe
student_data <- dbGetQuery(con, "SELECT * FROM student WHERE tot_cred >= 50")
head(student_data)
     id       name  dept_name tot_cred
1 79352      Rumat    Finance      100
2 76672     Miliko Statistics      116
3 14182 Moszkowski Civil Eng.       73
4 44985     Prieto    Biology       91
5 44271    Sowerby    English      108
6 40897    Coppens       Math       58
## Export to CSV

# Export the entire 'instructor' table (already in instructor_data) to CSV
write.csv(instructor_data, file = "instructor_export.csv", row.names = FALSE)

## Clean up