Assignment 6: Web Data Formats, SQL, and Database Analysis
1. Look up websites containing the following data representations:
a) Using JSON
b) Using XML
JSONPlaceholder
JSON data are organized using curly brackets {}, and each object demonstrates a record and arrays maintain numerous records.
The data includes fields such as id, name, username, and address. It is simple to read, which makes it useful for web applications.
This type of data representation is most commonly used trough APIs, and the database is created often through SQL and NoSQL.
W3Schools XML
XML is structured like a tree and each element contains opening and closed tags.
The data includes labeled tags such as <name>, <price>, and <description>.
This type of data representation is commonly paired with XPath and XSLT processing, and that can be found under developer tools in the settings option on Google.
2. SQL exercise:
i. Express the following query in SQL using no subqueries and no set operations. (Hint: left outer join)
selectidfrom student exceptselect s_id from advisor where i_id isnotnull
ii. Using the university schema, write an SQL query to find the names and IDs of those instructors who teach every course taught in his or her department (i.e., every course that appears in the course relation with the instructor’s department name). Order result by name.
# Connect PostgreSQL database using R# Packages: DBI, odbc, RPostgres# Documentation:## DBI: https://dbi.r-dbi.org## RPostgres https://github.com/r-dbi/RPostgres## R https://solutions.posit.co/connections/db/## odbc: https://solutions.posit.co/connections/db/best-practices/drivers/## Load librarieslibrary(RPostgres) # Provides the Postgres() driver
Warning: package 'RPostgres' was built under R version 4.5.3
library(DBI) # Generic R Database Interface
Warning: package 'DBI' was built under R version 4.5.3
library(odbc) # Interface to ODBC driver
Warning: package 'odbc' was built under R version 4.5.2
## Connect to PostgreSQL and databasecon <-dbConnect( RPostgres::Postgres(),dbname ="University", host ="localhost", port =5432, user ="postgres", password =Sys.getenv("DB_PASSWORD") )## Perform queries# (a) Simple query: fetch all rows/columns in 'instructor' and create a data objectinstructor_data <-dbGetQuery(con, "SELECT * FROM instructor")head(instructor_data)
id name dept_name salary
1 63395 McKinnon Cybernetics 94333.99
2 78699 Pingr Statistics 59303.62
3 96895 Mird Marketing 119921.41
4 4233 Luo English 88791.45
5 4034 Murata Athletics 61387.56
6 50885 Konstantinides Languages 32570.50
# (b) Another query: fetch instructors in 'Comp. Sci.' department # with a salary > 60000 (example condition)comp_sci_instructors <-dbGetQuery( con, "SELECT * FROM instructor WHERE dept_name = 'Comp. Sci.' AND salary > 60000;")comp_sci_instructors
id name dept_name salary
1 34175 Bondi Comp. Sci. 115469.11
2 3335 Bourrier Comp. Sci. 80797.83
# (c) Query a different table, e.g., 'student', and store in an R dataframestudent_data <-dbGetQuery(con, "SELECT * FROM student WHERE tot_cred >= 50")head(student_data)
id name dept_name tot_cred
1 79352 Rumat Finance 100
2 76672 Miliko Statistics 116
3 14182 Moszkowski Civil Eng. 73
4 44985 Prieto Biology 91
5 44271 Sowerby English 108
6 40897 Coppens Math 58
## Export to CSV# Export the entire 'instructor' table (already in instructor_data) to CSVwrite.csv(instructor_data, file ="instructor_export.csv", row.names =FALSE)## Clean up