Assignment 4: Webscraping 1

To gain web data for research, the R programming language and its rvest01 package is an efficient approach to data collection, production and web scraping. Using a website’s developer tools to copy any Xpaths helps the researcher target and retrieve specific information. However, it is important to know that not every website allows web scraping due to privacy and ethical concerns, so it’s best to scrape any public data from public websites as much as possible. When scraping from a website, researchers should identify essential XPaths, integrate them into an R script, and deploy the rvest package to gain a structured data from the website. It’s important to remove any unnecessary information to make the data look more presentable. These methods are efficient for web-based data collection and preparations for any research analysis.

To demonstrate this process, I apply my method in this assignment by retrieving data from Wikipedia on foreign exchange reserves. I identified two tables of interest, which are “Foreign Exchange Reserves By Country” and “Currency Composition Of Foreign Exchange Reserves” (COFER). The XPaths corresponding to each table were extracted and applied within R to obtain the datasets. I used the rvest_wiki01.R package that was provided to us by Dr. Karl Ho and implemented the Xpaths into the code and gave me the table I needed. However, I didn’t know how to paste the tables into the quarto file. Fortunately, Dr. Ho gave me the idea of creating a dataframe and using the head() function to generate the top five cases of the scraped table. With this recommendation, I asked ChatGBT to assist me with how to create a dataframe and generate the top 5 cases. With the help of AI, I used the downloaded CSV files and imported it into the quarto page, and made sure the data looks more presentable by removing uneccessary rows. All of these steps helped me create a more cleaner and organized table. Overall, this is an efficient way of organizing, processing, and preparing data from a public source, showing the practical use of the rvest01 package in managing web-based data acquisition. The codes and the work done are pasted below.

# Load packages
library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.5.3

Warning: package 'ggplot2' was built under R version 4.5.3

Warning: package 'tidyr' was built under R version 4.5.3

Warning: package 'readr' was built under R version 4.5.3

Warning: package 'dplyr' was built under R version 4.5.3

Warning: package 'lubridate' was built under R version 4.5.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(rvest)


Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding

# Import dataset
fores <- read_csv("data/fores.csv", col_types = cols(NA...7 = col_skip(), 
    NA...8 = col_skip(), newdate = col_skip()))

New names:
• `NA` -> `NA...7`
• `NA` -> `NA...8`

View(fores)
foreign_reserves_data <- read_csv("data/foreign_reserves_data.csv", 
    col_types = cols(Other_Currencies = col_skip(), 
        Unallocated_Reserves = col_skip(), 
        `NA` = col_skip()))
View(foreign_reserves_data)

# Top 5 rows for each data frame
head(fores, 5)

# A tibble: 5 × 6
  Rank                               Country   Forexres     Date  Change Sources
  <chr>                              <chr>     <chr>        <chr> <chr>  <chr>  
1 Country(as recognized by the U.N.) Continent Including g… Incl… Exclu… Exclud…
2 Country(as recognized by the U.N.) Continent millions U.… Chan… milli… Change 
3 China                              Asia      3,643,149    41,0… 3,389… 31,221 
4 Japan                              Asia      1,324,210    19,7… 1,230… 16,230 
5 Switzerland                        Europe    1,007,710    13,9… 897,2… 14,490

head(foreign_reserves_data, 5)

# A tibble: 5 × 11
   Year Quarter USD      Euro     JPY    GBP    CAN    CNY    AUD    CHF   Total
  <dbl> <chr>   <chr>    <chr>    <chr>  <chr>  <chr>  <chr>  <chr>  <chr> <chr>
1    NA <NA>    USD      EUR      JPY    GBP    CAD    CNY    AUD    CHF   Total
2  2019 Q1      6,727.09 2,208.79 584.63 495.70 208.64 212.26 181.95 15.27 11,6…
3  2019 Q2      6,752.28 2,264.88 611.87 497.41 209.85 212.80 186.71 15.53 11,7…
4  2019 Q3      6,728.85 2,212.74 612.75 492.22 205.44 213.83 182.48 16.20 11,6…
5  2019 Q4      6,674.83 2,279.30 631.00 511.51 206.71 215.81 187.18 17.36 11,8…