Weekly Reflections

WEEK 1 REFLECTION (2/10/2026): AI AND ORIGINALITY

  1. AI and Originality

    • The introduction of Artificial Intelligence started with the release of ChatGBT in November 2022, following other AI model releases such as Google Gemini, Claude, etc. This changed the world especially for data analysts and researchers, being used for research and coding assistance. Questions speculate whether AI can be used to brainstorm and assist with original ideas, or whether it generates the ideas for users.

    • I argue that AI isn’t a tool to use for generating original ideas; rather that it draws and mixed original and previous research. I also argue that any ideas are completely “original” nowadays, as most of these designs have been publicized before, even though individuals proceed to reinvent and use them in new contexts.

  2. AGI

    • Google AI defines Artificial General Intelligence (AGI) as a type of technology used for theoretical ways of understanding, learning and performing any intellectual tasks that a human can accomplish. Furthermore, it is programmed to be a research agent, helping to break through scientific research and other disciplines.

    • I believe that it can be used as a research assistant to produce the best research qualities, but I also argue that it can be used incorrectly, where the model can be prompted to do the research for the user. It may not be completely accurate, but it can be used in an academic dishonest way of conducting research and analyses.

WEEK 2 REFLECTION (2/17/2026): AI AND RESEARCH PROJECT

  1. AI for Knowledge Mining Project

    • For the group project, me and my colleagues believe we can use AI in our project by helping turn our STATA data zip file to help us use it in RStudio. AI can help us let the data be read in RStudio.
  2. AI Model Selections and Research Applications

    • We are using ChatGPT and Google Gemini for my project. Some of our colleagues are new using RStudio, so they will ask it to help them if I run into any errors uploading the STATA zip file to RStudio. Additionally, I will help my colleagues with the RStudio and Google Gemini codes to extract our results.

WEEK 3 REFLECTION (2/24/2026): HUMAN KNOWLEDGE, INFORMATION, AND AI DISCREPANCIES/FAILURES

  1. Human Knowledge vs Information
    • Individuals find and build knowledge based on their experiences and society they grew up in. According to Google AI, key methods of human knowledge are based on logical reasoning, memory, and receiving information from authorities, books, or education.

    • Knowledge is the capability of a person’s understanding regarding a topic, while information is data a person contains. Information can be transferred, such as an email with a list of documents; it is always kept. Knowledge on the other hand, requires the mental and logical capability to review and understand the data provided. Regarding the email example: a person may keep the email with the listed documents, but the person needs to look at the documents and understand them, which the person may forget.

  2. AI Discrepancies and Failures
    • AI can create fake and false data/information, such as fake legal cases, political bias, and financial losses due to false predictions. Specifically:

      • Hallucinations, specifically inventing fake policies or fake news

      • Embedded Bias and Propaganda: Unfair treatment of women’s rights in Iran and Afghanistan, but portraying as women-friendly nations.

      • Financial and Strategic Miscalculations: Zillow’s home-buying algorithm overestimated house prices, causing millions of losses and layoffs.

    • One pattern that makes these discrepancies noticeable is AI-generated information and misinformation; tools used to mislead individuals and causing harm to them.

WEEK 4 REFLECTION (3/3/2026): MACHINE LEARNING PIPELINE ON CYBERCRIME ANALYSIS AND FINANCIAL LOSSES

*AI Use Disclaimer: AI tools such as ChatGPT were used as an assistant tool during this reflection to help understand machine learning concepts and code fixture. All analysis, interpretation, and final decisions were reviewed and completed by the author.*

library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.3
Warning: package 'ggplot2' was built under R version 4.5.3
Warning: package 'tidyr' was built under R version 4.5.3
Warning: package 'readr' was built under R version 4.5.3
Warning: package 'dplyr' was built under R version 4.5.3
Warning: package 'lubridate' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <- read.csv("C:/Users/akbar/Downloads/LossFromNetCrime.csv",
                 stringsAsFactors = FALSE)

# Convert all complaint/loss columns except country to numeric
num_cols <- names(data)[names(data) != "Country"]

data[num_cols] <- lapply(data[num_cols], function(x) as.numeric(gsub(",", "", x)))

str(data)
'data.frame':   117 obs. of  13 variables:
 $ Country         : chr  "PR" "PS" "PT" "PY" ...
 $ X2019_Complaints: num  655 1784 1119 1913 5503 ...
 $ X2019_Losses    : num  5929974 22483591 13870074 10967865 48101706 ...
 $ X2020_Complaints: num  1338 2890 2020 2992 7390 ...
 $ X2020_Losses    : num  7209755 25423219 12391290 13815152 81178182 ...
 $ X2021_Complaints: num  1785 3352 2102 3188 10164 ...
 $ X2021_Losses    : num  9.46e+06 4.89e+07 1.82e+07 2.67e+07 1.32e+08 ...
 $ X2022_Complaints: num  1594 3210 1918 3768 10042 ...
 $ X2022_Losses    : num  1.72e+07 5.78e+07 3.09e+07 4.01e+07 1.87e+08 ...
 $ X2023_Complaints: num  1817 3378 2178 3487 11034 ...
 $ X2023_Losses    : num  2.10e+07 6.93e+07 2.87e+07 3.36e+07 2.44e+08 ...
 $ X2024_Complaints: num  1974 3811 2209 2678 12071 ...
 $ X2024_Losses    : num  3.15e+07 6.60e+07 4.02e+07 4.52e+07 2.81e+08 ...
summary(data)
   Country          X2019_Complaints  X2019_Losses       X2020_Complaints
 Length:117         Min.   :   216   Min.   :2.498e+06   Min.   :   362  
 Class :character   1st Qu.:  1119   1st Qu.:1.179e+07   1st Qu.:  1937  
 Mode  :character   Median :  5156   Median :4.571e+07   Median :  8187  
                    Mean   : 24650   Mean   :2.366e+08   Mean   : 43249  
                    3rd Qu.: 16525   3rd Qu.:2.025e+08   3rd Qu.: 28232  
                    Max.   :449305   Max.   :3.303e+09   Max.   :796395  
  X2020_Losses       X2021_Complaints  X2021_Losses       X2022_Complaints
 Min.   :2.673e+06   Min.   :   391   Min.   :4.363e+06   Min.   :   458  
 1st Qu.:1.382e+07   1st Qu.:  2102   1st Qu.:2.286e+07   1st Qu.:  1918  
 Median :4.529e+07   Median :  9415   Median :1.000e+08   Median :  8819  
 Mean   :2.812e+08   Mean   : 45731   Mean   :4.713e+08   Mean   : 42984  
 3rd Qu.:2.125e+08   3rd Qu.: 30367   3rd Qu.:4.003e+08   3rd Qu.: 29642  
 Max.   :3.907e+09   Max.   :940125   Max.   :6.467e+09   Max.   :769205  
  X2022_Losses       X2023_Complaints  X2023_Losses       X2024_Complaints
 Min.   :6.648e+06   Min.   :   400   Min.   :8.907e+06   Min.   :   330  
 1st Qu.:3.350e+07   1st Qu.:  2280   1st Qu.:4.132e+07   1st Qu.:  2253  
 Median :1.276e+08   Median :  9527   Median :1.631e+08   Median :  9251  
 Mean   :7.196e+08   Mean   : 45505   Mean   :8.590e+08   Mean   : 50734  
 3rd Qu.:6.055e+08   3rd Qu.: 31255   3rd Qu.:6.421e+08   3rd Qu.: 31525  
 Max.   :1.030e+10   Max.   :876894   Max.   :1.192e+10   Max.   :946966  
  X2024_Losses      
 Min.   :9.793e+06  
 1st Qu.:4.788e+07  
 Median :1.876e+08  
 Mean   :1.014e+09  
 3rd Qu.:8.030e+08  
 Max.   :1.446e+10  
# Keep only rows with non-missing 2024 losses
data <- data[!is.na(data$X2024_Losses), ]

set.seed(123)
n <- nrow(data)
train_index <- sample(1:n, size = floor(0.8 * n))

train <- data[train_index, ]
test  <- data[-train_index, ]

# Variables used in the model
model_vars <- c(
  "X2019_Complaints","X2019_Losses",
  "X2020_Complaints","X2020_Losses",
  "X2021_Complaints","X2021_Losses",
  "X2022_Complaints","X2022_Losses",
  "X2023_Complaints","X2023_Losses",
  "X2024_Losses"
)

# Keep only rows without missing values
train_model <- train[complete.cases(train[, model_vars]), ]
test_model  <- test[complete.cases(test[, model_vars]), ]

nrow(train_model)
[1] 93
nrow(test_model)
[1] 24
model <- lm(X2024_Losses ~ X2019_Complaints + X2019_Losses +
              X2020_Complaints + X2020_Losses +
              X2021_Complaints + X2021_Losses +
              X2022_Complaints + X2022_Losses +
              X2023_Complaints + X2023_Losses,
            data = train_model)

summary(model)

Call:
lm(formula = X2024_Losses ~ X2019_Complaints + X2019_Losses + 
    X2020_Complaints + X2020_Losses + X2021_Complaints + X2021_Losses + 
    X2022_Complaints + X2022_Losses + X2023_Complaints + X2023_Losses, 
    data = train_model)

Residuals:
       Min         1Q     Median         3Q        Max 
-321874006   -9915622   -1716146   21722915  343895159 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       2.187e+06  1.241e+07   0.176 0.860517    
X2019_Complaints  4.788e+02  2.592e+03   0.185 0.853928    
X2019_Losses     -2.692e-01  5.359e-01  -0.502 0.616803    
X2020_Complaints  1.089e+04  2.420e+03   4.499 2.23e-05 ***
X2020_Losses      1.352e+00  3.368e-01   4.014 0.000132 ***
X2021_Complaints  1.471e+03  1.525e+03   0.964 0.337771    
X2021_Losses      4.631e-01  1.826e-01   2.535 0.013130 *  
X2022_Complaints -5.310e+03  2.005e+03  -2.648 0.009701 ** 
X2022_Losses      9.396e-01  1.303e-01   7.209 2.52e-10 ***
X2023_Complaints -6.364e+03  1.465e+03  -4.344 3.98e-05 ***
X2023_Losses     -2.637e-01  1.343e-01  -1.964 0.052954 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 105300000 on 82 degrees of freedom
Multiple R-squared:  0.9978,    Adjusted R-squared:  0.9976 
F-statistic:  3795 on 10 and 82 DF,  p-value: < 2.2e-16
predictions <- predict(model, newdata = test_model)

rmse <- sqrt(mean((predictions - test_model$X2024_Losses)^2))
rmse
[1] 406916980
str(data)
'data.frame':   117 obs. of  13 variables:
 $ Country         : chr  "PR" "PS" "PT" "PY" ...
 $ X2019_Complaints: num  655 1784 1119 1913 5503 ...
 $ X2019_Losses    : num  5929974 22483591 13870074 10967865 48101706 ...
 $ X2020_Complaints: num  1338 2890 2020 2992 7390 ...
 $ X2020_Losses    : num  7209755 25423219 12391290 13815152 81178182 ...
 $ X2021_Complaints: num  1785 3352 2102 3188 10164 ...
 $ X2021_Losses    : num  9.46e+06 4.89e+07 1.82e+07 2.67e+07 1.32e+08 ...
 $ X2022_Complaints: num  1594 3210 1918 3768 10042 ...
 $ X2022_Losses    : num  1.72e+07 5.78e+07 3.09e+07 4.01e+07 1.87e+08 ...
 $ X2023_Complaints: num  1817 3378 2178 3487 11034 ...
 $ X2023_Losses    : num  2.10e+07 6.93e+07 2.87e+07 3.36e+07 2.44e+08 ...
 $ X2024_Complaints: num  1974 3811 2209 2678 12071 ...
 $ X2024_Losses    : num  3.15e+07 6.60e+07 4.02e+07 4.52e+07 2.81e+08 ...
colSums(is.na(data))
         Country X2019_Complaints     X2019_Losses X2020_Complaints 
               1                0                0                0 
    X2020_Losses X2021_Complaints     X2021_Losses X2022_Complaints 
               0                0                0                0 
    X2022_Losses X2023_Complaints     X2023_Losses X2024_Complaints 
               0                0                0                0 
    X2024_Losses 
               0 
# Compare training vs test error
train_predictions <- predict(model, newdata = train_model)

train_rmse <- sqrt(mean((train_predictions - train_model$X2024_Losses)^2))
test_rmse <- sqrt(mean((predictions - test_model$X2024_Losses)^2))

train_rmse
[1] 98847983
test_rmse
[1] 406916980
# Looking at residuals to help with error analysis
plot(model$fitted.values, model$residuals,
     xlab = "Fitted Values",
     ylab = "Residuals",
     main = "Residual Plot")
abline(h = 0, lty = 2)

# Checking actual vs predicted plots
results <- data.frame(
  Actual = test_model$X2024_Losses,
  Predicted = predictions
)

head(results)
        Actual  Predicted
1     31545772   22829813
2     66002407   87898101
10   448223952  470210159
11   145551079  205914199
18 11161782401 9824492611
19   303148050  321448290

WEEK 5 REFLECTION (3/10/2026): PREDICTION VS EXPLANATION, SIMPLE CAUSAL RELATIONSHIP DIAGRAM AND DIFFERENTIATING CAUSAL VS PREDICTIVE CLAIMS:

WEEK 6 REFLECTION (3/17/2026): TEXT MINING, NLP, LLM, AND AI FOR RESEARCH GUIDE

AI For Research Guide

Note: My ideas for this AI Research Guide were informed in part by Harvard’s Library Artificial Intelligence for Research and Scholarship guide.

Citation References:

“Research Guides: Artificial Intelligence for Research and Scholarship: AI for Research.” 2026. AI for Research - Artificial Intelligence for Research and Scholarship - Research Guides at Harvard Library. Accessed March 4, 2026. https://guides.library.harvard.edu/c.php?g=1330621&p=10034534

WEEK 7 REFLECTION (3/24/2026): LLM FAILURES, AI IN THE FUTURE

WEEK 8 REFLECTION (3/31/2026): RAG CRITICISMS AND IMPROVEMENTS

WEEK 9 REFLECTION (4/7/2026): KNOWLEDGE GRAPHS, WITH ITS NODES AND RELATIONSHIPS

WEEK 10 REFLECTION (4/14/2026): AI FOR SCIENCE AND AGENTIC AI SYSTEMS