Assignment 2: Prompt exercise
Step 1: Initial Prompt Creation
- Task: Write a baseline prompt to request a structured systematic literature review on data mining and machine learning applications.
- Prompt: “Conduct a 2,500-word structured systematic literature review on the applications of data mining and machine learning in real-world domains. Include a methodology section, synthesize key findings, identify trends and gaps, and propose one testable hypothesis. Use an academic tone and emulate systematic review standards.”
- I submit this prompt to ChatGPT, Copilot, and Grok 3, collecting the raw outputs.
Step 2: Analyze Model Responses
- Structure: Did it include a methodology section and follow a systematic review format?
- ChatGPT: Yes
- Copilot: Yes
- Grok: Somewhat
- Synthesis: Were key findings from data mining and machine learning applications well-summarized?
- ChatGPT: Yes
- Copilot: Yes
- Grok-3: Yes
- Trends and Gaps: Did it identify meaningful trends and research gaps?
- ChatGPT: There was one identified major trend, the gaps however; it was satisfactory.
- Copilot: The trends section presented general observations, and the analysis of research gaps provided greater depth.
- Grok-3: The identified trends and research gaps were meaningful but comparatively broad.
- Hypothesis: Was the proposed hypothesis testable and relevant?
- ChatGPT: Yes
- Copilot: The proposed hypothesis is somewhat reliable, lacking the precision and specificity observed in ChatGPT’s output.
- Grok-3: The proposed hypothesis was reasonably well-constructed. Although the model referenced relevant sources, it did not provide formal citations, limiting verifiability.
- References: Are the citations accurate (check using Google Scholar or Semantic Scholar)
- ChatGPT: In terms of sources, the model provided citation references, upon verification, all but one source was checked on Google Scholar and appeared academically valid.
- Copilot: The model also had a absence of citation sources or references.
- Grok-3: Although the model referenced relevant sources, it did not provide formal citations, limiting verifiability.
- Structure: Did it include a methodology section and follow a systematic review format?
Step 3: Refine the Prompt
Task: Revise the prompt to address deficiencies in each model’s response, creating three tailored prompts—one for ChatGPT, one for Copilot, and one for Grok 3.
Refined Prompt for All Three Models: “Imagine you’re a data scientist conducting a 2,500-word systematic literature review on how data mining and machine learning are applied in domains like healthcare, finance, and education. Outline a clear methodology, synthesize on specific key findings with fresh insights, provide a detailed and clear analysis of trends and gaps, and propose one bold, testable hypothesis. Maintain a rigorous academic tone.”
I test these refined prompts and compare the improved outputs.
Step 4: Cross Model Collaboration
- Task: Integrate the best elements from each model’s output into a final systematic review. Write a new prompt for the student’s preferred model (e.g., Grok 3) to synthesize the results.
- Example Synthesis Prompt: “Using these drafts from three AI models [pastes outputs], produce a 2,000-word structured systematic literature review on data mining and machine learning applications. Combine the strongest methodology, findings, trends, gaps, and hypothesis into a cohesive, academically sound document.”
- I submit their final review and justify their synthesis decisions.
Step 5: Reflection
Task: Write a reflection answering:
How did each model approach the systematic review differently?
ChatGPT: This model produced a well-organized literature review clear methodology section and a structured consistent systematic review format. For the synthesis, the key findings related to data mining and machine learning applications was concise and well-summarized. The review identified one trend and numerous research gaps, with the discussion of gaps being particularly specific. The proposed hypothesis was both testable and relevant. In terms of sources, the model provided citation references, upon verification, all but one source was checked on Google Scholar and appeared academically valid. Overall ChatGPT’s response demonstrated strong specific coverages, especially in its treatment in gap areas, though the examination of trends was limited.
Copilot: Similar to ChatGPT, this model contained a clear methodology section and systematic review format. Unlike the previous model, however, the output focused more on broader findings rather than detailed analytical synthesis. While the trends section presented general observations, the analysis of research gaps provided greater depth. The proposed hypothesis is somewhat reliable, lacking the precision and specificity observed in ChatGPT’s output. The model also had a absence of citation sources or references. Overall, Copilot was a coherent but less detailed literature review, with weaknesses primarily related to source support and analytical depth.
Grok 3: At first, the model produced a substantially shorter review than requested, generating approximately 900 words instead of 2,500 words. After clarifying the prompt for the word count, it generated the requested output. This model contains a methodology section, although it didn’t follow a systematic review protocol. Similar to ChatGPT, the synthesis of key findings from data mining and machine learning applications are well-summarized, and the identified trends and research gaps were meaningful but comparatively broad. The proposed hypothesis was reasonably well-constructed. Although the model referenced relevant sources, it did not provide formal citations, limiting verifiability. Overall, Grok 3 produced a competent but lesser quality review relative to the other models.
Which prompt refinements yielded the results for each model?
- Regarding prompt refinements, ChatGPT demonstrated the best structure through its methodology and a systematic review format. In terms of the synthesis across domain, both ChatGPT and Copilot are equally effective, each providing balanced discussions that covered a similar range of application areas.. Copilot ,however, offered a more extensive discussion of trends and gaps, identifying numerous broader thematic patterns. With the respect of the hypothesis development, ChatGPT provided the most detailed hypothesis testing. Finally, all models generated citation references that, upon verification through Google Scholar, appeared to be peer-reviewed academic sources.
What did you learn about leveraging AI for structured academic reviews?
- Through this exercise, I learned that AI isn’t the best reliable tool for writing and making structured academic literature reviews. While some AI models such as ChatGPT and Copilot can write a good academic review, their outputs may contains some flaws. For example, some models like Grok-3 may fail to follow length or structural instructions when asked, sometimes producing outputs that is contrary from the requested format. I also learned that AI can help with citation suggestions, but it’s important to double-check the legitimacy of these citation sources through Google Scholar or Semantic Scholar. Overall, AI can be a helpful tool for brainstorming, refining writing, and checking your works and citations, but it should never be a substitute of relying on independent academic work.