Proposal

Outlook Phishing Detection Extension for Students

Jordan Rodriguez Computer Science

Background

Phishing is one of the most commonly used forms of cybercrime, using fake emails and websites to deceive users into providing sensitive information like passwords, financial records, and personal information. These attacks are mainly done by impersonating someone the user trusts and convincing them to interact with malicious links or provide private information. Phishing is highly used today, with nearly 90% of organizations reporting facing phishing attacks (Alkhalil et al., 2021).

College students are particularly vulnerable to phishing attacks because of how frequently they receive emails regarding sensitive data like financial aid, student account verification, and student job opportunities. Attackers use fake accounts to imitate these emails, making students believe they are coming from credible sources. This makes it hard for students to identify which emails are phishing attacks, since they closely resemble official emails.

While there are already ways to prevent phishing attacks, like using a blacklist on known phishing links, these traditional methods cannot detect new threats (Tang & Mahmoud, 2021). To solve this, machine learning has emerged as a new approach to prevent phishing attacks by analyzing components of an email to ensure it is valid and not an imitation (Tang & Mahmoud, 2021). Machine learning can learn to find patterns in what makes an email legitimate or malicious, which is useful when facing an evolving threat like phishing (Alkhalil et al., 2021).

This project aims to develop an Outlook extension for students that uses machine learning to accurately detect and alert students to phishing scams in their email.

Methodology

First, data will be gathered from public phishing databases such as PhishTank and Kaggle. Then, the data will be preprocessed to collect key features such as sender addresses, link URLs, and suspicious keywords and phrases that may indicate phishing. Both phishing scams and legitimate emails will be collected to help identify the differences between the two.

Next, machine learning models using Naive Bayes classifiers will be trained using the data gathered from the databases to identify whether emails are legitimate or phishing scams. The models will be evaluated on metrics such as accuracy, precision, and recall.
Finally, a prototype Outlook extension will be designed to analyze emails for phishing scams and give an explanation of why the email is a phishing attack. The extension will be built using JavaScript and the Outlook Add-in APIs. The extension will be tested by a sample of 10 students, who will try the prototype and provide feedback on accuracy and usability. The students will be emailed a digital Amazon gift card upon completion of testing. The extension will be tested and evaluated on its ability to correctly identify phishing attacks and how helpful its feedback is to the users.

Anticipated Outcomes

The expected outcome of this research is a functional prototype Outlook extension that helps students detect phishing emails. The prototype should accurately detect potential phishing threats and alert the user while providing a clear explanation for why the threat was flagged. Additionally, the project will contribute to a better understanding of how machine learning can be used to assess security threats, such as phishing. The project results are planned to be presented at the Showcase of Undergraduate Research Excellence at UCF.

Significance

References

Alkhalil, Z., Hewage, C., Nawaf, L., & Khan, I. (2021). Phishing attacks: A recent comprehensive study and a new anatomy. Frontiers in Computer Science, 3. https://doi.org/10.3389/fcomp.2021.563060

Tang, L., & Mahmoud, Q. H. (2021). A survey of machine learning-based solutions for phishing website detection. Machine Learning and Knowledge Extraction, 3(3), 672-694. https://doi.org/10.3390/make3030034

Timeline

Dates Tasks
January 1, 2027- January 20, 2027 Identify common phishing scams, collect data, and preprocess data
January 21, 2027 - February 15, 2027 Develop machine learning models and begin training models
February 16, 2027 - February 28, 2027 Evaluate performance and make improvements to models
March 1, 2027 - March 15, 2027 Design Outlook extension prototype
March 16, 2027 - March 25, 2027 Test design and analyze results
March 26, 2027 - May 5, 2027 Finalize results and prepare final report

Budget

Item Units and Cost Total
Google Cloud Virtual Machine 1 instance, 75 hours,

$2 per hour
$150
Google Cloud Storage 50 Gigabytes, 5 Months

$1 per month
$5
Incentives for Participants $10 Amazon gift card per participant, 10 participants $100
Total   $255