Journal of Surgical Practice and Case Reports
Journal of Surgical Practice and Case Reports

Research Article Volume: 1 & Issue: 1

Assessing the Detectability of AI-Generated Phishing Emails by Modern Email Filters

Ruth Imanria Itua, Aaron Ogochukwu Okolo, Ifelunwa Ada Ikuni, Chidinma Anumaka, Sebastian Obeta*

Received : January 18, 2026 | Published : January 30, 2026

Citation: Itua, R.I., Okolo, A.O., Ikuni, I.A., Anumaka, C., and Obeta, S. (2026), ‘Assessing the Detectability of AI-Generated Phishing Emails by Modern Email Filters’, Journal of Artificial Intelligence and AI Ethics, vol. 1, no. 1, pp. 1–15.

Abstract

The emergence of large language models (LLMs) has significantly altered the phishing threat landscape by enabling the automated generation of linguistically fluent and contextually convincing phishing emails. While prior studies demonstrate the effectiveness of AI-generated phishing and the vulnerability of experimental classifiers, the real-world performance of widely deployed email filtering systems remains insufficiently understood. This study addresses this gap through an empirical evaluation of modern email filters exposed to AI-generated phishing content.

A controlled dataset of 100 phishing emails was generated across multiple attack categories using contemporary LLMs, including ChatGPT, Claude, Gemini, Meta AI, and Qwen 2.5, and evaluated against commonly used email filtering systems such as Gmail, Outlook, Yahoo Mail, Proton Mail, and Spam Assassin. Detection outcomes were analysed using quantitative performance metrics, rule activation analysis, and statistical testing to examine the influence of filtering system, phishing category, and language model on detectability.

The results reveal pervasive detection failures across most evaluated systems, with high false negative rates observed. Statistical analysis shows that detection outcomes are significantly associated with the filtering system employed, but not with the phishing category or the LLM used. These findings demonstrate systemic limitations in current email filtering architectures and highlight the need for adaptive, intent-aware defences against AI-enabled phishing.