Sms Phishing Detection in Sinhala Language messages Using Rule-Based Filtering and NLP
Today, people in many parts of the world such as Sri Lanka, prefer to use SMS for communication rather than other services, especially because most of them have mobile phones. Because of how widespread SMS is, cybercriminals often target it. SMS phishing or smishing, has turned into a major threat over the years. In this attack, people are sent fraudulent texts in an attempt to obtain their passwords or credit card numbers. A large number of tools exist for spotting English-based phishing attempts, but there are not many available for Sinhala. This blog outlines how an SMS phishing detection system was developed in Sinhala using automatic procedures and Natural Language Processing.
Understanding SMS Phishing
With SMS phishing, scammers work on creating a
sense of emergency to convince you to give away your private information. Many
times, they will use fake notice of winning, charge you, then either suspend
your account or demand immediate payment. Some messages may include links to
dangerous websites or encourage users to provide confidential info. The risk is
greater with these viruses since they work easily and users may not realize what
is happening. As more people use their phones in Sri Lanka, instances of
smishing have risen. Unfortunately, some users are not equipped with tools that
review and identify fishy Sinhala messages. There is a real need for tools that
help identify phishing messages quickly and easily in Sinhala.
Because Sinhala is a morphologically rich and low-resource language, it creates certain challenges. There is a limited quantity of SMS datasets in Sinhala, especially ones centering on phishing problems. The sentence structure in Sinhala is complex, so it becomes harder to analyze text. When we send text messages on our phones, we commonly use informal and short forms of words which is challenging for NLP. Sometimes people mix English and Sinhala terms in the same conversation. As a result, it is not easy to transfer models developed in English or other major languages to other languages.
How to implement

Comments
Post a Comment