Outliers are objects that are considerably dissimilar and inconsistent when compared to the rest of their data set. Detecting outliers is a challenging research problem with a wide range of real-world applications in engineering, business, security and healthcare. For example, internet service providers use intrusion detection, i.e. detecting outliers, in internet traffic to help address their permanent cybersecurity concerns. Similarly, regulators and investors focus on the outlier that is financial fraud, since such fraud causes substantial damage to businesses and society. However, detecting outliers, especially in large-scale unstructured data that includes both text and numbers, is a computationally complex problem that requires innovative solutions beyond traditional data analytics techniques.
What is this research about?
Professor Bijan Raahemi, from the Telfer School of Management, has received a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant to conduct research that will explore, design, analyze and implement reliable artificial intelligence (AI) algorithms to detect outliers in large-scale, unstructured high-dimensional data. The proposed architecture will employ emerging word-embedding techniques to convert text into feature vectors. It will also use dimension reduction and data summarization methods (including bio-inspired and evolutionary algorithms), and an ensemble of advanced AI and machine learning models. By combining these methods, Professor Raahemi aims to quickly and accurately detect outliers in vast amounts of documents that contain both text and numbers. He plans to apply the new methods to emerging applications in business and engineering to protect public and private organizations from substantial socio-economic harm.
How does this research contribute to the design of unbiased and reliable AI solutions?
This research project will also address the important research challenge of designing AI solutions that are unbiased, reliable, and trustworthy. Professor Raahemi will take a two-fold approach to this issue: 1) The collected and pre-processed data must be checked to ensure that it is not biased towards specific organizations, regions, or minority groups; 2) In order for their conclusions to be reliable and trustworthy, these AI algorithms must remain fair, unbiased, and as much as possible, explainable.