兰州大学机构库 >数学与统计学院
基于数据挖掘的保险欺诈识别与欺诈查出率估计
Alternative TitleInsurance Fraud Recognition and Fraud Detection Rate Estimation Based on Data Mining
孟小哲
Thesis Advisor白建明
2018-03-20
Degree Grantor兰州大学
Place of Conferral兰州
Degree Name硕士
Keyword不均衡数据 欠采样 AdaBoost 零膨胀 欺诈查出率
Abstract

伴随着保险行业的发展历程,保险欺诈逐渐成为影响保险行业健康发展最受关注的问题之一。在西方国家,所有大型保险公司对保险欺诈具有规范的检查技术和丰富的管理经验;但由于各种原因,中国保险行业关于欺诈索赔的管理实践一直处于缺失状态。国内保险公司虽然保存了详细而系统的索赔数据,但关于欺诈索赔的记录几乎为空白(可以肯定其中存在大量的欺诈索赔遗漏),保险欺诈数据具有典型的信息不完整、结构不均衡特征。

随着大数据技术的广泛兴起,将数掘挖掘、机器学习方法引入保险欺诈研究与管理实践,据此进行索赔样本的分类、识别其中的欺诈索赔成为可能。本文对国内某大型保险公司的车险索赔数据进行了探索性分析,发现保险索赔记录中的欺诈样本极为稀少,保险索赔数据存在明显的不均衡特点。针对这一特点,论文采用欠采样方法,首先将数据进行均衡处理;为了在采样时尽可能不丢失多数类索赔样本的信息,利用倍数次随机欠采样得到多个数据集,针对每个数据集使用AdaBoost算法构建分类器,后将不同的AdaBoost分类器进行集成,得到了保险欺诈识别分类器,该分类器的AUC为0.82。

欺诈查出率是指全部索赔中保险公司查出的欺诈与全部索赔的比率,它体现了既定条件下保险公司反欺诈的能力与水平。设一天内欺诈的发生服从某一Poisson分布,则保险公司一天内查出的欺诈个数将服从稀疏后的Poisson分布。由于保险公司检查能力不足,漏掉了很多欺诈索赔,再加上一天内存在不发生欺诈索赔(所有索赔均为诚实索赔)的情况,导致索赔数据中含有过多的零,超出了Poisson分布所能拟合的范围。为弥补这个不足,论文引入零膨胀模型对数据进行拟合,然后令零膨胀项为零,便可估计查出欺诈索赔的Poisson分布,从而得到欺诈查出率和欺诈遗漏率的估计结果。

Other Abstract

With the development of the insurance industry, insurance fraud has gradually become one of the most concerned issues affecting the healthy development of the insurance industry. In Western countries, all major insurance companies have standardized inspection techniques and rich management experience in insurance fraud; however, for various reasons, the practice of fraud claims management in the Chinese insurance industry has been in a state of deficiency. Although the domestic insurance company has kept detailed and systematic claim data, the record of fraud claims is almost blank, (it can be sure that there are a lot of fraud claims omitted), and the insurance fraud data has the typical characteristics of incomplete information and unbalanced structure.

With the extensive rise of big data technology, the methods of data mining and machine learning have been introduced into insurance fraud research and management practices, it is possible to classify claims samples and identify fraud claims based on these techniques. This paper makes an exploratory analysis of the auto insurance claim data of a large insurance company in China. It finds that the fraud sample in the insurance claim record is extremely scarce, and the insurance claim data is obviously unbalanced. To solve this problem, the paper adopts the undersampling method. Firstly, balance the data and then in order to minimize the loss of the information of most types of claims during sampling, multiple data sets are obtained by multiple times of random undersampling. The AdaBoost algorithm was used to build the classifier and used for each data set. After integrating the different AdaBoost classifiers, the insurance fraud recognition classifier was obtained. The AUC of the classifier was 0.82.

Fraud detection rate refers to the fraud rate detected by insurance companies in all claims. It reflects the ability and level of anti-fraud insurance companies under established conditions. If the occurrence of fraud within a day is subject to a Poisson distribution, then the number of frauds detected by the insurance company in one day will be subject to the sparse Poisson distribution. Due to lack of inspection capacity of insurance companies, many fraud claims were missed, plus one day of non-fraudulent claims (all claims are honest claims), resulting in excess zero in the claim data, exceeding the fitting capacity of Poisson distribution. To make up for this deficiency, the paper introduces a zero-inflated model to fit the data, and then the Poisson distribution of the fraud detection claims by setting the zero-inflated item to zero. At last, this results in estimates of fraud detection rate and fraud missing rate.

URL查看原文
Language中文
Document Type学位论文
Identifierhttps://ir.lzu.edu.cn/handle/262010/225121
Collection数学与统计学院
Recommended Citation
GB/T 7714
孟小哲. 基于数据挖掘的保险欺诈识别与欺诈查出率估计[D]. 兰州. 兰州大学,2018.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[孟小哲]'s Articles
Baidu academic
Similar articles in Baidu academic
[孟小哲]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[孟小哲]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.