“Water, water, everywhere, nor any drop to drink;” Replace Coleridge’s memorable line with “data” and you have an accurate surmisal of the life insurance industry over the last three decades. Dating back to 1700’s London and even, some suggest, ancient Rome, life insurance companies accrue vast sums of data on policyholders’ health, family circumstances, living arrangements, employment, and beneficiaries. The sector might be considered the original “Big Data” business but, until now, unlocking the full potential of that data was a task befitting Hercules.
Traditionally, life insurance companies store their data in different formats and in different systems, none of them compatible, none of them talking to each other. But the old legacy systems of green and orange screens do not provide a means of strategically using data and insurers are subsequently losing out in an increasingly competitive market.
Advances in artificial intelligence, machine learning, and predictive analytics have opened a new world of opportunity for life insurance companies. Digitalization in the 1990’s created an explosion of available data, but for a long time, this surge was not matched by corresponding technological developments that allowed the data to be processed, manipulated, and transformed into actionable insights
One of the greatest challenges currently facing actuaries in the life insurance industry is that while senior management is eager to put the abundance of data to practical gain—to develop a better understanding of policyholders and apply this in marketing, pricing and reserving—actuaries are often not equipped with the know-how to apply the latest technologies to their own data.
Many companies who work with brokers struggle to form a full picture of their clients. Even those who work directly with clients are unable to forecast what they will do and often fail to accurately predict future behavior because companies are unable to unlock the data’s hidden value with the range of tools available to them.
Life insurers are realizing that outmoded ways of doing business are not only sub-optimal but may even no longer be viable. For decades, the sector was slow to adapt to new technologies that other industries were responding to, and entrenched IT departments coupled with insufficient pressure to adapt were the enemies of innovation. Today, insurers are working in a very different climate. In a 2016 PwC survey, some three-quarters of insurance companies acknowledged that their business was going to be affected by technology disruption and feared that their traditional operations might lose to new contenders. A similar percentage of insurers surveyed in 2015 said that they expected to use big data in pricing, underwriting, and risk selection within two years. Competition is greater, premiums are lower and industry disruptors such as Lemonade are trying to upend the industry from a distribution channel point of view.
To address this, many insurers are looking to data scientists to extract value from their data. But most data teams at insurtech companies expect to receive normalized data from the insurers to create a structured data set. Not only is this frequently beyond the reach of smaller companies who don’t count data scientists among their staff – and therefore cannot devote the necessary time and resources to unpack the data, but even among those firms big enough to have their own innovation departments, the time —and money— required to cleanse and normalize the data can be burdensome.
Some companies are changing this reality by harnessing new data analysis tools to enable life insurance companies to monetize their in-force data and customer base. These platforms allow insurers to assess the potential for under-insurance, high lapse risk and profitability to improve up-sale and cross-sale efforts, as well as optimizing distribution channels to develop proactive retention programs. By taking advantage of artificial intelligence, machine learning, and predictive analytics, these systems augment the data, both internal and public, held by life insurance providers, grouping cohorts of policyholders together for meaningful analysis, to find the embedded value in a book of business.
Despite the potential value, the life insurance market, which has not yet made the technological jumps that have revolutionized other sectors such as banking and finance, has been slow to appreciate the value of raw and unstructured data.
Structuring unstructured data is a big headache for insurers, yet it is also a necessity if companies only have standard actuarial techniques at their disposal. Using advanced methods of artificial intelligence and machine-learning, data bypasses these first steps of insurer manipulation, allowing the modeling process to start straight away—a process that is immediately quicker and more efficient.
Typically, after data has been normalized, unstructured data is left out of the final analysis, wiping vast quantities of relevant information. In many cases, the lack of data is itself indicative of a behavioral pattern. Unstructured and manipulated raw data grants insurers the freedom to utilize more features—and the more features, the more insurers can understand why individuals make the choices they do, helping them to build up a more realistic image of their behavior. From a quantitative point of view, the model is improved by ever larger data sets.
The benefits of unstructured data can be illustrated through the example of a free text box that may accompany an insurer’s request for policyholders’ work emails and occupations. When you have free text cells, swaths of data can be lost unless insurers understand how to analyze it and link it to external information sources. For example, the data can be divided into three sets: those that enter an accurate work email, those who enter an email that is incorrect and those who leave the space blank. Through advanced modeling, it can be ascertained that each group behaves in behaviorally distinct ways. From this data—and even, significantly, from the absence of data—insurers gain greater insights into the policy-holder. Those who put in incorrect emails may have lost their jobs, for example, and those who don’t enter a work email may either be unemployed or employed in a field—such as construction or cleaning—where they don’t require an email.
Similarly, if people are asked to enter their occupations manually, there will be tens of thousands of variations—teacher, math teacher, French tutor—that are not statistically significant until the techniques of machine learning are applied: different occupations can then be clustered according to different statistical groups, such as pensioners, teachers, managers, housewives, to extract potentially lucrative data.
Knowing where a policyholder is paying their premiums from, whether from an individual or company account or, for example, the Teachers Federal Credit Unit or the Navy Federal Credit Union is advantageous for the insurer.
Unlocking the structure within unstructured data is the key to further insights, which are enriched by publicly available external data. Advanced models can apply those features most relevant to insurance, for example, U.S. census questions on monthly insurance expenditure and assets in pension savings. Moreover, every time data is entered into these machine-learning models, the process is quicker than before, giving insurers greater insights in a significantly shorter space of time.
Raw data is the key to insurers staying ahead of the competition. Life insurers can continue to do what they do best—but now with the tools to irrigate their data and watch the profits bloom.