Sanjay Paul

Machine Unlearning: a balancing act between reward, risk and regret

A Talk by Sanjay Paul MBA MBCS CITP BEng (IT Architect, UK Govt (Civil Service))

About this Talk

Recent advancements in Reinforcement Learning (RL) are evolving beyond the traditional goal of maximising expected rewards. Researchers and data scientists are increasingly integrating ‘risk sensitivity’ and ‘regret bounds’ to enhance the safety and reliability of RL algorithms, particularly in business-critical applications. This shift reflects a deeper understanding: not all rewards are estimated equal. In many cases, preventing catastrophic losses takes precedence over pursuing large but uncertain gains.

Over time, intelligent systems continuously unlearn and relearn from operational data streams, adapting to serve the organisation’s best interests - not merely by maximising profit, but being sustainable over the long-term. Put simply, less is more when resilience matters.


Key takeaways:

An intelligent machine learns many different ways, and one of the ways it trains itself is RL technique. RL fundamentally depends on the ‘state-action-reward’ cycle to determine its next move. However, learning algorithms of maximising reward alone is not good enough to apply in a competitive business environment. So, algorithms are now laden with risk sensitivity for each step vis-à-vis corresponding regret scores to be robust.

The core theme of this talk is an application of machine learning techniques to predict the most effective sales strategy. This approach has the potential to deliver significant value by improving customer retention during (non-life) insurance renewal and amendment interactions, across both phone and digital (app/online) channels. Customers’ persona can be assessed for the risk of churn and proactively engaging thereafter with selection of an appropriate strategy.

Imagine, the organisation has a few active sales strategies on offer.

  • Dynamic premium adjustment at renewal 
  • Personalised coverage recommendations and upselling 
  • Cross-selling fringe and add-on products keeping core product at rock bottom and 
  • Freemium product sale to widen customers’ footprint. 


The intelligent system can predict the probability of success in real time during ongoing interactions and guide the automated digital flow or the human agent on phone to pick a strategy for the best possible outcome.

If executed effectively, the right approach guided by sophisticated machine learning deepens customer relationships by truly addressing their unique needs. And, this is a prime example of machine unlearning: where a system sacrifices short-term reward in order to accommodate legitimate risks.

24 October 2025, 11:30 AM

11:30 AM - 12:00 PM

Add to Calendar

About The Speaker

Sanjay Paul

Sanjay Paul MBA MBCS CITP BEng

IT Architect, UK Govt (Civil Service)

Social & Business Researcher ·· IT Architect & Data Scientist ·· Mentor & Volunteer