From Proof of Concept to Scalable Policies: Challenges and Solutions, with an Application

Crepon, Duflo, Gurgand, Rathelot, and Zamora From Proof of Concept to Scalable Policies: Challenges and Solutions, with an Application (2017) Abhijit Banerjee, Rukmini Banerji, James Berry, Esther Duflo, Harini Kannan,  Shobhini Mukerji, Marc Shotland, Michael Walton. Journal of Economic Perspectives vol. 31, no. 4, Fall 2017 (pp. 73-102). Suggested by my colleague Alexandra van Geen.


The promise of randomized controlled trials is that evidence gathered through the evaluation of a specific program helps us—possibly after several rounds of fine-tuning and multiple replications in different contexts—to inform policy. However, critics have pointed out that a potential constraint in this agenda is that results from small “proof-of-concept” studies run by nongovernment organizations may not apply to policies that can be implemented by governments on a large scale. After discussing the potential issues, this paper describes the journey from the original concept to the design and evaluation of scalable policy. (…) We use this example to draw general lessons about using randomized control trials to design scalable policies.

In terms of establishing causal claims, it is generally accepted within the discipline that randomized controlled trials are particularly credible from the point of view of internal validity. This credibility applies to the interventions studied—at that time, on that population, implemented by the organization that was studied—but does not necessarily extend beyond. Not at all clear that results from small “proof-of-concept” studies run by nongovernment organizations can or should be directly turned into recommendations for policies for implementation by governments on a large scale. While external validity of a randomized controlled trial cannot be taken for granted, is it far from unattainable.

6 obstacles

Six main challenges in drawing conclusions from a localized randomized controlled trial about a policy implemented at scale:

[1] Market equilibrium effects.  When an intervention is implemented at scale, it could change the nature of the market.

To assess the equilibrium impact of an intervention (…) The typical design is a two-stage randomization procedure in which the treatment is randomly assigned at the market level in addition to the random assignment within a market. For example, the experiment of Crepon, Duflo, Gurgand, Rathelot, and Zamora (2013) varied the treatment density of a job placement assistance program in France within labor markets, in addition to random assignment of individuals within each market.

One potential challenge with the experimental identification of equilibrium effects is that it is not always obvious what the “market” is

When a particular intervention is scaled up, more people will be needed to implement it. This may lead to an increase in their wages or in difficulties hiring them.

[2] Spillover Effects. Many treatments have spillovers on neighboring units, which implies that those units are not ideal control groups. Not all spillovers are easy to detect in pilot experiments: in some cases, they may be highly nonlinear.

[3] Political reactions. including either resistance to or support for a program, may vary as programs scale up.

Potential political backlash?

  • Worth exploring whether some changes in potentially inessential program details are available.
  • It is also important to try to anticipate the backlash and create a constituency for the reform from the start
  • Finally, the potential for political backlash may provide an argument for not doing too many pilots, since large-scale programs are less likely to be scotched.

[4] Context Dependence. Would results extend in a different setting (even within the same country)? Would the results depend on some observed or unobserved characteristics of the location where the intervention was carried out?

[5] Randomization or Site-selection bias. Organizations or individuals who agree to participate in an early experiment may be different from the rest of the population; randomization bias.

  • Organizations (and even individuals within governments) who agree to participate in randomized controlled trials are often exceptional
  • A well-understood problem arises when individuals select into treatment
  • Site-selection bias arises because an organization chooses a location or
    a subgroup where effects are particularly large

Blair, Iyengar, and Shapiro (2013): randomized controlled trials across are disproportionally conducted in countries with democratic governments.

[6] Piloting Bias/Implementation Challenges. A number of studies have found differences between implementation by nongovernment organizations and governments. Banerjee, Hanna, Kyle, Olken, and Sumarto (2016): the [Indonesian] government was less effective at running a pilot program and more effective with full implementation.

As the discussion in this section has emphasized, the issue of how to travel from evidence at proof-of-concept level to a scaled-up version cannot be settled in the abstract. The issue of [4] context-dependence needs to be addressed through replications, ideally guided by theory. [1] General equilibrium and [2] spillover effects can be addressed by incorporating estimation of these effects into study designs, or by conducting large-scale experiments where the equilibrium plays out. [5] Randomization bias and [6] piloting bias can be addressed by trying out the programs on a sufficient scale with the government that will eventually implement it, documenting success and failure, and moving from there.

[I skipped the Teaching at the Right Level example]

General Lessons

Perhaps the key point is to remember what small pilot experiments are good for and what they are not good for.

If the objective is to design or test a model [i.e. no policy implications], the researcher can ignore most of the concerns that we talked about in this paper. Something valuable will be learnt anyway.

For researchers, a strong temptation in a stage-two trial will be to do what it takes “to make it work,” but the risk of implementation challenges means that it is important to think about how far to go in that direction. On the one hand, trial and error will be needed to embed any new intervention within an existing bureaucracy. Anything new is challenging, and at the beginning of a stage-two trial, considerable time needs to be spent to give the program a fair shot. On the other hand, if the research team embeds too much of its own staff and effort and ends up substituting for the organization, not enough will need to be learnt about where implementation problems might emerge


The Consumer Financial Protection Bureau and the Quest for Consumer Comprehension – Lauren Willis

CFPB_Vertical_RGB-300x212The Consumer Financial Protection Bureau and the Quest for Consumer Comprehension (book chapter, April 2017) by Lauren Willis.

I found out about this new strand of work via ASIC. I really liked Willis’ debunking paper The Financial Education Fallacy (2011). Related to the paper I summarize below: Performance-Based Consumer Law (2015) and Performance-Based Remedies: Ordering Firms to Eradicate Their Own Fraud (2017). Perhaps I will dive deeper into one of those in another blog post.


To ensure that consumers understand financial products’ “costs, benefits, and risks,” the Consumer Financial Protection Bureau has been redesigning mandated disclosures, primarily through iterative lab testing. But no matter how well these disclosures perform in experiments, firms will run circles around the disclosures when studies end and marketing begins. To meet the challenge of the dynamic twenty-first-century consumer financial marketplace, the bureau should require firms to demonstrate that a good proportion of their customers understand key pertinent facts about the financial products they buy. Comprehension rules would induce firms to inform consumers and simplify products, tasks that firms are better equipped than the bureau to perform.
[unless otherwise stated, all text below is quoted from the paper]
The bureau [CFPB] must induce firms themselves to promote consumer comprehension:
Demonstrating sufficient customer comprehension could be a precondition firms must meet before enforcing a term or charging a fee, or firms could be sanctioned (or rewarded) for low (or high) demonstrated comprehension levels. In effect, rather than prescriptively regulating the marketing and sales process with mandated disclosures or pursuing firms on an ad hoc ex post basis for unfair, deceptive, and abusive marketing and sales practices, the bureau would monitor firms and incentivize them to minimize customer confusion as the marketing and sales process unfolds over time.
Comprehension rules are a form of performance-based regulation, in that they regulate outputs not inputs.
By moving testing of disclosure from the lab to the field, and trying to stimulate firms to develop creative disclosure methods, the CFPB implicitly acknowledges that:
  1. disclosures that do well in experimental conditions may not work in real-world conditions,
  2. firms are better situated than regulators to innovate to achieve consumer comprehension,
  3. valid, reliable consumer confusion audits are possible.

How might this form of regulation operate in practice?

  1. Measuring the quality of a valued outcome (comprehension) rather than of an input that is often pointless (mandated or pre-approved disclosure);
  2. Assessing actual customer comprehension in the field as conditions change over time, rather than imagining what the “reasonable consumer” would understand or testing consumers in the lab or in single-shot field experiments;
  3. Requiring firms to affirmatively and routinely demonstrate customer understanding, rather than relying on the bureau’s limited resources to examine firm performance ad hoc when problems arise;
  4. Giving firms the flexibility and responsibility to effectively inform their customers about key relevant costs, benefits and risks through whatever means the firms see fit, whether that be education or product simplification, rather than asking regulators to dictate how disclosures and products should be designed.
Certainly comprehension is often neither necessary nor sufficient for good decisions (…) Even knowledgeable consumers make bad decisions, whether as a result of inadequate willpower or decisionmaking biases. (…) many decisions require basic financial knowledge that consumers lack; the effective annual percentage rate (APR) for a credit card account “defies plain language efforts”.
It might well be more cost effective for society to engage in substantive regulation of product design or performance-based regulation of consumer welfare outcomes (e.g. a lender that does not follow the bureau’s underwriting rules can instead demonstrate annually that no more than five percent of its loan portfolio defaulted).


Even without any intent to deceive, firms not only will but must leverage consumer confusion to compete with other firms that deceive customers.
Firms have a bevy of means at their disposal to undermine mandated disclosures’ effectiveness:
  1. By altering the design of the transaction (e.g. banks are adept at sabotaging overdraft disclosures, see When Nudges Fail: Slippery Defaults, Willis, 2013)
  2. Frame consumers’ thought processes long before consumers see a disclosure. Consumers may think they are unaffected, but advertising works (Wood and Poltrack 2015; Lewis and Reiley 2014).
  3. Physically divert attention from disclosures. AT&T designed the envelope, cover letter, and amended contract after extensive “antimarketing” market testing to ensure that most consumers would not open the envelope, or if they did open it, would not read beyond the cover letter (Ting v. AT&T, 319 F.3d 1126, 9th Cir. 2003)
  4. Take proactive steps to ferret out easy marks, vulnerable customers. Savvy firms might use inferred cognitive load, mood, or stress levels to sell consumers products at the very moment when mandated disclosures will be misinterpreted or ignored. Firms can even engage in real-time marketing through Internet and mobile devices to reach consumers at vulnerable moments (Digital Market Manipulation, Calo, 2014).
Like sausage-makers, marketers do not want the public to know how their product is made.

Comprehension rules & customer confusion audits

Comprehension rules would align firms’ goals with the CFPB’s mandate to ensure consumer understanding of financial product costs, benefits, and risks. The effect of successful regulation through comprehension rules would be to bring transactions into closer alignment with consumer expectations.
Firms know a lot about their customers, as they already collect this information for marketing and product development purposes.
The very capacities that modern firms use to market products and defeat mandated disclosures enable them to attain better consumer comprehension more quickly and at a lower cost than regulators. The bureau can try to educate consumers, but nothing beats professional marketers when it comes to sending consumers a message.
Firms are in a better position than regulators to decide when it is worth the cost of educating consumers about complex or unintuitive features and when simplifying products is more cost-effective. Firms might find that educating their customers is so costly that it would be cheaper for firms to directly channel consumers to suitable products.
The bureau would need to remain mindful of firm agility at circumventing disclosure, and guard against firms’ manipulation of customer confusion audit results.


The benchmarks against which firm performance in customer confusion audits ought to be judged depend on which of the bureau’s statutory purposes it is pursuing: transparency, competition, or fairness.
Benchmarks if the goal is:
  • Fairness: the benchmarks would need to be high, perhaps as high as the approximately 85 percent benchmark implicitly used in false advertising cases
  • Competition: the benchmarks might be lower, depending on the firm’s ability to differentiate informed from uninformed consumers.
  • Prevent firms from undermining mandated disclosures: the benchmarks might be set at the comprehension levels the bureau can obtain in its disclosure testing.
  • Increase consumer comprehension from where consumers stand now:
    the benchmark might be set based on industrywide performance.

Benefits of Comprehension Rules

The effect of successful regulation through comprehension rules would be to bring transactions into closer alignment with consumer expectations.

The ultimate direct benefit of comprehension rules is increased consumer decisional autonomy; consumers would get what they think they are getting, not whatever hidden features firms can slip into the transaction.

Empowered choices free of confusion are only possible, and the market is only driven to efficiency, when consumers comprehend the transactions in which they engage.

Today we pretend that individual consumers use disclosures to drive market competition and make welfare-enhancing decisions, but we do not spend the resources needed to realize actual consumer understanding. As a result,  consumers neither discipline the market nor consistently enhance their own welfare.

How to prove, how to interpret and what to do? Uncertainty experiences of street-level tax officials

January 18th 2018, the VIDE publicatieprijs 2017 will be awarded. My own paper Werkt de wildwestwaarschuwing wel? is one of the nominees. The other nominee is How to prove, how to interpret and what to do? Uncertainty experiences of street-level tax officials by Nadine Raaphorst, published in Public Management Review in 2017 (2016 Impact Factor: 2.293).

Obviously, I can’t really objectively summarize this paper. And the fact that it is qualitative research based on a storytelling method, is also completely opposite to my quantitative bias. However, transcribing 37 stories “about situations they experienced as difficult or complicated” by 17 tax officials, is probably no easy feat and quite some work. And just like my paper, Raaphorst did not study actual behaviour.

This study examines the kind of uncertainties frontline tax officials working with a trust-based inspection approach experience in interacting with citizen-clients. The classical literature on bureaucracy and the street-level bureaucracy literature suggest frontline officials face two kinds of uncertainties: information and interpretation problems. Analysing stories of Dutch frontline tax officials collected through in-depth interviews, this article shows that these two kinds of uncertainty only explain a part of the uncertainties experienced. Respondents also face action problems requiring improvisational judgements. The study furthermore finds that different sources underlie these uncertainties, pointing to possible explanations.

Raaphorst studied Dutch tax officials (Belastingdienst) that have dealings with citizen-clients/entrepeneurs, and who have to implement a trust-based inspection approach (“horizontaal toezicht”, aimed at “collaboration and trust” and “rules and legislation that are vaguer“).

A trade-off is: “such policies may yield more responsive law enforcement and service provision, [but] they could also compromise consistent and fair decision-making, especially when certain types of citizen-clients have better negotiation and communication skills to take control in bureaucratic interactions.”

The paper seeks to solve “the lack of understanding of the kinds, conditions, and consequences of uncertainty at play in frontline work.” This is all the more important in a more uncertain bureaucratic process where “bureaucrats’ actions are increasingly made dependent on their perceptions of citizens in interactions, and to a lesser extent prescribed by formal rules, this leads to a more uncertain bureaucratic process.”

Three types of uncertainty

Apparently, there are two types of uncertainty inexisting literature (information and interpretation) and this study adds a new type; action uncertainty:

These findings underline the importance of social interactions to bureaucratic work and hence to understanding the role of uncertainty in bureaucracy. Whereas public administration literature has pointed to the existence of information uncertainties and interpretation uncertainties this study adds a third kind: action uncertainties.

Because “objective rationality (…) did not reflect organizational reality” as described below, there is a information problem with ‘unknowns’.

[In] the traditional model of bureaucracy (…) bureaucracies are seen as rational organizations that should limit individual bureaucrats’ discretionary powers by setting strict rules and procedures. Technocratic knowledge, embodied in rules, procedures, and policies, is put at the heart of bureaucratic organizations.

On uncertainty as an interpretation problem: “bureaucrats’ discretionary practices are not only informed by organizational classification systems and rules but also by personal judgements regarding clients’ worthiness or deservingness, based on cultural schemes, moral beliefs and values, or certain stereotypes.” So “‘instances'” need to be interpreted, to see “what ‘is really happening’“.

A paragraph on Uncertainty of social interactions rightly states: “Discretion at the frontlines ‘is necessary to respond to the unexpected and to ensure that services are responsive to individual need’” And in the public administration literature apparently “The uncertainty that is inherent to discretion is treated as given.” I don’t know the PA-literature, but this strikes me as strange (see this related discussion Toezichthouders moeten zelf initiatief nemen in discussie over buitenwettelijk toezicht).

Summarizing table

The paper has three tables (one in the appendix), that I tried to integrate into one table. I felt they overlapped a lot and differences were more in lay-out than content. That didn’t help me understand the structure of the paper. The different order in the text on action uncertainty from the tables also confused me a bit.

Table 2 Description of the kinds of uncertainty at play in frontline tax officials’ work, slightly adapted and enriched:

Problem of Proof Standards Control
Contexts in which they occur Lack of evidence to support one’s interpretation [4] Vague rules and legislation [8]

Conflicting norms, values, feelings [4]

Impact of citizen-clients’ private lives and emotions [10]

Negotiations with citizen-clients [3]

Deviations from normality [8]

Difficulties experienced Vague stories of citizen- clients |
Conflicting informational cues |Comprehensibility of account is not clear-cut affair |
Finding proof requires effort and time
Law insufficient as backing |Potential inconsistent decision-making | Far-reaching consequences for citizen-clients On-the-spot reaction | Consequentiality of official’s immediate reaction|Change of inspection approach | Dependence on citizen- client

Numbers in brackets: number of stories (total N=37).

As I understood it, rows with problem and Context are nearly identical to Tabel 1 and Table A1 from the Appendix.

For Interpretation uncertainty, what is called “Vague rules and legislation” in Table 2 is “Determining right decision” in Table 1 (and sometimes “grey area interpretation” or “absence of clear standards about what is right in these instances” in the text).

And “Conflicting norms, values, feelings” in Table 2 is called “Experiencing dilemmas” in Table 1 (or “tension between what one ought to do as a tax official and one’s personal values or ideas about what is appropriate, or one’s feelings of empathy.” in the text). Another nice description of this construct is “this leeway or ‘freedom to struggle’ involves dilemmas between following the law on the one hand and feelings of empathy on the other hand.

The “Impact of citizen-clients’ private lives and emotion” under Action uncertainty is described in the text as “emotional labour” and “when ‘private life’ leaks into the encounter“.

The story illustrating “Negotiations with citizen-clients” where one tax official felt “he has been too open and has given away too much already early in the negotiation” was the most salient and best at describing a construct for me.


Default neglect in attempts at social influence (PNAS)

Zlatev, J. J., Daniels, D. P., Kim, H., & Neale, M. A. (2017). Default neglect in attempts at social influence. Proceedings of the National Academy of Sciences, 114(52), 13643-13648. [pdf | supplemental information] And the link on Open Science Framework (OSF):


Current theories suggest that people understand how to exploit common biases to influence others. However, these predictions have received little empirical attention. We consider a widely studied bias with special policy relevance: the default effect, which is the tendency to choose whichever option is the status quo. We asked participants (including managers, law/business/medical students, and US adults) to nudge others toward selecting a target option by choosing whether to present that target option as the default. In contrast to theoretical predictions, we find that people often fail to understand and/or use defaults to influence others, i.e., they show “default neglect.” First, in one-shot default-setting games, we find that only 50.8% of participants set the target option as the default across 11 samples (n = 2,844), consistent with people not systematically using defaults at all. Second, when participants have multiple opportunities for experience and feedback, they still do not systematically use defaults. Third, we investigate beliefs related to the default effect. People seem to anticipate some mechanisms that drive default effects, yet most people do not believe in the default effect on average, even in cases where they do use defaults. We discuss implications of default neglect for decision making, social influence, and evidence-based policy

Key question in this study was: do people actually understand default nudges
enough to use them strategically?

Experts think so (spoiler alert: they were wrong): In an email survey of members of the Society for Judgment and Decision Making (n =133), the overwhelming majority of experts—90.1%—predicted that people would successfully use defaults to influence others in desired directions. Only 2.3% of experts predicted that people would fail to use defaults altogether.

In the experiments, defaults did work: CMs [Choice Maker] demonstrated a default treatment effect of 25.2 percentage points. That is, the CMs were 25.2 percentage points more likely to choose an option when it was the default than when it was not the default.

3 studies

The authors did three studies:
Yet we find that people acting as CAs [Choice Architects] frequently fail to understand and/or use defaults strategically when trying to influence others’ choices, even when doing so is in their best interest.

  • In study 1, in contrast to both theoretical and expert predictions, we found that only 50.8% of people set the target option as the default, across 11 samples totaling 2,844 participants.
  • In study 2, we found that this default neglect in CA decisions persisted even when people were given multiple opportunities for experience and feedback.
  • In study 3, most CAs revealed incorrect beliefs about how setting a default is likely to affect a CM. Even in cases where CAs were good at systematically using defaults (i.e., in the preselect default game), these decisions did not comport at all with CAs’ beliefs.

Study 1 finds that managers, law/business/medical students, and US adults often fail to understand and/or use defaults, but professionals do score above chance (59%)


Study 2: learning/repeated experiments did not improve the optimal use of defaults

In round 1 (the first round), 33% of CAs used default nudges optimally, which was significantly worse than random chance [2(1)=16:01; p <0:001]. This was qualitatively similar to some of the CA decisions in study 1. In round 20 (the final round), 54% of CAs used default nudges optimally, which was significantly better than round 1 behavior (z =3:89; p <0:001) but not significantly different from random chance [2(1)=0:81; p =0:37].


Study 3: Choice Architect (CA) beliefs about the default effect are far from accurate,

we asked CAs to predict what percentage of CMs would choose each option when it was and was not the default. The difference between these two predictions reveals CAs’ beliefs about the effect of setting a default. (…)

Overall, only 39.6% of CAs had directionally correct beliefs about the default effect (i.e., predicted that more CMs would choose an option if it was the default) (…)

Overall, there was a small but significant positive correlation between whether CAs demonstrated correct beliefs and whether they presented the optimal default (r =0:10; p =0:01).


The fact that many CAs do not even believe that the default effect exists makes it less likely that they would seek out help in their decisions about how to use available tactics.

Long-term incentive structures (…) may help explain why marketing professionals (who are incentivized to influence people toward specific target options) seem to use defaults more than bureaucrats who help develop public policy (who may simply want to give people options that are easy to understand or widely beneficial, without necessarily wanting to influence them toward a specific target option).

The Birds and the Bees – What behavioural science and biology teach us about risk

A lecture advertised as Behavioural science and biology have much to teach us about risk obviously gets my attention.

Below my main take aways on this FCA Insights Lecture (FCA = Financial Conduct Authority in the UK, similar to the AFM in the Netherlands, where I work). Read the full Insight Lecture delivered by Chairman of Oxford Risk, Lord Krebs on 22 November, 2017.

Talk in three parts:

  1.  The biology of risk and decision making.
  2.  The role of regulation versus informed choice
  3. How we might use insights from behavioural science to help consumers of financial products to make better choices. Key point in this final section will be that behavioural science has not displaced classical economic models, but has the potential to enrich our understanding of human decision-making.

So what do biologists have to say about risk?

Biological models derive optimality functions from Darwinian fitness, or a proxy for fitness such as food intake, growth rate, or reproductive success. It can be argued that normative economic models, because they have no external reference point equivalent to Darwinian fitness, have an element of circularity: utility is that which is maximised.

‘Expected energy budget rule

Imagine a choice between two food sources, a ‘safe’ option and a ‘risky’ option. Which one is it better for the organism to choose? The theoretical answer depends on the organism’s internal state. (…) You may not think of peas as being very clever, but they have been equipped by natural selection with mechanisms for detecting nutrient concentration in the soil, a valuable survival device. When a seedling germinates, the young roots grow towards parts of the soil that are rich in nutrients.

Dener, E., Kacelnik, A., & Shemesh, H. (2016). Pea plants show risk sensitivity. Current Biology, 26(13), 1763-1767.

I want to make two general comments about this experiment.

First, the fact that pea plants follow the predictions of a normative evolutionary model of risk underscores the point that a brain, or conscious thought, isn’t needed to make the right decisions. Instead, the species we study rely on rules of thumb that have evolved because they yield the right answer, or at least an approximation to it.

Second, understanding these rules of thumb not only helps to gain insight into differences between the optimal solution and observed behaviour, but could also provide a general theory of decision making that complements and enriches the normative optimality models. There’s a parallel here with the difference between normative economic models and the rules that people actually use to make decisions (what Gerd Gigerenzer calls ‘fast and frugal heuristics’).

It’s easy to see how this finding, if applicable to humans, could be used to manipulate people’s choice of investment portfolios. But also “for good”, an example (from Knoef & Brügen, 2017):

Omdat hun default voor de meeste mensen de beste keuze is, heeft NEST de low risk-optie ‘NEST lower growth fund’ genoemd (in plaats van low risk), terwijl de high risk-optie ‘NEST higher risk fund’ heet (in plaats van high return).

The role of the regulator

If the construction of options for investment can be used to steer people’s decisions, should they be regulated, or is it a case of caveat emptor?

[W]hy regulators may not want to ban things. If you have objective criteria and apply them consistently, you may come up with some unintended consequences. Be careful of what you wish for.

My question, for discussion, is whether or not similar externalities of poor financial decisions by consumers could cause “indirect harms” and therefore justify regulation

labelling is not a panacea for the problem of dietary ill health. Again, I pose a question for discussion: are there parallels here for the labelling and marketing of financial products?

I very much agree with this last remark; warnings, disclosure and more inforamtion are (too) often hailed as the solution to everything.

Helping people to make better financial decisions

Encouraging people to make better choices through nudging, and as an alternative to regulation, has been advocated in recent years for many areas of policy, including financial services. The latest annual report from the Behavioural Insights Team or “Nudge Unit”, published last month, lists their key success stories from field scale trials. (…) On the face of it, these are impressive results, although it remains to be seen how long lasting the effects of nudges are. But even if these success stories are sustained, I think nudging has its limits. (…) nudges are likely to be of limited effect, compared with more interventionist measures such as investment in infrastructure, taxation or regulation. (…) No amount of nudging will compensate for lack of investment in the appropriate infrastructure.

we should not see behavioural science as an alternative to traditional optimisation models. In biology the actual mechanisms by which animals or plants make decisions are seen as complementary to, and not alternatives to, normative optimality models.

awareness of how people actually make decisions must be relevant to the ways in which advice is presented.

Beyond nudge

Our view at Oxford Risk is that the best financial decisions will be made by consumers

  • when they have the relevant knowledge,
  • when they are engaged with the decision and
  • when they feel comfortable about making the decision.

The challenge for the financial services industry is to harness the power of behavioural science to help people to make decisions about their money that

  • will give them what they want,
  • what they need and
  • what they understand.

Consumers and competition: Delivering more effective consumer power in retail financial markets

Triggered by these tweets, I read a position paper by the FS-CP, the Financial Services Consumer Panel (Don’t rely on consumers to boost competition, pretty much sums up the main conclusion) and a ‘think-piece’ Consumers and competition: Delivering more effective consumer power in retail financial markets.

About the FS-CP: We work to advise and challenge the FCA from the earliest stages of its policy development to ensure they take into account the consumer interest.

The think piece was written by Jonquil Lowe and aims to

  • consider possibilities to deliver more effective ‘consumer power’
  • generate a set of real-world metrics that can measure how well markets work for consumers
  • influence FCA thinking on competition and consumer responsibility

using a consumer survey and a literature review (p2).

Current approach: “competition policy focuses heavily on how to make consumers more engaged and closer to the rational decision-makers that traditional economic theory suggests are a prerequisite for well-functioning markets.” (p6). This focus on timing and framing of information “underplays other potential policy options and overlooks the possibility that some seemingly irrational lack of engagement might in fact be rational

Price discrimination

3.12 With ‘third-degree’ price discrimination, consumers do not self-select. Instead the firm has to be able to distinguish groups of consumers who are willing to pay more. (…)

This type of price discrimination is common in the general insurance market, where customers who stay with the same insurer year after year typically pay higher premiums than new customers. Consumers may view this practice as an unfair penalty on loyalty.

However, a report for the Financial Services Practitioner Panel from Clayton et al (2013) documents industry views that, given existing customers can choose to switch if they want to, this form of price discrimination amounts to consumer choice rather than consumer detriment and: ‘there is an inherent value to customers to not having to shop around and transfer and this should be reflected in the price they pay (i.e they should pay more)’ (p.22)

p.13-15: More on price discrimination:

  • A common price strategy is price obfuscation (…) For example:
    • discontinuous pricing occurs where a small change in the consumer’s circumstances or behaviour triggers additional charges, such as a sudden shift from free-if-in-credit current account banking to steep charges for even a small, short-lived unauthorised overdraft.
    • Exit charges that are triggered if a consumer wants to switch their mortgage or investment-type life insurance product may be downplayed at the time of purchase.
    • Firms may ‘game’ any price disclosure rules or price comparison services by keeping down headline charges, but charging extra for product features that might normally be considered integral to the basic product, for example, some insurers charge a fee for administrative adjustments, such as change of address.
  • Bait and switch
  • Opportunism (taking advantage of an external event or requirement to charge higher prices)

Product-feature strategies

  • Product bundling and add-ons
  • Product complexity (eg many characteristics or technically complex; many similar but slightly different products)
  • Product differentiation (genuine or spurious); product differentiation may also be used spuriously to create a degree of market power provided consumers are convinced the differences have some meaning and value. (p.15)
  • Brand and advertising
  • ‘Hollowing out’ (reducing core features or services); stripped-back policies’ (…) akin to the ‘shrinking Mars Bar’ (p.16)

Exploit biases

Firms can turn System 1 thinking to their advantage by providing consumers with cues designed to influence heuristic-based decisions. (p.17). A few are mentioned (inertia, framing, present bias, anchors, salience), but I feel this chapter does not provide convincing evidence for the conclusion on p.52: “Consumers are prone to behavioural traits that get in the way of their ability to drive competition and may be deliberately exploited by firms.”

Barriers to competition


“[C]ompetition regulators may be underestimating the extent to which consumers’ decisions to occupy the Repeat-Passive space could be a rational choice rather than the result of behavioural biases that need to be changed or harnessed (…) : the wide nature of search and switching costs; and satisficing. (p.29)

Enhancing consumer power

Building on Grubb’s (2015) policy types, the following policy options are considered:

Simplify choice: This could entail policies to simplify products, information and process.
‘Advice’: Policies that provide or facilitate expert advice to consumers, such as, comparative information from regulators and price comparison websites and requirements for firms to share customer data with these services.
Choose for the consumer: Policies that involve consumers delegating choice decisions to a third party, and
Other policies: These include the type of behaviourally informed measures being deployed by the FCA, such as smarter information that is better framed and more timely, as well as triggers, incentives and financial education. (p.33)


Automated shopping-and-switching

6.24 To summarise, automated shopping-and-switching has three novel aspects:

• It essentially digitises the role of a human broker or adviser, but going further than price comparison websites by taking full account of product and service features other than just price.

• It changes the default from status quo to switch if a consumer gain is identified, and

• It passively gets on with repeating the process as necessary, releasing consumers from the merry-go-round of repeated active engagement across multiple financial and household markets. (p.40)

Meaningful metrics

Switching data is a particularly contentious measure because the rational outcome of shopping around might be not to switch if expected benefits do not outweigh costs. Moreover, as a market approaches the ideal of perfect competition, the necessity for, and pay-off from, switching would decline. (Paradoxically, this would also reduce the incentive for consumers to shop around even though continued shopping around would be required to maintain perfect competition.) (p.49)


8.2 However, the expectations placed on consumers [reliance on rational, active consumers to drive competition by shopping around] look unrealistic because:

  • Perfect competition seldom, if ever, exists in reality. In most retail financial services markets, it is in firms’ interests to create and maintain market power. They do this through strategies, such as price discrimination, price obfuscation, product bundling and complexity and promotion of brands. The result is markets that are overly complicated and products that are difficult or impossible to compare. Even the most financially capable consumers face a battle to find value-for-money in markets like these.
  • Price comparison websites are designed to help consumers make product comparisons, but often focus too heavily on headline price, ignoring other essential factors, such as product features and quality of service. Firms can ‘game’ price comparison sites by ‘hollowing out’ products and using a variety of ancillary charges.
  • Consumers are prone to behavioural traits that get in the way of their ability to drive competition and may be deliberately exploited by firms. Competition regulators have a growing understanding of these traits, treating them as a new type of barrier to competition, and seeking either to alter consumer behaviour or adopt policies that work with the grain of consumers’ actual behaviour. However, the aim of these policies is still to foster more active shopping around and switching.
  • Competition regulators may be misinterpreting widespread consumer decisions not to engage with shopping around as behavioural barriers, when in reality they may be rational choices based on consumers’ preferences about how they wish to spend their time and mental effort.


Gelezen in ESB: Het effect van framing op pensioenperceptie

In ESB van 12 oktober 2017 stond een artikel over consumentenonderzoek van Prast (ex DNB) en Teppa (DNB) onder 1034 mensen van het CenterPanel: Het effect van framing op pensioenperceptie [pdf].

Vier random groepen kregen op verschillende wijze een vervangingsratio van 50% bij pensionering te zien en hen werd gevraagd: wat is oordeel om bij pensioen van rond te komen? zeer onvoldoendeonvoldoendevoldoenderuim voldoende. Het goede antwoord is (zeer) onvoldoende

Het %frame luidde: “U krijgt 50% van uw huidige bruto-inkomen”. Andere frames: bedrag in euro per jaar, bedrag in euro per maand, en “0,5 maal uw huidige inkomen

Percentage werkt beter
Respondenten in het percentage frame de verwachte pensioenaanspraak significant vaker als onvoldoende of zeer onvoldoende beschouwen (…) Een aanwijzing dat communicatie in euro’s kennelijk onduidelijk is.” concluderen Prast en Teppa.

Want in het %-frame zegt 82% (zeer) onvoldoende (oranje omcirkeld in tabel hieronder); in de andere frames (blauw omkaderd) zegt 70% (zeer) onvoldoende.


Prast en Teppa extrapoleren: “Als een percentageframe leidt tot meer duidelijkheid (‘bewust­zijn’) bij de deelnemer over de mate waarin zijn pensioen­aanspraak voldoende is, kan de pensioenuitvoerder, met slechts weinig moeite, zijn bereik van de deelnemer verbe­teren en zo voldoen aan het doel van de Wet pensioencom­municatie.”

Op zich ondersteunen de uitkomsten wel deze conclusie, maar ik vind het nogal stellig. En intuïtief vreemd, en niet helemaal in lijn met andere onderzoeken waarin euro’s en %’s vergeleken worden (vaak vindt men euro’s duidelijk).

Prast en Teppa maken zelf ook deze kanttekening: “Een kanttekening daarbij is dat wij slechts heb­ben gekeken naar een vervangingsratio gelijk aan de helft van het huidige inkomen. Nader onderzoek is nodig om te zien of het framingeffect stand houdt bij hogere en lagere vervangingsratio’s.

Ik vind het opvallend dat de 50%-conditie (“50% van uw huidige bruto-inkomen”) zo anders scoort dan de 0,5 conditie (“0,5 maal uw huidige inkomen”). In het %-frame antwoordt 82% (zeer) onvoldoende, in het decimaal-frame is dat 67%.

Het biedt wel de kans om makkelijk te repliceren. Voor de euro-condities moet je het inkomen van de respondent weten, voor het decimaal-frame niet.

Mogelijk dat we in de komende AFM Consumentenmonitor onder steekproef van N=800 die representatief is voor Nederland 6 random groepen gaan ondervragen (2×3 opzet); met 2 soorten frames (% of decimaal) en 3 waardes: 50%/0,5 – 70%/0,7 – 90%/0,9.

De vraag zou dan luiden voor 50%/0,5 frames:

Stel, u krijgt de volgende informatie over uw toekomstige pensioen:
Als u tot uw pensionering blijft werken, kunt u vanaf het bereiken van de pensioenleeftijd het volgende pensioen verwachten:

Frame A: 50 procent van uw huidige bruto-inkomen
Frame B: 0,5 maal uw huidige bruto-inkomen.

Geef aan in welke mate u dit pensioen voldoende of onvoldoende vindt om van rond te komen. Laat hierbij het eventuele inkomen van uw partner buiten beschouwing.
1. Ruim voldoende
2. Voldoende
3. Onvoldoende
4. Zeer onvoldoende
5. Weet niet

Andere reacties
Guus Pijnenburg is minder enthousiast:


Werkt de wildwestwaarschuwing wel? Onderzoek naar de vrijstellingsvermelding Let op! U belegt buiten AFM-toezicht

In het september nummer van het Tijdschrift voor Toezicht beschrijven Nynke van Egmond- de Boer en ik (team Consumentengedrag) een onderzoek naar de vrijstellingsvermelding Let op! U belegt buiten AFM-toezicht. In experimenten in het AFM Consument&Panel leidt afwezigheid van deze waarschuwing tot een hogere investeringsintentie, vooral bij vermogende respondenten die de melding niet kennen.


Voor beleggingsaanbiedingen aan particulieren die buiten toezicht vallen, geldt sinds 2012 een verplichte vrijstellingsvermelding in een vast formaat. In een gerandomiseerd experiment tonen we aan dat afwezigheid van de melding ‘Let op! U belegt buiten AFM-toezicht. Geen vergunning- en prospectusplicht voor deze activiteit’ leidt tot een hogere investeringsintentie (het effect op het daadwerkelijke gedrag is niet gemeten), vooral bij vermogende respondenten die de melding niet kennen (42 procent van de populatie).

Dit lijkt in lijn met een doel van de waarschuwing, namelijk consumenten beschermen door ze te wijzen op hun grotere eigen verantwoordelijkheid bij deze producten.

Met dit gerandomiseerde experiment laten we zien dat het nuttig en uitvoerbaar is voor toezichthouders om hun interventies te testen op effectiviteit.

We hebben onderzoek gedaan met een experimenten onder respondenten met meer dan €100.000 vermogen. Dat zijn potentiële beleggers want een van de mogelijke gronden voor een vrijstelling is namelijk als de waarde van de effecten ten minste €100.000 per belegger bedraagt (zie voor meer informatie). De €2,5 miljoen vrijstelling wordt verhoogd naar €5 miljoen. En vanaf 1 oktober 2017 geldt een meld- en informatieplicht voor aanbieders van vrijgestelde beleggingen.

Werkt als je ‘m niet al kent
In algemene zin was de koopintentie in het experiment laag. Toch vonden we een significant effect van aan/afwezigheid van de vrijstellingsvermelding; vooral respondenten die de waarschuwing nog niet kenden, gaven een minder lage koopintentie aan.


In dit experiment hebben we niet daadwerkelijk gedrag bestudeerd, wat we bijvoorbeeld wel gedaan hebben bij het onderzoek naar de kredietwaarschuwingszin; ‘Let op! Geld lenen kost geld’ geen onmiddellijk effect in verkoopomgeving (december 2016).

Wil je het hele artikel hebben? Stuur me een mailtje of download hier Werkt de wildwestwaarschuwing wel_TvT.

TIBER 2017 Symposium #TIBER2017

The TIBER 2017 Symposium on Psychology and Economics took place on August 25, 2017 in Tilburg [full program, my tweets from that day]. Interesting day, and good to see some people from the financial industry (ING, Rabobank).

Keynote Ralph Hertwig (from the Gigerenzer-school) kicked off with a talk on Preferential Heuristics, Uncertainty and the Structure of the Environment.

He started by quoting a 1967 paper Man as an intuitive statistician where Peterson and Beach argue that “Man gambles well”. But the Tversky and Kahneman-paradigm a couple of years later proved more influential, puzzling Hertwig.

Description vs. Experience
Tversky and Kahneman were influential with tasks where risks where described, instead of uncertainty experienced.

Hertwig shows figures from A meta-analytic review of two modes of learning and the description-experience gap (2016) DU Wulff, M Mergenthaler Canesco, R Hertwig.

Especially with low, true probabilities, there are large effects between answers to a choice formulated descriptively, versus the choice made after experiencing pay-offs. (DU =  discrete underweighting). From Wikipedia: “in experienced prospects, people tend to underweight the probability of the extreme outcomes and therefore judge them as being even less likely to occur.”


Adaptive decision maker has to make a trade-off between accuracy-effort. So heuristics can be effective, says Hertwig.

Hertwig on risk communication:

And research methods (why do adults score worse than babies or chimps?)

Parallel sessions
First, I went to Ozan Isler; Honesty, Cooperation & Social Influence, who:

we present a new mind-game that is powerful enough to measure honesty at the
individual level and fast enough to be implemented online. The game consists of forty rounds. In each round, the participant is first asked to think of a number between 0 and 9, then shown a single-digit random number, and finally asked to report whether the two numbers match.

Then I saw a talk about changing faces with FaceGen on a trustworthiness scale and playing a dictator game. The cool thing was the stopping rule: they started with N=30 participants, and would add N=5 until BayesFactor was either > 3 or < 1/3.

Next, I saw Tony Evans: The reputational consequences of generalized trust

Final morning talk: Stefan Trautmann – Implementing Fair Procedures? “We find that unfair outcomes are acceptable for the agents if procedures are perceived as fair. However, with opaque allocation decisions, it may be difficult to commit to fair allocation procedures. Indeed, we find a very high degree of favoritism by the decision
makers when they are forced to allocate unequal outcomes, and have no fair (random) procedure available

During lunch I read two interesting posters, one by my DNB-colleague Carin van der Cruijsen on DNB Working Paper No. 563: Payments data: do consumers want to keep them in a safe or turn them into gold?

And a poster by a collaborator of Stefan Zeisberger (Nijmegen), whom we’ll also collaborate with. Basically, people do trade on their beliefs if they are in the plus (have gained), but not/less so when they are in the red.

Afternoon sessions
I attended this talk because it used Bayesian statistics, but the unfortunately did not feature very much in the presentation: Peer effects on risky decision making from early adolescence to young adulthood: Specificity and boundary conditions.

Then Less Likely Outcomes Are Valued Less by Gabriele Paolacci (who is male, incidentally): “We found that people value the gift card less when its availability is uncertain.” Somewhat interesting, not very applicable. This effect might counter the scarcity-argument in marketing a bit, but doesn’t seem likely.

Final talk was Jan Stoop with a replication of The Rich Drive Differently, a Study Suggests (2013). Stoop et al found nothing, across wide range of settings and with 2.5x N of original study.

After tea
First up: Financial Incentives Beat Social Norms: A Field Experiment on Retirement Information Search [Netspar presentation, SSRN). Presentation by Inka Eberhardt. I had seen this work before, then presented by co-author Paul Smeets. It is a really big RCT (N=250,000) RCT, sending letters to get them to sign in to their personal webpage at their pension fund. Q&A was a bit disappointing, I didn’t think the answers were particularly strong.

Pollmann’s talk Let’s talk about money: Attachment style, financial communication, and
financial conflict concluded with a suggestion to Nibud for a web tool Let’s talk about money.

Final talk was Diffusion of culpability in reparations behavior. Results for a novel task were presented, fMRI results are forthcoming.

TIBER 2017 was concluded by Bertil Tungodden’s keynote: Fairness and Redistribution: Experimental Evidence.

Tungodden presented on a paper that is nicely summarized in this HBR article: Is It OK to Get Paid More for Being Lucky?

The Rise of Behavioural Discrimination & Virtual Competition

This blog post Big data and first-degree price discrimination (thanks Patricia) led me to the work of Ariel Ezrachi and Maurice Stucke. As Silvia Merler writes:

[Ezrachi and Stucke] argue that online behavioural discrimination will differ from the price discrimination we have seen in the retail world in three important respects:

  1. Big data allow the shift from third-degree, imperfect price discrimination to near perfect price discrimination;
  2. Sellers can use big data to target consumers with the right “emotional pitch” to increase overall consumption (the demand curve shifts to the right)
  3. As more online retailers personalise pricing and product offerings, it will be harder for consumers to discover a general market price and to assess their outside options, thus implying that behavioural discrimination becomes more durable.

Ezrachi and Stucke published a book in 2016: Virtual Competition (on my to read pile, reserved it at the University Library; book’s webpage also contains a lot of extra info/links).

Behavioural discrimination
I did read their paper The rise of behavioural discrimination (37 European Competition Law Review 484 (2016)).

New dynamics that reduce our welfare? (…) Our article explores how e-commerce and the personalisation of our online environment can give rise to behavioural discrimination, a durable, more pernicious form of price discrimination.”

I. Near perfect price discrimination

Third-degree price discrimination, which involves the charging of different prices to different groups. The price can depend, among other things, on your location (i.e. where you live), your age, or your sex. Cinemas, bus services, and restaurants, for example, may charge adults higher prices than children, students or senior citizens.

By contrast, in this article, our focus is on the possible shift to perfect, or first-degree, price discrimination—where firms can identify and charge for each individual the most he or she is willing to pay, i.e. the reservation price.

“Big Data, learning by doing, and the scale of experiments come into play to better approximate your reservation price.”

“In this data-driven economy, the algorithm—to maximise profitability—will estimate the likelihood of our shopping elsewhere or being aware of better deals and accordingly provide us with a convincing sales pitch.” (e.g. coupons and promotion codes for customers more sensitive to outside options, i.e. more price-sensitive customers who are likeley to compare option, more sophisticated consumers. Naieve consumers can be exploited more efficiently).

II. Shifting the demand curve to the right

Sellers using our personal data to induce us to buy more products or services than
we otherwise would have purchased.

A few consumer biases, which firms may exploit to promote consumption:

  • Use of decoys
  • Price steering, e.g. On Orbitz, Mac Users Steered to Pricier Hotels
  • Increasing complexity; facilitate consumer error or bias and manipulate consumer demand to their advantage (…) companies can, by designing the number and types of options they offer, better exploit consumers’ cognitive overload. In increasing complexity, the firms can also increase consumers’ search and switching costs, thereby reducing the visibility (and attraction) of outside options, and giving them more latitude to exploit consumers.
  • Imperfect willpower “framing effects” (how the issue is worded or framed) do matter. Credit cards are one example. Here they cite a Dutch study The abolition of the No-discrimination Rule from 2000 (!) with N=150 consumers (!) surveyed. Dutch
    merchants could impose surcharges or offer discounts based on how the customer was going to pay. Of the consumers surveyed, 74% thought it (very) bad if a merchant asked for a surcharge for using a credit card. But when asked about a merchant offering a cash discount, only 49% thought it (very) bad. A weak spot in an excellent paper.

The road to near-perfect behavioural discrimination will be paved with personalised coupons and promotions: the less price-sensitive online customers may not care as much if others are getting promotional codes, coupons, and so on, as long as the list price does not increase. (p.488)


Another way to frame behavioural discrimination in a palatable manner is to ascribe the pricing deviations to shifting market forces. Few people pay the same price for corporate stock. They accept that the pricing differences are responsive to market changes in supply and demand (dynamic pricing) rather than price discrimination (differential pricing). So once consumers accept that prices change rapidly (such as airfare, hotels, etc.), they have lower expectations of price uniformity among competitors. One hotel may be charging a higher price because of its supply of rooms (rather than discriminating against that particular user). (…) Thus, we may not know when pricing is dynamic, discriminatory, or both.


III. The durability of behavioural discrimination

it will be harder to know what others see. (…) As personalised offerings increase, search costs will also increase for consumers seeking to identify the “true” market price.

Behavioural discrimination—while not always possible—could occur more often than we expect. Furthermore, as we shift more of our activities to a controlled online ecosystem, it is likely to intensify.

The power to discriminate may be curtailed by possible pushback from consumers (I personally doubt it).

Price comparison websites may foster, rather than foil, behavioural discrimination and switching costs may be higher than one assumes, despite perceived competition being only a click away. (from the footnotes related to this quote: As more consumers rely (and trust) an intermediary to deliver the best results (whether relevant results to a search query or array of goods and services), the less interested they become in multi-homing—that is, from checking the availability of products and prices elsewhere. And: many users who indicated that when a search result is fails to meet their expectations they will “try to change the search query—not the search engine.”


IV. The welfare effects of behavioural discrimination

sellers can manipulate our environment to increase overall consumption, without necessarily increasing our welfare.

Once one accounts the consumer perspective, the social welfare perspective, and the limited likelihood of total welfare increasing, behavioural discrimination is likely a toxic combination. Moreover, behavioural discrimination may blur into actual discrimination due to the limits and costs of refined aggregation.

The worrying thing is that we (and the enforcers) may not even know that we are being discriminated against. Under the old competitive paradigm, one might suspect one was discriminated against if access was inexplicably denied (e.g. restaurants for “whites only”) or was charged a higher price based on this single variable. Under the new paradigm, users may not detect the small but statistically significant change in targeted advertisements (or advertised rates).



As pricing norms change, price and behavioural discrimination eventually may be accepted as the new normal. Just as we have accepted (or become resigned to) the quality degradation of air travel, and the rise of airline fees—from luggage to printing boarding passes—our future norms may well include online segmentation and price discrimination.

The costs can be significant. The new paradigm of behavioural discrimination affects not only our pocketbook but our social environment, trust in firms and the marketplace, personal autonomy, privacy and well-being.


Some other relevant links: