Het meten van effecten van de handhaving door de Belastingdienst

In het laatste nummer van het Tijdschrift voor Toezicht van 2016 schrijven drie medewerkers van de Belastingdienst over effectmeting (“een onmisbaar element van ‘goed toezicht’”): “Centraal in dit (beschrijvende en verkennende) artikel staat de vraag hoe de Belastingdienst de effecten van zijn handhavings- en toezichtactiviteiten meet en wat de uitdagingen hierbij zijn.” (p9)

Het meten van effecten van de handhaving door de Belastingdienst (2016) Sjoerd Goslinga, Maarten Siglé en Lisette van der Hel [pdf]

“Met effectmeting vindt de beoordeling plaats of het uitvoeren van de handhavingsactiviteiten daadwerkelijk de determinanten van compliance heeft beïnvloed en of dit vervolgens effect heeft gehad op de compliance” schrijven Goslinga et al.

Bijvoorbeeld: Aangiftecampagne om de compliance (tijdig aangifte doen, voor 1 april) van burgers te verhogen door middel van voorlichting.

bdiensteffect

‘Outcome’ (=effect) representeert in deze effectketen de uiteindelijke impact van de activiteiten van de Belastingdienst op zijn strategische doel: compliance. ‘Output’ (=resultaat) daarentegen is datgene wat door de inspanningen van de Belastingdienst is geproduceerd (zoals het aantal verstuurde brieven of uitgevoerde controles. Het uitvoeren van handhavingsactiviteiten wordt ook wel omschreven als een interventie. In termen van de effectketen gaat het hier om input, proces en output.

De auteurs zijn eerlijk (en reëel) over de stand van zaken met effectmeting:

 

Naar onze mening is de kern van het probleem dat belastingdiensten überhaupt niet gewend zijn om effecten te meten, maar zich vooral beperken tot output omdat dat veelal gemakkelijker vast te stellen is dan effecten.

Uitdagingen
De auteurs noemen vijf uitdagingen voor effectmeting:

  1. Expliciteren aan hoe activiteiten bijdragen (de inzet van mensen en middelen) aan het realiseren van de doelstellingen. Bijvoorbeeld met een doelenboom.
  2. Vinden van de juiste achterliggende oorzaken van (non-)compliance
  3. Meten van effecten van preventieve activiteiten – “Nadenken over nieuwe soorten indicatoren, die verder weg lijken te staan van de outputindicatoren waar belastingdiensten voorheen sterk op stuurden.”
  4. Opzetten van methodologisch verantwoord onderzoek

    Om vast te stellen in hoeverre inspanningen van de Belastingdienst bepalend zijn (geweest) voor dat nalevingsniveau is het noodzakelijk om een vergelijking te maken met het nalevingsniveau in een situatie waarin de inspanningen niet zouden zijn geleverd (counterfactual). (…) Het ideale onderzoeksdesign is de zogenoemde randomized controlled trial of gecontroleerd veldexperiment (p26)

    Er zijn veel situaties denkbaar waarin het niet mogelijk is een hoger niveau van onderzoeksvaliditeit te bereiken; de eerste twee niveaus kunnen dan zeker van toegevoegde waarde zijn. (p25)

  5. Organisatorische inbedding van effectmeting.

    Een uitdaging voor veel toezichthouders is om effectmeting op een structurele manier te borgen in de organisatie en onderdeel te maken van de manier van werken.

 

Effectmeting is een continu proces omdat het bij het realiseren van de (algemene) beleidsdoelstelling gaat om een ‘duurzame’ verandering in het gedrag van belastingplichtigen en de borging van de continuïteit van belastingopbrengsten.

Spanish regulation for labeling of financial products: a behavioral-experimental analysis

Spanish regulation for labeling of financial products: a behavioral-experimental analysis – Y Gómez, V Martínez-Molés, J Vila – Economia Politica, 2016 [Pdf].

Abstract 

This paper assesses the impact of the Spanish Ministry of Economy and
Competitiveness’ (Board of Executives (BOE) Order ECC/2316/2015. Economy
and Competitiveness Ministry, Spain, 2015) new regulation for financial product labeling.

We design and conduct an economic experiment where subjects make risky investment decisions under three different treatments: a control group where subjects have only objective information about the key features of the products they must select and two treatment groups introducing visual labels resembling the labels required under the new Spanish regulation. The results of the experiment are analyzed within the framework of rank-dependent utility theory.

While visual labels do not change the utility function of the subjects, they do significantly affect the subjects’ weighting functions. The introduction of numerical and color-coded labels significantly increases the concavity of the weighting functions and increases pessimism and risk-aversion in cases where the probability of obtaining the best outcome is high.

Labels widen the difference between real subjects’ behavior and that of the perfectly rational agents described by expected utility theory. Consequently, our empirical findings raise doubts as to whether the new regulation actually achieves its objectives.

The regulation seeks to empower retail investors by enhancing their understanding of financial products. Introducing the visual labels, however, seemingly increases the differences between actual risk levels and the decision weights applied by subjects when making decisions.

Moreover, labels increase investors’ pessimism and risk-aversion when the best outcome is likely and fail to alter investors’ risk-aversion when the worst outcome is likely.

giminez

Method was at times too complicated for me (utility & weighting functions), but interesting outcomes nonetheless:

  • Consequently, our empirical findings raise doubts as to whether the new regulation actually achieves its objectives.
  • In summary, visual labels affect subjects’ understanding of risk levels. Visual labels cause subjects’ understanding to diverge from that of perfectly rational agents. Furthermore, labels make subjects more risk averse in cases where the probability of the best output is high.
  • The behavioral experiment presented in this paper shows that the labels proposed under the new regulation are seemingly a long way from achieving their goal. Taking decisions made by the rational agents described in rational choice theory as a benchmark, our experiment shows that both graphical and numerical labels actually worsen subjects’ decision-making. Introducing labels makes retail investors’ decisions less rational.
  • The practitioners claimed that introducing labels has increased the perception of risk associated with the safest products (for instance, bank deposits), mainly among investors with low financial literacy.

Disclosure and warnings are often employed as the solution for everything (market failures). Important and good that such interventions are also measured and assessed on merit. The proof of the pudding is in the eating.

Minimum Payments and Debt Paydown in Consumer Credit Cards

Ben Keys and Jialan Wang have a working paper called Minimum Payments and Credit Card Paydown. Most of my summary below are copy/pasted sentences from the paper.

Abstract

Using a dataset covering one quarter of the U.S. general-purpose credit card market, we document that 29% of accounts regularly make payments at or near the minimum payment. We exploit changes in issuers’ minimum payment formulas to distinguish between liquidity constraints and anchoring as explanations for the prevalence of near-minimum payments. At least 10% of all accounts respond more to the formula changes than expected based on liquidity constraints alone, representing a lower bound on the role of anchoring.

Using a back-of-envelope calculation, we estimate that anchoring consumers would save at least $570 million per year in interest charges if all issuers adopted the highest observed minimum payment formula in our sample.

Disclosures implemented by the CARD Act, an example of one potential policy solution to anchoring, resulted in fewer than 1% of accounts adopting an alternative suggested payment. Our results show that the design and salience of contract terms in credit products have significant impacts on household balance sheets.

Keys and Lang  position their paper as “the first empirical study to estimate the economic signicance of anchoring in the credit card market”; “Because the minimum payment is a lower bound on the optimal payment amount for the vast majority of consumers, anchoring would downwardly bias payment amounts and lead to suboptimally high debt levels, lower average consumption, and greater consumption volatility for affected consumers.”

They used the CFPB Credit Card Database (CCDB), that covered February 2008 to December 2013, and the issuers in the full dataset comprise over 85% of credit card industry balances. Based on a 1% random sample with about 40 million observations, they analysed three questions:

  1. Who pays the minimum? “We find that 29% of accounts pay exactly [9%] or close to (i.e. within $50 of) [20%] the minimum in most months. (…) Either many consumers are liquidity constrained at amounts that happen to be near the minimum, or that repayment decisions are in influenced by anchoring.”In the 1970s, typical minimum payments were about 5% of the outstanding balance. By the 2000s, the average minimum payment had fallen to 2%.

    Payment behavior is highly persistent over time both within and across accounts, it is only weakly correlated with traditional proxies for liquidity constraints.

  2. Minimal payments due to anchoring? “Taking advantage of the fact that several issuers changed their minimum payment formulas during the sample period. allows us to estimate the fraction of anchoring consumers by measuring before and after formula changes. using a dierence-in-dierences approach we nd that 9 to 20% of all accounts changed their payments by more than the mechanical effect alone.”At least 22% of accounts payed close to the minimum and at least 9% of all accounts anchor to the minimum payment. Estimated range is between 22% and 38%. Notably, the behavioral response is consistent, yielding a signicant fraction of anchoring consumers in response to both minimum payment increases and decreases. Consumers’ repayment choices are sensitive to changes in minimum payment formulas.
  3. Did the CARD-act nudge work? “”Nudges” that encourage higher payments; they measured the effect of one such disclosure required by the Credit Card Accountability Responsibility and Disclosure (CARD) Act of 2009. The disclosure was mandated on more than half of all statements, and presents a calculation of the payment needed to amortize the outstanding balance in three years.” Figure 7 below shows what the disclosure looks like. “Fewer than 1% of accounts adopt the three-year repayment amount (…) a prominent policy change aimed at de-biasing consumers failed to yield a large economic effect relative to the influence of anchoring.”

fig7keysjang

We interpret the fraction of accounts that adopt the three-year repayment amount as an estimate of the ability for mandated disclosure to establish new anchors for consumer payments. The regulation specified that consumers who paid their balances in full for two months in a row and those whose minimum payments are higher than the three-year repayment amount are exempt from the disclosures.

Panel B of Figure 8 (see below) presents the difference-in-differences results around the implementation date. There are no pre-trends in the period prior to the implementation of the disclosure, in large part because very few consumers actively chose the three-year repayment amount in the absence of the disclosure.

keysJangFig8

In the five months following the CARD Act, we observe a sharp increase in the share of accounts paying the three-year disclosure amount. Although the economic impact is small, with treatment effects of less than 1%, the effect is statistically significant. Another trend visible in the figure is a deterioration of the effect of the disclosure over time. One reason for the decline in the disclosure’s effect could be habituation as consumers become accustomed to seeing the disclosure and “tune out” after its novelty wears off. We use this medium-run effect of 0.5% as the benchmark estimate of the disclosure’s overall impact.

Economic significance
Assuming that 0.5% of consumers who adopt the three-year payment amount would have otherwise made the minimum payment, we find that the disclosures led to an $0.18 per month increase in payments averaged across all accounts. We estimate that the disclosures saved consumers $62 million in interest charges in 2013.

If the disclosures had instead caused all anchoring consumers (estimated range between 22% and 38%) to move from the minimum payment to the three-year payment amount, we find that the interest savings in 2013 would have been two orders of magnitude larger, between $2.7 and $4.7 billion. The effect of the disclosures is substantially smaller than the economic role
of anchoring.

Conclusion
The modest effects we document of the CARD Act disclosures illustrate the challenges of changing real-world behavior using traditional forms of disclosure.

The answers to the 3 questions:

  1. Who pays the minimum? 29%
  2. Minimal payments due to anchoring? 22% – 38%
  3. Did the CARD-act nudge work? Yes, but very little (<1%)

From advert to action: behavioural insights into the advertising of financial products

The Financial Conduct Authority (FCA) published Occasional Paper No. 26 on April 12th 2017: From advert to action: behavioural insights into the advertising of financial products. It was written by Paul Adams and Laura Smart.

Laura Smart also wrote an Insight (FCA’s term for a blogpost I suppose): Economical with the truth: three ways behavioural science can help to spot a misleading advert. Add there is an infographic.

And on June 29th, they have a nice event for regulated firms. Experts Rory Sutherland and Joe Gladstone will present, as will Laura Smart and the FCA Financial Promotions team.

Summary

How are we affected by financial advertising? What do we pay attention to and when might we be misled? We explore the science of advertising to answer these questions. Building on earlier FCA work into behavioural biases, we summarise a large body of academic literature to explore the mechanisms behind consumer attention, understanding, and behaviour. We build this into a framework for understanding how consumers process information in the form of advertisements, divided into three stages: See, Interpret and Act. We then apply our findings in a novel setting: explaining what the science says about when an advert may be unclear, unfair or misleading.

In See, we find that attention may be predicted by the relative salience of information and is also affected by consumers’ motivation and intentions; for example, those searching for a house are more likely to notice mortgage deals.

In Interpret, we find that certain ways of presenting information, particularly those which make use of behavioural biases or which involve percentages may impede understanding and have the potential to mislead consumers in certain circumstances.

In Act, we see that consumers may be influenced into action through techniques which encourage reliance on heuristics or emotion, rather than reason, and that this may cause problems.

 

What is advertising for? (p6)

  • Marketing professionals point to the role of advertising in changing customers’ preferences or improving their brand recognition
  • Psychologists and behavioural scientists argue that advertising aims to prime potential customers to buy products when opportunity presents itself.
  • Economic approach
    • persuasive; that it altered consumers’ tastes and created (potentially spurious) product differentiation and brand loyalty.
    • informative; advertising helped to solve the problem that it is costly for consumers to search for products by providing information directly and efficiently.
    • complementary to the advertised product; that it does not change views or provide information, but simply enhances the existing features of a product.
  • Traditional approach: AIDA model attention, interest, desire and action. “require a high level of cognitive involvement, which does not necessarily concur with the behaviour we see
  • Behavioural approach (used by this FCA paper): how advertising draws on inbuilt psychological mechanisms, invokes our emotions, changes our preferences and invites automatic responses, as well as tells us a story.

 

How do we process adverts?

  1. See: getting our attention
    • Salience (“bottom-up attention”); size, colour, incongruities, pictures, music , language (e.g. personalised, or containing signal words such as “danger” or “warning” [Wogalter et al (2002) Research-based guidelines for warning design and evaluation])
    • Motivation (“top down attention”); people are also affected by their current circumstances; what they are thinking and feeling at the time in which they come across an advert. This highlights the importance of considering context and possible effects of priming in assessing consumer responses.
  2. Interpret: reaching an understanding
    • Numbers: “People are highly likely to make systematic errors when processing numbers”, especially with percentages. Or availability bias. “When it comes to communicating risk, comprehension may be reduced still further”. I fully endorse the FCA’s recommendation of David Spiegelhalter (@d_spiegel) and Gerd Gigerenzer’s work (see this book review for summary Simple Heuristics That Make Us Smart).
    • Framing: such as playing to loss aversion, tinkering with the choice-set (e.g. decoy effect), defaults (save lives), anchoring, and drip pricing:

      “Another way to present costs in a way that makes them seem less unattractive is to present the first cost and then add additional or optional costs later (such as adding sales fees, platform fees and termination fees for investments after presenting the initial cost; OFT, 2010, Advertising of Prices). Because the customer is already psychologically invested in the purchase by this point, they are less likely to back out when the further costs appear.”

    • Words and truth. I had never heard of Gricean Maxims, named after Paul Grice, who described conversational implicature in a piece called Logic and Conversation (1975).

      Omissions and caveats which lead to false impressions are often called “pragmatic implications” (see Gricean Maxims box). Common examples include:
      * two juxtaposed phrases which imply a causalrelationship: “You want only the best. Buy brand X”,
      * hedge words such as “may”,
      * comparative adjectives: “Gives you more rewards”, and
      * piecemeal survey results: “Better than Competitor A on price, better than Competitor B on coverage”.
      (…)
      It may be helpful to consider pragmatic implications in understanding what consumers take away from advertisements and to pay attention not only to what is said, but also how it is said. Even if the words are literally true, the message that the customer takes away could be incorrect.

      Adams en Smart conclude on #2 Interpret: “Techniques such as framing and pragmatic implications affect what consumers take away from an advertisement, which may be a different impression from what the words literally say.” (p26)

  3. Act: being influenced
    Consumers may be influenced to purchase products through appeals to emotion or the use of principles of influence, such as reciprocity or scarcity.

    • Emotion (“affect”). Possible counters:
      • “cooling on” periods, customers need to actively do something to complete the decision and activate the product. This provides a pressure-free period in which the customer can stop and think
      • pop-up warnings during purchase processes, (…) To test comprehension directly, it would even be possible to ask mandatory questions to check that a customer has understood what they are buying.
    • Influence. The Cialdini 6: Liking, Authority, Scarcity, Social Proof, Consistency, Reciprocity.

 

On Targeting & Timing

Targeting: now easier than ever to target adverts to consumers based on data about them. Two important considerations:

  1. As choices become more tailored to the individual’s current preferences, the individual is less likely to discover new preferences . They may even develop a distorted knowledge of what products are actually available.
  2. Data about consumers may be used to target those in particular circumstances, for example, those in debt or those who enjoy gambling, which could be detrimental to customers who are less able to ignore poor value or risky offers (Ronson, 2005 Who killed Richard Cullen?).

On Timing (p15): I tweeted the Ellering 2016 reference. He also quotes Dan Zarella who, in my experience, also has good, data-backed advice on how to get retweets or mail opens. Blog is not updated much though.

 

FCA regulation of Financial Promotions

The overarching principle [of the FCA] is that financial promotions must be clear, fair and not misleading.” (p4)

The FCA already requires that all relevant product information, including risk warnings and key exclusions, is sufficiently prominent” (p15)

The FCA published guidance on social media and customer communications in 2015 which explained that shorter adverts, including tweets, should still be standalone compliant (clear, fair and not misleading) without the need for users to click on a link to see balancing information or caveats (Financial Conduct Authority, FG15/4, 2014). However, as part of the Smarter Consumer Communications initiative, the FCA is undertaking further work to explore alternative approaches to firms’ communications through social media (Financial Conduct Authority, 2016). (p24)

 

Where to draw a line? What is acceptable advertising?
When is a sell too hard? When does selling become misselling? (In Dutch: the difference between “verleiding” & “misleiding”).

“What is the difference between unethical and ethical advertising? Unethical advertising uses falsehoods to deceive the public; ethical advertising uses truth to deceive the public.”

Vilhjalmur Stefansson, explorer and ethnologist (p16)

It is difficult to find a suitable way to measure when techniques might be unfair. Is it better to measure consumer understanding of their products, the decision making process of the consumer or the literal interpretation of the rules? In practice, it might be appropriate to take all of these factors into account. (p33)

For example, the UK Advertising Standards Authority recently adjudicated a case in which a company sent out marketing material in white windowed envelopes and found that the envelope breached the CAP code by making it insufficiently clear that the direct mailing was a marketing communication before opening it (Advertising Standards Authority, 2017). (p13)

Dispositional Greed (paper)

In 2015, two papers came out with exactly the same title: Dispositional Greed. Here, I focus on the paper by Seuntjes et al (Tilburg). The other one is by two Belgian scholars (Ghent). Fortunately, results were similar. A concurrent replication.

Abstract

Greed is an important motive: it is seen as both productive (a source of ambition; the motor of the economy) and destructive (undermining social relationships; the cause of the late 2000s financial crisis). However, relatively little is known about what greed is and does.

This article reports on 5 studies that develop and test the 7-item Dispositional Greed Scale (DGS). Study 1 (including 4 separate samples from 2 different countries, total N = 6092) provides evidence for the construct and discriminant validity of the DGS in terms of positive correlations with maximization, self-interest, envy, materialism, and impulsiveness, and negative correlations with self-control and life satisfaction. Study 2 (N = 290) presents further evidence for discriminant validity, finding that the DGS predicts greedy behavioral tendencies over and above materialism. Furthermore, the DGS predicts economic behavior: greedy people allocate more money to themselves in dictator games (Study 3, N = 300) and ultimatum games (Study 4, N = 603), and take more in a resource dilemma (Study 5, N = 305).

These findings shed light on what greed is and does, how people differ in greed, and how greed can be measured. In addition, they show the importance of greed in economic behavior and provide directions for future studies.

To compare, the Belgian paper had two studies, N=317 “fully employed US citizens” and N=218 US MTurkers.

tenor(Bron)

Further Research
In Study 1, the authors found an “unexpected result, namely the absence of a relationship between greed and risk taking.

And:

Future research could also focus on the observation that some groups of people appeared to score higher on dispositional greed than others. For example, we found that younger people were greedier than older people. (…)

We also found relationships between greed and levels of education and between greed and gender, but, interestingly, we did not find relationships with income or religiosity.

The Belgian study concluded “Greed is higher in men, professionals in financial sectors and non-religious people“;

As expected, men (M = 3.72, SD = 1.27) are more greedy than women (M = 3.40, SD = 1.13, t(216) = 1.99, p < .05). By regrouping the 20 potential industries, we found that respondents working in financial and management sectors (M = 3.84, SD = 1.28) are significantly greedier than those working in services, or the arts (M = 3.22, SD = 1.20, t(114) = 2.70, p < .01). Whether greedy people are more likely to start a financial job or whether financial jobs trigger a greedy disposition is not clear from our results and requires further research. [Krekels & Pandelaere, 2015]

DGS is Dutch
The Dispositional Greed Scale (DGS) consists of these 7 items, with response on a 5-point scale; (Sterk oneens |Oneens | Niet oneens/niet eens | Eens |Sterk eens):

  • Ik wil altijd meer
  • Ik ben eigenlijk wel hebberig
  • Geld heb je nooit genoeg
  • Zodra ik iets heb denk ik alweer aan het volgende dat ik wil hebben
  • Het maakt niet uit hoeveel ik heb, ik ben nooit echt tevreden
  • Mijn levensmotto is ‘meer is beter’
  • Ik denk dat ik nooit genoeg spullen kan hebben

Choice Complexity, Benchmarks and Costly Information

Recently, I spoke with Mark Sanders about his Discussion Paper for the Utrecht University School of Economics: Choice Complexity, Benchmarks and Costly Information (2017, Job Harms, Stephanie Rosenkranz, Mark Sanders).

Abstract

In this study we investigate how two types of information interventions, providing a benchmark and providing costly information on option ranking, can improve decision-making in complex choices.

In our experiment subjects made a series of incentivized choices between four hypothetical financial products with multiple cost components. In the benchmark treatments one product was revealed as the average for all cost components, either in relative or absolute terms. In the costly information treatment subjects were given the option to pay a flat fee in order to have two products revealed as being suboptimal. Our results indicate that benchmarks affect decision quality, but only when presented in relative terms. In addition, we find that the effect of relative benchmarks on decision-quality increases as options become more dissimilar in terms of the number of optimal and suboptimal features.

This result suggests that benchmarks make these differences between products more salient. Furthermore, we find that decision-quality is improved by providing costly information, specifically for more similar options. Finally, we find that absolute – but not relative – benchmarks increase demand for costly information.

In sum, these results suggest that relative benchmarks can improve decision-making in complex choice environments.

Complex task
Subjects had 30 seconds to complete a complex task, namely to choose one out of four products. And to choose the product with the lowest costs. This was their proxy for a complex financial decision. Not only a calculating excercise, but you also have to know that the management fee is to be added to the starting costs, and the tax break can be substracted from the monthly costs.

Table below shows the task; I calculated the bottom row, that was not shown in the experiment. Product D is the average of A, B and C. In this example, product A is the optimal choice (payoff =5). Product C is worst (or “suboptimal” as it is called in the paper) with a -5 payoff.

I am not too sure if I think this is a valid proxy for e.g. buying a mortgage. Considerable maths-skills are needed. A good heuristic seems to be to look at the monthly costs. So perhaps this valliant effort to operationalize a complex financial decision for a lab-setting with students ahs merit.

Product
A B C D
Starting costs 87 92 103 94
Monthly costs 35 49 64 49
Maturity costs 72 91 2 52
Management fee (%) 15% 31% 16% 21%
Tax deduction (%) 11% 10% 10% 10%
 total cost (not shown) 579 771 873 734

Experimental conditions: benchmark & advice
The randomized controlled trial had a 3 x 2 factorial design. Either no benchmark, an absolute bechmark (average product, i.e. product D in table above), or a relative benchmark (all values for bechmark D are rescaled to 100).

This was crossed with: option to get advice or absence of this option. Advice in this experiment consisted of eliminating the worst option and 1 of the 2 remaining non-optimal options. So a subject choosing advice for the tabel above woud get to see product A (optimal) and (randomly) either B or D. Getting advice costs 2.5 in payoff, so max pay-off becomes 2.5. But it reduces downside risk of picking the worst product. Subjects could still pick any of the 4 options, even when advised against.

Results: advice and relative benchmarks work
Why papers by economists always lack graphs still does not cease to amaze me. So I copied their table 4 to Excel and used Conditional formating to get some bars. (aside: fortunately, tables were included in the text and not at the end, that always annoys me too, having to flip back and forth).

Pretty clear that advice works: subject respond more within time (fourth column; 82%-87% without advice vs 92%-97% for with advice). Good to mention: if you fail to respond within 30 seconds, the worst option is automatically selected for you.

Advice also leads to higher pay-offs, which is both driven by more often picking the best choice and by less often choosing the worst option.

Regression results indicate that absolute bechmarks do not affect the quality of the decisions (i.e. the payoff). Relative benchmarks do affect the quality, albeit it via a significant interaction effect with product dissimilarity; “In contrast, relative benchmarks do improve decision-making as options in the choice set become increasingly dissimilar in terms of the number of optimal and suboptimal attributes“.

SandersBenchmark

Conclusion
The effects of benchmarks on decision quality are either absent ( for absolute bechmark) or not very strong/convincing (for relative benchmark); only the interaction with product dissimilarity is significant. I’m not too enthusastic about the experimental task, on how relevant that is to natural decisions and the product dissimilarity might also be more of an artefact than something real.

So: laudable and important research, not too convinced yet about the real-life implications. More research is needed…

The collaborative roots of corruption

Dit is deel 2 naar aanleiding van een presentatie van Shaul Shalvi bij de AFM. In eerder blogpost schreef ik over de behavioral ethics approach, nu dus over The collaborative roots of corruption.

Dit paper gaat over Sequential dyadic die-rolling paradigmPeople sharing the exact same profits from lying lie much more compared to people not sharing the profits equally, or those who work alone.

Hoe het werkt: persoon A gooit een dobbelsteen, persoon B krijgt de uitkomst te zien en gooit daarna ook een dobbelsteen. Als A en B exact hetzelfde gegooid hebben, krijgen ze de waarde van wat ze gegooid hebben (2 x 5 = 5 euro etc).

weiselshalvi0

Wat je zou verwachten: random verdeling, met ongeveer 1 op 6 keer een dubbele worp (=pay-out) en 1 op de 36 keer een dubbel-6 (=maximale pay-out).

Wat je in het lab-experiment ziet: enorme oververtegenwoordiging van 2×6:

weiselshalvi1.png

Er vallen verschillende “koppeltjes” te onderscheiden (uitkomst van A is een kruisje, uitkomst van B is een rondje, als ze samen vallen dan is er pay-out):

weiselshalvi4

Panel A: A sets the stage; B gets the job done
Panel B: A sets the stage; B gets the job done
Panel C: A sets the stage; B gets the job done
Panel D: B gets the job done, but A might not understand how to set the stage, so B signals in round 5

Panel D is interessant; B vind de €4 pay-out te laag en geeft signaal af (een 6). A snapt het maar half, door daarna een 5 te “rollen”.

Paper toont duidelijk aan dat samenwerking in dit dobbelsteen-spel tot veel meer “corruptie” leidt en niet tot disciplinerende werking/sociale controle

Uit de Press release Collaboration may encourage corporate corruption

The study found the highest levels of corrupt collaboration occurred when parties shared profits equally, and were reduced when either player’s incentive to lie was decreased or removed.
Dr Weisel said: “When partners’ profits are not aligned, or when individuals complete a comparable task alone, corruption levels drop.”
Ander, meer overzichtsartikel waarin dit paper ook geciteerd wordt: Editorial: Dishonest Behavior, from Theory to Practice

 

PS: Salvi vertelde ook over een derde paper; See what you want to see,  Justifications Shape Ethical Blind Spots (2015). Besteed ik geen apart blog aan. Interessant hieraan was dat respondenten vaker (onbewust?) fouten maakten als dat in hun voordeel was.

Taakje (Ambiguous-dice-task) was om aan te geven bij welk getal het kruisje het dichtst bij staat (de 3 dus). Alleen: als payout was gebaseerd op getal dat je aangeeft (in voorbeeld hieronder loont het dus om “per ongeluk” 6 te zien als dichterbij het kruisje) dan is er een duidelijke bias in de fout die gemaakt wordt (rechterkant van de grafiek, gele balk=het foute getal levert meer op). Als pay-out is op basis van “noem je het juiste getal”, dan geen bias in fout.

seewhatyouwanttosee

Conclusie: When asked to report the outcomes closets to the fixation cross to earn money, people make fewer self-hurting than self-serving mistakes. The task was implemented both with and without eye-tracking.

Wat is het effect van transparante toezichthouders op het vertrouwen van de burger? Een experimentele studie

In het eerste nummer van 2017  van het tijdschrift Beleid en Maatschappij staat een artikel van Femke de Vries, Stephan Grimmelikhuijsen en van mij: Wat is het effect van transparante toezichthouders op het vertrouwen van de burger? Een experimentele studie.

In haar oratie lichtte Femke de Vries al een tipje van de sluier op over dit onderzoek dat we begin 2016 deden met ruim 1000 respondenten uit het www.afmconsumentenpanel.nl.

Meer transparantie, meer vertrouwen?
We onderzoeken empirisch of transparantie van toezichthouders werkelijk tot meer vertrouwen leidt. Wij hanteren de volgende Engelstalige definitie voor transparantie:

‘Transparency is the availability of information about an organization or actor allowing external actors to monitor the internal workings or performance of that organization’ (Grimmelikhuijsen, 2012, 55).

We onderzoeken het effect van:

  1. De inhoud van de informatie die openbaar gemaakt wordt; “Het is te verwachten dat mensen anders reageren op situaties waarin een toezichthouder het ‘goed’ heeft gedaan, dan op situaties waarin hij het ‘slecht’ heeft gedaan.” (p.11).
  2. Het type transparantie. “We onderscheiden twee typen transparantie: transparantie over proces en transparantie over de rationale. Procestransparantie geeft vooral details over de procedure voorafgaand aan een besluit. (…) Rationale transparantie gaat over het geven van redenen en onderliggende principes van een beslissing.” (p11-12).

Transparantie van toezicht

Transparantie heeft in de toezichtpraktijk grofweg twee verschillende functies (De Vries, 2016): een instrumentele functie en een verantwoordingsfunctie. Met de instrumentele functie bedoelen we de situatie waarin de toezichthouder transparantie inzet om zijn maatschappelijke opdracht te realiseren. (…)

Het publiceren van toezichtresultaten kan overigens ook worden gezien als een onderdeel van de verantwoordingsfunctie. Door het publiceren van toezichtresultaten leggen toezichthouders immers verantwoording af over de inzet van mensen en middelen. (p.8-9)

Experiment
In deze randomized controlled trial (RCT) hadden we 7 groepen waar respondenten aan werden toegewezen. Dus 144 respondenten waren de controlegroep, kregen helemaal geen tekst voorgelegd. Verder was het een 3 x 2 design: drie soorten berichten waarvan één positief voor de AFM, AFM legt boete op, en twee inhoudelijk negatiever; een boete wordt teruggedraaid en een bericht over herbeoordelingen rentederivaten (“AFM onderzoekt tekortkomingen in haar eigen toetsing van de herbeoordelingen”). Alle drie berichten zijn op twee manieren beschreven; met procestransparantie (nadruk op “hoe”) of met rationale-transparantie (nadruk op “waarom”).

Na het lezen van de tekst kreeg iedereen de gevalideerde schaal om vertrouwen in de AFM te meten. De schaal bestaat uit negen vragen, drie vragen elk over gepercipieerde competentie, welwillendheid (Benevolence) en integriteit. Omdat het enige verschil tussen de 7 groepen van ieder 144 respondenten de gelezen tekst was, kan je verschillen tussen groepen toeschrijven aan de voorgelegde tekst.

Communiceer, vooral over het waarom
De zes groepen die wel een tekst te lezen kregen, scoren gemiddeld genomen significant hoger op vertrouwen in de AFM dan de controlegroep die niets las. Überhaupt communiceren, transparant zijn, lijkt dus goed voor het vertrouwen.

En als je dan communiceert, doe dat dan vooral met nadruk op het waarom, dus met rationale-transparantie. En niet alleen over het proces. In Figuur 1: de vierkantjes van rationale-transparantie scoren hoger op betrouwbaarheid dan de cirkels van procestransparantie.

De inhoud van het bericht lijkt minder belangrijk; in absolute zin scoort het (voor de AFM) negatieve bericht dat een boete wordt vernietigd, het hoogst. De hoe-communicatie over de rentederivaten, waarin de AFM zelf een fout toegeeft, scoort wel het laagst (zwarte cirkel in Figuur 1).

BenMfig1

Expliciet uitvragen
Wanneer we expliciet vragen of het vertrouwen gedaald of gestegen is, dan zien we wel een duidelijk effect van de inhoud van het bericht (figuur 2 hieronder). De opgelegde boete scoort in de perceptie veel beter dan de ingetrokken boete of de herbeoordeling rentederivaten. Wel zien we ook hier dat waarom-communicatie beter scoort dan hoe-communicatie op vertrouwen.

BenMfig2

Conclusies

  1. Burgers hebben meer vertrouwen in toezichthouder AFM na het lezen van een bericht over de toezichthouder.
  2. Vertrouwen, en de drie onderliggende dimensies competentie,welwillendheid en integriteit, kan verder verhoogd worden door in de communicatie de focus te leggen op rationale transparantie (het ‘waarom’), meer dan op procestransparantie (het ‘hoe’).
  3. ‘Slecht nieuws’ kan zonder veel schade transparant gemaakt kan worden, waarbij wel gezegd moet worden dat hierbij dan wel de rationale van de fouten en vervolgstappen uitgelegd moet worden.
  4. Als mensen expliciet gevraagd wordt naar het effect op vertrouwen wegen zij de informatie veel zwaarder dan dat zij onbewust doen.

Een samenvatting in een slide deck:

Early-warning signals of topological collapse in interbank networks

On March 13th, I went to a Studium Generale lecture by Diego Garlaschelli. The main part of his presentation was on this paper Early-warning signals of topological collapse in interbank networks (2013) by Tiziano Squartini, Iman van Lelyveld (DNB), and Diego Garlaschelli in Scientific Reports 3 (3357).

They looked at about 110 Dutch banks and their interconnections from 1998 – 2008. There seems to be no pattern in number of banks over time (black) or vertices (connections, grey).

garlassci

So no evidence of signals for the crisis? They looked deeper, at the network structure.

First, comparing actual network versus a random network where only the number of connections is kept constant (difference between real and random network is expressed as a z-score). result: abrupt change in z-score (purple lines, left panel in tweet below).

However, accounting for (in random network) the fact that large banks have more links than small banks results in continuous transition (blue line, right side)

One step further, is also incorporating triad (triangle) structures in the random network (green lines). Most of the 14 possible triad configuration do not show a pattern. However, one of the riskiest (motif number 9, the unreciprocated 3-loop, where A loans to B, B loans to C, and C loans to A; a debt loop) does show a pattern (see also Figure 6 in the paper)

Concluding slide:

Policy conclusion in the paper:

More generally, any policy directed at regulating interbank markets in a ‘pairwise’ fashion appears to be fundamentally ineffective, since the most significant patterns are found to occur at an irreducibly triadic level. This result moves the regulation target even further away: while the notion of systemic risk already implies that monitoring individual banks is insufficient to contain systemic risk, monitoring pairs of banks is also likely to fail; the minimal ‘building blocks’ appear to be triples of banks.

As to further research/predicting the next crisis:

 

Financial Competence (update on Sandro Ambuehl’s work)

Yesterday, I went to a CREED seminar where Sandro Ambuehl spoke on two papers: The Effect of Financial Education on the Quality of Decision Making and Peer Effects in Financial Decision Making – A Case of the Blind Leading the Blind?

Measuring quality of financial decision
The first paper, I blogged about before. Two slides did help me understand it better.

On the methodology and the willingness to pay for the simple and the complex products:

ambuehlmethod.png

And the summary slide:

ambuehlconcl.png

(also the shift in CDF curve became more clear to me. People that underestimated compound estimate had better estimate because of the [rhetoric] intervention. However, this positive effect for some, was offset by a negative effect on people that previously had a good estimate of compound interest now overestimated its effect. So no net-benefit for the whole population)

Peer effects
The other paper was Peer Effects in Financial Decision Making – A Case of the Blind Leading the Blind?(with B. Douglas Bernheim, Fulya Ersoy, and Donhatai Harris).
Abstract:

Often, people consult with others for advice before they make financial decisions. Previous research argues that such communication amounts to a case of the blind leading the blind. In this paper, we document that it can be beneficial, and explore mechanisms. In our laboratory experiment, subjects make private decisions about investments involving compound interest both before and after they communicate with a randomly assigned partner. Communication not only improves decision making for the specific tasks they have sought advice about, but subjects successfully generalize these skills to novel decision problems. We find that communication is most beneficial when pair members’ skills are at similar levels — the transmission of financial competence requires a common language, and is not merely a case of information flowing from those who have it to those who do not. Finally, communication leads subjects to reevaluate their privately revealed time preferences. Discount rates move towards the communication partners’ rate, and do so to a larger extent if the partner is more patient. We suggest policies to improve the quality of financial decision making.

Sandro used the same methodology to establish financial competence, namely the divergence between willingness to pay for the simple product and the complex product. Rationally, this should be zero, because they are economically equivalent, but the complex one is often considered less worthy, resulting in a negative number for divergence/financial competence.

peerq.png

Communication does help in financial decision making, after conferring, people make better choices, also on novel tasks.

peerhelp.png

I didn’t photograph this, but it also matters with whom you confer. Sample was split in two based on results in stage 0. If a person from the bottom-half had a top-half partner (i.e. “better”), competence increased. However, if a person from the bottom-half had a bottom-half partner (i.e. “similar”), competence increased even more. Booij et al (2016) found something similar for the effect of similar peers (Ability peer effects in university: Evidence from a randomized experiment).

Summary slide of the talk

ambuehlsummary.png