Musea

De eerste maanden van dit jaar ben ik in paar musea geweest en heb daar wat kiekjes geschoten. Verzamel ik hier in deze blogpost.

Bezoek aan Haags Fotomuseum (maart) heeft eigen post: Michael Wolf – Life in Cities, net als het Jenevermuseum in Schiedam (januari).

Februari “Photo-phylles” in Jardin Botanique Bordeaux:

Deze slideshow heeft JavaScript nodig.

Februari: Bernd, Hilla en de anderen / Fotografie uit Düsseldorf in Huis Marseille:

Deze slideshow heeft JavaScript nodig.

Maart: NEMO Amsterdam (kids hadden studiedag, samen met Mees):

DbFy7-ZW4AAg9EO.jpg large

Maart: André Volten – Utopia in Beelden aan zee:

Deze slideshow heeft JavaScript nodig.

April: Centraal Museum Utrecht:

Deze slideshow heeft JavaScript nodig.

April: Fashion Cities Africa in Tropenmuseum Amsterdam:

Deze slideshow heeft JavaScript nodig.

April: Hollandse Meesters uit de Hermitage in Hermitage Amsterdam:

Deze slideshow heeft JavaScript nodig.

 

Advertenties

Central Bank Communication and the General Public

Central Bank Communication and the General Public. Andy Haldane (of Dog and Frisbee fame) and Michael McMahon. 2018. Forthcoming, AEA Papers and Proceedings.

Blinder (2009) wrote that “It may be time for both central banks and researchers to pay more attention to communication with a very different audience: the general public.” communication can aid expectations, and hence economic, management; central bank communication is now itself a powerful lever of monetary policy.

Haldane (2017) stresses a deficit of public understanding as well as public trust in central banks –  a twin deficits problem. Facing these twin deficits, a number of central banks have recently acknowledged the need to adapt their communications strategies to improve their reach to the general public, including through more accessible language and more direct engagement (Haldane, 2017). Because such efforts are not costless, however, two important considerations arise: feasibility and desirabilty.

Desirabilty

Four reasons why it may be desirable to speak directly to a wider audience.

  1. A better under standing of the factors driving the economy, and economic policy, could help to reduce the incidence of such self-reinforcing expectatational swings in sentiment and behaviour.
    To become convincing and credible, communications may need to be simple, relevant and story-based. Typical central bank communications tend to fail on all three fronts.
    Households who report greater knowledge and greater satisfaction with monetary policy are also likely to have one-year, two-year and five year inflation expectations that are closer to the inflation target.
  2. Building public understanding may be important as a means of establishing trust and credibility about central banks and their policies.
    It is also important for reasons of political accountability.
    Satisfaction in central banks’ actions is positively correlated with institutional understanding. It is also positively correlated with measures of central bank credibility.
  3. Traditional information intermediaries, such as the mainstream media and nancial markets, may benefit from new, simpler narrative communication.
  4. To engage in more listening to messages from the general public, given that aggregating information is one of a monetary policy committees’ key roles.

Feasibility

We examine a recent communication initiative by the Bank of England. In November 2017 the Bank of England launched a new, broader-interest version of its quarterly In ation Report (IR), augmented with new layers of content aimed explicitly at speaking to a less-specialist audience.

Overall, the analysis is a nuanced good news message.

  1. Website activity over the course of the 24 hours after the announcement increase markedly in November 2017, almost doubling compared with earlier IRs.
  2. Numbers of tweets and retweets associated with the IR were materially higher than in August 2017, but slightly lower than in August 2016. Monetary policy news itself,
    rather than the means by which it is communicated, is the largest single factor determining the reach of Twitter activity.
  3. More than 70% of respondents [in a survey of BoE business contacts] felt the new layered summary helped them to better understand the IR’s messages. And around 60% of respondents felt the new communication improved their perceptions of the Bank.

Experiment

We now assess the impact of the new Bank of England communications more directly through a controlled experiment. N=285 UK general public, plus sample of first-year Oxford economics graduate students.

Participants were then randomly assigned to read either the traditional Monetary Policy Summary that accompanies the IR or the new, simplified layered content.

Three questions:

(1) understand the content and messages?

The results confirm that the new layered content is easier to read and understand, even for technically-advanced MPhil students.

(2) IR summary changed your views or expectations?

In the case of the general public survey, we find that more straightforward communication boosts the chances that the participant’s beliefs move more closely into alignment with the Bank’s forecasts. For MPhil students, the coefficient is also positive but not statistically significant.

(3) How has the IR summary affected your perceptions of the Bank of England?

Those that read the new layered content tended to develop an improved perception of the institution (BoE).

Policy implications

On a practical level, central banks aiming to reach a broader audience will need to continue to innovate and experiment with different methods and media for engaging the general public. This will, inevitably, require a degree of trial and error.

Success should be measured, not by the ability to reach everyone, but rather to influence beyond the small minority of technical specialists and information intermediaries who currently form the core of central banks’ audience.

***

Summarized in The Telegraph: There are good reasons why the Bank of England is trying to speak to ‘ordinary people’.

For central banks, communication is a powerful policy tool. The way central bankers talk about their thinking and decision influences even long-term interest rates as investors price credit according to their expectations of the central bank’s behaviour.

(…)

At the same time, the Bank is changing the way it communicates, and more specifically changing the people it communicates to. As well as the traders, economists and strategists in the markets, the Bank wants to talk to the wider public. This approach has led to a more accessible version of the quarterly Inflation Report and Governor Mark Carney going on ITV’s Peston on Sunday show. There are several reasons why the Bank might try to broaden the audience.

For one, clearer, simpler messaging may help media and markets to understand policy. Experience suggests the cryptic code of an earlier generation of central bankers can be misunderstood even by sophisticated market participants.

Perhaps more important, talking directly to “ordinary people” confirms that household actions matter to the economy and to Bank policy. Just as much as City investors, consumers need to form sensible expectations of the future economy when they make decisions on borrowing and spending.

A more accessible approach may build public confidence in the Bank at a time when trust in public institutions is weak. It could also open a dialogue that facilitates the flow of information from households to the central bank. The Bank surveys businesses extensively, but households less so. MPC members knowing more about what households think and feel about the economy can only be a good thing.

So there are good reasons for the Bank to try harder to talk to the wider public, and I think that approach will continue. Market participants should get used to the fact that they’re not the only people the Bank wants to talk to – and learn to read its utterances in that context.

 

Rethinking Traditional Methods of Survey Validation – Maul (2017)

Andrew Maul (2017) Rethinking Traditional Methods of Survey Validation, Measurement: Interdisciplinary Research and Perspectives, 15:2, 51-6 (Found via Tweet that is now protected)

Abstract

It is commonly believed that self-report, survey-based instruments can be used to measure a wide range of psychological attributes, such as self-control, growth mindsets, and grit. Increasingly, such instruments are being used not only for basic research but also for supporting decisions regarding educational policy and accountability. The validity of such instruments is typically investigated using a classic set of methods, including the examination of reliability coefficients, factor or principal components analyses, and correlations between scores on the instrument and other variables. However, these techniques may fall short of providing the kinds of rigorous, potentially falsifying tests of relevant hypotheses commonly expected in scientific research. This point is illustrated via a series of studies in which respondents were presented with survey items deliberately constructed to be uninterpretable, but the application of the aforementioned validation procedures nonetheless returned favorable-appearing results. In part, this disconnect may be traceable to the way in which operationalist modes of thinking in the social sciences have reinforced the perception that attributes do not need to be defined independently of particular sets of testing operations. It is argued that affairs might be improved via greater attention to the manner in which definitions of psychological attributes are articulated and greater openness to treating beliefs about the existence and measurability of psychological attributes as hypotheses rather than assumptions—in other words, as beliefs potentially subject to revision.

Procedures of analysis and quality control of measurement instruments are often grouped under the heading of “validation” in the social sciences. In the case of self-report, survey-based instruments, such validation activities commonly consist of essentially three steps:

  1. Estimation of overall reliability or measurement precision, via estimation of Cronbach’s alpha
  2. Some form of latent variable modeling, via exploratory factor analysis (or sometimes principal components analysis), possibly followed by confirmatory factor analysis; and, more rarely,other latent variable models
  3. Estimation of associations between the measured variable and external variables, by inspection and interpretation of correlation matrices of scores on the new instrument and scores from existing instruments designed to measure similar or theoretically related attributes or outcomes of interest.

 

Why is this trinity succesful?

An optimistic explanation for the longevity and popularity of these techniques could be that they are, in fact, reliably successful in achieving their intended scientific and quality-control aims.

A less optimistic observer might note various extra-scientific factors that might contribute to the popularity of these techniques, such as the fact that they have a clear track record of success in facilitating the publication of manuscripts in academic journals and providing a socially accepted warrant for claims of validity; additionally, these techniques are relatively easy to understand and implement, especially by comparison to many other psychometric models (which are not as easily accessible via common software programs such as SPSS).

Three studies with items without theory

In the three studies described above, items were written in the complete absence of a theory concerning what they measured and how they worked.

  1. In the first study, the items closely resembled items from a widely used survey instrument intended to measure growth mindsets, with the notable exception that the key noun in the sentence (“intelligence”) had been replaced with a nonsense word (“gavagai”). To help ensure that any results were not driven by peculiarities of the word “gavagai,” two additional versions of the survey were also used,  where the word “gavagai” was replaced with “kanin” or“quintessence” [result: wording did not matter].
  2. In the second study, items consisted only of meaningless gibberish (Study 2 items were constructed so as to entirely lack even the semblance of semantics. Eight items were constructed, of approximately equal length, consisting of stock lorem ipsum text (e.g.,“sale mollis qualisque eum id, molestie constituto ei ius”)
  3. In the third, they were simply absent. The items (if they could even be called that) simply consisted of an item number (e.g., “1.”), followed only by the six response options as described in the previous studies,ranging from strongly disagree to strongly agree.

Prima facie, it would seem difficult to take seriously the claim that any of these sets of items constituted a valid measure of a psychological attribute, and if such a claim were made, one might reasonably expect any quality-control procedure worthy of the name to provide an unequivocal rejection.To state this in Popperian language: If ever there were a time when a theory deserved to be falsified, this would appear to be it.

Yet, this is not what occurred. In all three studies above, reliability estimates for the deliberately-poorly-designed item blocks were quite high by nearly any standard found in the social sciences.

These validation procedures returned results roughly in line with what is commonly provided as positive evidence of validity throughout the social sciences. This would appear to cast doubt on the adequacy of these methods for providing the kind of rigorous test of beliefs usually expected of scientific studies. Indeed, if response data from nonsensical and blank items can meet classically accepted criteria for validity, one might wonder under what conditions such procedures would not return encouraging results.

it was argued and shown that traditional validation approaches may commonly fail to provide rigorous, potentially falsifying tests of key hypotheses involved in the construction of measures; it was demonstrated that it is not only possible but also apparently fairly easy to obtain favorable-seeming values of common statistical criteria for validity even in the absence of a theory concerning what an instrument measures and how it operates and, in fact, even in the absence of actual items.

The validation activities themselves (in particular, the aforementioned trinity of reliability estimates, factor analyses, and inspection of correlations with other variables) are essentially unreactive to theory.

Favorable-looking results as a default expectation

The results of this study suggest that, at least in the context of responding to survey questions, respondents often choose to behave consistently unless there is a clear reason not to do so. As such, it may be that favorable-looking results of covariance-based statistical procedures  should be regarded more as a default expectation for survey response data than as positive evidence for the validity of an instrument as a measure of a psychological attribute.

Ad hoc explanations

[A] number of interesting correlations surfaced, including the correlation between scores on the “Theory of Gavagai” items and scores on the original Theory of Intelligence items, and the correlation between the nonsense items and Big Five Agreeableness. If one were inclined to do so, one might be able to provide ad hoc explanations regarding how these correlations constitute evidence of validity.

Misconceptions regarding the nature of scientific inquiry in general and measurement in particular

The process of “validating” a measure seems to be thought of by many as separate from the process of defining the attribute to be measured and articulating hypotheses concerning the nature of the connection between variation in the attribute and variation in the outcomes of the proposed testing procedures; that is, the classic trinity of analytic methods used in traditional survey validation applications seem to be fixed a priori and independently of the substantive area of application, background psychological theory, and motivating goals for the creation of the instrument.

In many applications of psychological measurement, the definition of the attribute of interest is vague at best and incoherent or entirely absent at worst.

operationalism and other strong forms of empiricism may have encouraged the perception that psychological attributes do not need to be rigorously defined independently of a particular set of testing operations.

There may be good reason to be suspicious of strong claims regarding the accuracy, precision, and coherence of many survey-based instruments at least, to the extent to which such claims are justified with reference to traditional validation strategies and

especially in the presence of unclear or poorly formulated definitions of target attributes and theories regarding their connection to the outcomes of proposed measurement procedures.

Michell (e.g.,1999; Measurement in psychology: A critical history of a methodological concept) refers to this belief [measurement is a universally necessary component of scientific inquiry] as the quantitative imperative—the conviction that measurement is necessary for scientific inquiry— and gives a thorough historical account of its origins and development and of the ways in which it has shaped methodological reasoning in the psychological sciences since their inception.

***
Factoid from footnote 5: Lorem ipsum text, which is commonly used as placeholder text in publishing and graphic design applications, is itself derived from sections 1.10.32 and 1.10.33 of “de Finibus Bonorum et Malorum” (The Extremes of Good and Evil) by Cicero, written in 45 BC.

Salience and Switching: Hungarians do switch car insurance due to advertising campaign

Working paper Salience and Switching by András Kiss (UvA) [mirror SalienceSwitchingKiss].

Abstract

I estimate the effect of a concentrated advertising period on contract switching decisions in auto liability insurance and show that consumers’ inattention is a major obstacle to switching service providers. For identification, I exploit a recent change in Hungarian regulation,which creates exogenous variation in the salience of the switching opportunity for a subset of drivers. Using a micro-level dataset, I find that the media campaign increases switching rates from 20% to 36%. I also jointly estimate switching costs and consumer inattention in a two-stage demand model, showing that 30% of insurees only consider switching because of the campaign.

Inertia

[I]nertia can be due to the time or the effort cost of switching, but psychological factors, such as inattention, procrastination, or fear of new situations, can also create or heighten barriers to switching.

[T]here are a number of important retail markets with low switching rates and high consumer inertia (e.g. gas, electricity, banking) in which people could benefit from an endogenously arising campaign effect if switching opportunities were restricted to specific times of the year. (also see my Dutch blog post Waarom klanten onbewuste blijvers zijn & hoe je ze kunt overtuigen).

Natural experiment

The main point of this paper, however, is the use of a natural experiment to measure how much an actually observed policy change can influence consumer decisions, and to show that the mechanism works primarily by decreasing the share of inattentive people.

[I]dentifying the effect of salience on consumer switching.

Campaign +16%-points switchers; financial incentive: +3.5%-points

In this paper, I exploit a change in auto liability insurance regulation in Hungary to identify the causal effect of a concentrated advertising period that provides no decision-relevant information to consumers, but increases the salience of the switching opportunity for a well-defined time period.

My main result is that the campaign almost doubles switching rates from a baseline of 20 percent to 36 percent.

In comparison, the estimated reduced-form relationship betweenfinancial incentives and switching decisions is much weaker: an additional saving of $35 per year (the median in the sample) is associated with only 3.5 percentage points higher switching rates.

Without the campaign, over two-thirds of consumers ignore the decision problem altogether, whereas during the campaign the implied share of inattentive people is 40 percent. The estimated mean switching cost is $57.

Hungarian auto-insurance market changed

Before January 1, 2010 (in the synchronized regime), contracts were required to coincide with the calendar year from the second year onwards. [P]eople [only] had the month of November to consider changing insurance contracts. Market players spent over 90 percent of their yearly marketing budgets in November.

Following January 1, 2010 (in the dispersed regime), all new insurance periods -including the first one – have become one year long.

Graphs

People that can switch in November (campaign periods, grey areas) do so more often than in non-campaign periods.

hungarianswitchcarinsurance

Financial savings do play a role (fraction of switchers increases with fee savings from switching), but %switching is much higher in campaign-season (November; circles) than in months without major advertisement campaigns (squares)

hungarianswitchcarinsurance2

 

Discussion & Conclusions

Using a natural experiment created by a change in auto insurance regulation in Hungary, I show that merely increasing the salience of a decision problem without transmitting relevant information has a large eect on people’s actions. Therefore, their “choice” to ignore the problem when it was not salient must have been suboptimal.

The coordination of all switching activity into a single month in the synchronized regime is also an effective idea, as the campaign eect estimates confirm.

Finally, a requirement to send a regulator designed information leaflet on contract switching along with the insurers’ announcement of next year’s continuation prices could make a difference as well. In a field experiment, Adams et al. (2016) vary the amount of information and the cost of switching to alternative savings accounts in the U.K. and also look at the effect of reminders. They find a 3-9 pp increase in switching from a baseline of 3%. [Adams et al, 2016, is FCA Occasional Paper No. 19: Attention, Search and Switching: Evidence on Mandated Disclosure from the Savings Market]. (I don’t think Kiss’s paper adds new evidence for leaflets, I think it is an additional policy recommendation).

My main result is that the media campaign has a large causal effect, increasing switching rates by around 16 percentage points from a baseline of 20 percent. 70 percent of people routinely ignore the decision problem, while the campaign decreases the share of inattentive people to 40 percent. The campaign’s effects are largely homogenous across drivers and robust to a variety of specifications.

Controlled experiments on the web: survey and practical guide

Controlled experiments on the web: survey and practical guide. Ron Kohavi · Roger Longbotham · Dan Sommerfield · Randal M. Henne (2009) Data Min Knowl Disc 18:140–181.

Summary

Controlled experiments neutralize confounding variables by distributing them equally over all values through random assignment, thus establishing a causal relationship between the changes made in the different variants and the measure(s) of interest, including the Overall Evaluation Criterion (OEC).

We agree and believe that companies can accelerate innovation through experimentation because it is the customers’ experience that ultimately matters, and we should listen to them all the time by running experiments.

Our experience indicates that significant learning and return-on-investment (ROI) are seen when development teams listen to their customers, not to the Highest Paid Person’s Opinion (HiPPO). Many organizations have strong managers who have strong opinions, but lack data, so we started to use the term HiPPO, which stands for Highest Paid Person’s Opinion, as a way to remind everyone that success really depends on the users’ perceptions.

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments (single-factor or factorial designs), A/B tests (and their generalizations), split tests

When a company builds a system for experimentation, the cost of testing and experimental failure becomes small, thus encouraging innovation through experimentation. Failing fast and knowing that an idea is not as great as was previously thought helps provide necessary course adjustments so that other more successful ideas can be proposed and implemented.

Overall Evaluation Criterion (OEC)

A good OEC should not be short-term focused (e.g., clicks); to the contrary, it should include factors that predict long-term goals, such as predicted lifetime value and repeat visits. Ulwick describes some ways to measure what customers want (although not specifically for the web). [Book by Ulwick – What customers want, pdf].

When running experiments, it is important to decide in advance on the OEC (a planned comparison); otherwise, there is an increased risk of finding what appear to be significant results by chance (familywise type I error).

If the experiment was designed and executed properly, the only thing consistently different between the two variants is the change between the Control and Treatment, so any differences in the OEC are inevitably the result of this assignment, establishing causality.

we need to aggressively filter out robots that do not delete cookies and have a large number of actions

Ramp-up

An experiment can be initiated with a small percentage of users assigned to the Treatment(s), and then that percentage can be gradually increased. For example, if you plan to run an A/B test at 50%/50%, you might start with a 99.9%/0.1% split, then rampup the Treatment from 0.1% to 0.5% to 2.5% to 10% to 50%. At each step, which could run for, say, a couple of hours, you can analyze the data to make sure there are no egregious problems with the Treatment before exposing it to more users. The square factor in the power formula implies that such errors could be caught quickly on small populations and the experiment can be aborted before many users are exposed to the bad Treatment.

Limitations

    1. Quantitative metrics, but no explanations (no “why”)
    2. Short term versus long term effects (Long-term goals should be part of the OEC).
    3. Primacy and newness effects
    4. Features must be implemented (paper prototyping can be used for qualitative feedback, can complement controlled experiments).
    5. Consistency
    6. Parallel experiments (strong interactions are rare in practice; see below)
    7. Launch Events and Media Announcements. If there is a big announcement made about a new feature, such that the feature is announced to the media, all users need to see it

Parallel or sequential?

There are two primary benefits of a single MVT (MultiVariable test) versus multiple sequential A/B tests to test the same factors
  1. You can test many factors in a short period of time
  2. You can estimate interactions between factors

Three common limitations are:

  1. Some combinations of factors may give a poor user experience
  2. Analysis and interpretation are more difficult
  3. It can take longer to begin the test

Power

It is commonly thought that the power of the experiment decreases with the number of treatment combinations (cells). This may be true if the analysis is conducted by comparing each individual cell to the Control cell. However, if the analysis is the more traditional one of calculating main effects and interactions using all the data for each effect, little or no power is lost.
(…)
There are two things that will decrease your power, though. One is increasing the number of levels (variants) for a factor. This will effectively decrease the sample size for any comparison you want to make, whether the test is an MVT or an A/B test. The other is to assign less than 50% of the test population to the treatment (if there are two levels). It is especially important for treatments in an MVT to have the same percentage of the population as the Control.
If you want to test ideas as quickly as possible and aren’t concerned about interactions, use the overlapping experiments approach. (With overlapping experiments you test the factors more quickly and, if there is sufficient overlap in any two factors, you can estimate the interaction between those factors.)

If it is important to estimate interactions run the experiments concurrently with users being independently randomized into each test effectively giving you a full factorial experiment.

Lessons learned

6.1 Analysis

  • 6.1.1 Mine the data (a population of  users with a specific browser version was significantly worse for the Treatment)
  • 6.1.2 Speed matters
  • 6.1.3 Test one factor at a time (or not)
    • Conduct single-factor experiments for gaining insights and when you make incremental changes that could be decoupled
    • Try some bold bets and very different designs
    • Use full or fractional factorial designs suitable for estimating interactions when several factors are suspected to interact strongly. Limit the number of values per factor and assign the same percentage to the treatments as to the control. This gives your experiment maximum power to detect effects.

6.2 Trust and execution

  • 6.2.1 Run continuous A/A tests
  • 6.2.2 Automate ramp-up and abort
  • 6.2.3 Determine the minimum sample size [online power calculator]
  • 6.2.4 Assign 50% of users to treatment (For example, if an experiment is run at 99%/1%, then it will have to run about 25 times longer than if it ran at 50%/50%.)
  • 6.2.5 Beware of day of week effects

6.3 Culture and busines

  • 6.3.1 Agree on the OEC upfront; the interested parties have agreed on how an experiment is to be evaluated before the experiment is run.
  • 6.3.2 Beware of launching features that “do not hurt” users.(It is possible that the experiment is negative but underpowered)
  • 6.3.3 Weigh the feature maintenance costs (A small increase in the OEC may not outweigh the cost of maintaining the feature)
  • 6.3.4 Change to a data-driven culture

Quotes

The paper has some nice quotes:

  • The fewer the facts, the stronger the opinion – Arnold Glasow
  • The difference between theory and practice is larger in practice than the difference between theory and practice in theory – Jan L.A. van de Snepscheut
  • The road to hell is paved with good intentions and littered with sloppy analysis– Anonymous
  • It is difficult to get a man to understand something when his salary depends upon his not understanding it.– Upton Sinclair
  • Almost any question can be answered cheaply, quickly and finally, by a test campaign. And that’s the way to answer them – not by arguments around a table. Go to the court of last resort – buyers of your products. – Claude Hopkins, Scientific Advertising, 1923
  • …the ability to experiment easily is a critical factor for Web-based applications. The online world is never static. There is a constant flow of new users, new products and new technologies. Being able to figure out quickly what works and what doesn’t can mean the difference between survival and extinction. – Hal Varian, 2007

Michael Wolf – Life in Cities

Tot 22 april is de tentoonstelling Life in Cities van Michael Wolf in het Fotomuseum Den Haag. Aardige introductie is dit filmpje van 6 minuten.

De serie Architecture of density is prachtig; wolkenkrabbers in Hong Kong.

20180310_15233020180310_152353

Ook Transparent City (Chicago) bevat hele mooie foto’s

20180310_143442

En Paris Roof Tops (waren heel mooi uitgelicht in een donkere kelder):

20180310_14541820180310_145443

Andere series die getoond werden, waren Tokyo compression:

20180310_152501.jpg

en werk als student:

20180310_150940

Zeker een aanrader!

Op de terugweg kwamen we nog langs Ringen aan Zee

20180310_16544320180310_165614

Psychologie van beleggen en Bitcoins

Bitcoins zijn hot en veel in het nieuws. De koers lijkt –met af en toe een hapering- alleen maar omhoog te gaan. Maar cryptomunten zijn zeker niet zonder risico. Bij deze hype spelen ook psychologische aspecten, waardoor consumenten mogelijk niet de juiste (risico-)afwegingen maken.

Psychologie van beleggers

Uit AFM onderzoek in 2015 onder zelfstandig beleggers “blijkt dat het gedrag van beleggers vaak afwijkt van de wijze waarop toezichthouders, financiële ondernemingen en wetgevers zouden willen dat beleggers zich gedragen.” Dat echte mensen zich soms anders gedragen dan perfect rationele modellen voorspellen, is een reden waarom de AFM een apart Consumentengedrag-team heeft. We gebruiken inzicht in daadwerkelijk beslisgedrag om risico’s in kaart te brengen en om effectiever toezicht te kunnen houden.

De psychologische valkuilen bij beleggen gelden misschien nog wel sterker bij het kopen en verkopen van Bitcoins en andere cryptomunten. Omdat koersveranderingen hierbij veel extremer omhoog en omlaag gaan dan meer reguliere beleggingen zoals aandelen of obligaties. Vergelijkingen met de tulpenmanie, de camping-hausse of de internetbubbel liggen op de loer (hoewel: Kent u het verhaal van de tulpenmanie? Klopt niet (en lessen voor de Bitcoin).

Trends

In Google Trends, op basis van zoekopdrachten, zien we een grote toename in de interesse in Bitcoins, die bijna net zo hard stijgt als de koers. Zoekvolume kan voorspellende waarde hebben, blijkt uit een studie van het Centraal Planbureau (CPB). “[Er] blijkt een sterke correlatie tussen het aantal zoekopdrachten [naar “hypotheek”] bij Google en het aantal feitelijke transacties op de huizenmarkt” schrijven de CPB-onderzoekers in Een voorlopende huizenmarktindicator.

googletrendsbitcoin

Beurs

Ook voor de “gewone” (aandelen)beurs geldt dat als de beurskoersen aantrekken, er meer mensen gaan beleggen. Volgens onderzoeksbureau Kantar TNS telt Nederland in 2017 bijna 1,4 miljoen beleggende huishoudens, 14% meer dan in 2016. De onderzoekers spreken ook van een bandwagon-effect. Dat is een bekend fenomeen uit de psychologie en onderstreept dat de mens een sociaal wezen is en sterk beïnvloed wordt door wat mensen om hem of haar heen doen. In 2015 schreven we hierover (pagina 16 van rapport Belangrijke inzichten over zelfstandige beleggers):

[sociale vergelijking en kuddegedrag] refereren aan het feit dat onze beslissingen worden beïnvloed door de beslissingen die anderen maken. Mensen vergelijken hun eigen situatie met die van anderen in hun sociale omgeving. Als veel mensen in onze omgeving al een bepaald product hebben, nemen we onbewust aan dat dit een goed product is (Cialdini, 1993). Dus, als veel mensen in onze omgeving starten met beleggen, nemen we aan dat het een goed moment is om dat ook te doen. Bewijs hiervoor volgt uit de sterke correlatie tussen de stand van de AEX en het aantal beleggers.

En natuurlijk speelt ook FOMO mee; Fear Of Missing Out. Als de buurman een nieuwe boot kan kopen van zijn Bitcoin-winsten (al dan niet alleen op papier), dan willen we daar niet bij achter blijven. Uit onderzoek van Kantar blijkt overigens ook dat “nauwelijks 2% van de huishoudens” cryptocurrencies zoals de bitcoin heeft. In een update van februari 2018 schrijft Reg van Steen van Kantar: Aantal Nederlandse beleggers cryptovaluta geëxplodeerd, maar nog geen kwart ervan staat op winst. Ze becijferen dat Nederland 580.000 cryptobeleggers telt.

Er is ook wetenschappelijke literatuur die stelt dat particuliere beleggers vaak slecht timen; ze stappen in als de koers hoog is en verkopen als de koers laag is. Om winst te maken, moet je dat juist andersom doen. Dit satirische bericht van De Speld bevat wel een kern van waarheid: Bitcoin stopt pas met stijgen als jij instapt.

Goede risicoafweging

De AFM is een toezichthouder, ze geeft geen beleggingsadvies. Maar de AFM vindt het wel belangrijk dat consumenten een goede risico-afweging maken. Daarom schreef de toezichthouder in 2013 Wees je bewust van de risico’s van Bitcoins, en onlangs nog Reële risico’s bij cryptocurrencies. Meer informatie over de risico’s van investeren in virtuele valuta lees je hier op de AFM website.

From Proof of Concept to Scalable Policies: Challenges and Solutions, with an Application

Crepon, Duflo, Gurgand, Rathelot, and Zamora From Proof of Concept to Scalable Policies: Challenges and Solutions, with an Application (2017) Abhijit Banerjee, Rukmini Banerji, James Berry, Esther Duflo, Harini Kannan,  Shobhini Mukerji, Marc Shotland, Michael Walton. Journal of Economic Perspectives vol. 31, no. 4, Fall 2017 (pp. 73-102). Suggested by my colleague Alexandra van Geen.

Abstract

The promise of randomized controlled trials is that evidence gathered through the evaluation of a specific program helps us—possibly after several rounds of fine-tuning and multiple replications in different contexts—to inform policy. However, critics have pointed out that a potential constraint in this agenda is that results from small “proof-of-concept” studies run by nongovernment organizations may not apply to policies that can be implemented by governments on a large scale. After discussing the potential issues, this paper describes the journey from the original concept to the design and evaluation of scalable policy. (…) We use this example to draw general lessons about using randomized control trials to design scalable policies.

In terms of establishing causal claims, it is generally accepted within the discipline that randomized controlled trials are particularly credible from the point of view of internal validity. This credibility applies to the interventions studied—at that time, on that population, implemented by the organization that was studied—but does not necessarily extend beyond. Not at all clear that results from small “proof-of-concept” studies run by nongovernment organizations can or should be directly turned into recommendations for policies for implementation by governments on a large scale. While external validity of a randomized controlled trial cannot be taken for granted, is it far from unattainable.

6 obstacles

Six main challenges in drawing conclusions from a localized randomized controlled trial about a policy implemented at scale:

[1] Market equilibrium effects.  When an intervention is implemented at scale, it could change the nature of the market.

To assess the equilibrium impact of an intervention (…) The typical design is a two-stage randomization procedure in which the treatment is randomly assigned at the market level in addition to the random assignment within a market. For example, the experiment of Crepon, Duflo, Gurgand, Rathelot, and Zamora (2013) varied the treatment density of a job placement assistance program in France within labor markets, in addition to random assignment of individuals within each market.

One potential challenge with the experimental identification of equilibrium effects is that it is not always obvious what the “market” is

When a particular intervention is scaled up, more people will be needed to implement it. This may lead to an increase in their wages or in difficulties hiring them.

[2] Spillover Effects. Many treatments have spillovers on neighboring units, which implies that those units are not ideal control groups. Not all spillovers are easy to detect in pilot experiments: in some cases, they may be highly nonlinear.

[3] Political reactions. including either resistance to or support for a program, may vary as programs scale up.

Potential political backlash?

  • Worth exploring whether some changes in potentially inessential program details are available.
  • It is also important to try to anticipate the backlash and create a constituency for the reform from the start
  • Finally, the potential for political backlash may provide an argument for not doing too many pilots, since large-scale programs are less likely to be scotched.

[4] Context Dependence. Would results extend in a different setting (even within the same country)? Would the results depend on some observed or unobserved characteristics of the location where the intervention was carried out?

[5] Randomization or Site-selection bias. Organizations or individuals who agree to participate in an early experiment may be different from the rest of the population; randomization bias.

  • Organizations (and even individuals within governments) who agree to participate in randomized controlled trials are often exceptional
  • A well-understood problem arises when individuals select into treatment
  • Site-selection bias arises because an organization chooses a location or
    a subgroup where effects are particularly large

Blair, Iyengar, and Shapiro (2013): randomized controlled trials across are disproportionally conducted in countries with democratic governments.

[6] Piloting Bias/Implementation Challenges. A number of studies have found differences between implementation by nongovernment organizations and governments. Banerjee, Hanna, Kyle, Olken, and Sumarto (2016): the [Indonesian] government was less effective at running a pilot program and more effective with full implementation.

As the discussion in this section has emphasized, the issue of how to travel from evidence at proof-of-concept level to a scaled-up version cannot be settled in the abstract. The issue of [4] context-dependence needs to be addressed through replications, ideally guided by theory. [1] General equilibrium and [2] spillover effects can be addressed by incorporating estimation of these effects into study designs, or by conducting large-scale experiments where the equilibrium plays out. [5] Randomization bias and [6] piloting bias can be addressed by trying out the programs on a sufficient scale with the government that will eventually implement it, documenting success and failure, and moving from there.

[I skipped the Teaching at the Right Level example]

General Lessons

Perhaps the key point is to remember what small pilot experiments are good for and what they are not good for.

If the objective is to design or test a model [i.e. no policy implications], the researcher can ignore most of the concerns that we talked about in this paper. Something valuable will be learnt anyway.

For researchers, a strong temptation in a stage-two trial will be to do what it takes “to make it work,” but the risk of implementation challenges means that it is important to think about how far to go in that direction. On the one hand, trial and error will be needed to embed any new intervention within an existing bureaucracy. Anything new is challenging, and at the beginning of a stage-two trial, considerable time needs to be spent to give the program a fair shot. On the other hand, if the research team embeds too much of its own staff and effort and ends up substituting for the organization, not enough will need to be learnt about where implementation problems might emerge

The Consumer Financial Protection Bureau and the Quest for Consumer Comprehension – Lauren Willis

CFPB_Vertical_RGB-300x212The Consumer Financial Protection Bureau and the Quest for Consumer Comprehension (book chapter, April 2017) by Lauren Willis.

I found out about this new strand of work via ASIC. I really liked Willis’ debunking paper The Financial Education Fallacy (2011). Related to the paper I summarize below: Performance-Based Consumer Law (2015) and Performance-Based Remedies: Ordering Firms to Eradicate Their Own Fraud (2017). Perhaps I will dive deeper into one of those in another blog post.

Abstract

To ensure that consumers understand financial products’ “costs, benefits, and risks,” the Consumer Financial Protection Bureau has been redesigning mandated disclosures, primarily through iterative lab testing. But no matter how well these disclosures perform in experiments, firms will run circles around the disclosures when studies end and marketing begins. To meet the challenge of the dynamic twenty-first-century consumer financial marketplace, the bureau should require firms to demonstrate that a good proportion of their customers understand key pertinent facts about the financial products they buy. Comprehension rules would induce firms to inform consumers and simplify products, tasks that firms are better equipped than the bureau to perform.
[unless otherwise stated, all text below is quoted from the paper]
The bureau [CFPB] must induce firms themselves to promote consumer comprehension:
Demonstrating sufficient customer comprehension could be a precondition firms must meet before enforcing a term or charging a fee, or firms could be sanctioned (or rewarded) for low (or high) demonstrated comprehension levels. In effect, rather than prescriptively regulating the marketing and sales process with mandated disclosures or pursuing firms on an ad hoc ex post basis for unfair, deceptive, and abusive marketing and sales practices, the bureau would monitor firms and incentivize them to minimize customer confusion as the marketing and sales process unfolds over time.
Comprehension rules are a form of performance-based regulation, in that they regulate outputs not inputs.
By moving testing of disclosure from the lab to the field, and trying to stimulate firms to develop creative disclosure methods, the CFPB implicitly acknowledges that:
  1. disclosures that do well in experimental conditions may not work in real-world conditions,
  2. firms are better situated than regulators to innovate to achieve consumer comprehension,
  3. valid, reliable consumer confusion audits are possible.

How might this form of regulation operate in practice?

  1. Measuring the quality of a valued outcome (comprehension) rather than of an input that is often pointless (mandated or pre-approved disclosure);
  2. Assessing actual customer comprehension in the field as conditions change over time, rather than imagining what the “reasonable consumer” would understand or testing consumers in the lab or in single-shot field experiments;
  3. Requiring firms to affirmatively and routinely demonstrate customer understanding, rather than relying on the bureau’s limited resources to examine firm performance ad hoc when problems arise;
  4. Giving firms the flexibility and responsibility to effectively inform their customers about key relevant costs, benefits and risks through whatever means the firms see fit, whether that be education or product simplification, rather than asking regulators to dictate how disclosures and products should be designed.
Certainly comprehension is often neither necessary nor sufficient for good decisions (…) Even knowledgeable consumers make bad decisions, whether as a result of inadequate willpower or decisionmaking biases. (…) many decisions require basic financial knowledge that consumers lack; the effective annual percentage rate (APR) for a credit card account “defies plain language efforts”.
It might well be more cost effective for society to engage in substantive regulation of product design or performance-based regulation of consumer welfare outcomes (e.g. a lender that does not follow the bureau’s underwriting rules can instead demonstrate annually that no more than five percent of its loan portfolio defaulted).

Antimarketing

Even without any intent to deceive, firms not only will but must leverage consumer confusion to compete with other firms that deceive customers.
Firms have a bevy of means at their disposal to undermine mandated disclosures’ effectiveness:
  1. By altering the design of the transaction (e.g. banks are adept at sabotaging overdraft disclosures, see When Nudges Fail: Slippery Defaults, Willis, 2013)
  2. Frame consumers’ thought processes long before consumers see a disclosure. Consumers may think they are unaffected, but advertising works (Wood and Poltrack 2015; Lewis and Reiley 2014).
  3. Physically divert attention from disclosures. AT&T designed the envelope, cover letter, and amended contract after extensive “antimarketing” market testing to ensure that most consumers would not open the envelope, or if they did open it, would not read beyond the cover letter (Ting v. AT&T, 319 F.3d 1126, 9th Cir. 2003)
  4. Take proactive steps to ferret out easy marks, vulnerable customers. Savvy firms might use inferred cognitive load, mood, or stress levels to sell consumers products at the very moment when mandated disclosures will be misinterpreted or ignored. Firms can even engage in real-time marketing through Internet and mobile devices to reach consumers at vulnerable moments (Digital Market Manipulation, Calo, 2014).
Like sausage-makers, marketers do not want the public to know how their product is made.

Comprehension rules & customer confusion audits

Comprehension rules would align firms’ goals with the CFPB’s mandate to ensure consumer understanding of financial product costs, benefits, and risks. The effect of successful regulation through comprehension rules would be to bring transactions into closer alignment with consumer expectations.
Firms know a lot about their customers, as they already collect this information for marketing and product development purposes.
The very capacities that modern firms use to market products and defeat mandated disclosures enable them to attain better consumer comprehension more quickly and at a lower cost than regulators. The bureau can try to educate consumers, but nothing beats professional marketers when it comes to sending consumers a message.
Firms are in a better position than regulators to decide when it is worth the cost of educating consumers about complex or unintuitive features and when simplifying products is more cost-effective. Firms might find that educating their customers is so costly that it would be cheaper for firms to directly channel consumers to suitable products.
The bureau would need to remain mindful of firm agility at circumventing disclosure, and guard against firms’ manipulation of customer confusion audit results.

Benchmarks

The benchmarks against which firm performance in customer confusion audits ought to be judged depend on which of the bureau’s statutory purposes it is pursuing: transparency, competition, or fairness.
Benchmarks if the goal is:
  • Fairness: the benchmarks would need to be high, perhaps as high as the approximately 85 percent benchmark implicitly used in false advertising cases
  • Competition: the benchmarks might be lower, depending on the firm’s ability to differentiate informed from uninformed consumers.
  • Prevent firms from undermining mandated disclosures: the benchmarks might be set at the comprehension levels the bureau can obtain in its disclosure testing.
  • Increase consumer comprehension from where consumers stand now:
    the benchmark might be set based on industrywide performance.

Benefits of Comprehension Rules

The effect of successful regulation through comprehension rules would be to bring transactions into closer alignment with consumer expectations.

The ultimate direct benefit of comprehension rules is increased consumer decisional autonomy; consumers would get what they think they are getting, not whatever hidden features firms can slip into the transaction.

Empowered choices free of confusion are only possible, and the market is only driven to efficiency, when consumers comprehend the transactions in which they engage.

Today we pretend that individual consumers use disclosures to drive market competition and make welfare-enhancing decisions, but we do not spend the resources needed to realize actual consumer understanding. As a result,  consumers neither discipline the market nor consistently enhance their own welfare.

Verslag Netspar International Pension Workshop 2018

Van 17-19 januari 2018 was de International Pension Workshop ’18 van Netspar. Mijn verslag/samenvatting, vooral in Twitter-draadjes.

Dag 1

Monika Böhnke – Choice in Pensions: Insights from the Swedish Premium Pension System

Opvallend: het default-fonds is het meest riskante. Idee bij opzet was dat dat juist zou stimuleren om te kiezen. Dus niet, ongeveer 1000 Zweden kiezen zelf per jaar.

Hazel BatemanRegulation of information provision for pension choices: Australia and the Netherlands compared

Vooral over onderzoek in Australië. Conclusies: “People are not using pension information as expected” & “Testing should be on real, actual behavior”

Ward Romp  – What drives pension reform measures in the OECD?

Niet heel nuttig voor mijn werk bij de AFM, wel interessant soort onderzoek. Handig overzicht pension reforms Nederland.

Johannes HagenA nudge to quit? The effect of fixed-term pensions on labor supply and retirement choices among older workers

Niet gezien (later wel ppt gekregen), wel interessant sterk effect van nudge voor keuze; als “lump sum in 5 jaar” meer salient, duidelijk weergegeven, dan +30%-punt die dat kiest en effect op werkkeuzes; mensen stoppen eerder met werken (want in 5 jaar hogere maandelijkse uitkering).

Dag 2

Jesper RangvidComparison of pension systems in The Netherlands and Denmark: Shifts from “safe” to “less safe” pensions products

“Given a choice, young males with economic background in cities are more likely to give up guarantee”

Goed punt van discussant: slechts 18% maakt een keuze in Denemarken; waarom dan keuze invoeren?

Raymond MontizaanPension reform: Disentangling the impact on Retirement Behavior and Private Savings

Snelle, harde aanpassing in pensioen leidt tot meerdere problemen; beleidsimplicatie: geleidelijk pensioenveranderingen doorvoeren.

 

Dag 3

Arthur van SoestPension Communication in the Netherlands and other countries.

DNB onderzoek gebruikt maat voor “objectieve” pensioenkennis: Pension literacy: 3 questions that respondents think they know the answer to (researchers don’t know the right answers). Only “don’t know” is scored as wrong answer. Any other answer is “correct”

Onderzoekers vinden wel relaties tussen pensioencommunicatie en pensioenkennis en actieve pensioenbeslissingen (even kwijt hoe ze dat definieerden), maar zijn zwakke verbanden en vaak niet causaal.

 

Henriëtte Prast – The Power of Percentage: Quantitative Framing of Pension Income.

Dit is onderzoek dat we als AFM ook gerepliceerd hebben. Zie hier mooie samenvatting op DasKapital:

 

Paul SmeetsFinancial Incentives Beat Social Norms: A Field Experiment on Retirement information search.

Mooi experiment met 250.000 brieven aan deelnemers pensioenfonds detailhandel

Inspelen op sociale norm werkt niet om meer inlogs te krijgen, kans op VVV-bon wel.

 

Gregor BeckerCan Digital Information Treatments Intensify the Search for Household Spending Data and Improve Liquidity?

Experiment met 100.000 bankklanten om ze online huishoudboekje te laten gebruiken; leidde bij huishoudens met betalingsproblemen tot gemiddeld €453 meer tegoed op lopende rekening

 

Tabea Bucher-KoenenFinTech as a solution for rational inattention in individual pension planning?

Nog in ruwe/opstartfase; willen effect van een pensioendashboard testen. Alleen in Duitsland weinig digitale data, daarom handmatig informatie voor 1000 mensen ingevoerd door studenten (à 24 minuten per persoon).

Discussant had mooie link naar onderzoek uit 2016 in UK dat ik nog niet kende: Understanding consumer experience of pension scams a year on from pension freedoms.

 

Rondom de praatjes had ik ook nog even contact met Marike Knoef. Onderzoek van haar was toen net in FD verschenen Nederlander weet nog altijd weinig van pensioen.

In dat stuk ook de 5 kennisvragen. Twee daarvan komen rechtstreeks uit AFM kennistest weetwatjeweet. Een draadje op Twitter: