Experimentation – assignment 1: running a ready-made experiment


In the first assignment, you’ll run one of two ready-made experiments that explore the issues we’ve discussed in class on Framing and Irony. The primary goal of this experiment is to get you acquainted with carrying out experiments and collecting data, which we will analyze together in class. The secondary (and ultimately more important, long-term) goal is to get you to critically think about any experiment you’re reading about, replicating or designing yourself. These two experiments and the data collected from them will form the foundation for your own experimental exploration in the form of follow-up studies.

The assignment is due Tuesday, 13 February, by 17:00. Submit a csv file of your experimental results. Specific instructions as to the data structure to be added later.

Framing – background and experiment

Choices can be worded in a way that highlights the positive or the negative aspects of the same decision, potentially changing their desirability.

In Tversky & Kahneman’s (1981) Asian Disease scenario, a disease, if untreated, is expected to claim 600 lives. In the gain-framed condition, participants are asked to make a choice between Program A where 200 people will be saved and Program B where there is a one-third chance of saving all 600 (and a two-third chance that nobody is saved). In the loss-framed condition, participants are told that with Program A, 400 people will die. Participants in the gain-framed conditions chose Program A 72% of the time whereas participants in the loss-framed condition chose it only 22% of the time. Tversky and Kahneman account for this framing effect through Prospect Theory, a behavioural model that represents how people decide between alternatives that involve risk and uncertainty (Kahneman & Tversky, 1979).

Tversky & Kahneman’s study on risky choice framing inspired numerous follow-up studies that unearth additional types of framing, namely attribute framing, where, for example, beef that is 95% lean is consider healthier than beef that’s 5% fat, or goal framing, where people are more likely motivated by a $5 reward than a $5 penalty. See Levin et. al (1998) for a survey of the many studies and their proposed typology of framing.

Your job in this first assignment is to run a ready-made experiment about framing. But before you do so, start off by actually doing the experiment yourself, following this link.

Pragmatic meaning and irony – background and experiment

One of the most challenging aspects of natural language interpretation is the interpretation of pragmatic meaning: intended meaning that goes beyong the literal meaning of words and sentences. For example, consider the following exchange (translated from German and adapted from Twitter, (Rehbein et al. 2013):

@germanpsycho: I’m married now!
B: My, uh, congratualtions!

Inferring pragmatic meaning oftentimes involves comparing what is literally said by the speaker with alternative utterances that could have been said instead (see Grice 1989, Horn 2006). In the example above, Just writing congratulations! would be interpreted at face value: The writer is expressing their best wishes for their interlocutor’s momentous life event. In spoken language, the listener would be sensitive to the intonation, but in written language it’s the added uh that signals that the utterance shouldn’t quite be interpreted literally. And so, compared with congratulations! the listener infers that the extra uh signals that the speaker isn’t being sincere.

If we wanted to teach a chat bot how to understand such instances of irony, we’d first need to understand exactly how irony is interpreted. In order to arrive at such an understanding, we can investigated corpora of spoken and writen language to make generalization about how language is used, and use uh(m) as a case study. This, of course, would require us to go through all the tokens and evaluated the function of uh(m). For example, these markers primarilly express hesitation, as in:

@GottaTalk2V1212 uhm, what are you doing?

Here the speaker is sincere: They’re asking for information about what the listener is doing or pointing out that whatever the listener is going is surprising, perhaps even unacceptable. In classifying each instance of uh(m), we’re relying on our (or our annotators’) intuitions, which in most cases will probably be correct. We could also obtain insight about how speakers comprehend these cases of irony by following the thread above to see if @germanpsycho responds in a way that suggests they interpreted the congratulations as ironic. What we’ll do here, though, is poll participants about their intuitions about the sincerity of the speaker in utterring the statement containing uh(m). Later in the course we’ll learn how to compare our experimental results with more corpus analysis.

The task is as follows:

In this experiment, we’ve taken 25 various tweets from the twitter corpus on NLTK, which had hashtagged adjectives. In this experiment, you’ll be asking participants to read the tweets — they have 5 seconds to read them — and then evaluated the speaker’s sincerity, degree of irony and the emotional mood of the tweet (i.e. its sentiment), as well as guess the tweet author’s gender. But before you do so, start off by actually doing the experiment yourself, following this link.


Setting up an Ibex account

In this course you’ll learn to use a powerful online experimentation tool used by many researchers of linguistics and psychology called Ibex Farm. Ibex Farm hosts experiments using multiple methodologies from simple survey and questionnaire-like tasks to reaction-time, reading-time experiments and experiments that use visual and audio stimuli. Designing the experiments will involve some coding in JavaScript and html, but not to worry: there’s lots of documentation and templates out there to help you out.

You might be thinking: why not just use google forms for these experiments? We’re dealing with pretty simple, textual experimental items. It’s true that it’d be much faster to upload all the items on google forms. But think ahead: You’ll be learning how to use a tool that might turn out to be useful downstream when you want to run more sophisticated behavioural experiments where you have better control of how participants view or hear the items you show them.

Create an account on Ibex farm. Before you start, carefully read the Ibex manual to understand how Ibex works. Note that you can either:

  • Use Ibex Farm, which will host your experiment. This means that you’ll only need to modify the file example.data.js (the easy option)
  • Set up the server, for which there are detailed instructions in the documentation (the more time- and work-intensive option)

Uploading the ready-made file

In order to gently introduce you to experimental design and related programming, you get to just run a ready-made experiment. Create a new experiment on Ibex Farm. Don’t call it ‘framing’ or anything else that might give away what you’re investigating. For the framing experiment download the experiment file provided here; for the adjective experiment download the experiment file provided here. You’ll have to upload this example intro onto the experiment setup page. Once you’ve uploaded these two files onto the new experiment you create on Ibex Farm, it should look just like the experiment the link for which was given above.

Testing the experiment and data

Before you get to the point of recruiting participants and collecting data, take a look at the experimental items to understand what is investigated here.

Here are some pointers to help you out:

  1. What type of framing is being investigated here (for the framing experiment)?
  2. Take a look at the items:
    1. What are the experimental conditions?
    2. How many items per condition will each participant see?
    3. Given their number, how many participants do we need to collect data from to obtain enough observations?
    4. Are there filler and control items?
      1. What are their proportion?
      2. How similar are they to the target items?
    5. What is the nature of the task?
    6. What kind of data will you obtain?
    7. Are there any potential confounds or inconsistencies in the task and items that might affect participants’ responses?

Also, before you start sending the experiment’s link to potential participants, check whether the experiment runs smoothly all the way through and that there’s no weird, unexpected stuff happening. Then look at the output (the results file), import it to R and see if you can read and understand the results.

Recruiting participants and running the experiment

Now that you’re ready to run the experiment, it’s time to find kind people who’ll agree to participate. You will want to ask people who aren’t taking this course, obviously, or people who know what this experiment is about. Among the people who’ll participate there might be a few who know about framing and we’d hope they know a thing or two about irony, but that’s fine.

Data management

We will be merging the experimental data which each of you will be collecting, so it’s important to make sure that the data has the same structure. You will get explicit instructions regarding the column names soon.

Exploratory data analysis

Once you’ve collected the data, take a look at it on R and see if you can make any generalizations about it. We will later compare your individual conclusions with the generalizations drawn from the aggregated data. What you’re expected to do prior to class is some descriptive statistics: look at means, medians, standard deviations, distribution of the data, and generate some graphs the capture the data.

Looking ahead

You were asked to think critically about the experimental items before you ran the experiments. Now is the time to think critically about the experiment vis-a-vis the results:

  1. Are there issues with the design, e.g. potential confounds, that the results highlight?
  2. Are there new issues you didn’t anticipate?
  3. Did you get null-results? Why do you think this is the case?
  4. Is the data messy in unexpected ways? Why?
  5. This experiment is more or less a variation on Tversky and Kahneman’s original study. What could be potential follow-ups to this study?
  6. How do these results figure into Prospect Theory and the typology of framing?