• - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Beauty sleep:...

Beauty sleep: experimental study on the perceived health and attractiveness of sleep deprived people

  • Related content
  • Peer review
  • John Axelsson , researcher 1 2 ,
  • Tina Sundelin , research assistant and MSc student 2 ,
  • Michael Ingre , statistician and PhD student 3 ,
  • Eus J W Van Someren , researcher 4 ,
  • Andreas Olsson , researcher 2 ,
  • Mats Lekander , researcher 1 3
  • 1 Osher Center for Integrative Medicine, Department of Clinical Neuroscience, Karolinska Institutet, 17177 Stockholm, Sweden
  • 2 Division for Psychology, Department of Clinical Neuroscience, Karolinska Institutet
  • 3 Stress Research Institute, Stockholm University, Stockholm
  • 4 Netherlands Institute for Neuroscience, an Institute of the Royal Netherlands Academy of Arts and Sciences, and VU Medical Center, Amsterdam, Netherlands
  • Correspondence to: J Axelsson john.axelsson{at}ki.se
  • Accepted 22 October 2010

Objective To investigate whether sleep deprived people are perceived as less healthy, less attractive, and more tired than after a normal night’s sleep.

Design Experimental study.

Setting Sleep laboratory in Stockholm, Sweden.

Participants 23 healthy, sleep deprived adults (age 18-31) who were photographed and 65 untrained observers (age 18-61) who rated the photographs.

Intervention Participants were photographed after a normal night’s sleep (eight hours) and after sleep deprivation (31 hours of wakefulness after a night of reduced sleep). The photographs were presented in a randomised order and rated by untrained observers.

Main outcome measure Difference in observer ratings of perceived health, attractiveness, and tiredness between sleep deprived and well rested participants using a visual analogue scale (100 mm).

Results Sleep deprived people were rated as less healthy (visual analogue scale scores, mean 63 (SE 2) v 68 (SE 2), P<0.001), more tired (53 (SE 3) v 44 (SE 3), P<0.001), and less attractive (38 (SE 2) v 40 (SE 2), P<0.001) than after a normal night’s sleep. The decrease in rated health was associated with ratings of increased tiredness and decreased attractiveness.

Conclusion Our findings show that sleep deprived people appear less healthy, less attractive, and more tired compared with when they are well rested. This suggests that humans are sensitive to sleep related facial cues, with potential implications for social and clinical judgments and behaviour. Studies are warranted for understanding how these effects may affect clinical decision making and can add knowledge with direct implications in a medical context.

Introduction

The recognition [of the case] depends in great measure on the accurate and rapid appreciation of small points in which the diseased differs from the healthy state Joseph Bell (1837-1911)

Good clinical judgment is an important skill in medical practice. This is well illustrated in the quote by Joseph Bell, 1 who demonstrated impressive observational and deductive skills. Bell was one of Sir Arthur Conan Doyle’s teachers and served as a model for the fictitious detective Sherlock Holmes. 2 Generally, human judgment involves complex processes, whereby ingrained, often less consciously deliberated responses from perceptual cues are mixed with semantic calculations to affect decision making. 3 Thus all social interactions, including diagnosis in clinical practice, are influenced by reflexive as well as reflective processes in human cognition and communication.

Sleep is an essential homeostatic process with well established effects on an individual’s physiological, cognitive, and behavioural functionality 4 5 6 7 and long term health, 8 but with only anecdotal support of a role in social perception, such as that underlying judgments of attractiveness and health. As illustrated by the common expression “beauty sleep,” an individual’s sleep history may play an integral part in the perception and judgments of his or her attractiveness and health. To date, the concept of beauty sleep has lacked scientific support, but the biological importance of sleep may have favoured a sensitivity to perceive sleep related cues in others. It seems warranted to explore such sensitivity, as sleep disorders and disturbed sleep are increasingly common in today’s 24 hour society and often coexist with some of the most common health problems, such as hypertension 9 10 and inflammatory conditions. 11

To describe the relation between sleep deprivation and perceived health and attractiveness we asked untrained observers to rate the faces of people who had been photographed after a normal night’s sleep and after a night of sleep deprivation. We chose facial photographs as the human face is the primary source of information in social communication. 12 A perceiver’s response to facial cues, signalling the bearer’s emotional state, intentions, and potential mate value, serves to guide actions in social contexts and may ultimately promote survival. 13 14 15 We hypothesised that untrained observers would perceive sleep deprived people as more tired, less healthy, and less attractive compared with after a normal night’s sleep.

Using an experimental design we photographed the faces of 23 adults (mean age 23, range 18-31 years, 11 women) between 14.00 and 15.00 under two conditions in a balanced design: after a normal night’s sleep (at least eight hours of sleep between 23.00-07.00 and seven hours of wakefulness) and after sleep deprivation (sleep 02.00-07.00 and 31 hours of wakefulness). We advertised for participants at four universities in the Stockholm area. Twenty of 44 potentially eligible people were excluded. Reasons for exclusion were reported sleep disturbances, abnormal sleep requirements (for example, sleep need out of the 7-9 hour range), health problems, or availability on study days (the main reason). We also excluded smokers and those who had consumed alcohol within two days of the protocol. One woman failed to participate in both conditions. Overall, we enrolled 12 women and 12 men.

The participants slept in their own homes. Sleep times were confirmed with sleep diaries and text messages. The sleep diaries (Karolinska sleep diary) included information on sleep latency, quality, duration, and sleepiness. Participants sent a text message to the research assistant by mobile phone (SMS) at bedtime and when they got up on the night before sleep deprivation. They had been instructed not to nap. During the normal sleep condition the participants’ mean duration of sleep, estimated from sleep diaries, was 8.45 (SE 0.20) hours. The sleep deprivation condition started with a restriction of sleep to five hours in bed; the participants sent text messages (SMS) when they went to sleep and when they woke up. The mean duration of sleep during this night, estimated from sleep diaries and text messages, was 5.06 (SE 0.04) hours. For the following night of total sleep deprivation, the participants were monitored in the sleep laboratory at all times. Thus, for the sleep deprivation condition, participants came to the laboratory at 22.00 (after 15 hours of wakefulness) to be monitored, and stayed awake for a further 16 hours. We therefore did not observe the participants during the first 15 hours of wakefulness, when they had had a slightly restricted sleep, but had good control over the last 16 hours of wakefulness when sleepiness increased in magnitude. For the sleep condition, participants came to the laboratory at 12.00 (after five hours of wakefulness). They were kept indoors two hours before being photographed to avoid the effects of exposure to sunlight and the weather. We had a series of five or six photographs (resolution 3872×2592 pixels) taken in a well lit room, with a constant white balance (×900l; colour temperature 4200 K, Nikon D80; Nikon, Tokyo). The white balance was differently set during the two days of the study and affected seven photographs (four taken during sleep deprivation and three during a normal night’s sleep). Removing these participants from the analyses did not affect the results. The distance from camera to head was fixed, as was the focal length, within 14 mm (between 44 and 58 mm). To ensure a fixed surface area of each face on the photograph, the focal length was adapted to the head size of each participant.

For the photo shoot, participants wore no makeup, had their hair loose (combed backwards if long), underwent similar cleaning or shaving procedures for both conditions, and were instructed to “sit with a straight back and look straight into the camera with a neutral, relaxed facial expression.” Although the photographer was not blinded to the sleep conditions, she followed a highly standardised procedure during each photo shoot, including minimal interaction with the participants. A blinded rater chose the most typical photograph from each series of photographs. This process resulted in 46 photographs; two (one from each sleep condition) of each of the 23 participants. This part of the study took place between June and September 2007.

In October 2007 the photographs were presented at a fixed interval of six seconds in a randomised order to 65 observers (mainly students at the Karolinska Institute, mean age 30 (range 18-61) years, 40 women), who were unaware of the conditions of the study. They rated the faces for attractiveness (very unattractive to very attractive), health (very sick to very healthy), and tiredness (not at all tired to very tired) on a 100 mm visual analogue scale. After every 23 photographs a brief intermission was allowed, including a working memory task lasting 23 seconds to prevent the faces being memorised. To ensure that the observers were not primed to tiredness when rating health and attractiveness they rated the photographs for attractiveness and health in the first two sessions and tiredness in the last. To avoid the influence of possible order effects we presented the photographs in a balanced order between conditions for each session.

Statistical analyses

Data were analysed using multilevel mixed effects linear regression, with two crossed independent random effects accounting for random variation between observers and participants using the xtmixed procedure in Stata 9.2. We present the effect of condition as a percentage of change from the baseline condition as the reference using the absolute value in millimetres (rated on the visual analogue scale). No data were missing in the analyses.

Sixty five observers rated each of the 46 photographs for attractiveness, health, and tiredness: 138 ratings by each observer and 2990 ratings for each of the three factors rated. When sleep deprived, people were rated as less healthy (visual analogue scale scores, mean 63 (SE 2) v 68 (SE 2)), more tired (53 (SE 3) v 44 (SE 3)), and less attractive (38 (SE 2) v 40 (SE 2); P<0.001 for all) than after a normal night’s sleep (table 1 ⇓ ). Compared with the normal sleep condition, perceptions of health and attractiveness in the sleep deprived condition decreased on average by 6% and 4% and tiredness increased by 19%.

 Multilevel mixed effects regression on effect of how sleep deprived people are perceived with respect to attractiveness, health, and tiredness

  • View inline

A 10 mm increase in tiredness was associated with a −3.0 mm change in health, a 10 mm increase in health increased attractiveness by 2.4 mm, and a 10 mm increase in tiredness reduced attractiveness by 1.2 mm (table 2 ⇓ ). These findings were also presented as correlation, suggesting that faces with perceived attractiveness are positively associated with perceived health (r=0.42, fig 1 ⇓ ) and negatively with perceived tiredness (r=−0.28, fig 1). In addition, the average decrease (for each face) in attractiveness as a result of deprived sleep was associated with changes in tiredness (−0.53, n=23, P=0.03) and in health (0.50, n=23, P=0.01). Moreover, a strong negative association was found between the respective perceptions of tiredness and health (r=−0.54, fig 1). Figure 2 ⇓ shows an example of observer rated faces.

 Associations between health, tiredness, and attractiveness

Fig 1  Relations between health, tiredness, and attractiveness of 46 photographs (two each of 23 participants) rated by 65 observers on 100 mm visual analogue scales, with variation between observers removed using empirical Bayes’ estimates

  • Download figure
  • Open in new tab
  • Download powerpoint

Fig 2  Participant after a normal night’s sleep (left) and after sleep deprivation (right). Faces were presented in a counterbalanced order

To evaluate the mediation effects of sleep loss on attractiveness and health, tiredness was added to the models presented in table 1 following recommendations. 16 The effect of sleep loss was significantly mediated by tiredness on both health (P<0.001) and attractiveness (P<0.001). When tiredness was added to the model (table 1) with an estimated coefficient of −2.9 (SE 0.1; P<0.001) the independent effect of sleep loss on health decreased from −4.2 to −1.8 (SE 0.5; P<0.001). The effect of sleep loss on attractiveness decreased from −1.6 (table 1) to −0.62 (SE 0.4; P=0.133), with tiredness estimated at −1.1 (SE 0.1; P<0.001). The same approach applied to the model of attractiveness and health (table 2), with a decrease in the association from 2.4 to 2.1 (SE 0.1; P<0.001) with tiredness estimated at −0.56 (SE 0.1; P<0.001).

Sleep deprived people are perceived as less attractive, less healthy, and more tired compared with when they are well rested. Apparent tiredness was strongly related to looking less healthy and less attractive, which was also supported by the mediating analyses, indicating that a large part of the found effects and relations on appearing healthy and attractive were mediated by looking tired. The fact that untrained observers detected the effects of sleep loss in others not only provides evidence for a perceptual ability not previously subjected to experimental control, but also supports the notion that sleep history gives rise to socially relevant signals that provide information about the bearer. The adaptiveness of an ability to detect sleep related facial cues resonates well with other research, showing that small deviations from the average sleep duration in the long term are associated with an increased risk of health problems and with a decreased longevity. 8 17 Indeed, even a few hours of sleep deprivation inflict an array of physiological changes, including neural, endocrinological, immunological, and cellular functioning, that if sustained are relevant for long term health. 7 18 19 20 Here, we show that such physiological changes are paralleled by detectable facial changes.

These results are related to photographs taken in an artificial setting and presented to the observers for only six seconds. It is likely that the effects reported here would be larger in real life person to person situations, when overt behaviour and interactions add further information. Blink interval and blink duration are known to be indicators of sleepiness, 21 and trained observers are able to evaluate reliably the drowsiness of drivers by watching their videotaped faces. 22 In addition, a few of the people were perceived as healthier, less tired, and more attractive during the sleep deprived condition. It remains to be evaluated in follow-up research whether this is due to random error noise in judgments, or associated with specific characteristics of observers or the sleep deprived people they judge. Nevertheless, we believe that the present findings can be generalised to a wide variety of settings, but further studies will have to investigate the impact on clinical studies and other social situations.

Importantly, our findings suggest a prominent role of sleep history in several domains of interpersonal perception and judgment, in which sleep history has previously not been considered of importance, such as in clinical judgment. In addition, because attractiveness motivates sexual behaviour, collaboration, and superior treatment, 13 sleep loss may have consequences in other social contexts. For example, it has been proposed that facial cues perceived as attractive are signals of good health and that this recognition has been selected evolutionarily to guide choice of mate and successful transmission of genes. 13 The fact that good sleep supports a healthy look and poor sleep the reverse may be of particular relevance in the medical setting, where health estimates are an essential part. It is possible that people with sleep disturbances, clinical or otherwise, would be judged as more unhealthy, whereas those who have had an unusually good night’s sleep may be perceived as rather healthy. Compared with the sleep deprivation used in the present investigation, further studies are needed to investigate the effects of less drastic acute reductions of sleep as well as long term clinical effects.

Conclusions

People are capable of detecting sleep loss related facial cues, and these cues modify judgments of another’s health and attractiveness. These conclusions agree well with existing models describing a link between sleep and good health, 18 23 as well as a link between attractiveness and health. 13 Future studies should focus on the relevance of these facial cues in clinical settings. These could investigate whether clinicians are better than the average population at detecting sleep or health related facial cues, and whether patients with a clinical diagnosis exhibit more tiredness and are less healthy looking than healthy people. Perhaps the more successful doctors are those who pick up on these details and act accordingly.

Taken together, our results provide important insights into judgments about health and attractiveness that are reminiscent of the anecdotal wisdom harboured in Bell’s words, and in the colloquial notion of “beauty sleep.”

What is already known on this topic

Short or disturbed sleep and fatigue constitute major risk factors for health and safety

Complaints of short or disturbed sleep are common among patients seeking healthcare

The human face is the main source of information for social signalling

What this study adds

The facial cues of sleep deprived people are sufficient for others to judge them as more tired, less healthy, and less attractive, lending the first scientific support to the concept of “beauty sleep”

By affecting doctors’ general perception of health, the sleep history of a patient may affect clinical decisions and diagnostic precision

Cite this as: BMJ 2010;341:c6614

We thank B Karshikoff for support with data acquisition and M Ingvar for comments on an earlier draft of the manuscript, both without compensation and working at the Department for Clinical Neuroscience, Karolinska Institutet, Sweden.

Contributors: JA designed the data collection, supervised and monitored data collection, wrote the statistical analysis plan, carried out the statistical analyses, obtained funding, drafted and revised the manuscript, and is guarantor. TS designed and carried out the data collection, cleaned the data, drafted, revised the manuscript, and had final approval of the manuscript. JA and TS contributed equally to the work. MI wrote the statistical analysis plan, carried out the statistical analyses, drafted the manuscript, and critically revised the manuscript. EJWVS provided statistical advice, advised on data handling, and critically revised the manuscript. AO provided advice on the methods and critically revised the manuscript. ML provided administrative support, drafted the manuscript, and critically revised the manuscript. All authors approved the final version of the manuscript.

Funding: This study was funded by the Swedish Society for Medical Research, Rut and Arvid Wolff’s Memory Fund, and the Osher Center for Integrative Medicine.

Competing interests: All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any company for the submitted work; no financial relationships with any companies that might have an interest in the submitted work in the previous 3 years; no other relationships or activities that could appear to have influenced the submitted work.

Ethical approval: This study was approved by the Karolinska Institutet’s ethical committee. Participants were compensated for their participation.

Participant consent: Participant’s consent obtained.

Data sharing: Statistical code and dataset of ratings are available from the corresponding author at john.axelsson{at}ki.se .

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode .

  • ↵ Deten A, Volz HC, Clamors S, Leiblein S, Briest W, Marx G, et al. Hematopoietic stem cells do not repair the infarcted mouse heart. Cardiovasc Res 2005 ; 65 : 52 -63. OpenUrl Abstract / FREE Full Text
  • ↵ Doyle AC. The case-book of Sherlock Holmes: selected stories. Wordsworth, 1993.
  • ↵ Lieberman MD, Gaunt R, Gilbert DT, Trope Y. Reflection and reflexion: a social cognitive neuroscience approach to attributional inference. Adv Exp Soc Psychol 2002 ; 34 : 199 -249. OpenUrl CrossRef
  • ↵ Drummond SPA, Brown GG, Gillin JC, Stricker JL, Wong EC, Buxton RB. Altered brain response to verbal learning following sleep deprivation. Nature 2000 ; 403 : 655 -7. OpenUrl CrossRef PubMed
  • ↵ Harrison Y, Horne JA. The impact of sleep deprivation on decision making: a review. J Exp Psychol Appl 2000 ; 6 : 236 -49. OpenUrl CrossRef PubMed Web of Science
  • ↵ Huber R, Ghilardi MF, Massimini M, Tononi G. Local sleep and learning. Nature 2004 ; 430 : 78 -81. OpenUrl CrossRef PubMed Web of Science
  • ↵ Spiegel K, Leproult R, Van Cauter E. Impact of sleep debt on metabolic and endocrine function. Lancet 1999 ; 354 : 1435 -9. OpenUrl CrossRef PubMed Web of Science
  • ↵ Kripke DF, Garfinkel L, Wingard DL, Klauber MR, Marler MR. Mortality associated with sleep duration and insomnia. Arch Gen Psychiatry 2002 ; 59 : 131 -6. OpenUrl CrossRef PubMed Web of Science
  • ↵ Olson LG, Ambrogetti A. Waking up to sleep disorders. Br J Hosp Med (Lond) 2006 ; 67 : 118 , 20. OpenUrl PubMed
  • ↵ Rajaratnam SM, Arendt J. Health in a 24-h society. Lancet 2001 ; 358 : 999 -1005. OpenUrl CrossRef PubMed Web of Science
  • ↵ Ranjbaran Z, Keefer L, Stepanski E, Farhadi A, Keshavarzian A. The relevance of sleep abnormalities to chronic inflammatory conditions. Inflamm Res 2007 ; 56 : 51 -7. OpenUrl CrossRef PubMed Web of Science
  • ↵ Haxby JV, Hoffman EA, Gobbini MI. The distributed human neural system for face perception. Trends Cogn Sci 2000 ; 4 : 223 -33. OpenUrl CrossRef PubMed Web of Science
  • ↵ Rhodes G. The evolutionary psychology of facial beauty. Annu Rev Psychol 2006 ; 57 : 199 -226. OpenUrl CrossRef PubMed Web of Science
  • ↵ Todorov A, Mandisodza AN, Goren A, Hall CC. Inferences of competence from faces predict election outcomes. Science 2005 ; 308 : 1623 -6. OpenUrl Abstract / FREE Full Text
  • ↵ Willis J, Todorov A. First impressions: making up your mind after a 100-ms exposure to a face. Psychol Sci 2006 ; 17 : 592 -8. OpenUrl Abstract / FREE Full Text
  • ↵ Krull JL, MacKinnon DP. Multilevel modeling of individual and group level mediated effects. Multivariate Behav Res 2001 ; 36 : 249 -77. OpenUrl CrossRef Web of Science
  • ↵ Ayas NT, White DP, Manson JE, Stampfer MJ, Speizer FE, Malhotra A, et al. A prospective study of sleep duration and coronary heart disease in women. Arch Intern Med 2003 ; 163 : 205 -9. OpenUrl CrossRef PubMed Web of Science
  • ↵ Bryant PA, Trinder J, Curtis N. Sick and tired: does sleep have a vital role in the immune system. Nat Rev Immunol 2004 ; 4 : 457 -67. OpenUrl CrossRef PubMed Web of Science
  • ↵ Cirelli C. Cellular consequences of sleep deprivation in the brain. Sleep Med Rev 2006 ; 10 : 307 -21. OpenUrl CrossRef PubMed Web of Science
  • ↵ Irwin MR, Wang M, Campomayor CO, Collado-Hidalgo A, Cole S. Sleep deprivation and activation of morning levels of cellular and genomic markers of inflammation. Arch Intern Med 2006 ; 166 : 1756 -62. OpenUrl CrossRef PubMed Web of Science
  • ↵ Schleicher R, Galley N, Briest S, Galley L. Blinks and saccades as indicators of fatigue in sleepiness warnings: looking tired? Ergonomics 2008 ; 51 : 982 -1010. OpenUrl CrossRef PubMed Web of Science
  • ↵ Wierwille WW, Ellsworth LA. Evaluation of driver drowsiness by trained raters. Accid Anal Prev 1994 ; 26 : 571 -81. OpenUrl CrossRef PubMed Web of Science
  • ↵ Horne J. Why we sleep—the functions of sleep in humans and other mammals. Oxford University Press, 1988.

experimental research scholarly articles

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 11 December 2020

Quantifying and addressing the prevalence and bias of study designs in the environmental and social sciences

  • Alec P. Christie   ORCID: orcid.org/0000-0002-8465-8410 1 ,
  • David Abecasis   ORCID: orcid.org/0000-0002-9802-8153 2 ,
  • Mehdi Adjeroud 3 ,
  • Juan C. Alonso   ORCID: orcid.org/0000-0003-0450-7434 4 ,
  • Tatsuya Amano   ORCID: orcid.org/0000-0001-6576-3410 5 ,
  • Alvaro Anton   ORCID: orcid.org/0000-0003-4108-6122 6 ,
  • Barry P. Baldigo   ORCID: orcid.org/0000-0002-9862-9119 7 ,
  • Rafael Barrientos   ORCID: orcid.org/0000-0002-1677-3214 8 ,
  • Jake E. Bicknell   ORCID: orcid.org/0000-0001-6831-627X 9 ,
  • Deborah A. Buhl 10 ,
  • Just Cebrian   ORCID: orcid.org/0000-0002-9916-8430 11 ,
  • Ricardo S. Ceia   ORCID: orcid.org/0000-0001-7078-0178 12 , 13 ,
  • Luciana Cibils-Martina   ORCID: orcid.org/0000-0002-2101-4095 14 , 15 ,
  • Sarah Clarke 16 ,
  • Joachim Claudet   ORCID: orcid.org/0000-0001-6295-1061 17 ,
  • Michael D. Craig 18 , 19 ,
  • Dominique Davoult 20 ,
  • Annelies De Backer   ORCID: orcid.org/0000-0001-9129-9009 21 ,
  • Mary K. Donovan   ORCID: orcid.org/0000-0001-6855-0197 22 , 23 ,
  • Tyler D. Eddy 24 , 25 , 26 ,
  • Filipe M. França   ORCID: orcid.org/0000-0003-3827-1917 27 ,
  • Jonathan P. A. Gardner   ORCID: orcid.org/0000-0002-6943-2413 26 ,
  • Bradley P. Harris 28 ,
  • Ari Huusko 29 ,
  • Ian L. Jones 30 ,
  • Brendan P. Kelaher 31 ,
  • Janne S. Kotiaho   ORCID: orcid.org/0000-0002-4732-784X 32 , 33 ,
  • Adrià López-Baucells   ORCID: orcid.org/0000-0001-8446-0108 34 , 35 , 36 ,
  • Heather L. Major   ORCID: orcid.org/0000-0002-7265-1289 37 ,
  • Aki Mäki-Petäys 38 , 39 ,
  • Beatriz Martín 40 , 41 ,
  • Carlos A. Martín 8 ,
  • Philip A. Martin 1 , 42 ,
  • Daniel Mateos-Molina   ORCID: orcid.org/0000-0002-9383-0593 43 ,
  • Robert A. McConnaughey   ORCID: orcid.org/0000-0002-8537-3695 44 ,
  • Michele Meroni 45 ,
  • Christoph F. J. Meyer   ORCID: orcid.org/0000-0001-9958-8913 34 , 35 , 46 ,
  • Kade Mills 47 ,
  • Monica Montefalcone 48 ,
  • Norbertas Noreika   ORCID: orcid.org/0000-0002-3853-7677 49 , 50 ,
  • Carlos Palacín 4 ,
  • Anjali Pande 26 , 51 , 52 ,
  • C. Roland Pitcher   ORCID: orcid.org/0000-0003-2075-4347 53 ,
  • Carlos Ponce 54 ,
  • Matt Rinella 55 ,
  • Ricardo Rocha   ORCID: orcid.org/0000-0003-2757-7347 34 , 35 , 56 ,
  • María C. Ruiz-Delgado 57 ,
  • Juan J. Schmitter-Soto   ORCID: orcid.org/0000-0003-4736-8382 58 ,
  • Jill A. Shaffer   ORCID: orcid.org/0000-0003-3172-0708 10 ,
  • Shailesh Sharma   ORCID: orcid.org/0000-0002-7918-4070 59 ,
  • Anna A. Sher   ORCID: orcid.org/0000-0002-6433-9746 60 ,
  • Doriane Stagnol 20 ,
  • Thomas R. Stanley 61 ,
  • Kevin D. E. Stokesbury 62 ,
  • Aurora Torres 63 , 64 ,
  • Oliver Tully 16 ,
  • Teppo Vehanen   ORCID: orcid.org/0000-0003-3441-6787 65 ,
  • Corinne Watts 66 ,
  • Qingyuan Zhao 67 &
  • William J. Sutherland 1 , 42  

Nature Communications volume  11 , Article number:  6377 ( 2020 ) Cite this article

16k Accesses

60 Citations

68 Altmetric

Metrics details

  • Environmental impact
  • Scientific community
  • Social sciences

Building trust in science and evidence-based decision-making depends heavily on the credibility of studies and their findings. Researchers employ many different study designs that vary in their risk of bias to evaluate the true effect of interventions or impacts. Here, we empirically quantify, on a large scale, the prevalence of different study designs and the magnitude of bias in their estimates. Randomised designs and controlled observational designs with pre-intervention sampling were used by just 23% of intervention studies in biodiversity conservation, and 36% of intervention studies in social science. We demonstrate, through pairwise within-study comparisons across 49 environmental datasets, that these types of designs usually give less biased estimates than simpler observational designs. We propose a model-based approach to combine study estimates that may suffer from different levels of study design bias, discuss the implications for evidence synthesis, and how to facilitate the use of more credible study designs.

Similar content being viewed by others

experimental research scholarly articles

Citizen science in environmental and ecological sciences

experimental research scholarly articles

Improving quantitative synthesis to achieve generality in ecology

experimental research scholarly articles

Empirical evidence of widespread exaggeration bias and selective reporting in ecology

Introduction.

The ability of science to reliably guide evidence-based decision-making hinges on the accuracy and credibility of studies and their results 1 , 2 . Well-designed, randomised experiments are widely accepted to yield more credible results than non-randomised, ‘observational studies’ that attempt to approximate and mimic randomised experiments 3 . Randomisation is a key element of study design that is widely used across many disciplines because of its ability to remove confounding biases (through random assignment of the treatment or impact of interest 4 , 5 ). However, ethical, logistical, and economic constraints often prevent the implementation of randomised experiments, whereas non-randomised observational studies have become popular as they take advantage of historical data for new research questions, larger sample sizes, less costly implementation, and more relevant and representative study systems or populations 6 , 7 , 8 , 9 . Observational studies nevertheless face the challenge of accounting for confounding biases without randomisation, which has led to innovations in study design.

We define ‘study design’ as an organised way of collecting data. Importantly, we distinguish between data collection and statistical analysis (as opposed to other authors 10 ) because of the belief that bias introduced by a flawed design is often much more important than bias introduced by statistical analyses. This was emphasised by Light, Singer & Willet 11 (p. 5): “You can’t fix by analysis what you bungled by design…”; and Rubin 3 : “Design trumps analysis.” Nevertheless, the importance of study design has often been overlooked in debates over the inability of researchers to reproduce the original results of published studies (so-called ‘reproducibility crises’ 12 , 13 ) in favour of other issues (e.g., p-hacking 14 and Hypothesizing After Results are Known or ‘HARKing’ 15 ).

To demonstrate the importance of study designs, we can use the following decomposition of estimation error equation 16 :

This demonstrates that even if we improve the quality of modelling and analysis (to reduce modelling bias through a better bias-variance trade-off 17 ) or increase sample size (to reduce statistical noise), we cannot remove the intrinsic bias introduced by the choice of study design (design bias) unless we collect the data in a different way. The importance of study design in determining the levels of bias in study results therefore cannot be overstated.

For the purposes of this study we consider six commonly used study designs; differences and connections can be visualised in Fig.  1 . There are three major components that allow us to define these designs: randomisation, sampling before and after the impact of interest occurs, and the use of a control group.

figure 1

A hypothetical study set-up is shown where the abundance of birds in three impact and control replicates (e.g., fields represented by blocks in a row) are monitored before and after an impact (e.g., ploughing) that occurs in year zero. Different colours represent each study design and illustrate how replicates are sampled. Approaches for calculating an estimate of the true effect of the impact for each design are also shown, along with synonyms from different disciplines.

Of the non-randomised observational designs, the Before-After Control-Impact (BACI) design uses a control group and samples before and after the impact occurs (i.e., in the ‘before-period’ and the ‘after-period’). Its rationale is to explicitly account for pre-existing differences between the impact group (exposed to the impact) and control group in the before-period, which might otherwise bias the estimate of the impact’s true effect 6 , 18 , 19 .

The BACI design improves upon several other commonly used observational study designs, of which there are two uncontrolled designs: After, and Before-After (BA). An After design monitors an impact group in the after-period, while a BA design compares the state of the impact group between the before- and after-periods. Both designs can be expected to yield poor estimates of the impact’s true effect (large design bias; Equation (1)) because changes in the response variable could have occurred without the impact (e.g., due to natural seasonal changes; Fig.  1 ).

The other observational design is Control-Impact (CI), which compares the impact group and control group in the after-period (Fig.  1 ). This design may suffer from design bias introduced by pre-existing differences between the impact group and control group in the before-period; bias that the BACI design was developed to account for 20 , 21 . These differences have many possible sources, including experimenter bias, logistical and environmental constraints, and various confounding factors (variables that change the propensity of receiving the impact), but can be adjusted for through certain data pre-processing techniques such as matching and stratification 22 .

Among the randomised designs, the most commonly used are counterparts to the observational CI and BACI designs: Randomised Control-Impact (R-CI) and Randomised Before-After Control-Impact (R-BACI) designs. The R-CI design, often termed ‘Randomised Controlled Trials’ (RCTs) in medicine and hailed as the ‘gold standard’ 23 , 24 , removes any pre-impact differences in a stochastic sense, resulting in zero design bias (Equation ( 1 )). Similarly, the R-BACI design should also have zero design bias, and the impact group measurements in the before-period could be used to improve the efficiency of the statistical estimator. No randomised equivalents exist of After or BA designs as they are uncontrolled.

It is important to briefly note that there is debate over two major statistical methods that can be used to analyse data collected using BACI and R-BACI designs, and which is superior at reducing modelling bias 25 (Equation (1)). These statistical methods are: (i) Differences in Differences (DiD) estimator; and (ii) covariance adjustment using the before-period response, which is an extension of Analysis of Covariance (ANCOVA) for generalised linear models — herein termed ‘covariance adjustment’ (Fig.  1 ). These estimators rely on different assumptions to obtain unbiased estimates of the impact’s true effect. The DiD estimator assumes that the control group response accurately represents the impact group response had it not been exposed to the impact (‘parallel trends’ 18 , 26 ) whereas covariance adjustment assumes there are no unmeasured confounders and linear model assumptions hold 6 , 27 .

From both theory and Equation (1), with similar sample sizes, randomised designs (R-BACI and R-CI) are expected to be less biased than controlled, observational designs with sampling in the before-period (BACI), which in turn should be superior to observational designs without sampling in the before-period (CI) or without a control group (BA and After designs 7 , 28 ). Between randomised designs, we might expect that an R-BACI design performs better than a R-CI design because utilising extra data before the impact may improve the efficiency of the statistical estimator by explicitly characterising pre-existing differences between the impact group and control group.

Given the likely differences in bias associated with different study designs, concerns have been raised over the use of poorly designed studies in several scientific disciplines 7 , 29 , 30 , 31 , 32 , 33 , 34 , 35 . Some disciplines, such as the social and medical sciences, commonly undertake direct comparisons of results obtained by randomised and non-randomised designs within a single study 36 , 37 , 38 or between multiple studies (between-study comparisons 39 , 40 , 41 ) to specifically understand the influence of study designs on research findings. However, within-study comparisons are limited in their scope (e.g., a single study 42 , 43 ) and between-study comparisons can be confounded by variability in context or study populations 44 . Overall, we lack quantitative estimates of the prevalence of different study designs and the levels of bias associated with their results.

In this work, we aim to first quantify the prevalence of different study designs in the social and environmental sciences. To fill this knowledge gap, we take advantage of summaries for several thousand biodiversity conservation intervention studies in the Conservation Evidence database 45 ( www.conservationevidence.com ) and social intervention studies in systematic reviews by the Campbell Collaboration ( www.campbellcollaboration.org ). We then quantify the levels of bias in estimates obtained by different study designs (R-BACI, R-CI, BACI, BA, and CI) by applying a hierarchical model to approximately 1000 within-study comparisons across 49 raw environmental datasets from a range of fields. We show that R-BACI, R-CI and BACI designs are poorly represented in studies testing biodiversity conservation and social interventions, and that these types of designs tend to give less biased estimates than simpler observational designs. We propose a model-based approach to combine study estimates that may suffer from different levels of study design bias, discuss the implications for evidence synthesis, and how to facilitate the use of more credible study designs.

Prevalence of study designs

We found that the biodiversity-conservation (conservation evidence) and social-science (Campbell collaboration) literature had similarly high proportions of intervention studies that used CI designs and After designs, but low proportions that used R-BACI, BACI, or BA designs (Fig.  2 ). There were slightly higher proportions of R-CI designs used by intervention studies in social-science systematic reviews than in the biodiversity-conservation literature (Fig.  2 ). The R-BACI, R-CI, and BACI designs made up 23% of intervention studies for biodiversity conservation, and 36% of intervention studies for social science.

figure 2

Intervention studies from the biodiversity-conservation literature were screened from the Conservation Evidence database ( n =4260 studies) and studies from the social-science literature were screened from 32 Campbell Collaboration systematic reviews ( n =1009 studies – note studies excluded by these reviews based on their study design were still counted). Percentages for the social-science literature were calculated for each systematic review (blue data points) and then averaged across all 32 systematic reviews (blue bars and black vertical lines represent mean and 95% Confidence Intervals, respectively). Percentages for the biodiversity-conservation literature are absolute values (shown as green bars) calculated from the entire Conservation Evidence database (after excluding any reviews). Source data are provided as a Source Data file. BA before-after, CI control-impact, BACI before-after-control-impact, R-BACI randomised BACI, R-CI randomised CI.

Influence of different study designs on study results

In non-randomised datasets, we found that estimates of BACI (with covariance adjustment) and CI designs were very similar, while the point estimates for most other designs often differed substantially in their magnitude and sign. We found similar results in randomised datasets for R-BACI (with covariance adjustment) and R-CI designs. For ~30% of responses, in both non-randomised and randomised datasets, study design estimates differed in their statistical significance (i.e., p < 0.05 versus p  > =0.05), except for estimates of (R-)BACI (with covariance adjustment) and (R-)CI designs (Table  1 ; Fig.  3 ). It was rare for the 95% confidence intervals of different designs’ estimates to not overlap – except when comparing estimates of BA designs to (R-)BACI (with covariance adjustment) and (R-)CI designs (Table  1 ). It was even rarer for estimates of different designs to have significantly different signs (i.e., one estimate with entirely negative confidence intervals versus one with entirely positive confidence intervals; Table  1 , Fig.  3 ). Overall, point estimates often differed greatly in their magnitude and, to a lesser extent, in their sign between study designs, but did not differ as greatly when accounting for the uncertainty around point estimates – except in terms of their statistical significance.

figure 3

t-statistics were obtained from two-sided t-tests of estimates obtained by each design for different responses in each dataset using Generalised Linear Models (see Methods). For randomised datasets, BACI and CI axis labels refer to R-BACI and R-CI designs (denoted by ‘R-’). DiD Difference in Differences; CA covariance adjustment. Lines at t-statistic values of 1.96 denote boundaries between cells and colours of points indicate differences in direction and statistical significance ( p  < 0.05; grey = same sign and significance, orange = same sign but difference in significance, red = different sign and significance). Numbers refer to the number of responses in each cell. Source data are provided as a Source Data file. BA Before-After, CI Control-Impact, BACI Before-After-Control-Impact.

Levels of bias in estimates of different study designs

We modelled study design bias using a random effect across datasets in a hierarchical Bayesian model; σ is the standard deviation of the bias term, and assuming bias is randomly distributed across datasets and is on average zero, larger values of σ will indicate a greater magnitude of bias (see Methods). We found that, for randomised datasets, estimates of both R-BACI (using covariance adjustment; CA) and R-CI designs were affected by negligible amounts of bias (very small values of σ; Table  2 ). When the R-BACI design used the DiD estimator, it suffered from slightly more bias (slightly larger values of σ), whereas the BA design had very high bias when applied to randomised datasets (very large values of σ; Table  2 ). There was a highly positive correlation between the estimates of R-BACI (using covariance adjustment) and R-CI designs (Ω[R-BACI CA, R-CI] was close to 1; Table  2 ). Estimates of R-BACI using the DiD estimator were also positively correlated with estimates of R-BACI using covariance adjustment and R-CI designs (moderate positive mean values of Ω[R-BACI CA, R-BACI DiD] and Ω[R-BACI DiD, R-CI]; Table  2 ).

For non-randomised datasets, controlled designs (BACI and CI) were substantially less biased (far smaller values of σ) than the uncontrolled BA design (Table  2 ). A BACI design using the DiD estimator was slightly less biased than the BACI design using covariance adjustment, which was, in turn, slightly less biased than the CI design (Table  2 ).

Standard errors estimated by the hierarchical Bayesian model were reasonably accurate for the randomised datasets (see λ in Methods and Table  2 ), whereas there was some underestimation of standard errors and lack-of-fit for non-randomised datasets.

Our approach provides a principled way to quantify the levels of bias associated with different study designs. We found that randomised study designs (R-BACI and R-CI) and observational BACI designs are poorly represented in the environmental and social sciences; collectively, descriptive case studies (the After design), the uncontrolled, observational BA design, and the controlled, observational CI design made up a substantially greater proportion of intervention studies (Fig.  2 ). And yet R-BACI, R-CI and BACI designs were found to be quantifiably less biased than other observational designs.

As expected the R-CI and R-BACI designs (using a covariance adjustment estimator) performed well; the R-BACI design using a DiD estimator performed slightly less well, probably because the differencing of pre-impact data by this estimator may introduce additional statistical noise compared to covariance adjustment, which controls for these data using a lagged regression variable. Of the observational designs, the BA design performed very poorly (both when analysing randomised and non-randomised data) as expected, being uncontrolled and therefore prone to severe design bias 7 , 28 . The CI design also tended to be more biased than the BACI design (using a DiD estimator) due to pre-existing differences between the impact and control groups. For BACI designs, we recommend that the underlying assumptions of DiD and CA estimators are carefully considered before choosing to apply them to data collected for a specific research question 6 , 27 . Their levels of bias were negligibly different and their known bracketing relationship suggests they will typically give estimates with the same sign, although their tendency to over- or underestimate the true effect will depend on how well the underlying assumptions of each are met (most notably, parallel trends for DiD and no unmeasured confounders for CA; see Introduction) 6 , 27 . Overall, these findings demonstrate the power of large within-study comparisons to directly quantify differences in the levels of bias associated with different designs.

We must acknowledge that the assumptions of our hierarchical model (that the bias for each design (j) is on average zero and normally distributed) cannot be verified without gold standard randomised experiments and that, for observational designs, the model was overdispersed (potentially due to underestimation of statistical error by GLM(M)s or positively correlated design biases). The exact values of our hierarchical model should therefore be treated with appropriate caution, and future research is needed to refine and improve our approach to quantify these biases more precisely. Responses within datasets may also not be independent as multiple species could interact; therefore, the estimates analysed by our hierarchical model are statistically dependent on each other, and although we tried to account for this using a correlation matrix (see Methods, Eq. ( 3 )), this is a limitation of our model. We must also recognise that we collated datasets using non-systematic searches 46 , 47 and therefore our analysis potentially exaggerates the intrinsic biases of observational designs (i.e., our data may disproportionately reflect situations where the BACI design was chosen to account for confounding factors). We nevertheless show that researchers were wise to use the BACI design because it was less biased than CI and BA designs across a wide range of datasets from various environmental systems and locations. Without undertaking costly and time-consuming pre-impact sampling and pilot studies, researchers are also unlikely to know the levels of bias that could affect their results. Finally, we did not consider sample size, but it is likely that researchers might use larger sample sizes for CI and BA designs than BACI designs. This is, however, unlikely to affect our main conclusions because larger sample sizes could increase type I errors (false positive rate) by yielding more precise, but biased estimates of the true effect 28 .

Our analyses provide several empirically supported recommendations for researchers designing future studies to assess an impact of interest. First, using a controlled and/or randomised design (if possible) was shown to strongly reduce the level of bias in study estimates. Second, when observational designs must be used (as randomisation is not feasible or too costly), we urge researchers to choose the BACI design over other observational designs—and when that is not possible, to choose the CI design over the uncontrolled BA design. We acknowledge that limited resources, short funding timescales, and ethical or logistical constraints 48 may force researchers to use the CI design (if randomisation and pre-impact sampling are impossible) or the BA design (if appropriate controls cannot be found 28 ). To facilitate the usage of less biased designs, longer-term investments in research effort and funding are required 43 . Far greater emphasis on study designs in statistical education 49 and better training and collaboration between researchers, practitioners and methodologists, is needed to improve the design of future studies; for example, potentially improving the CI design by pairing or matching the impact group and control group 22 , or improving the BA design using regression discontinuity methods 48 , 50 . Where the choice of study design is limited, researchers must transparently communicate the limitations and uncertainty associated with their results.

Our findings also have wider implications for evidence synthesis, specifically the exclusion of certain observational study designs from syntheses (the ‘rubbish in, rubbish out’ concept 51 , 52 ). We believe that observational designs should be included in systematic reviews and meta-analyses, but that careful adjustments are needed to account for their potential biases. Exclusion of observational studies often results from subjective, checklist-based ‘Risk of Bias’ or quality assessments of studies (e.g., AMSTRAD 2 53 , ROBINS-I 54 , or GRADE 55 ) that are not data-driven and often neglect to identify the actual direction, or quantify the magnitude, of possible bias introduced by observational studies when rating the quality of a review’s recommendations. We also found that there was a small proportion of studies that used randomised designs (R-CI or R-BACI) or observational BACI designs (Fig.  2 ), suggesting that systematic reviews and meta-analyses risk excluding a substantial proportion of the literature and limiting the scope of their recommendations if such exclusion criteria are used 32 , 56 , 57 . This problem is compounded by the fact that, at least in conservation science, studies using randomised or BACI designs are strongly concentrated in Europe, Australasia, and North America 31 . Systematic reviews that rely on these few types of study designs are therefore likely to fail to provide decision makers outside of these regions with locally relevant recommendations that they prefer 58 . The Covid-19 pandemic has highlighted the difficulties in making locally relevant evidence-based decisions using studies conducted in different countries with different demographics and cultures, and on patients of different ages, ethnicities, genetics, and underlying health issues 59 . This problem is also acute for decision-makers working on biodiversity conservation in the tropical regions, where the need for conservation is arguably the greatest (i.e., where most of Earth’s biodiversity exists 60 ) but they either have to rely on very few well-designed studies that are not locally relevant (i.e., have low generalisability), or more studies that are locally relevant but less well-designed 31 , 32 . Either option could lead decision-makers to take ineffective or inefficient decisions. In the long-term, improving the quality and coverage of scientific evidence and evidence syntheses across the world will help solve these issues, but shorter-term solutions to synthesising patchy evidence bases are required.

Our work furthers sorely needed research on how to combine evidence from studies that vary greatly in their design. Our approach is an alternative to conventional meta-analyses which tend to only weight studies by their sample size or the inverse of their variance 61 ; when studies vary greatly in their study design, simply weighting by inverse variance or sample size is unlikely to account for different levels of bias introduced by different study designs (see Equation (1)). For example, a BA study could receive a larger weight if it had lower variance than a BACI study, despite our results suggesting a BA study usually suffers from greater design bias. Our model provides a principled way to weight studies by both their variance and the likely amount of bias introduced by their study design; it is therefore a form of ‘bias-adjusted meta-analysis’ 62 , 63 , 64 , 65 , 66 . However, instead of relying on elicitation of subjective expert opinions on the bias of each study, we provide a data-driven, empirical quantification of study biases – an important step that was called for to improve such meta-analytic approaches 65 , 66 .

Future research is needed to refine our methodology, but our empirically grounded form of bias-adjusted meta-analysis could be implemented as follows: 1.) collate studies for the same true effect, their effect size estimates, standard errors, and the type of study design; 2.) enter these data into our hierarchical model, where effect size estimates share the same intercept (the true causal effect), a random effect term due to design bias (whose variance is estimated by the method we used), and a random effect term for statistical noise (whose variance is estimated by the reported standard error of studies); 3.) fit this model and estimate the shared intercept/true effect. Heuristically, this can be thought of as weighting studies by both their design bias and their sampling variance and could be implemented on a dynamic meta-analysis platform (such as metadataset.com 67 ). This approach has substantial potential to develop evidence synthesis in fields (such as biodiversity conservation 31 , 32 ) with patchy evidence bases, where reliably synthesising findings from studies that vary greatly in their design is a fundamental and unavoidable challenge.

Our study has highlighted an often overlooked aspect of debates over scientific reproducibility: that the credibility of studies is fundamentally determined by study design. Testing the effectiveness of conservation and social interventions is undoubtedly of great importance given the current challenges facing biodiversity and society in general and the serious need for more evidence-based decision-making 1 , 68 . And yet our findings suggest that quantifiably less biased study designs are poorly represented in the environmental and social sciences. Greater methodological training of researchers and funding for intervention studies, as well as stronger collaborations between methodologists and practitioners is needed to facilitate the use of less biased study designs. Better communication and reporting of the uncertainty associated with different study designs is also needed, as well as more meta-research (the study of research itself) to improve standards of study design 69 . Our hierarchical model provides a principled way to combine studies using a variety of study designs that vary greatly in their risk of bias, enabling us to make more efficient use of patchy evidence bases. Ultimately, we hope that researchers and practitioners testing interventions will think carefully about the types of study designs they use, and we encourage the evidence synthesis community to embrace alternative methods for combining evidence from heterogeneous sets of studies to improve our ability to inform evidence-based decision-making in all disciplines.

Quantifying the use of different designs

We compared the use of different study designs in the literature that quantitatively tested interventions between the fields of biodiversity conservation (4,260 studies collated by Conservation Evidence 45 ) and social science (1,009 studies found by 32 systematic reviews produced by the Campbell Collaboration: www.campbellcollaboration.org ).

Conservation Evidence is a database of intervention studies, each of which has quantitatively tested a conservation intervention (e.g., sowing strips of wildflower seeds on farmland to benefit birds), that is continuously being updated through comprehensive, manual searches of conservation journals for a wide range of fields in biodiversity conservation (e.g., amphibian, bird, peatland, and farmland conservation 45 ). To obtain the proportion of studies that used each design from Conservation Evidence, we simply extracted the type of study design from each study in the database in 2019 – the study design was determined using a standardised set of criteria; reviews were not included (Table  3 ). We checked if the designs reported in the database accurately reflected the designs in the original publication and found that for a random subset of 356 studies, 95.1% were accurately described.

Each systematic review produced by the Campbell Collaboration collates and analyses studies that test a specific social intervention; we collated systematic reviews that tested a variety of social interventions across several fields in the social sciences, including education, crime and justice, international development and social welfare (Supplementary Data  1 ). We retrieved systematic reviews produced by the Campbell Collaboration by searching their website ( www.campbellcollaboration.org ) for reviews published between 2013‒2019 (as of 8th September 2019) — we limited the date range as we could not go through every review. As we were interested in the use of study designs in the wider social-science literature, we only considered reviews (32 in total) that contained sufficient information on the number of included and excluded studies that used different study designs. Studies may be excluded from systematic reviews for several reasons, such as their relevance to the scope of the review (e.g., testing a relevant intervention) and their study design. We only considered studies if the sole reason for their exclusion from the systematic review was their study design – i.e., reviews clearly reported that the study was excluded because it used a particular study design, and not because of any other reason, such as its relevance to the review’s research questions. We calculated the proportion of studies that used each design in each systematic review (using the same criteria as for the biodiversity-conservation literature – see Table  3 ) and then averaged these proportions across all systematic reviews.

Within-study comparisons of different study designs

We wanted to make direct within-study comparisons between the estimates obtained by different study designs (e.g., see 38 , 70 , 71 for single within-study comparisons) for many different studies. If a dataset contains data collected using a BACI design, subsets of these data can be used to mimic the use of other study designs (a BA design using only data for the impact group, and a CI design using only data collected after the impact occurred). Similarly, if data were collected using a R-BACI design, subsets of these data can be used to mimic the use of a BA design and a R-CI design. Collecting BACI and R-BACI datasets would therefore allow us to make direct within-study comparisons of the estimates obtained by these designs.

We collated BACI and R-BACI datasets by searching the Web of Science Core Collection 72 which included the following citation indexes: Science Citation Index Expanded (SCI-EXPANDED) 1900-present; Social Sciences Citation Index (SSCI) 1900-present Arts & Humanities Citation Index (A&HCI) 1975-present; Conference Proceedings Citation Index - Science (CPCI-S) 1990-present; Conference Proceedings Citation Index - Social Science & Humanities (CPCI-SSH) 1990-present; Book Citation Index - Science (BKCI-S) 2008-present; Book Citation Index - Social Sciences & Humanities (BKCI-SSH) 2008-present; Emerging Sources Citation Index (ESCI) 2015-present; Current Chemical Reactions (CCR-EXPANDED) 1985-present (Includes Institut National de la Propriete Industrielle structure data back to 1840); Index Chemicus (IC) 1993-present. The following search terms were used: [‘BACI’] OR [‘Before-After Control-Impact’] and the search was conducted on the 18th December 2017. Our search returned 674 results, which we then refined by selecting only ‘Article’ as the document type and using only the following Web of Science Categories: ‘Ecology’, ‘Marine Freshwater Biology’, ‘Biodiversity Conservation’, ‘Fisheries’, ‘Oceanography’, ‘Forestry’, ‘Zoology’, Ornithology’, ‘Biology’, ‘Plant Sciences’, ‘Entomology’, ‘Remote Sensing’, ‘Toxicology’ and ‘Soil Science’. This left 579 results, which we then restricted to articles published since 2002 (15 years prior to search) to give us a realistic opportunity to obtain the raw datasets, thus reducing this number to 542. We were able to access the abstracts of 521 studies and excluded any that did not test the effect of an environmental intervention or threat using an R-BACI or BACI design with response measures related to the abundance (e.g., density, counts, biomass, cover), reproduction (reproductive success) or size (body length, body mass) of animals or plants. Many studies did not test a relevant metric (e.g., they measured species richness), did not use a BACI or R-BACI design, or did not test the effect of an intervention or threat — this left 96 studies for which we contacted all corresponding authors to ask for the raw dataset. We were able to fully access 54 raw datasets, but upon closer inspection we found that three of these datasets either: did not use a BACI design; did not use the metrics we specified; or did not provide sufficient data for our analyses. This left 51 datasets in total that we used in our preliminary analyses (Supplementary Data  2 ).

All the datasets were originally collected to evaluate the effect of an environmental intervention or impact. Most of them contained multiple response variables (e.g., different measures for different species, such as abundance or density for species A, B, and C). Within a dataset, we use the term “response” to refer to the estimation of the true effect of an impact on one response variable. There were 1,968 responses in total across 51 datasets. We then excluded 932 responses (resulting in the exclusion of one dataset) where one or more of the four time-period and treatment subsets (Before Control, Before Impact, After Control, and After Impact data) consisted of entirely zero measurements, or two or more of these subsets had more than 90% zero measurements. We also excluded one further dataset as it was the only one to not contain repeated measurements at sites in both the before- and after-periods. This was necessary to generate reliable standard errors when modelling these data. We modelled the remaining 1,036 responses from across 49 datasets (Supplementary Table  1 ).

We applied each study design to the appropriate components of each dataset using Generalised Linear Models (GLMs 73 , 74 ) because of their generality and ability to implement the statistical estimators of many different study designs. The model structure of GLMs was adjusted for each response in each dataset based on the study design specified, response measure and dataset structure (Supplementary Table  2 ). We quantified the effect of the time period for the BA design (After vs Before the impact) and the effect of the treatment type for the CI and R-CI designs (Impact vs Control) on the response variable (Supplementary Table  2 ). For BACI and R-BACI designs, we implemented two statistical estimators: 1.) a DiD estimator that estimated the true effect using an interaction term between time and treatment type; and 2.) a covariance adjustment estimator that estimated the true effect using a term for the treatment type with a lagged variable (Supplementary Table  2 ).

As there were large numbers of responses, we used general a priori rules to specify models for each response; this may have led to some model misspecification, but was unlikely to have substantially affected our pairwise comparison of estimates obtained by different designs. The error family of each GLM was specified based on the nature of the measure used and preliminary data exploration: count measures (e.g., abundance) = poisson; density measures (e.g., biomass or abundance per unit area) = quasipoisson, as data for these measures tended to be overdispersed; percentage measures (e.g., percentage cover) = quasibinomial; and size measures (e.g., body length) = gaussian.

We treated each year or season in which data were collected as independent observations because the implementation of a seasonal term in models is likely to vary on a case-by-case basis; this will depend on the research questions posed by each study and was not feasible for us to consider given the large number of responses we were modelling. The log link function was used for all models to generate a standardised log response ratio as an estimate of the true effect for each response; a fixed effect coefficient (a variable named treatment status; Supplementary Table  2 ) was used to estimate the log response ratio 61 . If the response had at least ten ‘sites’ (independent sampling units) and two measurements per site on average, we used the random effects of subsample (replicates within a site) nested within site to capture the dependence within a site and subsample (i.e., a Generalised Linear Mixed Model or GLMM 73 , 74 was implemented instead of a GLM); otherwise we fitted a GLM with only the fixed effects (Supplementary Table  2 ).

We fitted all models using R version 3.5.1 75 , and packages lme4 76 and MASS 77 . Code to replicate all analyses is available (see Data and Code Availability). We compared the estimates obtained using each study design (both in terms of point estimates and estimates with associated standard error) by their magnitude and sign.

A model-based quantification of the bias in study design estimates

We used a hierarchical Bayesian model motivated by the decomposition in Equation (1) to quantify the bias in different study design estimates. This model takes the estimated effects of impacts and their standard errors as inputs. Let \(\hat \beta _{ij}\) be the true effect estimator in study \(i\) using design \(j\) and \(\hat \sigma _{ij}\) be its estimated standard error from the corresponding GLM or GLMM. Our hierarchical model assumes:

where β i is the true effect for response \(i\) , \(\gamma _{ij}\) is the bias of design j in response \(i\) , and \(\varepsilon _{ij}\) is the sampling noise of the statistical estimator. Although \(\gamma _{ij}\) technically incorporates both the design bias and any misspecification (modelling) bias due to using GLMs or GLMMs (Equation (1)), we expect the modelling bias to be much smaller than the design bias 3 , 11 . We assume the statistical errors \(\varepsilon _i\) within a response are related to the estimated standard errors through the following joint distribution:

where \({\Omega}\) is the correlation matrix for the different estimators in the same response and λ is a scaling factor to account for possible over/under-estimation of the standard errors.

This model effectively quantifies the bias of design \(j\) using the value of \(\sigma _j\) (larger values = more bias) by accounting for within-response correlations using the correlation matrix \({\Omega}\) and for possible under-estimation of the standard error using \(\lambda\) . We ensured that the prior distributions we used had very large variances so they would have a very small effect on the posterior distribution — accordingly we placed the following disperse priors on the variance parameters:

We fitted the hierarchical Bayesian model in R version 3.5.1 using the Bayesian inference package rstan 78 .

Data availability

All data analysed in the current study are available from Zenodo, https://doi.org/10.5281/zenodo.3560856 .  Source data are provided with this paper.

Code availability

All code used in the current study is available from Zenodo, https://doi.org/10.5281/zenodo.3560856 .

Donnelly, C. A. et al. Four principles to make evidence synthesis more useful for policy. Nature 558 , 361–364 (2018).

Article   ADS   CAS   PubMed   Google Scholar  

McKinnon, M. C., Cheng, S. H., Garside, R., Masuda, Y. J. & Miller, D. C. Sustainability: map the evidence. Nature 528 , 185–187 (2015).

Rubin, D. B. For objective causal inference, design trumps analysis. Ann. Appl. Stat. 2 , 808–840 (2008).

Article   MathSciNet   MATH   Google Scholar  

Peirce, C. S. & Jastrow, J. On small differences in sensation. Mem. Natl Acad. Sci. 3 , 73–83 (1884).

Fisher, R. A. Statistical methods for research workers . (Oliver and Boyd, 1925).

Angrist, J. D. & Pischke, J.-S. Mostly harmless econometrics: an empiricist’s companion . (Princeton University Press, 2008).

de Palma, A. et al . Challenges with inferring how land-use affects terrestrial biodiversity: study design, time, space and synthesis. in Next Generation Biomonitoring: Part 1 163–199 (Elsevier Ltd., 2018).

Sagarin, R. & Pauchard, A. Observational approaches in ecology open new ground in a changing world. Front. Ecol. Environ. 8 , 379–386 (2010).

Article   Google Scholar  

Shadish, W. R., Cook, T. D. & Campbell, D. T. Experimental and quasi-experimental designs for generalized causal inference . (Houghton Mifflin, 2002).

Rosenbaum, P. R. Design of observational studies . vol. 10 (Springer, 2010).

Light, R. J., Singer, J. D. & Willett, J. B. By design: Planning research on higher education. By design: Planning research on higher education . (Harvard University Press, 1990).

Ioannidis, J. P. A. Why most published research findings are false. PLOS Med. 2 , e124 (2005).

Article   PubMed   PubMed Central   Google Scholar  

Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349 , aac4716 (2015).

Article   CAS   Google Scholar  

John, L. K., Loewenstein, G. & Prelec, D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23 , 524–532 (2012).

Article   PubMed   Google Scholar  

Kerr, N. L. HARKing: hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2 , 196–217 (1998).

Zhao, Q., Keele, L. J. & Small, D. S. Comment: will competition-winning methods for causal inference also succeed in practice? Stat. Sci. 34 , 72–76 (2019).

Article   MATH   Google Scholar  

Friedman, J., Hastie, T. & Tibshirani, R. The Elements of Statistical Learning . vol. 1 (Springer series in statistics, 2001).

Underwood, A. J. Beyond BACI: experimental designs for detecting human environmental impacts on temporal variations in natural populations. Mar. Freshw. Res. 42 , 569–587 (1991).

Stewart-Oaten, A. & Bence, J. R. Temporal and spatial variation in environmental impact assessment. Ecol. Monogr. 71 , 305–339 (2001).

Eddy, T. D., Pande, A. & Gardner, J. P. A. Massive differential site-specific and species-specific responses of temperate reef fishes to marine reserve protection. Glob. Ecol. Conserv. 1 , 13–26 (2014).

Sher, A. A. et al. Native species recovery after reduction of an invasive tree by biological control with and without active removal. Ecol. Eng. 111 , 167–175 (2018).

Imbens, G. W. & Rubin, D. B. Causal Inference in Statistics, Social, and Biomedical Sciences . (Cambridge University Press, 2015).

Greenhalgh, T. How to read a paper: the basics of Evidence Based Medicine . (John Wiley & Sons, Ltd, 2019).

Salmond, S. S. Randomized Controlled Trials: Methodological Concepts and Critique. Orthopaedic Nursing 27 , (2008).

Geijzendorffer, I. R. et al. How can global conventions for biodiversity and ecosystem services guide local conservation actions? Curr. Opin. Environ. Sustainability 29 , 145–150 (2017).

Dimick, J. B. & Ryan, A. M. Methods for evaluating changes in health care policy. JAMA 312 , 2401 (2014).

Article   CAS   PubMed   Google Scholar  

Ding, P. & Li, F. A bracketing relationship between difference-in-differences and lagged-dependent-variable adjustment. Political Anal. 27 , 605–615 (2019).

Christie, A. P. et al. Simple study designs in ecology produce inaccurate estimates of biodiversity responses. J. Appl. Ecol. 56 , 2742–2754 (2019).

Watson, M. et al. An analysis of the quality of experimental design and reliability of results in tribology research. Wear 426–427 , 1712–1718 (2019).

Kilkenny, C. et al. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE 4 , e7824 (2009).

Christie, A. P. et al. The challenge of biased evidence in conservation. Conserv, Biol . 13577, https://doi.org/10.1111/cobi.13577 (2020).

Christie, A. P. et al. Poor availability of context-specific evidence hampers decision-making in conservation. Biol. Conserv. 248 , 108666 (2020).

Moscoe, E., Bor, J. & Bärnighausen, T. Regression discontinuity designs are underutilized in medicine, epidemiology, and public health: a review of current and best practice. J. Clin. Epidemiol. 68 , 132–143 (2015).

Goldenhar, L. M. & Schulte, P. A. Intervention research in occupational health and safety. J. Occup. Med. 36 , 763–778 (1994).

CAS   PubMed   Google Scholar  

Junker, J. et al. A severe lack of evidence limits effective conservation of the World’s primates. BioScience https://doi.org/10.1093/biosci/biaa082 (2020).

Altindag, O., Joyce, T. J. & Reeder, J. A. Can Nonexperimental Methods Provide Unbiased Estimates of a Breastfeeding Intervention? A Within-Study Comparison of Peer Counseling in Oregon. Evaluation Rev. 43 , 152–188 (2019).

Chaplin, D. D. et al. The Internal And External Validity Of The Regression Discontinuity Design: A Meta-Analysis Of 15 Within-Study Comparisons. J. Policy Anal. Manag. 37 , 403–429 (2018).

Cook, T. D., Shadish, W. R. & Wong, V. C. Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. J. Policy Anal. Manag. 27 , 724–750 (2008).

Ioannidis, J. P. A. et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. J. Am. Med. Assoc. 286 , 821–830 (2001).

dos Santos Ribas, L. G., Pressey, R. L., Loyola, R. & Bini, L. M. A global comparative analysis of impact evaluation methods in estimating the effectiveness of protected areas. Biol. Conserv. 246 , 108595 (2020).

Benson, K. & Hartz, A. J. A Comparison of Observational Studies and Randomized, Controlled Trials. N. Engl. J. Med. 342 , 1878–1886 (2000).

Smokorowski, K. E. et al. Cautions on using the Before-After-Control-Impact design in environmental effects monitoring programs. Facets 2 , 212–232 (2017).

França, F. et al. Do space-for-time assessments underestimate the impacts of logging on tropical biodiversity? An Amazonian case study using dung beetles. J. Appl. Ecol. 53 , 1098–1105 (2016).

Duvendack, M., Hombrados, J. G., Palmer-Jones, R. & Waddington, H. Assessing ‘what works’ in international development: meta-analysis for sophisticated dummies. J. Dev. Effectiveness 4 , 456–471 (2012).

Sutherland, W. J. et al. Building a tool to overcome barriers in research-implementation spaces: The Conservation Evidence database. Biol. Conserv. 238 , 108199 (2019).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Konno, K. & Pullin, A. S. Assessing the risk of bias in choice of search sources for environmental meta‐analyses. Res. Synth. Methods 11 , 698–713 (2020).

PubMed   Google Scholar  

Butsic, V., Lewis, D. J., Radeloff, V. C., Baumann, M. & Kuemmerle, T. Quasi-experimental methods enable stronger inferences from observational data in ecology. Basic Appl. Ecol. 19 , 1–10 (2017).

Brownstein, N. C., Louis, T. A., O’Hagan, A. & Pendergast, J. The role of expert judgment in statistical inference and evidence-based decision-making. Am. Statistician 73 , 56–68 (2019).

Article   MathSciNet   Google Scholar  

Hahn, J., Todd, P. & Klaauw, W. Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica 69 , 201–209 (2001).

Slavin, R. E. Best evidence synthesis: an intelligent alternative to meta-analysis. J. Clin. Epidemiol. 48 , 9–18 (1995).

Slavin, R. E. Best-evidence synthesis: an alternative to meta-analytic and traditional reviews. Educ. Researcher 15 , 5–11 (1986).

Shea, B. J. et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ (Online) 358 , 1–8 (2017).

Google Scholar  

Sterne, J. A. C. et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355 , i4919 (2016).

Guyatt, G. et al. GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes. J. Clin. Epidemiol. 66 , 151–157 (2013).

Davies, G. M. & Gray, A. Don’t let spurious accusations of pseudoreplication limit our ability to learn from natural experiments (and other messy kinds of ecological monitoring). Ecol. Evolution 5 , 5295–5304 (2015).

Lortie, C. J., Stewart, G., Rothstein, H. & Lau, J. How to critically read ecological meta-analyses. Res. Synth. Methods 6 , 124–133 (2015).

Gutzat, F. & Dormann, C. F. Exploration of concerns about the evidence-based guideline approach in conservation management: hints from medical practice. Environ. Manag. 66 , 435–449 (2020).

Greenhalgh, T. Will COVID-19 be evidence-based medicine’s nemesis? PLOS Med. 17 , e1003266 (2020).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Barlow, J. et al. The future of hyperdiverse tropical ecosystems. Nature 559 , 517–526 (2018).

Gurevitch, J. & Hedges, L. V. Statistical issues in ecological meta‐analyses. Ecology 80 , 1142–1149 (1999).

Stone, J. C., Glass, K., Munn, Z., Tugwell, P. & Doi, S. A. R. Comparison of bias adjustment methods in meta-analysis suggests that quality effects modeling may have less limitations than other approaches. J. Clin. Epidemiol. 117 , 36–45 (2020).

Rhodes, K. M. et al. Adjusting trial results for biases in meta-analysis: combining data-based evidence on bias with detailed trial assessment. J. R. Stat. Soc.: Ser. A (Stat. Soc.) 183 , 193–209 (2020).

Article   MathSciNet   CAS   Google Scholar  

Efthimiou, O. et al. Combining randomized and non-randomized evidence in network meta-analysis. Stat. Med. 36 , 1210–1226 (2017).

Article   MathSciNet   PubMed   Google Scholar  

Welton, N. J., Ades, A. E., Carlin, J. B., Altman, D. G. & Sterne, J. A. C. Models for potentially biased evidence in meta-analysis using empirically based priors. J. R. Stat. Soc. Ser. A (Stat. Soc.) 172 , 119–136 (2009).

Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S. & Thompson, S. G. Bias modelling in evidence synthesis. J. R. Stat. Soc.: Ser. A (Stat. Soc.) 172 , 21–47 (2009).

Shackelford, G. E. et al. Dynamic meta-analysis: a method of using global evidence for local decision making. bioRxiv 2020.05.18.078840, https://doi.org/10.1101/2020.05.18.078840 (2020).

Sutherland, W. J., Pullin, A. S., Dolman, P. M. & Knight, T. M. The need for evidence-based conservation. Trends Ecol. evolution 19 , 305–308 (2004).

Ioannidis, J. P. A. Meta-research: Why research on research matters. PLOS Biol. 16 , e2005468 (2018).

Article   PubMed   PubMed Central   CAS   Google Scholar  

LaLonde, R. J. Evaluating the econometric evaluations of training programs with experimental data. Am. Econ. Rev. 76 , 604–620 (1986).

Long, Q., Little, R. J. & Lin, X. Causal inference in hybrid intervention trials involving treatment choice. J. Am. Stat. Assoc. 103 , 474–484 (2008).

Article   MathSciNet   CAS   MATH   Google Scholar  

Thomson Reuters. ISI Web of Knowledge. http://www.isiwebofknowledge.com (2019).

Stroup, W. W. Generalized linear mixed models: modern concepts, methods and applications . (CRC press, 2012).

Bolker, B. M. et al. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol. Evolution 24 , 127–135 (2009).

R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing (2019).

Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67 , 1–48 (2015).

Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S . (Springer, 2002).

Stan Development Team. RStan: the R interface to Stan. R package version 2.19.3 (2020).

Download references

Acknowledgements

We are grateful to the following people and organisations for contributing datasets to this analysis: P. Edwards, G.R. Hodgson, H. Welsh, J.V. Vieira, authors of van Deurs et al. 2012, T. M. Grome, M. Kaspersen, H. Jensen, C. Stenberg, T. K. Sørensen, J. Støttrup, T. Warnar, H. Mosegaard, Axel Schwerk, Alberto Velando, Dolores River Restoration Partnership, J.S. Pinilla, A. Page, M. Dasey, D. Maguire, J. Barlow, J. Louzada, Jari Florestal, R.T. Buxton, C.R. Schacter, J. Seoane, M.G. Conners, K. Nickel, G. Marakovich, A. Wright, G. Soprone, CSIRO, A. Elosegi, L. García-Arberas, J. Díez, A. Rallo, Parks and Wildlife Finland, Parc Marin de la Côte Bleue. Author funding sources: T.A. was supported by the Grantham Foundation for the Protection of the Environment, Kenneth Miller Trust and Australian Research Council Future Fellowship (FT180100354); W.J.S. and P.A.M. were supported by Arcadia, MAVA, and The David and Claudia Harding Foundation; A.P.C. was supported by the Natural Environment Research Council via Cambridge Earth System Science NERC DTP (NE/L002507/1); D.A. was funded by Portugal national funds through the FCT – Foundation for Science and Technology, under the Transitional Standard – DL57 / 2016 and through the strategic project UIDB/04326/2020; M.A. acknowledges Koniambo Nickel SAS, and particularly Gregory Marakovich and Andy Wright; J.C.A. was funded through by Dirección General de Investigación Científica, projects PB97-1252, BOS2002-01543, CGL2005-04893/BOS, CGL2008-02567 and Comunidad de Madrid, as well as by contract HENARSA-CSIC 2003469-CSIC19637; A.A. was funded by Spanish Government: MEC (CGL2007-65176); B.P.B. was funded through the U.S. Geological Survey and the New York City Department of Environmental Protection; R.B. was funded by Comunidad de Madrid (2018-T1/AMB-10374); J.A.S. and D.A.B. were funded through the U.S. Geological Survey and NextEra Energy; R.S.C. was funded by the Portuguese Foundation for Science and Technology (FCT) grant SFRH/BD/78813/2011 and strategic project UID/MAR/04292/2013; A.D.B. was funded through the Belgian offshore wind monitoring program (WINMON-BE), financed by the Belgian offshore wind energy sector via RBINS—OD Nature; M.K.D. was funded by the Harold L. Castle Foundation; P.M.E. was funded by the Clackamas County Water Environment Services River Health Stewardship Program and the Portland State University Student Watershed Research Project; T.D.E., J.P.A.G. and A.P. were supported by funding from the New Zealand Department of Conservation (Te Papa Atawhai) and from the Centre for Marine Environmental & Economic Research, Victoria University of Wellington, New Zealand; F.M.F. was funded by CNPq-CAPES grants (PELD site 23 403811/2012-0, PELD-RAS 441659/2016-0, BEX5528/13-5 and 383744/2015-6) and BNP Paribas Foundation (Climate & Biodiversity Initiative, BIOCLIMATE project); B.P.H. was funded by NOAA-NMFS sea scallop research set-aside program awards NA16FM1031, NA06FM1001, NA16FM2416, and NA04NMF4720332; A.L.B. was funded by the Portuguese Foundation for Science and Technology (FCT) grant FCT PD/BD/52597/2014, Bat Conservation International student research fellowship and CNPq grant 160049/2013-0; L.C.M. acknowledges Secretaría de Ciencia y Técnica (UNRC); R.A.M. acknowledges Alaska Fisheries Science Center, NOAA Fisheries, and U.S. Department of Commerce for salary support; C.F.J.M. was funded by the Portuguese Foundation for Science and Technology (FCT) grant SFRH/BD/80488/2011; R.R. was funded by the Portuguese Foundation for Science and Technology (FCT) grant PTDC/BIA-BIC/111184/2009, by Madeira’s Regional Agency for the Development of Research, Technology and Innovation (ARDITI) grant M1420-09-5369-FSE-000002 and by a Bat Conservation International student research fellowship; J.C. and S.S. were funded by the Alabama Department of Conservation and Natural Resources; A.T. was funded by the Spanish Ministry of Education with a Formacion de Profesorado Universitario (FPU) grant AP2008-00577 and Dirección General de Investigación Científica, project CGL2008-02567; C.W. was funded by Strategic Science Investment Funding of the Ministry of Business, Innovation and Employment, New Zealand; J.S.K. acknowledges Boreal Peatland LIFE (LIFE08 NAT/FIN/000596), Parks and Wildlife Finland and Kone Foundation; J.J.S.S. was funded by the Mexican National Council on Science and Technology (CONACYT 242558); N.N. was funded by The Carl Tryggers Foundation; I.L.J. was funded by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada; D.D. and D.S. were funded by the French National Research Agency via the “Investment for the Future” program IDEALG (ANR-10-BTBR-04) and by the ALGMARBIO project; R.C.P. was funded by CSIRO and whose research was also supported by funds from the Great Barrier Reef Marine Park Authority, the Fisheries Research and Development Corporation, the Australian Fisheries Management Authority, and Queensland Department of Primary Industries (QDPI). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The scientific results and conclusions, as well as any views or opinions expressed herein, are those of the author(s) and do not necessarily reflect those of NOAA or the Department of Commerce.

Author information

Authors and affiliations.

Conservation Science Group, Department of Zoology, University of Cambridge, The David Attenborough Building, Downing Street, Cambridge, CB3 3QZ, UK

Alec P. Christie, Philip A. Martin & William J. Sutherland

Centre of Marine Sciences (CCMar), Universidade do Algarve, Campus de Gambelas, 8005-139, Faro, Portugal

David Abecasis

Institut de Recherche pour le Développement (IRD), UMR 9220 ENTROPIE & Laboratoire d’Excellence CORAIL, Université de Perpignan Via Domitia, 52 avenue Paul Alduy, 66860, Perpignan, France

Mehdi Adjeroud

Museo Nacional de Ciencias Naturales, CSIC, Madrid, Spain

Juan C. Alonso & Carlos Palacín

School of Biological Sciences, University of Queensland, Brisbane, 4072, QLD, Australia

Tatsuya Amano

Education Faculty of Bilbao, University of the Basque Country (UPV/EHU). Sarriena z/g E-48940 Leioa, Basque Country, Spain

Alvaro Anton

U.S. Geological Survey, New York Water Science Center, 425 Jordan Rd., Troy, NY, 12180, USA

Barry P. Baldigo

Universidad Complutense de Madrid, Departamento de Biodiversidad, Ecología y Evolución, Facultad de Ciencias Biológicas, c/ José Antonio Novais, 12, E-28040, Madrid, Spain

Rafael Barrientos & Carlos A. Martín

Durrell Institute of Conservation and Ecology (DICE), School of Anthropology and Conservation, University of Kent, Canterbury, CT2 7NR, UK

Jake E. Bicknell

U.S. Geological Survey, Northern Prairie Wildlife Research Center, Jamestown, ND, 58401, USA

Deborah A. Buhl & Jill A. Shaffer

Northern Gulf Institute, Mississippi State University, 1021 Balch Blvd, John C. Stennis Space Center, Mississippi, 39529, USA

Just Cebrian

MARE – Marine and Environmental Sciences Centre, Dept. Life Sciences, University of Coimbra, Coimbra, Portugal

Ricardo S. Ceia

CFE – Centre for Functional Ecology, Dept. Life Sciences, University of Coimbra, Coimbra, Portugal

Departamento de Ciencias Naturales, Universidad Nacional de Río Cuarto (UNRC), Córdoba, Argentina

Luciana Cibils-Martina

CONICET, Buenos Aires, Argentina

Marine Institute, Rinville, Oranmore, Galway, Ireland

Sarah Clarke & Oliver Tully

National Center for Scientific Research, PSL Université Paris, CRIOBE, USR 3278 CNRS-EPHE-UPVD, Maison des Océans, 195 rue Saint-Jacques, 75005, Paris, France

Joachim Claudet

School of Biological Sciences, University of Western Australia, Nedlands, WA, 6009, Australia

Michael D. Craig

School of Environmental and Conservation Sciences, Murdoch University, Murdoch, WA, 6150, Australia

Sorbonne Université, CNRS, UMR 7144, Station Biologique, F.29680, Roscoff, France

Dominique Davoult & Doriane Stagnol

Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Ankerstraat 1, 8400, Ostend, Belgium

Annelies De Backer

Marine Science Institute, University of California Santa Barbara, Santa Barbara, CA, 93106, USA

Mary K. Donovan

Hawaii Institute of Marine Biology, University of Hawaii at Manoa, Honolulu, HI, 96822, USA

Baruch Institute for Marine & Coastal Sciences, University of South Carolina, Columbia, SC, USA

Tyler D. Eddy

Centre for Fisheries Ecosystems Research, Fisheries & Marine Institute, Memorial University of Newfoundland, St. John’s, Canada

School of Biological Sciences, Victoria University of Wellington, P O Box 600, Wellington, 6140, New Zealand

Tyler D. Eddy, Jonathan P. A. Gardner & Anjali Pande

Lancaster Environment Centre, Lancaster University, LA1 4YQ, Lancaster, UK

Filipe M. França

Fisheries, Aquatic Science and Technology Laboratory, Alaska Pacific University, 4101 University Dr., Anchorage, AK, 99508, USA

Bradley P. Harris

Natural Resources Institute Finland, Manamansalontie 90, 88300, Paltamo, Finland

Department of Biology, Memorial University, St. John’s, NL, A1B 2R3, Canada

Ian L. Jones

National Marine Science Centre and Marine Ecology Research Centre, Southern Cross University, 2 Bay Drive, Coffs Harbour, 2450, Australia

Brendan P. Kelaher

Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland

Janne S. Kotiaho

School of Resource Wisdom, University of Jyväskylä, Jyväskylä, Finland

Centre for Ecology, Evolution and Environmental Changes – cE3c, Faculty of Sciences, University of Lisbon, 1749-016, Lisbon, Portugal

Adrià López-Baucells, Christoph F. J. Meyer & Ricardo Rocha

Biological Dynamics of Forest Fragments Project, National Institute for Amazonian Research and Smithsonian Tropical Research Institute, 69011-970, Manaus, Brazil

Granollers Museum of Natural History, Granollers, Spain

Adrià López-Baucells

Department of Biological Sciences, University of New Brunswick, PO Box 5050, Saint John, NB, E2L 4L5, Canada

Heather L. Major

Voimalohi Oy, Voimatie 23, Voimatie, 91100, Ii, Finland

Aki Mäki-Petäys

Natural Resources Institute Finland, Paavo Havaksen tie 3, 90014 University of Oulu, Oulu, Finland

Fundación Migres CIMA Ctra, Cádiz, Spain

Beatriz Martín

Intergovernmental Oceanographic Commission of UNESCO, Marine Policy and Regional Coordination Section Paris 07, Paris, France

BioRISC, St. Catharine’s College, Cambridge, CB2 1RL, UK

Philip A. Martin & William J. Sutherland

Departamento de Ecología e Hidrología, Universidad de Murcia, Campus de Espinardo, 30100, Murcia, Spain

Daniel Mateos-Molina

RACE Division, Alaska Fisheries Science Center, National Marine Fisheries Service, NOAA, 7600 Sand Point Way NE, Seattle, WA, 98115, USA

Robert A. McConnaughey

European Commission, Joint Research Centre (JRC), Ispra, VA, Italy

Michele Meroni

School of Science, Engineering and Environment, University of Salford, Salford, M5 4WT, UK

Christoph F. J. Meyer

Victorian National Park Association, Carlton, VIC, Australia

Department of Earth, Environment and Life Sciences (DiSTAV), University of Genoa, Corso Europa 26, 16132, Genoa, Italy

Monica Montefalcone

Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden

Norbertas Noreika

Chair of Plant Health, Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Tartu, Estonia

Biosecurity New Zealand – Tiakitanga Pūtaiao Aotearoa, Ministry for Primary Industries – Manatū Ahu Matua, 66 Ward St, PO Box 40742, Wallaceville, New Zealand

Anjali Pande

National Institute of Water & Atmospheric Research Ltd (NIWA), 301 Evans Bay Parade, Greta Point Wellington, New Zealand

CSIRO Oceans & Atmosphere, Queensland Biosciences Precinct, 306 Carmody Road, ST. LUCIA QLD, 4067, Australia

C. Roland Pitcher

Museo Nacional de Ciencias Naturales, CSIC, José Gutiérrez Abascal 2, E-28006, Madrid, Spain

Carlos Ponce

Fort Keogh Livestock and Range Research Laboratory, 243 Fort Keogh Rd, Miles City, Montana, 59301, USA

Matt Rinella

CIBIO-InBIO, Research Centre in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal

Ricardo Rocha

Departamento de Sistemas Físicos, Químicos y Naturales, Universidad Pablo de Olavide, ES-41013, Sevilla, Spain

María C. Ruiz-Delgado

El Colegio de la Frontera Sur, A.P. 424, 77000, Chetumal, QR, Mexico

Juan J. Schmitter-Soto

Division of Fish and Wildlife, New York State Department of Environmental Conservation, 625 Broadway, Albany, NY, 12233-4756, USA

Shailesh Sharma

University of Denver Department of Biological Sciences, Denver, CO, USA

Anna A. Sher

U.S. Geological Survey, Fort Collins Science Center, Fort Collins, CO, 80526, USA

Thomas R. Stanley

School for Marine Science and Technology, University of Massachusetts Dartmouth, New Bedford, MA, USA

Kevin D. E. Stokesbury

Georges Lemaître Earth and Climate Research Centre, Earth and Life Institute, Université Catholique de Louvain, 1348, Louvain-la-Neuve, Belgium

Aurora Torres

Center for Systems Integration and Sustainability, Department of Fisheries and Wildlife, 13 Michigan State University, East Lansing, MI, 48823, USA

Natural Resources Institute Finland, Latokartanonkaari 9, 00790, Helsinki, Finland

Teppo Vehanen

Manaaki Whenua – Landcare Research, Private Bag 3127, Hamilton, 3216, New Zealand

Corinne Watts

Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Wilberforce Road, Cambridge, CB3 0WB, UK

Qingyuan Zhao

You can also search for this author in PubMed   Google Scholar

Contributions

A.P.C., T.A., P.A.M., Q.Z., and W.J.S. designed the research; A.P.C. wrote the paper; D.A., M.A., J.C.A., A.A., B.P.B, R.B., J.B., D.A.B., J.C., R.S.C., L.C.M., S.C., J.C., M.D.C, D.D., A.D.B., M.K.D., T.D.E., P.M.E., F.M.F., J.P.A.G., B.P.H., A.H., I.L.J., B.P.K., J.S.K., A.L.B., H.L.M., A.M., B.M., C.A.M., D.M., R.A.M, M.M., C.F.J.M.,K.M., M.M., N.N., C.P., A.P., C.R.P., C.P., M.R., R.R., M.C.R., J.J.S.S., J.A.S., S.S., A.A.S., D.S., K.D.E.S., T.R.S., A.T., O.T., T.V., C.W. contributed datasets for analyses. All authors reviewed, edited, and approved the manuscript.

Corresponding author

Correspondence to Alec P. Christie .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Casper Albers, Samuel Scheiner, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary information, supplementary data 1, supplementary data 2, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Christie, A.P., Abecasis, D., Adjeroud, M. et al. Quantifying and addressing the prevalence and bias of study designs in the environmental and social sciences. Nat Commun 11 , 6377 (2020). https://doi.org/10.1038/s41467-020-20142-y

Download citation

Received : 29 January 2020

Accepted : 13 November 2020

Published : 11 December 2020

DOI : https://doi.org/10.1038/s41467-020-20142-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Robust language-based mental health assessments in time and space through social media.

  • Siddharth Mangalik
  • Johannes C. Eichstaedt
  • H. Andrew Schwartz

npj Digital Medicine (2024)

Is there a “difference-in-difference”? The impact of scientometric evaluation on the evolution of international publications in Egyptian universities and research centres

  • Mona Farouk Ali

Scientometrics (2024)

Quantifying research waste in ecology

  • Marija Purgar
  • Tin Klanjscek
  • Antica Culina

Nature Ecology & Evolution (2022)

Assessing assemblage-wide mammal responses to different types of habitat modification in Amazonian forests

  • Paula C. R. Almeida-Maués
  • Anderson S. Bueno
  • Ana Cristina Mendes-Oliveira

Scientific Reports (2022)

Mitigating impacts of invasive alien predators on an endangered sea duck amidst high native predation pressure

  • Kim Jaatinen
  • Ida Hermansson

Oecologia (2022)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Anthropocene newsletter — what matters in anthropocene research, free to your inbox weekly.

experimental research scholarly articles

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List

PLOS ONE logo

Can emotional intelligence be improved? A randomized experimental study of a business-oriented EI training program for senior managers

Raquel gilar-corbi, teresa pozo-rico, bárbara sánchez, juan-luís castejón.

  • Author information
  • Article notes
  • Copyright and License information

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected] (RGC); [email protected] (TPR)

Received 2018 Sep 11; Accepted 2019 Oct 9; Collection date 2019.

This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Purpose: This article presents the results of a training program in emotional intelligence. Design/methodology/approach: Emotional Intelligence (EI) involves two important competencies: (1) the ability to recognize feelings and emotions in oneself and others, and (2) the ability to use that information to resolve conflicts and problems to improve interactions with others. We provided a 30-hour Training Course on Emotional Intelligence (TCEI) for 54 senior managers of a private company. A pretest-posttest design with a control group was adopted. Findings: EI assessed using mixed and ability-based measures can be improved after training. Originality/value: The study’s results revealed that EI can be improved within business environments. Results and implications of including EI training in professional development plans for private organizations are discussed.

Introduction

This research study focused on EI training in business environments. Accordingly, the aim of the study was to examine the effectiveness of an original EI training program in improving the EI of senior managers. In this article, we delineate the principles and methodology of an EI training program that was conducted to improve the EI of senior managers of a private company The article begins with a brief introduction to the main models of EI that are embedded with the existing scientific literature. This is followed by a description of the EI training program that was conducted in the present study and presentation of results about its effectiveness in improving EI. Finally, the present findings are discussed in relation to the existing empirical literature, and the limitations and conclusions of the present study are articulated.

Defining EI

Various models of emotional intelligence (EI) have been proposed. The existing scientific literature offers three main models of EI: mixed, ability, and trait models. First, mixed models conceptualize EI as a combination of emotional skills and personality dimensions such as assertiveness and optimism [ 1 , 2 ]. Thus, according to the Bar-On model [ 3 ], emotional-social intelligence (ESI) is a multifactorial set of competencies, skills, and facilitators that determine how people express and understand themselves, understand and relate to others, and respond to daily situations The construct of ESI consists of 10 key components (i.e., self-regard, interpersonal relationships, impulse control, problem solving, emotional self-awareness, flexibility, reality-testing, stress tolerance, assertiveness, and empathy) and five facilitators (optimism, self-actualization, happiness, independence, and social responsibility). Emotionally and socially intelligent people accept and understand their emotions; they are also capable of expressing themselves assertively, being empathetic, cooperating with and relating to others in an appropriate manner, managing stressful situations and changes successfully, solving personal and interpersonal problems effectively, and having an optimistic perspective toward life. Second, ability models of EI focus on the processing of information and related abilities [ 3 ]. Accordingly, Mayer and Salovey [ 4 ] have conceptualized EI as a type of social intelligence that entails the ability to manage and understand one’s own and others’ emotions. Indeed, this implies that EI also entails the ability to use emotional information to manage thoughts and actions in an adaptive manner [ 5 ]. Third, the trait EI approach understands EI as emotion-related information [ 6 ]. According to trait models, EI refers to self-perceptions and dispositions that can be incorporated into fundamental taxonomies of personality. Therefore, according to Petrides and Furnham [ 7 ], trait EI is partially determined by several dimensions of personality and can be situated within the lower levels of personality hierarchies. However, it is a distinct construct that can be differentiated from other personality constructs. In addition, the construct of trait EI includes various personality dispositions as well as the self-perceived aspects of social intelligence, personal intelligence, and ability EI. The following facets are subsumed by the construct of trait EI: adaptability, assertiveness, emotion perception (self and others), emotion expression, management (others), and regulation, impulsiveness (low), relationships, self-esteem, self-motivation, social awareness, stress management, trait empathy, happiness, and optimism [ 7 ]. Finally, as Hodzic et al. [ 8 ] have indicated, most existing definitions of EI permit us to draw the conclusion that EI is a measurable individual characteristic that refers to a way of experiencing and processing emotions and emotional information. It is noteworthy that these models are not mutually exclusive [ 7 ].

Effects of EI on different outcomes

EI has been found to be related to workplace performance in highly demanding work environments (see e.g. [ 9 ]). Consequently, companies, entities, and organizations tend to recognize the importance of EI, promote it on a daily basis to facilitate career growth, and recruit those who possess this ability. [ 10 ].

With regard to research that has examined the EI-performance link, Van Rooy and Viswesvaran [ 11 ] conducted a metanalytic study to examine the predictive power of EI in the workplace. They found that approximately 5% of the variance in workplace performance was explained by EI, and this percentage is adequately significant to increase savings and promote improvements within organizations. In addition, the authors concluded that further in-depth investigations are needed to comprehensively understand the construct of EI.

However, the EI-performance link must be interpreted with caution. Specifically, Joseph and Newman [ 12 ] examined emotional competence in the workplace and found that EI predicts performance among those with high emotional labor jobs but not their counterparts with low emotional labor jobs. In addition, they indicated that further research is required to delineate the relationship between EI and actual job performance, gender and race differences in EI, and the utility of different types of EI measures that are based on ability or mixed models in training and selection. Accordingly, Pérez-González and Qualter [ 13 ] have underscored the need for emotional education. Further, Brasseur et al. [ 14 ] found that better job performance is related to EI, especially among those with jobs for which interpersonal contact is very important.

It is noteworthy that EI is positively related to job satisfaction. Accordingly, Chiva and Alegre [ 15 ] found that there was an indirect positive relationship between self-reported EI (i.e., as per mixed models) and job satisfaction. A total of 157 workers from several companies participated in this study. These findings suggest that people with higher levels of EI are more satisfied with their jobs and demonstrate a greater capacity for learning than their counterparts with lower levels of EI.

Similarly, Sener, Demirel, and Sarlak [ 16 ] adopted a mixed model of EI and examine its effect on job satisfaction. They found that individuals with strong emotional and social competencies demonstrated greater self-control. A total of 80 workers participated in this study. They were able to manage and understand their own and others’ emotions in an intelligent and adaptive manner in their personal and professional lives.

In addition, EI (i.e., as per mixed models) predicts job success because it influences one’s ability to deal with environmental demands and pressures [ 17 ]. Therefore, it has been contended that several components of EI (i.e., as per mixed models) contribute to success and productivity in the workplace [ 18 ]; future research studies should extend this line of inquiry. Several studies have shown that people with high levels of ability EI communicate in an interesting and assertive manner, which in turn makes others feel more comfortable in the workplace [ 19 ]. In addition, it has been contended that EI (i.e., as per mixed models) plays a valuable role in group development because effective teamwork occurs when team members possess knowledge about the strengths and weaknesses of others and the ability to use these strengths when necessary [ 15 , 20 ]. It is especially important for senior managers to demonstrate high levels of EI because they play a predominant role in team management, leadership, and organizational development.

Finally, studies that have examined the relationship between EI and wellbeing have found that ability EI is a predictor of professional success, wellbeing, and socially relevant outcomes [ 21 – 23 ]. Extending this line of inquiry, Slaski and Cartwright [ 24 ] investigated the relationship between EI and the quality of working life among middle managers and found that higher levels of EI is related to better performance, health, and wellbeing.

EI and leadership

The actions of organizational leaders play a crucial role in modulating the emotional experiences of employees [ 25 ]. Accordingly, Thiel, Connelly, and Griffith [ 26 ] found that, within the workplace, emotions affect critical cognitive tasks including information processing and decision making. In addition, the authors have contended that leadership plays a key role in helping subordinates manage their emotions. In another study, Batool [ 27 ] found that the EI of leaders have a positive impact on the stress management, motivation, and productivity of employees.

Gardner and Stough [ 28 ] further investigated the relationship between leadership and EI among senior managers and found that leaders’ management of positive and negative emotions had a beneficial impact on motivation, optimism, innovation, and problem resolution in the workplace. Therefore, the EI of directors and managers is expected to be positively correlated with employees’ work motivation and achievement.

Additionally, EI competencies are involved in the following activities: choosing organizational objectives, planning and organizing work activities, maintaining cooperative interpersonal relationships, and receiving the support that is necessary to achieve organizational goals [ 29 ]. In this regard, some authors have provided compelling theoretical arguments in favor of the existence of a relationship between EI and leadership [ 30 – 34 ]. In this way, several researches [ 30 – 34 ] show that EI is a core and key variable positively related to effective and transformational leadership and this is important for positive effects on job performance and attitudes that are desirable in the organization.

Further, people with high levels of EI are more capable of regulating their emotions to reduce work stress [ 35 ]; thus, it is necessary to emphasize the importance of EI in order to meet the workplace challenges of the 21st century.

In conclusion, EI competencies are considered to be key qualities that individuals who occupy management positions must possess [ 36 ]. Further, EI transcends managerial hierarchies when an organization flourishes [ 37 ]. Finally, emotionally intelligent managers tend to create a positive work environment that improves the job satisfaction of employees [ 38 ].

EI trainings

Past studies have shown that training improves the EI of students [ 22 , 39 , 40 – 44 ], employees [ 45 – 47 ], and managers [ 48 – 52 ]. More specifically, within the academic context, Nelis et al. [ 22 ] found that group-based EI training significantly improved emotion identification and management skills. In another study, Nelis et al. [ 39 ] found that EI training significantly improved emotion regulation and comprehension and general emotional skills. It also had a positive impact on psychological wellbeing, subjective perceptions of health, quality of social relations, and employability. Similarly, several studies that have been conducted within the workplace have shown that EI can be improved through training [ 45 – 52 ] and have underscored the key role that it plays in effective performance [ 53 , 54 ].

In addition, two relevant metanalyses [ 8 , 55 ] concluded that there has been an increase in research interest in EI, recognition of its influence on various aspects of people’s lives, and the number of interventions that aim to improve EI. Relatedly, Kotsou et al. [ 55 ] and Hodzic et al. [ 8 ] reviewed the findings of past studies that have examined the effects of EI training to explore whether such training programs do indeed improve EI.

First, Hodzic et al. [ 8 ] concluded that EI training has a moderate effect on EI and that interventions that are based on ability models of EI have the largest effects. In addition, the improvements that had resulted from these interventions were found to have been temporally sustained.

Second, the conclusions of Kotsou et al.’s [ 55 ] systematic review of the literature on the effectiveness of EI training make it evident that more rigorous and controlled studies are needed to permit one to draw concrete conclusions about whether training improves ability EI. Studies that had adopted mixed models of EI tended to more consistently find that training improves EI. Accordingly, the results of Kotsou et al.’s [ 55 ] metanalytic study revealed that EI training enhances teamwork, conflict management, employability, job satisfaction, and work performance.

Finally, it is necessary to identify and address the limitations of past interventions in future studies to improve their quality and effectiveness.

Purpose of the study

In the systematic review conducted by Kotsou et al. [ 55 ] regarding research published on interventions to improve EI in adults, one out of five studies with managers, was performed on a sample of middle managers, without randomization, with an inactive control group, no immediate measures after the training, and only one evaluation was performed six months after the training. In the other four studies collected in Kotsou et al.’s systematic review [ 55 ], only one study utilized a control group (inactive control group), one employed randomizations, and two studies performed follow-up measures six months after the intervention.

The two metanalyses confirmed and identified some problems or gaps we have tried to overcome in the present study. For this reason, in our study, we propose to deepen the assessment of EI training for senior managers, aiming to overcome most of the limitations mentioned in the studies of Kotsou et al. [ 55 ] and Hodzic et al. [ 8 ] by implementing the following: 1) Include a control group (waiting list group); 2) Conduct follow-up measurements (12 months later); 3) Employ an experimental design; 3) Include a workshop approach with group discussions and interactive participation; 4) Identify specific individual differences (i.e., age, gender) that might determine the effects of interventions; and 5) Use self-report and ability measures. For these reasons, two different ways of evaluating EI have been selected in this study to assess the emotional competencies applied within the labor and business world to solve practical problems: the EQ-i questionnaire [ 2 ], based on mixed models to provide a self-perceived index of EI, and the Situational Test of Emotional Management (STEM) and the Situational Test of Emotional Understanding (STEU) [ 56 ] based on the ability model. Thus, including two different EI measure we aim at obtaining a more reliable validation of the intervention used.

Therefore, the objective of our study was to investigate whether EI can be improved among employees who occupy senior management positions in a private company. Thus, the research hypothesis was that participation in the designed program would improve EI among senior managers.

EI training development

The Course on Emotional Intelligence (TCEI) was created to provide senior managers with emotional knowledge and practical emotional skills so that they can apply and transfer their new understanding to teamwork and find solutions to real company problems and challenges. In this way, TCEI prepares workers to use the emotional learning resources appropriate to each work situation. In addition, TCEI combines face-to-face work sessions with a cross-sectional training through an e-learning platform. For more details, see S1 Appendix 1.

According to Mikolajczak [ 57 ], three interrelated levels of emotional intelligence can be differentiated: a) conceptual-declarative emotion knowledge, b) emotion-related abilities, and c) emotion-related dispositions. The TCEI aims at developing emotional skills, which are included on the second level of Mikolajczak’s model. Moreover, the present study uses the mixed model and the ability model measures to assess the level of EI. In using these measures, it is possible to assess the second level of Mikolajczak’s model. Pérez-González and Qualter [ 13 ] also suggest that activities related to ability EI should be included in emotional education programs.

Thus, this EI program was designed to allow senior managers to make use of their understanding and management of emotions as a strategy to assist them in facing the challenges within their work environment and managing their workgroups. Following the recommendation of Pérez-Gonzáles and Qualter, the training intervention methodology is founded in DAPHnE key practices [ 13 ]. It is important to emphasize that this training is grounded in practicality since it works based on the resolution of real cases, utilizing participative teaching-learning techniques and cooperative learning, while promoting the transfer of all aspects of EI and applied to various situations that can occur in the workplace. The e-learning system in the Moodle platform also provides an added value since it allows the creation of an environment providing exposure to professional experiences and continuous training. This type of pedagogical approach based on skills training and mediated through e-learning is a methodology that emerged in the 1990s when business organizations sought to create environments better suited to improving the management of large groups of employees. After its success, it began to be used in other contexts, including higher education and organizational development [ 58 – 60 ].

Finally, in order to justify the chosen training, it is important to note that the following official competencies for senior managers have been designated by the company:

Supervise the staff and guarantee optimum employee performance by fostering a motivational working environment where employees receive the appropriate support and respect and their initiatives are given the consideration they deserve.

Make decisions and promote clear goals, efficient leadership, competitive compensation, and acknowledgment of the employees’ achievements.

Justify their decisions to executives and directors, explaining how they have ensured training by creating opportunities for appropriate professional development for all employees and how they have facilitated conditions for a better balance in achieving the company’s objectives.

In conclusion, considering the above-mentioned professional competencies required, senior managers were selected as participants in this study since they need to possess and apply aspects related to EI in order to accomplish their leadership and staff management responsibilities.

Participants

The company participating in this study was an international company with almost 175 years of history that occupies a select position in a branch of industry in the natural gas value chain, from the source of supply to market, including supply, liquefaction, shipping, regasification, and distribution. The company is present in over 30 countries around the world.

This study was conducted involving a sample of 54 senior managers selected from a company in a European country. The sample was extracted from the entire population of senior managers within this company following a stratified random sampling procedure, taking into account the gender of the population in order to select 50% of each gender.

The mean age of participants was 37.61 years (standard deviation = 8.55) and the percentage of female senior managers was 50%. For evaluation purposes, these employees were randomly divided into two groups: the experimental group ( n = 26; mean age = 35.57 (7.54); 50% women) and the control group ( n = 28; mean age = 39.50 (9.11); 50% women). The control group received EI training after the last data collection.

Initially, a group of senior managers from the company was selected to participate in the study, as they are employees who need a special domain of EI due to the competencies assigned to their professional category. In all cases, informed consent was requested for their participation in the study.

Assignment of participants to each condition, experimental or control, was performed using a random-number program. In addition, to avoid the Hawthorne effect, participants were not told if they were assigned to the experimental or control group; only their consent to participate in research on the development of EI was asked. Participants from the control group completed the same evaluations as the training group but were not exposed to the training.

The scales were administered during the pretest phase (Time 1) on an online platform for the experimental and control groups. On average, approximately 90 minutes were needed to complete the tests.

After the data were collected in the pre-test phase, only the experimental group participated in the TCEI over seven weeks, and they received a diploma.

Later, the scales were administered during the posttest phase (Time 2). Similarly, we collected the same data one year later (Time 3). A lapse of one year was allowed to pass because all training programs carried out in this company are re-evaluated one year later to determine whether improvements in employees’ skills were maintained over time. In fact, this demonstrates a clear commitment to monitoring the results achieved. Other studies have also used reevaluations of their results. For example, according to Nelis et al. [ 22 ] and Nelis et al. [ 39 ], the purpose of their studies was to evaluate whether trait EI could be improved and if these changes remained. To accomplish this, the authors performed three assessments: prior to the intervention, at the end of the intervention, and six months later. Therefore, as recommended by Kirkpatrick [ 61 ], research on the effectiveness of training should also include a long-term assessment of skills transfer.

Finally, is important to remark that all participants were properly informed of the investigation, and their written consent was obtained. All methods were performed in accordance with the relevant guidelines and regulations and the study was approved by University of Alicante Ethics Committee (UA-2015-07-06) and carried out in accordance with the relevant guidelines and regulations.

As mentioned before, two different ways of defining and evaluating EI were selected for this study: (1) EQ-i, based on mixed models, and (2) the STEM/STEU questionnaires, based on the ability model of EI.

The Emotional Quotient Inventory [ 2 ]

To measure EI based on the mixed models, the short version of the EQ-i was used, which comprises 51 self-referencing statements and requires subjects to rate the extent to which they agree or disagree with each statement on a five-point Likert scale (1 = strongly disagree; 5 = strongly agree). An example item is the following; “In handling situations that arise, I try to think of as many approaches as I can.” The EQ-i comprises five factors: Intrapersonal EI and Self-Perception, Interpersonal EI, Adaptability and Decision Making, General Mood and Self-Expression, Stress Management, and a Total EQ-i score, which serves as a global EI measure. The author of this instrument reports a Cronbach’s alpha ranging from .69 to .86 for the 5 subscales [ 2 , 62 ] and the Cronbach’s alpha of the Emotional Quotient Inventory was .80 for the present sample of senior manager.

Situational Test of Emotional Understanding (STEU) and Situational Test of Emotion Management (STEM) [ 63 ]

Two tests were used to measure EI based on the ability model. Emotion understanding was evaluated by the short version of the Situational Test of Emotional Understanding (STEU) [ 63 ]. This test is composed of 25 items that present an emotional situation (decontextualized, workplace-related, or private-life-related). For each item, participants have to choose which emotion will most likely elicit the described situation. Cronbach’s alpha of STEU is .83 [ 63 ] and the Cronbach’s alpha of the Situational Test of Emotional Understanding was .86 for the present sample of senior manager. An example item is the following: “An unwanted situation becomes less likely or stops altogether. The person involved is most likely to feel: (a) regret, (b) hope, (c) joy, (d) sadness, (e) relief” (in this case, the correct answer is “relief”).

On the other hand, emotion management was evaluated by the short version of the Situational Test of Emotion Management (STEM) [ 63 ]. This test is composed of a 20-item situational judgment test (SJT) that uses hypothetical behavioral scenarios followed by a set of possible responses to the situation. Respondents must choose which option they would most likely select in a “real” situation. Cronbach’s alpha of STEM is .68 [ 63 ] and the Cronbach’s alpha of the Situational Test of Emotion Management was .84 for the present sample of senior manager. An example item is the following: “Pete has specific skills that his workmates do not, and he feels that his workload is higher because of it. What action would be the most effective for Pete? (a) Speak to his boss about this; (b) Start looking for a new job; (c) Be very proud of his unique skills; (d) Speak to his workmates about this.”

TCEI content and organization

The program schedule spanned seven weeks with a face-to-face session of 95 minutes each week, which was delivered by one of the researchers specifically trained for this purpose. All the experimental group participants were taught together in these sessions. The content of each session was the following:

1st Session : Introduction. The objectives and methodology of the training were explained to participants.

2nd Session : Intrapersonal EI and self-perception. Trainees learned to identify their own emotions.

3rd Session : Interpersonal EI. Participants learned to identify others’ emotions.

4th Session : Adaptability and decision making. The objective was to improve trainees’ ability to identify and understand the impact that their own feelings can have on thoughts, decisions, behavior, and work performance resulting in better decisions and workplace adaptability.

5th Session : General mood and self-expression. Trainees worked on expressing their emotions and improving their skills to effectively control their mood.

6th Session : Stress management. Participants learned EI skills to manage stress effectively.

7th Session : Emotional understanding and emotion management. Trainees learned skills to effectively manage their emotions as well as skills that influence the moods and emotions of others.

In addition, access to the virtual environment (Moodle platform) was required after each face-to-face session. The time spent in the platform was registered, with a minimum of five hours required per week.

The virtual environment allowed the researcher to review all the content completed in each face-to-face session.

All of the EI abilities included in the virtual part of the training have been previously used in the face-to-face part; thus, virtual training is simply a method used to consolidate EI knowledge. In fact, the virtual environment has the same function as completing a workbook about the information presented during the face-to-face session. However, the added advantage of working in an e-learning environment is that all of the trainers are connected and can share their tasks and progress with others. At times, in addition to reviewing the contents of the previous session, the e-learning environment also introduces some important terms for the next session utilizing the principles of the well-known flipped classroom methodology. In short, the following activities were carried out through the Moodle platform to consolidate the participants’ knowledge:

1st Session: Participants were informed that e-learning would be part of the training in order to consolidate EI knowledge.

2nd Session: Participants explored the skills of Intrapersonal EI and self-perception in the virtual environment through discussion forums.

3rd Session: Participants learned the skills of identifying others’ emotions and utilizing this emotional information for decision-making. This information was summarized in the virtual environment through discussion forums.

4th Session: Participants sharpened their skills of adaptability and decision-making through the production of innovative ideas and the utilization of critical thinking skills in assessing the impact that their own feelings can have on others’ work performance. Trainees learned how to express their own emotions, as well as the skill of effectively controlling their mood, through the resolution of practical cases in the virtual environment; in these cases, innovative ideas and critical thinking skills were required in order to make better decisions during emotionally impactful; situations. In addition, trainees utilized the forum to reflect on why their own emotional regulation is important for ensuring long-term workplace adaptability.

5th Session: Verbal quiz, discussion, and forum contribution. Trainees participated in an online debate about key emotional skills in order to understand how to apply them in a real work environment. In particular, the debate focused on regulating the self-expression skill and equilibrating the general mood when there are difficult situations within the company. In this way, the participants identified the skills required to effectively manage the stress experienced in order to maintain a positive mood A discussion about common stressful situations at work was carried out in the virtual environment, and strategies for regulating the mood during critical work situations were shared.

6th Session: Discussion of ideas related to EI. Trainees participated in an online debate about key emotional skills in order to understand how to apply stress management skills to the real work environment. It was necessary to share previous work experiences where stress was a significant challenge and reevaluate the emotionally intelligent way to deter stress and maintain a balanced senior manager life.

7th Session: Participants concluded the training on target strategies to effectively manage their emotions as well as skills that influence the moods and emotions of others. This session, therefore, was a period for feedback where brief answers to specific doubts were provided. In addition, the outcomes of the training were established by the participants. Finally, senior managers were encouraged to stay connected through the Moodle platform in order to resolve future challenges together using the EI skills learned and internalized during the training period.

Data analysis

An experimental pretest-posttest with a control group design was adopted. Under this design, multivariate variance analysis (MANOVA) and univariate variance analysis (ANOVA) of repeated measures were performed, in which the measures of dependent variables were treated as variables evaluated within the same subjects, and groups operated as variables between subjects. Finally, all statistical analyses were conducted using SPSS statistical software, version 21.0 (IBM, Armonk, USA).

First, sample normality analysis indicated that the population followed a normal distribution. The results of Box’s M test did not show homogeneity in the variance-covariance matrix on the EQ-i Total Scale (M = 59.29; F = 9.26, p ≥ 0.00) or on the STEM/STEU (M = 231.01; F = 36.07, p ≥0.00). However, Hair et al. [ 64 ] have stated that if the control and the experimental groups are of equal size, which was the case in this study, then that factor tends to mitigate the effects of violations of the normality assumption.

Second, to test whether there was any significant difference between the experimental group and control group at the time of pretest, Student’s t -test was performed to determine the differences in means of all the variables measured ( Table 1 ). Table 1 shows that there were no significant differences at the time of pretest. This finding suggests that both groups began in analogous situations.

Table 1. Student’s t-test of differences in means (t1, t2, t3).

Note. t1 = pretest; t2 = posttest; t3 = follow-up.

1 = direct score

Therefore, we came to the conclusion that the two groups of workers could not be distinguished by EI level before the TCEI program. In addition, the mean age of each group was analyzed and no baseline differences were found between the two groups.

To assess the impact of the program on EI, the scores obtained by both groups were compared before its implementation (pretest–Time 1) and shortly after the program was delivered (posttest–Time 2), as well as one year later (follow-up–Time 3). Group membership was the independent factor or variable, and the scores obtained by the subjects regarding EI were the criteria or dependent variables.

Two control variables, gender and age, were included in the analysis because they could affect the results. However, none of these variables showed a statistically significant effect in any of the variables assessed (p≥ .50 in all cases).

Regarding the implementation of the program, Table 2 presents the test results for intra-subject effects, which showed significant Group x Time interaction for all variables except for Adaptability.

Table 2. Summary of intra- and inter-subject univariate ANOVA.

The observed power was highest in the key scales: 1.00 for the STEU/STEM and Total EQ-i. Regarding the subscales, the observed power was also 1.00 for the Intrapersonal, Stress Management, and General Mood subscales; on the other hand, the observed power for the Interpersonal and the Adaptability subscales was .66 and .55, respectively.

Similarly, the effect size (η 2 ), the proportion of total variability attributable to a factor, and the magnitude of the difference between one time and another resulting from the interaction between the time of assessment and implementation of the program, was high for the key scales: ≥.71 for the STEU/STEM, and .82 for the Total EQ-i. With regards to the subscales, the effect size (η 2 ) was the following: .44 for Intrapersonal, .07 for Interpersonal, .32 for Stress Management, .05 for Adaptability, and .26 for General Mood.

To further explain these results, complementary analyses were performed. On the one hand, as shown in Table 1 , we carried out an average comparison between the experimental and control groups at the measurement moments T2 and T3. Results revealed significant differences between the experimental group and the control group regarding all variables and in both moments (T2 and T3), except for the Interpersonal variable, in which the experimental group obtained higher scores in these two moments but without being statistically significant these differences. This could explain the small effect size obtained for this variable.

In addition, the Adaptability variable showed statistically significant differences between the experimental group and the control group at time T2, with the control group scoring higher, while at time T3, the experimental group also obtains higher scores regarding Adaptability; however, this score difference with regards to the control group was not statistically significant. This could explain why the interaction was not significant and the small effect size obtained for this variable.

In order to compare differences between moments T1, T2, and T3, the marginal means were analyzed for both groups (experimental and control) per moment and variable ( Table 3 ).

Table 3. Marginal means comparing t1-t2, t1-t3, and t2-t3.

Note. EG = experimental group; CG = control group; t1 = pretest; t2 = posttest; t3 = follow-up.

In general, in the experimental group, there was a significant improvement between moments T1 and T2 in all variables, except Interpersonal and Adaptability, which did not present changes at any of the three moments (T1, T2, T3). On the other hand, scores remained without significant changes regarding all variables between moments T2 and T3, except in the case of STEU and STEM, in which the scores continued to improve between moments T2 and T3.

In the control group, the results were the same as in the experimental group concerning the Interpersonal and Adaptability variables. However, with regards to other variables, the trend was inverse to the experimental group between moments T1 and T2; in this case, there was a significant decrease in the scores between these two moments in the rest of the variables. Between moments T2 and T3, the scores remained without significant changes in all the variables measured with the EQ-i. In the case of variables measured with the ability test, there was a significant decrease in the STEU scores between moments T2 and T3, whereas the STEM scores remained without significant changes.

Figs 1 – 3 show the scores obtained in the EQ-i total scale and STEM/STEU total scales by both groups at Times 1, 2, and 3. At Times 2 and 3, the experimental group, which had received the EI training, had an increase in its scores, whereas the control group did not present any substantial change in scores.

Fig 1. Total EQi performance of the groups at pre-test (Time 1), post-test (Time 2), and one year after (Time 3).

Fig 1

Fig 3. STEM performance of the groups at pre-test (Time 1), post-test (Time 2), and one year after (Time 3).

Fig 3

Fig 2. STEU performance of the groups at pre-test (Time 1), post-test (Time 2), and one year after (Time 3).

Fig 2

The objective of this study was to examine the effectiveness of an EI training program among the senior managers (N = 54) of a private company. Consistent with Pérez-González and Qualter [ 13 ], Hodznik et al. [ 8 ], and Kotsou et al.’s [ 55 ] recommendations, we aimed to contribute new research findings and extend the existing literature on the effectiveness of EI training in the workplace. The main findings of this study revealed that intrapersonal EI, self-perception, general mood, self-expression, and stress management were maintained after the completion of the training. On the other hand, improvements in emotional understanding and emotion management had strengthened over time. However, the results also revealed that training did not result in similar improvements across all variables. Specifically, training had a nonsignificant impact on interpersonal and adaptability skills.

Theoretical implications of the study

With regard to the theoretical implications of the present findings, the observed effectiveness of the TCEI, which was conducted using an innovative methodology that entailed face-to-face training and a virtual campus support system among senior managers, extends the existing literature on the development of EI training programs.

The training program that was conducted as a part of this study failed to improve two dimensions of EI: interpersonal and adaptability skills. There are two possible explanations for why these variables did not demonstrate improvement. First, high-quality training that addresses all the dimensions of EI is necessary to produce large effects. Therefore, the time and exercises that are devoted to these two dimensions of EI may need to be redefined. Accordingly, the second and fourth sessions of this training (i.e., interpersonal and adaptability skills, respectively) can be enriched by adding new activities and including long-term evaluation of the transfer of skills to real workplace situations in which these abilities are required to resolve challenges. Indeed, allocating more time and exercises to these topics may have offered participants greater experience in practicing these interpersonal and adaptability skills in regular and virtual classroom settings before applying them in the workplace.

On the other hand, changes in these two dimensions of EI may not be detectable immediately after the completion of the training or soon after a year has elapsed. Similarly, the studies that Kotsou et al. reviewed [ 55 ] also indicated that improvements in EI may not be detectable immediately or shortly after the completion of an intervention. Further, the conclusions of this review appear to suggest that shorter training programs do not improve some dimensions of EI. Therefore, a more intensive training and longer time gap between completion of training and assessment (i.e., after more than a year has elapsed) may yield significant results for these two dimensions of EI. Indeed, other studies have used longer time gaps such as more than two years [ 40 ] and yearly evaluations across three years [ 47 ].

In any case, the present findings suggest that the proposed training intervention is effective in improving some dimensions of EI. In particular, senior managers who received EI training demonstrated significant improvements in their ability to perceive, understand, and accept their own and others’ emotions in an effective way, be self-reliant, achieve personal goals, manage stress, have a positive attitude, and control and manage emotions; these findings are consistent with those of past studies that have aimed to improve EI by providing training in workplaces [ 45 – 52 ].

The largest effects emerged for the total scores for EI (as per mixed models; total EQ-i), followed by emotion management (STEM) and understanding (STEU), intrapersonal aspects, stress management, and finally, general mood. Moreover, improvements in emotional understanding and emotion management that had resulted from the training intervention had strengthened over time.

Similarly, several researchers have indicated that EI plays a key role in leadership development and success in the workplace [ 65 , 66 ]. The behaviors of managers shape critical stages of their subordinates’ careers as well as the provision of optimal training and promotion [ 67 , 68 ]. Given the unique significance that EI and optimal leadership bears to this group of professionals, the present study aimed to improve the EI of senior managers.

In sum, the proposed program is a training intervention that can be used to enhance the EI of senior managers because, as the previously articulated extensive literature review has demonstrated, EI plays a key role within work environments. Therefore, the present findings suggest that the TCEI is an effective training program that can improve the ability to identify one’s own and others’ emotions as well as identify and understand the impact of one’s feelings on thoughts, decisions, behaviors, and performance at work.

Practical implications

The present findings serve as empirical evidence of the effectiveness of the training program that was conducted in the present study in improving key dimensions of EI that foster the emotional skills that are both necessary and desirable in the workplace. Accordingly, the present findings have practical implications because they support the future use of the EI training program that was used in the present study. In this regard, the present findings revealed that EI training can promote the emotional development of senior managers.

In addition, the methodology of the training program is noteworthy because it required participants to use communication and work as a group to solve real practical problems that necessitate the application of EI skills in the workplace. Similarly, the use of face-to-face training alongside an e-learning platform helped participants acquire the ability to learn independently as well as synergically (i.e., with other senior managers). This encouraged the group to reflect on their knowledge about EI and apply their EI skills to handle workplace challenges.

It is important to emphasize that there were significant temporal changes in the scores of measures of emotional understanding and emotion management; in other words, the scores continued to improve a year after the completion of training. It is interesting to note that the methodology of the last training session was unique because it involved the creation of a “life and career roadmap” and “commitment to growth and development. We believe that these exercises were responsible for the continued improvement in important EI skills over time that was observed in the present study.

This finding has important practical implications because it underscores the importance of requiring senior managers to indicate their commitment to the transfer of knowledge. Indeed, the roadmap defines the results that are expected to follow the implementation of the learned emotional strategies and verifies the achievement of these results. In addition, all managers signed an online contract to indicate their commitment to remain connected through the virtual campus support system to resolve any conflicts that may arise within the company in an emotionally intelligent manner.

We believe that the method of learning that our intervention entailed is more effective than conventionally used methods. Further, the uniqueness of this method may have contributed to the observed change in scores because it allowed frustrated senior managers to share their unresolved issues. Finally, by practicing emotional understanding and emotional management during the training, the created a plan of action and implemented their solutions using EI strategies.

In addition, we believe that signing the online contract helped them understand their responsibilities and the impact that their emotional understanding and emotion management can have on the organization. The fact that their scores on measures of emotional understanding and emotion management continued to increase over time indicates that the subjects had acquired these skills and that, once they had acquired them, they continued to develop them. Similarly, Kotsou et al. [ 55 ] also found that training resulted in stable improvements in EI. In addition to providing their participants with EI tools and skills as a part of their training, they also motivated them to apply these skills and use these tools in the future.

Taken together, the present findings have promising practical implications. Specifically, the findings suggest that a training methodology that facilitates knowledge transfer (i.e., application of knowledge about EI in the management of workplace challenges) can enhance the following dimensions of EI: emotional understanding, emotion management, self-perception (through training activities that pertain to self-regard, self-actualization, and emotional self-awareness), decision making (through training activities that pertain to problem solving, reality testing, and impulse control), self-expression (through training activities that pertain to emotional expression, assertiveness, and independence), and stress management (through training activities that pertain to flexibility, stress tolerance, and optimism).

Limitations and future studies

The present study has several limitations that require explication. First, we included only age and gender as control variables and omitted other individual differences that could have influenced the results. However, it is important for future researchers to define and examine the role of individual differences in the effects of EI training in greater detail. In addition, in accordance with Kotsou et al. [ 55 ] and Hodzic et al.’s [ 8 ] suggestions, detailed behavioral indicators must be examined because they may play a crucial role in the effectiveness of EI training. Another limitation of the present study is that the intervention program was conducted in only one company. Therefore, future studies must implement this program in different companies and across varied business contexts. The present results make it apparent that further refinements are needed in order to address the aforementioned limitations of this intervention.

Another limitation of the present study is that it did not assess the effect that improvements in EI can have on other variables. Accordingly, recommendations for further research include the determination of whether improvements in EI that result from training lead to improvements in other variables such as job satisfaction and performance and successful leadership, in accordance with the results of other research studies [ 69 – 72 ]. Thus, future research studies must consider these possibilities when they examine whether the TCEI has the potential to produce all the aforementioned outcomes at an organizational level. Furthermore, the intervention can be redesigned in such a manner that it yields specific performance outcomes. Further, longitudinal studies on the effectiveness of EI training must be conducted across several sectors and countries.

Finally, senior managers define and direct the careers of the rest of a company’s personnel; Therefore, future research studies must examine how EI training can be used to promote its previously observed desirable effects such as the demonstration of good leadership behaviors, effective cooperation, and teamwork [ 29 , 31 , 34 – 38 , 69 ]. In fact, this is an interesting line of inquiry for future researchers.

Conclusions

In conclusion, the present findings contribute to the existing knowledge on the development of EI because they indicate that the training program resulted in improvements in many dimensions of the EI of senior managers. More specifically, the longitudinal effects of EI training on senior managers’ emotional skills had maintained over time, whereas the corresponding effects on emotional understanding and emotion management had strengthened at one-year follow up. Finally, the implementation of this intervention in organizational settings can nurture and promote a sense of fulfillment among employees.

Supporting information

Data underlying the findings described.

TCEI planning schedule.

Acknowledgments

This research was supported by the Spanish Ministry of Economy and Competitiveness (EDU2015-64562-R)

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

This research was supported by the Spanish Ministry of Economy and Competitiveness (EDU2015-64562-R) to R.G-C. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

  • 1. Goleman D. Emotional intelligence: Why it can matter more than IQ. Learning. 1996; 24(6), 49–50. [ Google Scholar ]
  • 2. Bar-On R. EQ-i: Bar On emotional quotient inventory: A measure of emotional intelligence: Technical manual. North Tonawanda: Multi-Health System; 2002. [ Google Scholar ]
  • 3. Bar-On R. EQ-i: The Bar-On model of emotional-social intelligence (ESI). Psicothema. 2002; 18, 13–25. Available from: http://www.redalyc.org/articulo.oa?id=72709503 [ PubMed ] [ Google Scholar ]
  • 4. Mayer J. D., Salovey P., and Caruso D. R. Emotional intelligence: Theory, findings, and implications. Psychological Inquiry. 2004; 15, 197–215. [ Google Scholar ]
  • 5. Mayer J. D., and Salovey P. The intelligence of emotional intelligence. Intelligence. 1993; 17, 433–442. [ Google Scholar ]
  • 6. Petrides K. V., Pita R., and Kokkinaki F. The location of trait emotional intelligence in personality factor space. British Journal of Psychology. 2007; 98(2), 273–289. 10.1348/000712606X120618 [ DOI ] [ PubMed ] [ Google Scholar ]
  • 7. Petrides K. V., and Furnham A. Trait emotional intelligence: Psychometric investigation with reference to established trait taxonomies. European Journal of Personality, 2001; 15, 425–448. [ Google Scholar ]
  • 8. Hodzic S., Scharfen J., Ripoll P., Holling H., and Zenasni F. How Efficient Are Emotional Intelligence Trainings: A Meta-Analysis. Emotion Review. 2018; 10(2), 138–148. 10.1177/1754073917708613 [ DOI ] [ Google Scholar ]
  • 9. Schutte N. S., Malouff J. M., and Thorsteinsson E. B. Increasing emotional intelligence through training: Current status and future directions. International Journal of Emotional Education. 2013; 5(1), 56. [ Google Scholar ]
  • 10. Moon T. W., and Hur W. M. Emotional intelligence, emotional exhaustion, and job performance. Social Behavior and Personality. 2011; 39(8), 1087–1096. 10.2224/sbp.2011.39.8.1087 [ DOI ] [ Google Scholar ]
  • 11. Van Rooy DL, Viswesvaran C Emotional intelligence: A meta-analytic investigation of predictive validity and nomological net. Journal of Vocational Behavior. 2014; 65, 71–95. 10.1016/S0001-8791(03)00076-9 [ DOI ] [ Google Scholar ]
  • 12. Joseph DL, Newman DA Emotional intelligence: An integrative meta-analysis and cascading model. Journal of Applied Psychology. 2010; 95, 54–78. 10.1037/a0017286 [ DOI ] [ PubMed ] [ Google Scholar ]
  • 13. Pérez-González JC, and Qualter P Emotional intelligence and emotional education in school years In Dacree Pool L., Qualter P, editors. An Introduction to Emotional Intelligence. Chichester: Wiley; 2018. pp. 81–104. [ Google Scholar ]
  • 14. Brasseur S, Grégoire J, Bourdu R, Mikolajczak M. The Profile of Emotional Competence (PEC): Development and Validation of a Self-Reported Measure that Fits Dimensions of Emotional Competence Theory. 2013; PLOS ONE, 8 (5) e62635 10.1371/journal.pone.0062635 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 15. Chiva R., and Alegre J. Emotional intelligence and job satisfaction: The role of organizational learning capability. Personnel Review. 2008; 37(5–6), 680–701. 10.1108/00483480810906900 [ DOI ] [ Google Scholar ]
  • 16. Sener E., Demirel O., and Sarlak K. The effect of the emotional intelligence on job satisfaction. Connecting Health and Humans. 2009; 146, 710–711. 10.3233/978-1-60750-024-7-710 [ DOI ] [ PubMed ] [ Google Scholar ]
  • 17. Moreno-Jimenez B., Blanco-Donoso L. M., Aguirre-Camacho A., de Rivas S., and Herrero M. Social skills for the new organizations. Behavioral Psychology-Psicologia Conductual. 2014; 22, 585–602. [ Google Scholar ]
  • 18. Kinman G., and Grant L. Exploring stress resilience in trainee social workers: The role of emotional and social competencies. British Journal of Social Work. 2011; 41, 261–275. [ Google Scholar ]
  • 19. Froman L. Positive psychology in the workplace. Journal of Adult Development. 2010; 17, 59–69. [ Google Scholar ]
  • 20. Humphrey R. H., Pollack J. M., and Hawver T. Leading with emotional labor. Journal of Managerial Psychology. 2008; 23, 151–168. [ Google Scholar ]
  • 21. Mayer J. D., Salovey P., and Caruso D. R. Emotional intelligence: New ability or eclectic traits? American Psychologist. 2008; 63, 503–517. 10.1037/0003-066X.63.6.503 [ DOI ] [ PubMed ] [ Google Scholar ]
  • 22. Nelis D., Kotsou I., Quoidbach J., Hansenne M., Weytens F., Dupuis P., et al. Increasing emotional competence improves psychological and physical well-being, social relationships, and employability. Emotion. 2011; 11(2), 354–366. 10.1037/a0021554 [ DOI ] [ PubMed ] [ Google Scholar ]
  • 23. Hur Y., van den Berg P. T., and Wilderom C. P. M. Transformational leadership as a mediator between emotional intelligence and team outcomes. Leadership Quarterly. 2011; 22(4), 591–603. 10.1016/j.leaqua.2011.05.002 [ DOI ] [ Google Scholar ]
  • 24. Slaski M., and Cartwright S. Health, performance and emotional intelligence: An exploratory study of retail managers. Stress and Health: Journal of the International Society for the Investigation of Stress. 2002; 18(2), 63–68. [ Google Scholar ]
  • 25. Kaplan S., Cortina J., Ruark G., LaPort K., and Nicolaides V. The role of organizational leaders in employee emotion management: A theoretical model. The Leadership Quarterly. 2014; 25(3), 563–580. [ Google Scholar ]
  • 26. Thiel C. E., Connelly S., and Griffith J. A. Leadership and emotion management for complex tasks: Different emotions, different strategies. The Leadership Quarterly. 2012; 23(3), 517–533. [ Google Scholar ]
  • 27. Batool B. F. Emotional intelligence and effective leadership. Journal of Business Studies Quarterly. 2013; 4(3), 84. [ Google Scholar ]
  • 28. Gardner L., and Stough C. Examining the relationship between leadership and emotional intelligence in senior level managers. Leadership and organization development journal. 2002; 23(2), 68–78. [ Google Scholar ]
  • 29. Sunindijo R. Y., Hadikusumo B. H. W., and Ogunlana S. Emotional intelligence and leadership styles in construction project management. Journal of Management in Engineering. 2007; 23, 166–170. [ Google Scholar ]
  • 30. Cooper R. K. Applying emotional intelligence in the workplace. Training and Development, 51. 1997; 31–42. [ Google Scholar ]
  • 31. George J. M. Emotions and Leadership: The Role of Emotional Intelligence. Human Relations. 2000; 53(8), 1027–1055. 10.1177/0018726700538001 [ DOI ] [ Google Scholar ]
  • 32. Wong C. S., and Law K. S. The effects of leader and follower emotional intelligence on performance and attitude: An exploratory study. The leadership quarterly. 2002; 13(3), 243–274. [ Google Scholar ]
  • 33. Palmer B., Walls M., Burgess Z., and Stough C. Emotional intelligence and effective leadership. Leadership and Organization Development Journal. 2001; 22(1), 5–10. [ Google Scholar ]
  • 34. Yammarino F. J., Spangler W. D., and Bass B. M. Transformational leadership and performance: A longitudinal investigation. The Leadership Quarterly. 1993; 4(1), 81–102. [ Google Scholar ]
  • 35. Cha J, Cichy RF, Kim SH. The contribution of emotional intelligence to social skills and stress management skills among automated foodservice industry executives. Journal of Human Resources in Hospitality and Tourism. 2008; 8(1),15–31. [ Google Scholar ]
  • 36. Dulewicz V, Higgs M. Leadership at the top: The need for emotional intelligence in organizations. The International Journal of Organizational Analysis. 2003;11(3), 193–210. [ Google Scholar ]
  • 37. Goleman D, Boyatzis R. McKee. The New Leaders: Transforming the Art of Leadership into the Science of Results. London: Little Brown, 2002. [ Google Scholar ]
  • 38. Goleman D. and Boyatzis R. Social intelligence and the biology of leadership. Harvard Business Review. 2008; 86, 74–85. [ PubMed ] [ Google Scholar ]
  • 39. Nelis D., Quoidbach J., Mikolajczak M., and Hansenne M. Increasing emotional intelligence: (How) is it possible? Personality and Individual Differences. 2009; 47(1), 36–41. [ Google Scholar ]
  • 40. Boyatzis R. E., and Saatcioglu A. A 20-year view of trying to develop emotional, social and cognitive intelligence competencies in graduate management education. Journal of Management Development. 2008; 27, 92–108. 10.1108/02621710810840785 [ DOI ] [ Google Scholar ]
  • 41. Clarke N. Developing emotional intelligence abilities through team-based learning. Human Resource Development Quarterly. 2010; 21(2), 119–138. 10.1002/hrdq.20036 [ DOI ] [ Google Scholar ]
  • 42. Dacre Pool L., and Qualter P. Improving emotional intelligence and emotional self-efficacy through a teaching intervention for university students. Learning and Individual Differences. 2012; 22(3), 306–312. 10.1016/j.lindif.2012.01.010 [ DOI ] [ Google Scholar ]
  • 43. Gilar-Corbi R., Pozo-Rico T., and Castejon-Costa J. L. Improving emotional intelligence in higher education students: testing program effectiveness in tree countries. Educacion Xx1. 2019; 22(1), 161–187. 10.5944/educXX1.1988044 [ DOI ] [ Google Scholar ]
  • 44. Pool L. D., and Qualter P. Improving emotional intelligence and emotional self-efficacy through a teaching intervention for university students. Learning and Individual Differences. 2012; 22(3), 306–312. 10.1016/j.lindif.2012.01.010 [ DOI ] [ Google Scholar ]
  • 45. Beigi M., and Shirmohammadi M. Effects of an emotional intelligence training program on service quality of bank branches. Managing Service Quality. 2011; 21(5), 552–567. 10.1108/09604521111159825 [ DOI ] [ Google Scholar ]
  • 46. Turner R., and Lloyd-Walker B. (2008). Emotional intelligence (EI) capabilities training: Can it develop EI in project teams? International Journal of Managing Projects in Business, 1(4), 512–534. 10.1108/17538370810846450 [ DOI ] [ Google Scholar ]
  • 47. Dugan J. W., Weatherly R. A., Girod D. A., Barber C. E., and Tsue T. T. A longitudinal study of emotional intelligence training for otolaryngology residents and faculty. JAMA Otolaryngology Head Neck Surgery. 2014; 140(8), 720–726. 10.1001/jamaoto.2014.1169 [ DOI ] [ PubMed ] [ Google Scholar ]
  • 48. Cherniss C., Grimm L. G., and Liautaud J. P. Process-designed training: A new approach for helping leaders develop emotional and social competence. Journal of Management Development. 2010; 29(5), 413–431. 10.1108/02621711011039196 [ DOI ] [ Google Scholar ]
  • 49. Clarke N. The impact of a training program designed to target the emotional intelligence abilities of project managers. International Journal of Project Management. 2010; 28(5), 461–468. 10.1016/j.ijpro-man.2009.08.004 [ DOI ] [ Google Scholar ]
  • 50. Dulewicz V., and Higgs M. Can emotional intelligence be developed? International Journal of Human Resource Management. 2004: 15, 95–111. 10.1080/0958519032000157366 [ DOI ] [ Google Scholar ]
  • 51. Slaski M., and Cartwright S. Emotional intelligence training and its implications for stress, health and performance. Stress and Health. 2003: 19(4), 233–239. 10.1002/smi.979 [ DOI ] [ Google Scholar ]
  • 52. Daus C. S., Cage T., Cooper C. L., and Ashkanasy N. M. Learning to face emotional intelligence: Training and workplace applications. Research companion to emotion in organizations. 2008; 245–260. [ Google Scholar ]
  • 53. Coté S., and Miners C.T.H. Emotional intelligence, cognitive intelligence, and job performance. Administrative science quarterly. 2006; 51, 1–28. [ Google Scholar ]
  • 54. Sy T., Tram S., and O'hara L.A. Relaction of employee and manager emotional intelligence to job satisfaction and performance. Journal of vovational behavior. 2006; 68(3), 461–473. 10.1016/j.jvb.2005.10.003 [ DOI ] [ Google Scholar ]
  • 55. Kotsou I., et al. Improving Emotional Intelligence: A Systematic Review of Existing Work and Future Challenges. Emotion Review, 2018, 10.1177/1754073917735902 [ DOI ] [ Google Scholar ]
  • 56. MacCann C., and Roberts R. D. New paradigms for assessing emotional intelligence: theory and data. Emotion. 2008; 8(4), 540–551. 10.1037/a0012746 [ DOI ] [ PubMed ] [ Google Scholar ]
  • 57. Mikolajczak M. Moving beyond the ability-trait debate: A three level model of emotional intelligence. E-Journal of Applied Psychology. 2009; 5, 25–32 [ Google Scholar ]
  • 58. Halasz G., and Michel A. Key competences in Europe: Interpretation, policy formulation and implementation. European Journal of Education. 2011; 46(3), 289–306. [ Google Scholar ]
  • 59. Staudel T. Key competences for apprentices. International Journal of Psychology. 2008; 43(3–4), 336–336. [ Google Scholar ]
  • 60. Clarke N. Emotional intelligence and its relationship to transformational leadership and key project manager competences. Project Management Journal. 2010; 41(2), 5–20. [ Google Scholar ]
  • 61. Kirkpatrick D. Evaluating training programs: The four levels (2nd ed). San Francisco, CA: Berrett–Koehler Publishers; 1998. [ Google Scholar ]
  • 62. Bar-On R. The Bar-On Emotional Quotient Inventory (EQ-i): rationale, description and summary of psychometric properties In Glenn G, editors. Measuring emotional intelligence: common ground and controversy. Hauppauge, NY: Nova Science Publishers; 2004. pp. 111–142. [ Google Scholar ]
  • 63. MacCann C., and Roberts R. D. The brief assessment of emotional intelligence: Short forms of the Situational Test of Emotional Understanding (STEU) and Situational Test of Emotion Management (STEM). Princeton, NJ: Educational Testing Service; 2008. [ Google Scholar ]
  • 64. Hair J. F., Anderson R. E., Tatham R. L., and Black W. C. Análisis Multivariante, 5ª Ed. Madrid: Prentice Hall Iberia; 1999. [ Google Scholar ]
  • 65. Wolff S. B., Pescosolido A. T., and Druskat V. U. Emotional intelligence as the basis of leadership emergence in self-managing teams. Leadership Quarterly. 2002; 13(5), 505–522. 10.1016/S1048-9843(02)00141-8 [ DOI ] [ Google Scholar ]
  • 66. Buono A. F. Primal leadership: Realizing the power of emotional intelligence. Leadership Quarterly. 2003; 14(3), 353–356. 10.1016/S1048-9843(03)0019-5 [ DOI ] [ Google Scholar ]
  • 67. Rosete D., and Ciarrochi J. Emotional intelligence and its relationship to workplace performance outcomes of leadership effectiveness. Leadership and Organization Development Journal. 2005; 26(5), 388–399. [ Google Scholar ]
  • 68. Goleman D. What makes a leader? Harvard Business Review. 2004; 82(1), 82–94. [ PubMed ] [ Google Scholar ]
  • 69. Kerr R., Garvin J., Heaton N., and Boyle E. Emotional intelligence and leadership effectiveness. Leadership and Organization Development Journal. 2006; 27(4), 265–279. [ Google Scholar ]
  • 70. Carmeli A. The relationship between emotional intelligence and work attitudes, behavior and outcomes: An examination among senior managers. Journal of managerial Psychology. 2003; 18(8), 788–813. [ Google Scholar ]
  • 71. Pescosolido A. T. Emergent leaders as managers of group emotion. The Leadership Quarterly, 13(5). 2002; 583–599. [ Google Scholar ]
  • 72. Foster C., and Roche F. Integrating trait and ability EI in predicting transformational leadership. Leadership and Organization Development Journal. 2014; 35, 316–334. [ Google Scholar ]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data availability statement.

  • View on publisher site
  • PDF (733.6 KB)
  • Collections

Similar articles

Cited by other articles, links to ncbi databases.

  • Download .nbib .nbib
  • Format: AMA APA MLA NLM

Add to Collections

Determining the level of evidence: Nonexperimental research designs

Affiliation.

  • 1 Amy Glasofer is a nurse scientist at Virtua Center for Learning in Mt. Laurel, N.J., and Ann B. Townsend is an adult NP with The Nurse Practitioner Group, LLC.
  • PMID: 33953103
  • DOI: 10.1097/01.NURSE.0000731852.39123.e1

To support evidence-based nursing practice, the authors provide guidelines for appraising research based on quality, quantity, and consistency. This article, the second of a three-part series, focuses on nonexperimental research designs.

Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.

  • Evidence-Based Nursing*
  • Guidelines as Topic
  • Research Design*

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List

Elsevier Sponsored Documents logo

Quasi-experimental study designs series—paper 5: a checklist for classifying studies evaluating the effects on health interventions—a taxonomy without labels

Barnaby c reeves, george a wells, hugh waddington.

  • Author information
  • Article notes
  • Copyright and License information

Corresponding author. Tel.: +4401173423143; fax: +4401173423288. [email protected]

Accepted 2017 Feb 6.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

The aim of the study was to extend a previously published checklist of study design features to include study designs often used by health systems researchers and economists. Our intention is to help review authors in any field to set eligibility criteria for studies to include in a systematic review that relate directly to the intrinsic strength of the studies in inferring causality. We also seek to clarify key equivalences and differences in terminology used by different research communities.

Study Design and Setting

Expert consensus meeting.

The checklist comprises seven questions, each with a list of response items, addressing: clustering of an intervention as an aspect of allocation or due to the intrinsic nature of the delivery of the intervention; for whom, and when, outcome data are available; how the intervention effect was estimated; the principle underlying control for confounding; how groups were formed; the features of a study carried out after it was designed; and the variables measured before intervention.

The checklist clarifies the basis of credible quasi-experimental studies, reconciling different terminology used in different fields of investigation and facilitating communications across research communities. By applying the checklist, review authors' attention is also directed to the assumptions underpinning the methods for inferring causality.

Keywords: Health care, Health system, Evaluation, Study design, Quasi-experimental, Nonrandomized

What is new?

Evaluations of health system interventions have features that differ and which are described differently compared to evaluations of health care interventions.

An existing checklist of features has been extended to characterize: nesting of data in organizational clusters, for example, service providers; number of outcome measurements and whether outcomes were measured in the same or different individuals; whether the effects of an intervention are estimated by change over time or between groups; and the intrinsic ability of the analysis to control for confounding.

Evaluations of health care and health system interventions have features that affect their credibility with respect to establishing causality but which are not captured by study design labels.

With respect to inferring causality, review authors need to consider these features to discriminate “strong” from “weak” designs.

Review authors can define eligibility criteria for a systematic review with reference to these study design features, but applying the checklist does not obviate the need for a careful risk of bias assessment.

1. Introduction

There are difficulties in drawing up a taxonomy of study designs to evaluate health care interventions or systems that do not use randomization [1] . To avoid the ambiguities of study design labels, a checklist of design features has been proposed by the Cochrane Non-Randomized Studies Methods Group (including B.C.R. and G.A.W.) to classify nonrandomized studies of health care interventions on the basis of what researchers did [1] , [2] . The checklist includes items about: whether a study made a comparison and, if yes, how comparison groups were formed; the timing of key elements of a study in relation to its conduct; and variables compared between intervention and comparator groups [1] , [2] . The checklist was created primarily from the perspective of health care evaluation, that is, the kinds of intervention most commonly considered in Cochrane reviews of interventions.

The checklist works well in principle for study designs in which the allocation mechanism applies to individual participants, although it does not characterize unit of analysis issues that may arise from the mechanism of allocation or the organizational hierarchy through which an intervention is provided (clustering by practitioner or organizational unit on which allocation is based), unit of treatment issues arising from the organizational hierarchy through which the intervention is provided, or unit of analysis issues arising from the unit at which data are collected and analysed (whether patient, practitioner or organisational aggregate). Most health interventions are delivered by discrete care provider units, typically organized hierarchically (e.g., hospitals, family practices, practitioners); this makes clustering important, except when allocation is randomized, because interventions are chosen by care provider units in complex ways. A modified checklist was also suggested for cluster-allocated designs (diverse study designs in which the allocation mechanism applies to groups of participants) [1] , [2] , often used to evaluate interventions applied at the level of the group (e.g., disease prevention, health education, health policy), but the authors acknowledged that this checklist had not been well piloted.

There are three key challenges when trying to communicate study designs that do not use randomization to evaluate the effectiveness of interventions. First, study design labels are diverse or ambiguous, especially for cluster-allocated designs; moreover, there are key differences between research fields in the way that similar designs are conceived. Second, some study designs are, in fact, strategies for analysis rather than designs per se. Terms such as quasi-experimental, natural experiment, and observational cause particular ambiguity. The current checklist does not explicitly consider designs/analyses commonly used in health systems research (including so-called “credible quasi-experimental studies” [3] , [4] ), often taking advantage of large administrative or other available data sets, and in other cases using data purposely collected as part of prospective designs where random assignment is not feasible. Third, and important with respect to the motivation for this paper, differences of opinion exist between health care and health systems researchers about the extent to which some studies are “as good as” randomized trials when well conducted; it is not clear whether this is because common designs are described with different labels or whether there are substantive differences. Therefore, our primary aim in this paper is revise the checklist to overcome these limitations.

Specific objectives were (1) to include a question to capture information about clustering; and (2) to extend the checklist to include study designs often used by health systems researchers and econometricians in a way that deals with the design/analysis challenge. We intended that the revised checklist should be able to resolve the differences in opinion about the extent to causality can be inferred from nonrandomized studies with different design features, improving communication between different health research communities. We did not intend that the checklist should be used as a tool to assess risk of bias, which can vary across studies with the same design features.

The paper is structured in three parts. Part 1 sets out designs currently used for health systems evaluations, illustrating their use through inclusion of different designs/analyses in a recent systematic review. Part 2 describes designs used for health intervention/program evaluations. Part 3 clarifies some of the ambiguities of study design labels using the proposed design feature framework.

2. Part 1: “quasi-experimental” studies considered by health system researchers and health economists

Health systems researchers and health economists use a wide range of “quasi-experimental” approaches to estimate causal effects of health care interventions. Some methods are considered stronger than others in estimating an unbiased causal relationship. “Credible quasi-experimental studies” are ones that “estimate a causal relationship using exogenous variation in the exposure of interest which is not usually directly controlled the researcher.” This exogenous variation refers to variation determined outside the system of relationships that are of interest and in some situations may be considered “as good as random” variation [3] , [4] , [5] . Credible quasi-experimental approaches are based on assignment to treatment and control that is not controlled by the investigators, and the term can be applied to different assignment rules; allocation to treatment and control is by definition not randomized, although some are based on identifying a source of variation in an exposure of interest that is assumed to be random (or exogenous). In the present context, they are considered to use rigorous designs and methods of analysis which can enable studies to adjust for unobservable sources of confounding [6] and are identical to the union of “strong” and “weak” quasi-experiments as defined by Rockers et al. [4] .

Credible quasi-experimental methods use assignment rules which are either known or can be modeled statistically, including: methods based on a threshold on a continuous scale (or ordinal scale with a minimum number of units) such as a test score (regression discontinuity design) or another form of “exogenous variation” arising, for example, due to geographical or administrative boundaries or assignment rules that have gone wrong (natural experiments). Quasi-experimental methods are also applied when assignment is self-selected by program administrators or by beneficiaries themselves [7] , [8] . Credible methods commonly used to identify causation among self-selected groups include instrumental variable estimation (IVE), difference studies [including difference in differences, (DIDs)] and, to a lesser extent, propensity score matching (PSM) where individuals or groups are matched on preexisting characteristics measured at baseline and interrupted time series (ITS). Thumbnail sketches of these and other designs used by health system researchers are described in Box 1 . It should be noted that the sketches of study types used by health program evaluators are not exhaustive. For example, pipeline studies, where treatment is withheld temporarily in one group until outcomes are measured (where time of treatment is not randomly allocated), are also used.

Box 1. Thumbnail sketches of quasi-experimental studies used in program evaluations of CCT programs.

Quasi-experimental methods are used increasingly to evaluate programs in health systems research. Gaarder et al. [11] , Baird et al. [12] , and Kabeer and Waddington [13] have published reviews incorporating quasi-experimental studies on conditional cash transfer (CCT) programs, which make welfare benefits conditional upon beneficiaries taking specified actions like attending a health facility during the pre/post-natal period or enrolling children in school. Other reviews including quasi-experimental studies have evaluated health insurance schemes [14] , [15] and maternal and child health programs [16] . Other papers in this themed issue of the Journal of Clinical Epidemiology describe how quasi-experimental studies can be identified for evidence synthesis [17] , how data are best collected from quasi-experimental studies [18] , and how the global capacity for including quasi-experimental studies in evidence synthesis can best be expanded [19] , [20] . In this paper, we use studies from the reviews on the effects of CCT programs to illustrate the wide range of quasi-experimental methods used to quantify causal effects of the programs ( Table 1 ).

Experimental and quasi-experimental approaches applied in studies evaluating the effects of conditional cash transfer (CCT) programs

Sources: reviews of CCTS by Gaarder et al. [11] , Baird et al. [12] and Kabeer and Waddington [13] .

Some of the earliest CCT programs randomly assigned clusters (communities of households) and used longitudinal household survey data collected by researchers to estimate the effects of CCTs on the health of both adults and children [21] . The design and analysis of a cluster-randomized controlled trial of this kind is familiar to health care researchers [29] .

In other cases, it was not possible to assign beneficiaries randomly. In Jamaica's PATH program [22] , benefits were allocated to people with scores below a criterion level on a multidimensional deprivation index and the effects of the program were estimated using a regression discontinuity analysis. This study involved recruiting a cohort of participants being considered for benefits, to whom a policy decision was applied (i.e., assign benefits or not on the basis the specified deprivation threshold). In such studies, by assigning the intervention on the basis of a cutoff value for a covariate, the assignment mechanism (usually correlated with the outcome of interest) is completely known and can provide a strong basis for inferences, although usually in a less efficient manner than in randomized controlled trials (RCTs). The treatment effect is estimated as the difference (“discontinuity”) between two predictions of the outcome based on the covariate (the average treatment effect at the cutoff): one for individuals just above the covariate cutoff (control group) and one for individuals just below the cutoff (intervention group) [30] . The covariate is often a test score (e.g., to decide who receives a health or education intervention) [31] but can also be distance from a geographic boundary [32] . Challenges of this design are assignment determined approximately, but not perfectly, by the cutoff [33] or circumstances in which participants may be able to control factors determining their assignment status such as their score or location.

As with health care evaluation, many studies in health systems research combine multiple methods. In Ecuador's Bono de Desarrollo Humano program, leakages in implementation caused ineligible families to receive the program, compromising the original discontinuity assignment. To compensate for this problem, the effects of the program were estimated as a “fuzzy discontinuity” using IVE [23] . An instrument (in this case, a dichotomous variable taking the value of 1 or 0 depending on whether the participating family had a value on a proxy means test below or above a cutoff value used to determine eligibility to the program) must be associated with the assignment of interest, unrelated to potential confounding factors and related to the outcome of interest only by virtue of the relationship with the assignment of interest (and not, e.g., eligibility to another program which may affect the outcome of interest). If these conditions hold, then an unbiased effect of assignment can be estimated using two-stage regression methods [10] . The challenge lies not in the analysis itself (although such analyses are, typically, inefficient) but in demonstrating that the conditions for having a good instrument are met.

In the case of Bolsa Alimentação in Brazil, a computer error led eligible participants whose names contained nonstandard alphabetical characters to be excluded from the program. Because there are no reasons to believe that these individuals would have had systematically different characteristics to others, the exclusion of individuals was considered “as good as random” (i.e., a true natural experiment based on quasi-random assignment) [9] .

Comparatively few studies in this review used ITS estimation, and we are not aware of any studies in this literature which have been able to draw on sufficiently long time series with longitudinal data for individual units of observation in order for the design to qualify “as good as randomized.” An evaluation of Nepal's Safe Delivery Incentive Programme (SDIP) drew on multiple cohorts of eligible households before and after implementation over a 7-year period [24] . The outcome (neonatal mortality) for each household was available at points in time that could be related to the inception of the program. Unfortunately, comparison group data were not available for nonparticipants, so an analysis of secular trends due to general improvements in maternal and child health care (i.e., not due to SDIP) was not possible. However, the authors were able to implement a regression “placebo test” (sometimes called a “negative control”), in which SDIP treatment was linked to an outcome (use of antenatal care) which was not expected to be affected by the program, the rationale being that the lack of an estimated spike in antenatal care at the time of the expected change in mortality might suggest that these other confounding factors were not at play. But ultimately, due to the lack of comparison group data, the authors themselves note that the study is only able to provide “plausible evidence of an impact” rather than probabilistic evidence (p. 224).

Individual-level DID analyses use participant-level panel data (i.e., information collected in a consistent manner over time for a defined cohort of individuals). The Familias en Accion program in Colombia was evaluated using a DID analysis, where eligible and ineligible administrative clusters were matched initially using propensity scores. The effect of the intervention was estimated as the difference between groups of clusters that were or were not eligible for the intervention, taking into account the propensity scores on which they were matched [25] . DID analysis is only a credible method when we expect unobservable factors which determine outcomes to affect both groups equally over time (the “common trends” assumption). In the absence of common trends across groups, it is not possible to attribute the growth in the outcome to the program using the DID analysis. The problem is that we rarely have multiple period baseline data to compare variation between groups in outcomes over time before implementation, so the assumption is not usually verifiable. In such cases, placebo tests on outcomes which are related to possible confounders, but not the program of interest, can be investigated (see also above). Where multiple period baseline data are available, it may be possible to test for common trends directly and, where common trends in outcome levels are not supported, undertake a “difference-in-difference-in-differences” (DDDs) analysis. In Cambodia, the evaluators used DDD analysis to evaluate the Cambodia Education Sector Support Project, overcoming the observed lack of common trends in preprogram outcomes between beneficiaries and nonbeneficiaries [26] .

As in the case of Attanasio et al. above [25] , difference studies are usually made more credible when combined with methods of statistical matching because such studies are restricted to (or weighted by) individuals and groups with similar probabilities of participation based on observed characteristics—that is, observations “in the region of common support.” However, where panel or multiple time series cohort data are not available, statistical matching methods are often used alone. By contrast with the above examples, a conventional cohort study design was used to evaluate Tekoporã in Paraguay, relying on PSM and propensity weighted regression analysis of beneficiaries and nonbeneficiaries at entry into the cohort to control for confounding [27] . Similarly, for Bolsa Familia in Brazil evaluators applied PSM to cross-sectional (census) data [28] . Variables used to match observations in treatment and comparison should not be determined by program participation and are therefore best collected at baseline. However, this type of analysis alone does not satisfy the criterion of enabling adjustment for unobservable sources of confounding because it cannot rule out confounding of health outcomes data by unmeasured confounding factors, even when participants are well characterized at baseline.

3. Part 2: “quasi-experimental” designs used by health care evaluation researchers

The term “quasi-experimental” is also used by health care evaluation and social science researchers to describe studies in which assignment is nonrandom and influenced by the researchers. At the first appearance, many of the designs seem similar, although they are often labeled differently. Although an assignment rule may be known, it may not be exploitable in the way described above for health system evaluations; for example, quasi-random allocation may be biased because of a lack of concealment, even when the allocation rule is “as good as random.”

Researchers also use more conventional epidemiological designs, sometimes called observational, that exploit naturally occurring variation. Sometimes, the effects of interventions can be estimated in these cohorts using instrumental variables (prescribing preference; surgical volume; geographic variation, distance from health care facility), quantifying the effects of an intervention in a way that is considered to be unbiased [34] , [35] , [36] . Instrumental variable estimation using data from a randomized controlled trial to estimate the effect of treatment in the treated, when there is substantial nonadherence to the allocated intervention, is a particular instance of this approach [37] , [38] .

Nonrandomized study design labels commonly used by health care evaluation researchers include: nonrandomized controlled trial, controlled before-and-after study (CBA), interrupted time series study (ITS; and CITS), prospective, retrospective or historically controlled cohort studies (PCS, RCS and HCS respectively), nested case–control study, case–control study, cross-sectional study, and before-after study. Thumbnail sketches of these study designs are given in Box 2 . In addition, researchers sometimes report findings for uncontrolled cohorts or individuals (“case” series or reports), which only describe outcomes after an intervention [54] ; these are not considered further because these studies do not collect data for an explicit comparator. It should be noted that these sketches are the authors' interpretations of the labels; studies that other researchers describe using these labels may not conform to these descriptions.

Box 2. Thumbnail sketches of quasi-experimental study designs used by health care evaluation researchers.

The designs can have diverse features, despite having the same label. Particular features are often chosen to address the logistical challenges of evaluating particular research questions and settings. Therefore, it is not possible to illustrate them with examples drawn from a single review as in part 1; instead, studies exemplifying each design are cited across a wide range of research questions and settings. The converse also occurs, that is, study design labels are often inconsistently applied. This can present great difficulties when trying to classify studies, for example, to describe eligibility for inclusion in a review. Relying on the study design labels used by primary researchers themselves to describe their studies can lead to serious misclassifications.

For some generic study designs, there are distinct study types. For example, a cohort study can study intervention and comparator groups concurrently, with information about the intervention and comparator collected prospectively (PCS) or retrospectively (RCS), or study one group retrospectively and the other group prospectively (HCS). These different kinds of cohort study are conventionally distinguished according to the time when intervention and comparator groups are formed, in relation to the conception of the study. Some studies are sometimes incorrectly termed PCS, in our view, when data are collected prospectively, for example, for a clinical database, but when definitions of intervention and comparator required for the evaluation are applied retrospectively; in our view, this should be an RCS.

4. Part 3: study design features and their role in disambiguating study design labels

Some of the study designs described in parts 1 and 2 may seem similar, for example, DID and CBA, although they are labeled differently. Some other study design labels, for example, CITS/ITS, are used in both types of literature. In our view, these labels obscure some of the detailed features of the study designs that affect the robustness of causal attribution. Therefore, we have extended the checklist of features to highlight these differences. Where researchers use the same label to describe studies with subtly different features, we do not intend to imply that one or other use is incorrect; we merely wish to point out that studies referred to by the same labels may differ in ways that affect the robustness of an inference about the causal effect of the intervention of interest.

The checklist now includes seven questions ( Table 2 ). The table also sets out our responses for the range of study designs as described in Box 1 , Box 2 . The response “possibly” (P) is prevalent in the table, even given the descriptions in these boxes. We regard this as evidence of the ambiguity/inadequate specificity of the study design labels.

Quasi-experimental taxonomy features checklist

Abbreviations: RCT, randomized controlled trial; Q-RCT, quasi-randomized controlled trial; IV, instrumental variable; RD, regression discontinuity; CITS, controlled interrupted time series; ITS, interrupted time series; DID, difference-in-difference; CBA, controlled before-and-after study; NRCT, nonrandomized controlled trial; PCS, prospective cohort study; RCS, retrospective cohort study; HCT, historically controlled study; NCC, nested case–control study; CC, case–control study; XS, cross-sectional study; BA, before-after study; Y, yes; N, no; P, possibly; na, not applicable.

Cells in the table are completed with respect to the thumbnail sketches of the corresponding designs described in Box 1 , Box 2 .

This row describes “explicit” clustering. In randomized controlled trials, participants can be allocated individually or by virtue of “belonging” to a cluster such as a primary care practice or a village.

This row describes “implicit” clustering. In randomized controlled trials, participants can be allocated individually but with the intervention being delivered in clusters (e.g., group cognitive therapy); similarly, in a cluster-randomized trial (by general practice), the provision of an intervention could also be clustered by therapist, with several therapists providing “group” therapy.

A study should be classified as “yes” for this feature, even if it involves comparing the extent of change over time between groups.

For (nested) case–control studies, group refers to the case/control status of an individual.

The distinction between these options is to do with the exogeneity of the allocation, hence designs further to the right in the table are more to have involve allocation by some non-exogenous agent.

Question 1 is new and addresses the issue of clustering, either by design or through the organizational structure responsible for delivering the intervention ( Box 3 ). This question avoids the need for separate checklists for designs based on assigning individual and clusters. A “yes” response can be given to more than one response item; the different types clustering may both occur in a single study and implicit clustering can occur an individually allocated nonrandomized study.

Box 3. Clustering in studies evaluating the effects of health system or health care interventions.

Clustering is a potentially important consideration in both RCTs and nonrandomized studies. Clusters exist when observations are nested within higher level organizational units or structures for implementing an intervention or data collected; typically, observations within clusters will be more similar with respect to outcomes of interest than observations between clusters. Clustering is a natural consequence of many methods of nonrandomized assignment/designation because of the way in which many interventions are implemented. Analyses of clustered data that do not take clustering into account will tend to overestimate the precision of effect estimates.

Clustering occurs when implementation of an intervention is explicitly at the level of a cluster/organizational unit (as in a cluster-randomized controlled trial, in which each cluster is explicitly allocated to control or intervention). Clustering can also arise implicitly, from naturally occurring hierarchies in the data set being analyzed, that reflect clusters that are intrinsically involved in the delivery of the intervention or comparator. Both explicit and implicit clustering can be present in a single study.

Examples of types of cluster

Practitioner (surgeon; therapist, family doctor; teacher; social worker; probation officer; etc.).

Organizational unit [general practice, hospital (ward), community care team; school, etc.].

Social unit (family unit; network of individuals clustered in some nongeographic network, etc.).

Geographic area (health region; city jurisdiction; small electoral district, etc.).

“Explicit” clustering

Clustering arising from allocation/formation of groups; clusters can contain only intervention or control observations.

“Implicit” clustering

Clustering arising from naturally occurring hierarchies of units of analysis in the data set being analyzed to answer the research question.

Clusters can contain intervention and control observations in varying proportions.

Factors associated with designation as intervention or control may vary by cluster.

No clustering

Designation of an observation as intervention or control is only influenced by the characteristics of the observation (e.g., patient choice to self-medicate with an over-the-counter medication; natural experiment in which allocation of individuals is effectively random, as in the case of Bolsa Alimentação where a computer error led to the allocation to intervention or comparator [31] .)

Question 1 in the checklist distinguishes individual allocation, cluster allocation (explicit clustering), and clustering due to the organizational hierarchy involved in the delivery of the interventions being compared (implicit clustering). Users should respond factually, that is, with respect to the presence of clustering, without making a judgment about the likely importance of clustering (degree of dependence between observations within clusters).

Questions 2–4 are also new, replacing the first question (“Was there a relevant comparison?”) in the original checklist [1] , [2] . These questions are designed to tease apart the nature of the research question and the basis for inferring causality.

Question 2 classifies studies according to the number of times outcome assessments were available. In each case, the response items distinguish whether or not the outcome is assessed in the same or different individuals at different times. Only one response item can be answered “yes.”

Treatment effects can be estimated as changes over time or between groups. Question 3 aims to classify studies according to the parameter being estimated. Response items distinguish changes over time for the same or different individuals. Only one response item can be answered “yes.”

Question 4 asks about the principle through which the primary researchers aimed to control for confounding. Three response items distinguish methods that:

control in principle for any confounding in the design, that is, by randomization, IVE, or regression discontinuity;

control in principle for time invariant unobserved confounding, that is, by comparing differences in outcome from baseline to end of study, using longitudinal/panel data for a constant cohort; or

control for confounding only by known and observed covariates (either by estimating treatment effects in “adjusted” statistical analyses or in the study design by restricting enrollment, matching and/or stratified sampling on known, and observed covariates).

The choice between these items (again, only one can be answered “yes”) is key to understanding the basis for inferring causality.

Questions 5–7 are essentially the same as in the original checklist [1] , [2] . Question 5 asks about how groups (of individuals or clusters) were formed because treatment effects are most frequently estimated from between group comparisons. An additional response option, namely by a forcing variable, has been included to identify credible quasi-experimental studies that use an explicit rule for assignment based on a threshold for a variable measured on a continuous or ordinal scale or in relation to a spatial boundary. When answering “yes” to this item, the review author should also identify the nature of the variable by answering “yes” to another item. Possible assignment rules are identified: the action of researchers, time differences, location differences, health care decision makers/practitioners, policy makers, on the basis of the outcome, or some other process. Other, nonexperimental, study designs should be classified by the method of assignment (same list of variables) but without there being an explicit assignment rule.

Question 6 asks about important features of a study in relation to the timing of their implementation. Studies are classified according to whether three key steps were carried out after the study was designed, namely: acquisition of source data to characterize individuals/clusters before intervention; actions or choices leading to an individual or cluster becoming a member of a group; and the assessment of outcomes. One or more of these items can be answered “yes,” as would be the case for all steps in a conventional RCT.

Question 7 asks about the variables that were measured and available to control for confounding in the analysis. The two broad classes of variables that are important are the identification and collection of potential confounder variables and baseline assessment of the outcome variable(s). The answers to this question will be less important if the researchers of the original study used a method to control for any confounding, that is, used a credible quasi-experimental design.

The health care evaluation community has historically been much more difficult to win around to the potential value of nonrandomized studies to evaluate interventions. We think that the checklist helps to explain why, that is, because designs used in health care evaluation do not often control for unobservables when the study features are examined carefully. To the extent that these features are immutable, the skepticism is justified. However, to the extent that studies may be possible with features that promote the credibility of causal inference, health care evaluation researchers may be missing an opportunity to provide high-quality evidence.

Reflecting on the circumstances of nonrandomized evaluations of health care and health system interventions may provide some insights why these different groups have disagreed about the credibility of effects estimated in quasi-experimental studies. The checklist shows that credible quasi-experimental studies gain credibility from using high-quality longitudinal/panel data; such data characterizing health care are rare, leading to evaluations that “make do” with the data that are available in existing information systems.

The risk of confounding in health care settings is inherently greater because participants' characteristics are fundamental to choices about interventions in usual care; mitigating against this risk requires high-quality clinical data to characterize participants at baseline and, for pharmaco-epidemiological studies about safety, often over time. Important questions about health care for which quasi-experimental methods of evaluation are typically considered are often to do with the outcome of discrete episodes of care, usually binary, rather than long-term outcomes for a cohort of individuals; this can lead to a focus on the invariant nature of the organizations providing the care rather than the varying nature of the individuals receiving care. These contrasts are apparent between, for example: DID studies using panel data to evaluate an intervention such as CCT among individuals with CBA studies of an intervention implemented at an organizational level studying multiple cross-sections of health care episodes; or credible and less credible interrupted time series.

There is a new article in the field of hospital epidemiology which also highlights various features of what it terms as quasi-experimental designs [56] . The list of features appears to be aimed at researchers designing a quasi-experimental study, acting more as a prompt (e.g., “consider options for …”) rather than as a checklist for a researcher appraising a study to communicate clearly to others about the nature of a published study, which is our perspective (e.g., a review author). There is some overlap with our checklist, but the list described also includes several study attributes intended to reduce the risk of bias, for example, blinding. By contrast, we consider that an assessment of the risk of bias in a study is essential and needs to be carried out as a separate task.

5. Conclusion

The primary intention of the checklist is to help review authors to set eligibility criteria for studies to include in a review that relate directly to the intrinsic strength of the studies in inferring causality. The checklist should also illuminate the debate between researchers in different fields about the strength of studies with different features—a debate which has to date been somewhat obscured by the use of different terminology by researchers working in different fields of investigation. Furthermore, where disagreements persist, the checklist should allow researchers to inspect the basis for these differences, for example, the principle through which researchers aimed to control for confounding and shift their attention to clarifying the basis for their respective responses for particular items.

Acknowledgments

Authors' contributions: All three authors collaborated to draw up the extended checklist. G.A.W. prepared the first draft of the paper. H.W. contributed text for Part 1. B.C.R. revised the first draft and created the current structure. All three authors approved submission of the final manuscript.

Funding: B.C.R is supported in part by the U.K. National Institute for Health Research Bristol Cardiovascular Biomedical Research Unit. H.W. is supported by 3ie.

  • 1. Reeves BC, Deeks JJ, Higgins JPT, Wells GA. Chapter 13: including non-randomized studies. In: Higgins JPT, Green S (editors), Cochrane handbook for systematic reviews of interventions. Version 5.0.2 (updated September 2009). The Cochrane Collaboration, 2009. Available at www.cochrane-handbook.org .
  • 2. Higgins J.P.T., Ramsay C., Reeves B.C., Shea B., Valentine J., Tugwell P. Issues relating to study design and risk of bias when including non-randomized studies in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4:12–25. doi: 10.1002/jrsm.1056. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 3. Rockers P.C., Røttingen J.A., Shemilt I., Tugwell P., Bärnighausen T. Inclusion of quasi-experimental studies in systematic reviews of health systems research. Health Policy. 2015;19:511–521. doi: 10.1016/j.healthpol.2014.10.006. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 4. Bärnighausen T., Tugwell P., Røttingen J.A., Shemilt I., Rockers P., Geldsetzer P. Quasi-experimental study designs series - Paper 4: uses and value. J Clin Epidemiol. 2017;89:21–29. doi: 10.1016/j.jclinepi.2017.03.012. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 5. Bärnighausen T., Oldenburg C., Tugwell P., Bommer C., Cara Ebert, Barreto M. Quasi-experimental study designs series - Paper 7: assessing the assumptions. J Clin Epidemiol. 2017;89:53–66. doi: 10.1016/j.jclinepi.2017.02.017. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 6. Waddington H., Aloe A.M., Becker B.J., Djimeu E.W., Hombrados J.G., Tugwell P. Quasi-experimental study designs series-paper 6: risk of bias assessment. J Clin Epidemiol. 2017:43–52. doi: 10.1016/j.jclinepi.2017.02.015. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 7. Cook T.D., Campbell D.T. Houghton Mifflin; Boston: 1979. Quasi-experimentation: design and analysis issues for field settings. [ Google Scholar ]
  • 8. Shadish W.R., Cook T.D., Campbell D.T. Houghton Mifflin; Boston, MA: 2002. Experimental and quasi-experimental designs for generalized causal inference. [ Google Scholar ]
  • 9. Morris S.S., Olinto P., Flores R., Nilson E.A., Figueiro A.C. Conditional cash transfers are associated with a small reduction in the rate of weight gain of preschool children in northeast Brazil. J Nutr. 2004;134:2336–2341. doi: 10.1093/jn/134.9.2336. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 10. Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000;29:722–729. doi: 10.1093/ije/29.4.722. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 11. Gaarder M.M., Glassman A., Todd J.E. Conditional cash transfer programmes: opening the black box. Conditional cash transfers and health: unpacking the causal chain. J Development Effectiveness. 2010;2:6–50. [ Google Scholar ]
  • 12. Baird S., Ferreira F.H.G., Özler B., Woolcock M. Relative effectiveness of conditional and unconditional cash transfers for schooling outcomes in developing countries: a systematic review. Campbell Syst Rev. 2013:8. http://dx.doi.org/10.4073/csr.2013.8 . [ Google Scholar ]
  • 13. Kabeer N., Waddington H. Economic impacts of conditional cash transfer programmes: a systematic review and meta-analysis. J Development Effectiveness. 2015;7:290–303. [ Google Scholar ]
  • 14. Acharya A., Vellakkal S., Taylor F., Masset E., Satija A., Burke M. EPPI-Centre, Social Science Research Unit, Institute of Education, University of London. 2012; London: 2012. Impact of national health insurance for the poor and the informal sector in low and middle-income countries: a systematic review. [ Google Scholar ]
  • 15. Giedion U., Alfonso E.A., Díaz Y. The World Bank; Washington DC: 2013. The impact of universal coverage schemes in the developing world: a review of the existing evidence. UNICO Studies Series 25. [ Google Scholar ]
  • 16. Tanner J.C., Aguilar Rivera A.M., Candland T., Galdo V., Manang F., Trichler R. Independent Evaluation Group; World Bank; Washington DC: 2013. Delivering the millennium development goals to reduce maternal and child mortality. A systematic review of impact evaluation evidence. Available at: https://ieg.worldbankgroup.org/Data/Evaluation/files/mch_eval_updated2.pdf . Accessed April 12, 2017. [ Google Scholar ]
  • 17. Glanville J., Eyers J., Jones A.M., Shemilt I., Wang G., Johansen M. Quasi-experimental study designs series-paper 8: identifying quasi-experimental studies to inform systematic reviews. J Clin Epidemiol. 2017;89:67–76. doi: 10.1016/j.jclinepi.2017.02.018. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 18. Aloe A.M., Becker B.J., Duvendack M., Valentine J.C., Shemilt I., Waddington H. Quasi-experimental study designs series-paper 9: collecting data from quasi-experimental studies. J Clin Epidemiol. 2017;89:77–83. doi: 10.1016/j.jclinepi.2017.02.013. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 19. Lavis J.N., Bärnighausen T., El-Jardali F. Quasi-experimental study designs series - Paper 11: Supporting the production and use of health systems research syntheses that draw on quasi-experimental study designs. J Clin Epidemiol. 2017;89:92–97. doi: 10.1016/j.jclinepi.2017.03.014. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 20. Rockers P.C., Tugwell P., Grimshaw J., Oliver S., Atun R., Røttingen J.A. Quasi-experimental study designs series-paper 12: strengthening global capacity for evidence synthesis of quasi-experimental health systems research. J Clin Epidemiol. 2017;89:98–105. doi: 10.1016/j.jclinepi.2016.03.034. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 21. Gertler P.J., Boyce S. 2001. An Experiment in Incentive-Based Welfare: The Impact of PROGESA on Health in Mexico. https://web.warwick.ac.uk/res2003/papers/Gertler.pdf Available at. [ Google Scholar ]
  • 22. Levy D., Ohls J. Mathematica Policy Research, Inc; Washington, DC: 2007. Evaluation of Jamaica's PATH Program: Final Report. [ Google Scholar ]
  • 23. Oosterbeek H., Ponce J., Schady N. World Bank; 2008. The Impact of cash transfers on school enrolment: evidence from Ecuador. Policy Research Working Paper No. 4645. Available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1149578 . [ Google Scholar ]
  • 24. Powell-Jackson T., Neupane B.D., Tiwari S., Tumbahangphe K., Manandhar D., Costello A.M. The impact of Nepal's national incentive programme to promote safe delivery in the district of Makwanpur. Adv Health Econ Health Serv Res. 2009;21:221–249. [ PubMed ] [ Google Scholar ]
  • 25. Attanasio O., Battistin E., Fitzsimons E., Mesnard A., Vera-Hernández M. The Institute of Fiscal Studies; 2005. How effective are conditional cash transfers? Evidence from Columbia. Briefing Note number 54. Available at http://www.ifs.org.uk/bns/bn54.pdf . [ Google Scholar ]
  • 26. Filmer D., Schady N. Getting girls into school: evidence from a scholarship program in Cambodia. Econ Development Cult Change. 2008;56:581–617. [ Google Scholar ]
  • 27. Soares F.V., Ribas R.P., Hirata G.I. Publications 3, International Policy Centre for Inclusive Growth; 2008. Achievements and shortfalls of conditional cash transfers: impact evaluation of Paraguay's Tekoporã Programme. Available at https://www.unicef.org/socialpolicy/files/Achievement_and_shortfalls_of_conditional_cash_transfers.pdf . [ Google Scholar ]
  • 28. Cardoso E., Souza A.P. Vanderbilt University; 2003. The impact of cash transfers on child labor and school enrollment in Brazil. Working Paper # 04–W07. [ Google Scholar ]
  • 29. Ukuomonne O.C., Gulliford M.C., Chinn S., Sterne J.A.C., Burney P.G.J. Methods for evaluating area-wide and organisation-based interventions in health and health care: a systematic review. Health Technol Assess. 1999;3:1–92. [ PubMed ] [ Google Scholar ]
  • 30. Lee D.S., Lemieux T. Regression discontinuity designs in economics. J Econ Lit. 2010;48:281–355. [ Google Scholar ]
  • 31. Moss B., Yeaton W. Shaping policies related to developmental education: an evaluation using the regression-discontinuity design. Educ Eval Policy Anal. 2006;28:215–229. [ Google Scholar ]
  • 32. Arcand J.L., Djimeu Wouabe E. Teacher training and HIV/AIDS prevention in West Africa: regression discontinuity design evidence from the Cameroon. Health Econ. 2010;19:36–54. doi: 10.1002/hec.1643. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 33. Valentine J.C., Thompson S.G. Issues relating to the inclusion of non-randomized studies in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4:26–35. doi: 10.1002/jrsm.1064. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 34. Boef A.G., Souverein P.C., Vandenbroucke J.P., van Hylckama Vlieg A., de Boer A., le Cessie S. Instrumental variable analysis as a complementary analysis in studies of adverse effects: venous thromboembolism and second-generation versus third-generation oral contraceptives. Pharmacoepidemiol Drug Saf. 2016;25(3):317–324. doi: 10.1002/pds.3956. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 35. Pezzin L.E., Laud P., Yen T.W., Neuner J., Nattinger A.B. Reexamining the relationship of breast cancer hospital and surgical volume to mortality: an instrumental variable analysis. Med Care. 2015;53:1033–1039. doi: 10.1097/MLR.0000000000000439. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 36. Bekelis K., Missios S., Coy S., Singer R.J., MacKenzie T.A. New York state: comparison of treatment outcomes for unruptured cerebral aneurysms using an instrumental variable analysis. J Am Heart Assoc. 2015;4(7) doi: 10.1161/JAHA.115.002190. http://dx.doi.org/10.1161/JAHA.115.002190 . [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 37. Goldsmith L.P., Lewis S.W., Dunn G., Bentall R.P. Psychological treatments for early psychosis can be beneficial or harmful, depending on the therapeutic alliance: an instrumental variable analysis. Psychol Med. 2015;45:2365–2373. doi: 10.1017/S003329171500032X. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 38. Reeves B.C., Pike K., Rogers C.A., Brierley R.C., Stokes E.A., Wordsworth S. A multicentre randomised controlled trial of Transfusion Indication Threshold Reduction on transfusion rates, morbidity and health-care resource use following cardiac surgery (TITRe2) Health Technol Assess. 2016;20:1–260. doi: 10.3310/hta20600. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 39. Aiken A.M., Davey C., Hargreaves J.R., Hayes R.J. Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a pure replication. Int J Epidemiol. 2015;44:1572–1580. doi: 10.1093/ije/dyv127. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 40. Holloway K.A., Gautam B.R., Reeves B.C. The effects of different kinds of user fees on prescribing quality in rural Nepal. J Clin Epidemiol. 2001;54:1065–1071. doi: 10.1016/s0895-4356(01)00380-8. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 41. Collin S., Reeves B.C., Hendy J., Fulop N., Hutchings A., Priedane E. Computerised physician order entry (CPOE) and picture archiving and communication systems (PACS) implementation in the NHS: a quantitative before-and-after study. BMJ. 2008;337:a939. doi: 10.1136/bmj.a939. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 42. Carey M.E., Mandalia P.K., Daly H., Gray L.J., Hale R., Martin Stacey L. Increasing capacity to deliver diabetes self-management education: results of the DESMOND lay educator non-randomized controlled equivalence trial. Diabet Med. 2014;31:1431–1438. doi: 10.1111/dme.12483. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 43. Skre I., Friborg O., Breivik C., Johnsen L.I., Arnesen Y., Wang C.E. A school intervention for mental health literacy in adolescents: effects of a non-randomized cluster controlled trial. BMC Public Health. 2013;13:873. doi: 10.1186/1471-2458-13-873. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 44. Campbell S.M., Reeves D., Kontopantelis E., Sibbald B., Roland M. Effects of pay for performance on the quality of primary care in England. N Engl J Med. 2009;361:368–378. doi: 10.1056/NEJMsa0807651. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 45. Grijalva C.G., Nuorti J.P., Arbogast P.G., Martin S.W., Edwards K.M., Griffin M.R. Decline in pneumonia admissions after routine childhood immunisation with pneumococcal conjugate vaccine in the USA: a time-series analysis. Lancet. 2007;369:1179–1186. doi: 10.1016/S0140-6736(07)60564-9. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 46. Steinbach R., Perkins C., Tompson L., Johnson S., Armstrong B., Green J. The effect of reduced street lighting on road casualties and crime in England and Wales: controlled interrupted time series analysis. J Epidemiol Community Health. 2015;69:1118–1124. doi: 10.1136/jech-2015-206012. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 47. Langham J., Reeves B.C., Lindsay K.W., van der Meulen J.H., Kirkpatrick P.J., Gholkar A.R. For the steering group for national study of subarachnoid haemorrhage. Variation in outcome after subarachnoid haemorrhage. A study of neurosurgical units in UK and Ireland. Stroke. 2009;40:111–118. doi: 10.1161/STROKEAHA.108.517805. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 48. Murphy G.J., Reeves B.C., Rogers C.A., Rizvi S.I.A., Culliford L., Angelini G.D. Increased mortality, post-operative morbidity and cost following red blood cell transfusion in patients having cardiac surgery. Circulation. 2007;116:2544–2552. doi: 10.1161/CIRCULATIONAHA.107.698977. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 49. Sacks H.S., Chalmers T.C., Smith H. Randomized versus historical controls for clinical trials. Am J Med. 1982;72:233–239. doi: 10.1016/0002-9343(82)90815-4. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 50. Vučković B.A., van Rein N., Cannegieter S.C., Rosendaal F.R., Lijfering W.M. Vitamin supplementation on the risk of venous thrombosis: results from the MEGA case-control study. Am J Clin Nutr. 2015;101:606–612. doi: 10.3945/ajcn.114.095398. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 51. Graham D.J., Campen D., Hui R., Spence M., Cheetham C., Levy G. Risk of acute myocardial infarction and sudden cardiac death in patients treated with cyclo-oxygenase 2 selective and non-selective non-steroidal anti-inflammatory drugs: nested case-control study. Lancet. 2005;365:475–481. doi: 10.1016/S0140-6736(05)17864-7. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 52. Maini R., Van den Bergh R., van Griensven J., Tayler-Smith K., Ousley J., Carter D. Picking up the bill - improving health-care utilisation in the Democratic Republic of Congo through user fee subsidisation: a before and after study. BMC Health Serv Res. 2014;14:504. doi: 10.1186/s12913-014-0504-6. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • 53. Hopkins C., Browne J.P., Slack R., Lund V., Topham J., Reeves B. The national comparative audit of surgery for nasal polyposis and chronic rhinosinusitis. Clin Otolaryngol. 2006;31:390–398. doi: 10.1111/j.1749-4486.2006.01275.x. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 54. Dekkers O.M., Egger M., Altman D.G., Vandenbroucke J.P. Distinguishing case series from cohort studies. Ann Intern Med. 2012;156:37–40. doi: 10.7326/0003-4819-156-1-201201030-00006. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 55. Agot K.E., Ndinya-Achola J.O., Kreiss J.K., Weiss N.S. Risk of HIV-1 in rural Kenya: a comparison of circumcised and uncircumcised men. Epidemiology. 2004;15:157–163. doi: 10.1097/01.ede.0000112220.16977.82. [ DOI ] [ PubMed ] [ Google Scholar ]
  • 56. Schweizer M.L., Braun B.I., Milstone A.M. Research methods in healthcare epidemiology and antimicrobial stewardship—quasi-experimental designs. Infect Control Hosp Epidemiol. 2016;37:1135–1140. doi: 10.1017/ice.2016.117. Published online: 07 June 2016. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • View on publisher site
  • Collections

Similar articles

Cited by other articles, links to ncbi databases.

  • Download .nbib .nbib
  • Format: AMA APA MLA NLM

Add to Collections

Experimental (Empirical) Research Articles

  • Library vs. Google
  • Background Reading
  • Keyword Searching
  • Evaluating Sources
  • Please Note: Searching PsycInfo
  • Citing Sources
  • Need more help?

How Can I Find Experimental (Empirical) Articles?

Many of the recommended databases in this research guide contain scholarly experimental articles (also known as empirical articles or research studies or primary research). Search in databases like: 

  • APA PsycInfo ​
  • ScienceDirect

Because those databases are rich in scholarly experimental articles, any well-structured search that you enter will retrieve experimental/empirical articles. These searches, for example, will retrieve many experimental/empirical articles:

  • caffeine AND "reaction time"
  • aging AND ("cognitive function" OR "cognitive ability")
  • "child development" AND play

Experimental (Empirical) Articles: How Will I Know One When I See One?

Scholarly experimental articles  to conduct and publish an experiment, an author or team of authors designs an experiment, gathers data, then analyzes the data and discusses the results of the experiment. a published experiment or research study will therefore  look  very different from other types of articles (newspaper stories, magazine articles, essays, etc.) found in our library databases..

In fact, newspapers, magazines, and websites written by journalists report on psychology research all the time, summarizing published experiments in non-technical language for the general public. Although that kind of article can be interesting to read (and can even lead you to look up the original experiment published by the researchers themselves),  to write a research paper about a psychology topic, you should, generally, use experimental articles written by researchers. The following guidelines will help you recognize an experimental article, written by the researchers themselves and published in a scholarly journal.

Structure of a Experimental Article Typically, an experimental article has the following sections:

  • The author summarizes her article
  • The author discusses the general background of her research topic; often, she will present a literature review, that is, summarize what other experts have written on this particular research topic
  • The author describes the experiment she designed and conducted
  • The author presents the data she gathered during her experiment
  • The author offers ideas about the importance and implications of her research findings, and speculates on future directions that similar research might take
  • The author gives a References list of sources she used in her paper

Look for articles structured in that way--they will be experimental/empirical articles. ​

Also, experimental/empirical articles are written in very formal, technical language (even the titles of the articles sound complicated!) and will usually contain numerical data presented in tables. 

As noted above, when you search in a database like APA PsycInfo, it's really easy to find experimental/empirical articles, once you know what you're looking for. Just in case, though, here is a shortcut that might help:

First, do your keyword search, for example:

search menu in APA PsycInfo

In the results screen, on the left-hand side, scroll down until you see "Methodology." You can use that menu to refine your search by limiting the articles to empirical studies only:

Methodology menu in APA PsycInfo

You can learn learn more about searching in APA PsycInfo, including advanced search limiters like methodology, age group, etc.,  from this APA guide . 

  • << Previous: Resources
  • Next: Research Tips >>
  • Last Updated: Oct 2, 2024 4:57 PM
  • URL: https://libguides.umgc.edu/psychology

IMAGES

  1. Scholarly Sources: The A-Z Guide

    experimental research scholarly articles

  2. FREE 11+ Experimental Research Templates in PDF

    experimental research scholarly articles

  3. Experimental Research Thesis Examples Pdf

    experimental research scholarly articles

  4. Anatomy of a Scholarly Article: NCSU Libraries

    experimental research scholarly articles

  5. (PDF) A Comprehensive Research Design for Experimental Studies in

    experimental research scholarly articles

  6. 10 Real-Life Experimental Research Examples (2024)

    experimental research scholarly articles

VIDEO

  1. Literature Survey

  2. How to Research: Getting Started

  3. #duet #motivation #mylesmunroejr #christianfaith #love #jcm #lovinggod #inspiration #jcwmm #lovegod

  4. #motivation #inspiration #christianfaith

  5. Harvard University research 😱 || #mrsahirasta

  6. Gather Articles for your Research using this website

COMMENTS

  1. Beauty sleep: experimental study on the perceived health and

    Objective To investigate whether sleep deprived people are perceived as less healthy, less attractive, and more tired than after a normal night's sleep. Design Experimental study. Setting Sleep laboratory in Stockholm, Sweden. Participants 23 healthy, sleep deprived adults (age 18-31) who were photographed and 65 untrained observers (age 18-61) who rated the photographs. Intervention ...

  2. Experimental Research Design

    Abstract. Experimental research design is centrally concerned with constructing research that is high in causal (internal) validity. Randomized experimental designs provide the highest levels of causal validity. Quasi-experimental designs have a number of potential threats to their causal validity. Yet, new quasi-experimental designs adopted ...

  3. Experimental and quasi-experimental designs in implementation research

    Quasi-experimental designs can be used to answer implementation science questions in the absence of randomization. The choice of study designs in implementation science requires balancing scientific, pragmatic, and ethical issues. Implementation science is focused on maximizing the adoption, appropriate use, and sustainability of effective ...

  4. Exploring Experimental Research: Methodologies, Designs, and

    The methodology involves a comprehensive review of scholarly articles, reports, and academic publications, focusing on AI applications in predictive maintenance, quality control, and supply chain ...

  5. Single-Case Experimental Designs: A Systematic Review of Published

    This article systematically reviews the research design and methodological characteristics of single-case experimental design (SCED) research published in peer-reviewed journals between 2000 and 2010. SCEDs provide researchers with a flexible and viable alternative to group designs with large sample sizes.

  6. Thinking Clearly About Correlations and Causation: Graphical Causal

    This article provides very general guidelines for researchers who are interested in any of the many research questions that require causal inferences to be made on the basis of observational data. Researchers from different areas of psychology have chosen different strategies to cope with the weaknesses of observational data.

  7. The Relative Merits of Observational and Experimental Research: Four

    Publishing experimental research of this type that takes a balanced approach to maximising experimental reliability by minimising both risk and uncertainty is likely to remain a challenging process in the immediate future. ... [PMC free article] [Google Scholar] 7. Pounis G. Design of Observational Nutrition Studies. In: Pounis G., editor.

  8. Journal of Experimental Psychology: Learning, Memory, and Cognition

    The Journal of Experimental Psychology: Learning, Memory, and Cognition ® publishes original experimental and theoretical research on human cognition, with a special emphasis on learning, memory, language, and higher cognition.. The journal publishes impactful articles of any length, including literature reviews, meta-analyses, replications, theoretical notes, and commentaries on previously ...

  9. The past, present, and future of experimental methods in the social

    In the midst of the current causal revolution, experimental methods are increasingly embraced across the social sciences. We first document the growth in the use of the experimental method and then overview the current state of the field along with suggestions for future research. Our review covers the core features of experiments that ...

  10. Quantifying and addressing the prevalence and bias of study designs in

    An analysis of the quality of experimental design and reliability of results in tribology research. Wear 426-427 , 1712-1718 (2019). Article CAS Google Scholar

  11. Experimental Research Design

    However, the term "research design" typically does not refer to the issues discussed above. The term "experimental research design" is centrally concerned with constructing research that is high in causal (or internal) validity. Causal validity concerns the accuracy of statements regarding cause and efect relationships.

  12. Google Scholar

    Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions. Advanced search. Find articles. with all of the words. with the exact phrase. with at least one of the words. without the ...

  13. Experimental Study Designs

    Click on the article title to read more. Corresponding Author. Dana P. Turner MSPH, PhD [email protected] Department of Anesthesia, Critical Care and Pain Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

  14. Can emotional intelligence be improved? A randomized experimental study

    Abstract. Purpose: This article presents the results of a training program in emotional intelligence. Design/methodology/approach: Emotional Intelligence (EI) involves two important competencies: (1) the ability to recognize feelings and emotions in oneself and others, and (2) the ability to use that information to resolve conflicts and problems to improve interactions with others.

  15. Use of Quasi-Experimental Research Designs in Education Research

    In the past few decades, we have seen a rapid proliferation in the use of quasi-experimental research designs in education research. This trend, stemming in part from the "credibility revolution" in the social sciences, particularly economics, is notable along with the increasing use of randomized controlled trials in the strive toward rigorous causal inference.

  16. Determining the level of evidence: Nonexperimental research ...

    Humans. Research Design*. To support evidence-based nursing practice, the authors provide guidelines for appraising research based on quality, quantity, and consistency. This article, the second of a three-part series, focuses on nonexperimental research designs.

  17. Quasi-experimental study designs series—paper 5: a checklist for

    There is a new article in the field of hospital epidemiology which also highlights various features of what it terms as quasi-experimental designs . The list of features appears to be aimed at researchers designing a quasi-experimental study, acting more as a prompt (e.g., "consider options for …") rather than as a checklist for a ...

  18. Quantitative Research Excellence: Study Design and Reliable and Valid

    Share access to this article. Sharing links are not relevant where the article is open access and not available if you do not have a subscription. For more information view the Sage Journals article sharing page.

  19. Free APA Journal Articles

    Free APA Journals. ™. Articles. Recently published articles from subdisciplines of psychology covered by more than 90 APA Journals™ publications. For additional free resources (such as article summaries, podcasts, and more), please visit the Highlights in Psychological Research page. Basic / Experimental Psychology.

  20. Experimental (Empirical) Research Articles

    Many of the recommended databases in this research guide contain scholarly experimental articles (also known as empirical articles or research studies or primary research). ... Because those databases are rich in scholarly experimental articles, any well-structured search that you enter will retrieve experimental/empirical articles. These ...