|Year : 2016 | Volume
| Issue : 1 | Page : 81
Which statistical hypothesis test should I apply? A simple guide for beginners
Inaamul Haq1, Aanisa Nazir2
1 Department of Community Medicine, Government Medical College, Srinagar, Jammu and Kashmir, India
2 Center for DNA Fingerprinting and Diagnostics, Laboratory of Bacterial Genetics, Hyderabad, Telangana, India
|Date of Submission||28-Oct-2015|
|Date of Acceptance||17-Apr-2016|
|Date of Web Publication||08-Jun-2016|
Department of Community Medicine, Government Medical College, Srinagar - 190 010, Jammu and Kashmir
Source of Support: None, Conflict of Interest: None
|How to cite this article:|
Haq I, Nazir A. Which statistical hypothesis test should I apply? A simple guide for beginners. Int J Prev Med 2016;7:81
| Introduction|| |
Researchers in the field of medical and allied sciences (MAS) are generally mathematically faint at heart. The evolution from a student to a researcher is abrupt. Even though statistical methods are taught to an MAS student at the postgraduate level in most Indian universities, practical application of the same is mostly lacking. Once these students are thrown into the realm of research, they fail to apply even the basic methods in analytical statistics. A researcher works hard and honestly to collect good data but may report the wrong findings because an incorrect statistical test was used for data analysis. With this study, we intend to present a simpler way of answering the common question, "Which statistical hypothesis test should I apply?"
| The building blocks|| |
There are three essential aspects which we must understand before we can start analyzing any data.
The type of variable used
"Any aspect of an individual that is measured, like blood pressure, or recorded, like age or sex, is called a variable."  A "variable" may take different values in different individuals (or animals, objects, organisms, and populations) or in an individual at different times. Common examples include age, sex, weight, height, caste, religion, income, education, colony count, bacterial strain, antibiotic sensitivity, bacterial motility, bacterial morphology, and bacterial growth rate. A researcher must be very clear about the variable(s) being recorded or measured in the research. For the purpose of this study, we shall define four types of variables: "dichotomous," "polychotomous," "ordinal score," and "scale."
Dichotomous variables have only two categories or levels. Variables with values such as "yes" or "no," "present" or "absent," "test" or "control" are dichotomous. Gender is a dichotomous variable with two values - "male" and "female." Disease status (present/absent), exposure status (present/absent), residence (urban/rural), antibiotic sensitivity (sensitive/resistant), type of bacterial strain (wild type/mutant), and motility (motile/immotile) are examples of dichotomous variables.
Polychotomous: Polychotomous variables are categorical variables with more than two categories. Examples include caste, religion, blood group, political affiliation, type of organism, socioeconomic status, severity of disease or symptom, and Likert scale. The categories of a polychotomous variable may or may not have an inherent order.
Sometimes, polychotomous variables have more than just few ordered categories based on a ranking or scoring system. A typical example is the commonly used scale for measuring pain - the visual analog scale.  It has at least 10 ordered categories. Other examples include Clinical Global Impression  and Yale-Brown Obsessive Compulsive Scale.  For this study, we shall consider all polychotomous variables with seven or more ordered categories as "ordinal scores."
These are quantitative variables. The characteristic that the variable indicates can be counted or measured. The measured ones usually have some unit of measurement (years, min, kg, cm, g, etc.). Examples include weight, height, age, clinical biochemistry parameters, pulse rate, bacterial doubling time, and bacterial colony size (in mm).
Sometimes, the way a variable has been measured may be different than how it is being recorded or analyzed. Age is usually measured as a scale variable but may be later grouped at the time of analysis into "age groups." In such instances, variables such as age will be treated as if they were polychotomous.
Identification of the question to be analyzed
A researcher should be very clear about the question for which an answer is being sought. An MAS researcher usually wants to (a) explore possible associations between variables (find out relationship between variables), (b) determine differences between two or more groups or treatments, or (c) predict an outcome given one or more exposure variables. To apply the correct statistical test, a researcher must be able to frame his/her question clearly. "Is there an association (or relationship) between gender and exam performance?", "Is there a relationship between type of bacterial strain and sensitivity to an antibiotic?", "Is there a difference in hospital stay between patients treated with operation 'A' versus operation 'B'?", "Is there a difference in the pain score of patients before and after giving an analgesic?" These are some questions which an MAS researcher might be interested in finding an answer to.
Identifying variables in the question
Most of the questions framed by an MAS researcher relate two variables. The trick to choose the correct statistical test is to identify the two variables in the question. The question "Is there an association between gender and exam performance?" has two variables: (1) Gender, a dichotomous variable and (2) exam performance, a scale variable (when measured as the marks obtained in an exam. The question "Is there a difference in hospital stay between patients treated with operation 'A' versus operation 'B'?" has two variables: (1) Hospital stay, a scale variable and (2) type of operation, a dichotomous variable with two values - operation "A" and operation "B."
| Choosing the correct statistical test|| |
Once the researcher is ready with the building blocks (the question to be answered, the two variables, and their type), it becomes easy to choose the correct statistical test. [Table 1] and [Table 2] provide an easy guide.
|Table 1: Choosing the statistical test, between-subjects design, and no repeated measures |
Click here to view
|Table 2: Choosing a statistical test, within-subjects design, and repeated measures |
Click here to view
In a between-subjects design, measurements are made on two or more different groups of "cases" (individuals, animals, objects, organisms, etc.). In a within-subjects design, measurements are made on the same cases at two (before and after treatment) or more (baseline, at time t 1 , and at time t 2 ) different times or through two or more different techniques (e.g. ELISA and Western Blot).
| Discussion|| |
This write-up has been written for beginners in research. This is not an all-encompassing text in statistical methods. We have omitted tests used for predictions because we think those to be beyond the scope of a beginner in research. Mentioning about regression analysis would be akin to opening the Pandora's box.
The tests mentioned in this write-up need to be used with caution. Most of these tests are based on certain assumptions which a beginner may find difficult to understand and test. In general, when we have a large dataset (more than 30 or 40 "cases"), , most of the test assumptions are met and the tests can be used effectively. Problems might arise with small datasets (<30 "cases"). Hence, caution should be exercised in "small dataset" situations, and an expert advice should be sought.
With the advent of computers and advances in the field of machine computing and internet, researchers can access free online sources for performing most of the tests mentioned in [Table 1] and [Table 2]. Open Source Epidemiologic Statistics for Public Health is one such source available at http://www.openepi.com/Menu/OE_Menu.htm. Other useful sites are VassarStats (http://vassarstats.net/) and GraphPad QuickCalcs (http://www.graphpad.com/quickcalcs/).
| References|| |
Kirkwood BR, Sterne JA. Essential Medical Statistics. 2 nd
ed. Oxford: Wiley-Blackwell; 2003. p. 501.
Wewers ME, Lowe NK. A critical review of visual analogue scales in the measurement of clinical phenomena. Res Nurs Health 1990;13:227-36.
Busner J, Targum SD. The clinical global impressions scale: Applying a research tool in clinical practice. Psychiatry (Edgmont) 2007;4:28-37.
Goodman WK, Price LH, Rasmussen SA, Mazure C, Fleischmann RL, Hill CL, et al.
The Yale-Brown Obsessive Compulsive Scale. I. Development, use, and reliability. Arch Gen Psychiatry 1989;46:1006-11.
Daniel WW. Biostatistics: A Foundation for Analysis in the Health Sciences. 9 th
ed. New York: John Wiley & Sons; 2009. p. 783.
[Table 1], [Table 2]