Multi-level Data Modeling (MLM)

by DAN CALLOWAY
Published 14 September 2010

WEAVERVILLE, NC – Multilevel modeling (MLM) is arguably one of the two most widely employed means of statistical analysis used primarily in the fields of social and behavioral research (Vogt, 2007)⁠. Although MLM has been widely used in sociology, its use in education is also widely known and accepted. In the field of educational research, it is referred to as hierarchical linear modeling (HLM). Other labels used to describe MLM are: random effects models, mixed effects models (used primarily in economics), and random coefficient regression models and covariance components models (Vogt, pp. 214 – 215).

The kinds of research that lend themselves well to MLM are educational and sociological research. A hypothetical study that I propose for use in a two-level MLM is that of modeling students’ academic achievement in reading comprehension in high school within geographical regions of the U.S. At level one, the student’s individual characteristics, such as personality traits, skills, attention span, and parental support would need to be modeled. Likewise, at level two, the geographical regions of the U.S. where the students attend high school, such as North, South, East, Northeast, Midwest, and West would be identified as well. Thus, the independent variables in the study would be personality traits, skills, attention span, parental support, and geographical location of high school attendance in the U.S. The dependent or outcome variable would be academic achievement in reading comprehension. The hierarchical structure of the data is justified in this two-level MLM because the first level of variables under the study are the individual student characteristics that vary within the students themselves, and the variables in the second level would be the area of the U.S. where the students attend high school, which operate separately from the variables that make up the students’ individual characteristics.

In the two-level MLM study proposed, the structure would consist of two separate levels: (1) individual students at level one (and associated characteristics), and (2) geographical region within the U.S. where the students attend high school. At level one, the IV of personality trait would consist of labels 1 = introvert, 2 = extrovert, 3 = motivated, and 4 = non-motivated. The IV of skills would consist of the labels 1 = beginner, 2 = intermediate, and 3 = advanced. The IV of attention span would be given the labels of 1 = can concentrate on any given task for more 10 minutes or less, and 2 = can concentrate on any given task for more than 10 minutes. And, finally, the IV of parental support would be labeled as 1 = full parental support, 2 = some parental support, 3 = no parental support. At level two, the IV of geographical region would be 1 = North, 2 = South, 3 = East, 4 = West, 5 = Northeast, and 6 = Midwest. The outcome or criterion variable of academic achievement in reading comprehension would be labeled as 1 = Poor, 2 = Fair, 3 = Good, and 4 = Excellent.


Reference:

Vogt, W. P. (2007). Quantitative Research Methods for Professionals (Custom., p. 334). Boston, MA: Allyn & Bacon.

by DAN CALLOWAY
Published 9 September 2010

WEAVERVILLE, NC - A variable construct that would be difficult to measure directly is intelligence. However, using the concept of factor analysis as outlined in Vogt (2007)⁠ and as discussed in Darlington (2010)⁠, one might be able to learn more about the degree of intelligence, say, in humans, if we used a set of multiple indicators such as math, verbal, and spatial skills. These three indicators are much easier to measure using standard tests that could be developed to evaluate those skill levels. The math indicator could be a simple 50-question test using multiple-choice responses of a, b, or c, where the most correct response to any question would be a unique math solution represented by the answers corresponding to either a, b, or c. The scale that could be employed might be a categorical numerical-valued range scale where scores from 0 – 33 correspond to poor math abilities (assigned a value of 0), scores of 34 – 66 correspond to intermediate math abilities (assigned a value of 1), and scores ranging between 67 – 100 represent advanced math abilities (assigned a score of 2). Similarly, the indicator of verbal skills could be based on a 50-question multiple-choice test with responses of a, b, or c that correspond to unique answers to verbally-related questions, such as use of tense, case, spelling, and grammar. The scale that would be employed here would be identical to the scale created to evaluate the math skills; that is, 0 – 33 corresponding to poor verbal skills (assigned a value of 0), 34 – 66 corresponding to intermediate verbal skills (assigned a value of 1), and achieving a score in the range of 67 – 100 would correspond to advanced verbal skills (assigned a value of 2). And, finally, the indicator of spatial skills might be a little more difficult to measure, but could be reasonably measured using the 50-question multiple-choice test method wherein responses of a, b, or c would represent the unique response sought as an answer to the question. However, in this test, the use of 3-dimensional diagrams would be required to represent the spatial relationships among the various factors identified within each question, and the solutions representing the most correct response. Here again, a categorical numerical-valued range scale of 0 – 33 (assigned a value of 0), 34 – 66 (assigned a value of 1), and 67 – 100 (assigned a value of 2) could be developed to represent poor, intermediate, and advanced spatial skills of the individual being tested. The overall measure of intelligence would be determined by the equation:

I = 2 (M + V) + S ,

where M, V, and S correspond to the numerical values assigned to the three indicators of math, verbal, and spatial skills, respectively, determined from the overall scores in each indicator being measured and compiled from the three tests given to the subject; and I represents the measure of intelligence of the subject ranging from 0 to 10, where 0 would represent a score of 0 in all three categories of skills being tested and 10 would represent a maximum score of 2 in each category within the skills being evaluated. As connoted in the equation above, less emphasis is placed on the math and verbal skills than is the spatial skills in determining overall measure of intelligence of the individual being tested.

The role of factor analysis in the above example is to find patterns in the correlations among the variables. These patterns are used to cluster the variables into groups, referred to as factors. These factors can then be treated as new composite variables (Vogt, 2007). The development of a correlation matrix would be necessary among the factors identified within the three indicators of math, verbal, and spatial skills, noted earlier in order to determine which were highly correlated and which were not, thus determining whether the factor would be included or excluded, respectively, from the matrix being considered.

References:

Darlington, R. B. (2010). Factor Analysis. Retrieved from http://www.psych.cornell.edu/Darlington/factor.htm.

Vogt, W. P. (2007). Quantitative Research Methods for Professionals (Custom., p. 334). Boston, MA: Allyn & Bacon.
[facebook_ilike]

Regression Analysis

by DAN CALLOWAY
Published on 29 August 2010

WEAVERVILLE, NC – Regression analysis is a statistical process used to examine why an independent variable does not fully explain or predict the dependent variable in a study whereby the researcher looks to answer three basic questions of what is: (1) the total contribution of all independent variables together, (2) the comparative importance of the different variables, and (3) the role a particular independent variable plays mutually exclusive of the effects that other independent variables have on the dependent or outcome variable (Vogt, 2007, p. 145; p. 147). The role of the researcher in using regression analysis is to decide whether to use all the predictor or independent variables to make predictions of the dependent variable or whether to explain the separate effects of the independent variables in making the predictions of the dependent variable; that is, the questions that researchers ask of regression analyses are shaped by the goals of their research and not be the technicalities or complexities of their computations (Vogt, p. 147).

Giving consideration to Project 2, I would use regression analysis to answer the three basic questions discussed earlier. Regression analysis would be used to determine the total contributions of all the independent variables taken together in my problem statement under consideration or study, to identify the comparative importance of the different variables chosen, and to investigate the role of each predictor variable in predicting the outcome variable when examined mutually exclusively of the effects of the other identified predictor variables on the outcome variable. In my regression analysis, the decision as to the independent variables and the dependent variable would be predicated on which variables were predictors and which variable(s) were outcomes in the analysis or problem statement. Those variables identified as predictors or whose values were allowed to vary independently would be selected as the independent variables (IVs) and the variable(s) that were dependent on the effects of the predictors would be classified as the (DVs) or dependent or outcome variable(s).

When conducting research, the researcher could reasonably assume that important variables (such as mediating variables) have been omitted from consideration of the problem under study if the effects of the existing predictor variables were not able to fully explain the outcome or criterion variable. The use of regression analysis is a good means of determining that important variables may have been omitted from the research especially if the regression coefficient of the focus IV is less than the regression coefficient with controls or controls with mediators are added. If the current predictor variables are inadequate to explain the effects on the outcome variable, then it can be logically assumed that there are other predictor variables as yet unidentified that are playing a role either through their interaction with other independent variables or their own direct effect on the outcome variable (Vogt, 2007).

Thus, the research problem I have identified is: “I would like to investigate whether there is a positive correlation between sexual and physical abuse of a child in his/her early childhood development and whether s/he was raised in a loving or abusive single-parent or traditional mixed parental environment, and the propensity of the child to become a criminal outcast in his/her adolescent or adult life as viewed by society.” The independent or predictor variables identified are: environmental upbringing, gender, ethnicity, age, and parental guidance. The outcome or criterion variable identified is adolescent or adult criminal affiliation.

Reference:

Vogt, W. P. (2007). Quantitative Research Methods for Professionals (Custom., p. 334). Boston: Pearson Education, Inc.

Tagged with:
 

Analysis of Variation (ANOVA)

by DAN CALLOWAY
Published 13 August 2010

WEAVERVILLE, NC – The t-test is a statistical test for two means. It is most often used to compare the means of two experimental conditions and, thus, two applications can be distinguished. The first one applies to experiments with two independent groups; that is, when subjects are assigned randomly to an experimental group and also when subjects are assigned at random to an experimental group and a control group. The null hypothesis is then that the population mean scores are equal for the two conditions. That is to say, there is no difference if we could compare the entire population of scores for the experimental and the control conditions. H0 : μE – μC = 0, where μE and μC represent the population means of the experimental and the control groups, respectively. Here, the alternative hypothesis would be Ha: μE – μC ≠ 0. If the direction of difference were detected ahead of time, a one-tailed version of Ha would be shown as μE – μC > 0. The t-test assumes that the scores in each condition are normally distributed and that the two distributions have equal variance (Levin, 1999)⁠.

When more than two means are analyzed, such as the case with more than two independent variables, then the t-test becomes an inappropriate test and gives way to statistical test of more than two means: analysis of variation, also called the ANOVA. The associated statistic of the ANOVA is the F-test. There are two versions of the ANOVA and its associated statistic. The first version of ANOVA is known as the one-way ANOVA, in which there is only one independent variable, but this independent variable may have many different levels as when you are comparing various dosages of a drug or varying amounts of reinforcement (Levin, 1999). In this case, the null hypothesis is H0: μ1 = μ2 = . . . = μk, where μi is the population mean for level i. Here, the alternative hypothesis would state that the population means are not all equal. The ANOVA is similar to an extension of the t-test, but the t-test is only able to test the means of two variables, each containing one level. If, for example, there were eight groups to be compared, and one used a t-test to compare every possible pair of groups, then one would have to conduct 28 different t-tests. If each individual t-test was conducted with
α = 0.05, then you could easily see that 28 t-tests could easily lead to one or more Type I errors because the expected number of errors would be 28 X 0.05 = 1.40. Thus, instead, a single analysis of variation is preferred because it would not be as likely to lead to a Type I error.

The F-test, in ANOVA, is defined as the ratio of two sample variances, that is to say,
F = s12 / s22; hence the term, analysis of variance. In the case of a one-way ANOVA, the variance term in the numerator is referred to as the between-groups variance because it is a measure of how much the k different group means vary from one another. The variance term in the denominator is called the within-groups variance because it is a measure of the average variance of scores within each experimental condition. Thus, the denominator is a measure of sampling error, while the numerator is a measure of sampling error plus any differences between experimental conditions that go beyond sampling error (Levin, 1999).

Reference:

Levin, I. P. (1999). Relating Statistics and Experimental Design: An Introduction (p. 90). Thousand Oaks, CA: Sage Publications, Inc.

Standard Error and Sample Size

by DAN CALLOWAY
Published 4 August 2010

WEAVERVILLE, NC - In this article, I used the Standard Error and Sample Size calculator to evaluate the effects that varying sample size and standard deviation have on the standard error (Lowry, 2010)⁠. The initial values that I selected for mean, standard deviation, and sample size (see Chart No. 1) were 25, 2.3, and 50, respectively. The results of the calculation revealed a range of 22.7 and 27.3 about the population mean of 25, which was ±2.3 SD about the population mean, and with a sample size of 50, resulted in a range of 24.6757 and 25.3253 in the sample population. The resultant SE was ±0.3253. When I varied the sample size from 50 to 200 (See Chart No. 2), the range in the new sampling population revealed results from 24.8374 to 25.1626 with a SE of ±0.1626. By multiplying the sample size by a factor of 4, the SE was multiplied by a factor of ½ the previous amount. Thus, what we can determine in this exercise is that it appears the relationship between SE and n (sample population size) is expressed by the formula:

Δ SE = 1 / (Δ n)1/2 , [a]

where Δ represents the multiplicative change in the expression under consideration. Thus, using the equation above, in this example, a change in sample size, n, by a factor of 4 will result in a corresponding change in SE by a factor of ½ . Next, I returned to my original values in Chart No. 1 for hypothetical mean, SD, and sample size, n, and varied the SD by multiplying the value of SD by a factor of 2. The resultant figure for SE upon doubling SD was ±0.6505 (See Chart No. 3). Thus, in this example, the effect of doubling the SD on SE was ±0.6505 / ±0.3253 ~ 2, or a doubling of SE. Therefore, we can determine in this exercise that it appears the relationship between SE and SD can be expressed by the formula:

Δ SE = Δ SD, [b]

where Δ represents the multiplicative change in the expression under consideration. Therefore, to summarize, we can deduce from [a] and [b] above, that changing the sample population by a factor, k, changes the SE by a factor of 1 / k1/2, and that changing the SD by a factor of m, increases the SE by the same factor, m. What we can see from this deduction is that increasing the sample size in a sample population reduces the SE whereas increasing the SD increases the SE proportionally. If we assume that a normal distribution is a continuous function, then using Calculus we can convert the expression in [a] to:

d SE/dn = d (1/n1/2) / dn = d (n-1/2) / dn = 1/(2n).

Thus the rate of change of SE with respect to n (sample population) is 1/(2n). Likewise, In equation [b]:

dSE/SD = dSD/SD = 1.

Thus the rate of change of the SE with respect to the SD is 1, or, in other words, the rate of change of SE with respect to SD is proportional. Therefore, we can deduce that the greater effect on SE is in varying the sample population (n) since it decreases the value of SE by a factor of 1/(2n) rather than varying the SD of the sample population since the rate of change is 1. In other words, the rate of change of SE with respect to n is 1/(2n), which means as n increases the rate of change in SE decreases inversely. However, as one increases/decreases SD, a proportionate change in SE occurs. In light of this information, I would opt with increasing the sample population before adjusting the SD since it has the greater effect on reducing SE.

Chart No. 1

Chart No. 2

Chart No. 3


Reference:

Lowry, R. (2010). Standard error of sample means. Retrieved from http://faculty.vassar.edu/lowry/dist.html. on 4 August 2010.


Dan Calloway

Get Adobe Flash playerPlugin by wpburn.com wordpress themes

SEO Powered by Platinum SEO from Techblissonline