>Technical Report on a Meta-analysis
TECHNICAL REPORT ON A VOTE-COUNTING META-ANALYSIS OF THE BIRTH-ORDER LITERATURE (1940-1999)
Frank J. Sulloway
Department of Psychology
and Institute for Personality and Social Research,
University of California, Berkeley
This "Technical Report" is divided into six sections:
This technical report is an expanded version of a discussion and analysis presented in Sulloway (2000).
I previously published the results of a vote-counting meta-analysis of the birth-order literature (Sulloway, 1995). Because I first set forth my conclusions as part of a peer commentary in which I faced constraints on space, I did not provide full details about my methods. In hindsight, omission of these meta-analytic details was unfortunate because a fuller discussion of these methods might have prevented several misunderstandings.
All of the research results that I employed in my meta-analysis were tabulated from studies listed in Part 1 of Ernst and Angst's (1983:93-189) review of the birth-order literature, which covers the period from 1940 to 1980. The present meta-analysis includes these previously analyzed results, together with findings reported in Part 2 of their book (1983:243-82), as well as some findings that were not previously coded by me (for example, results that could not be coded correctly without consulting unpublished dissertations). In this technical report I have also included a meta-analytic assessment of the birth-order literature published between 1981 and 1999.
Although Modell (1997), Harris (1998), and Townsend (1997, 2000) have taken issue with some aspects of my meta-analytic conclusions, it is important to note that none of these investigators has attempted to replicate my procedures as I actually described them. Furthermore, all three investigators ignored the full implications of my statement that my tallies were based on "findings" rather than "studies" (Sulloway 1996: Table 3). Meta-analytic tallies exclusively of "studies" make no sense in the kind of analysis that is generally recommended for these sorts of data. Findings are what matter. This is because studies often report more than one finding, which are sometimes based on more than one subject pool and which frequently involve diverse behavioral attributes that, for theoretical reasons, need to be kept separate in any formal statistical analysis of the data (Hunter and Schmidt, 1990:479-82).
Distinguishing Meta-Analytic "Findings," "Outcomes," and "Studies"
The need to distinguish individual findings (and meta-analytic outcomes of psychologically related findings) from studies is especially important given the goals of my meta-analysis, which sought to test specific hypotheses about sibling strategies as they relate to the Five Factor Model of personality (Costa and McCrae, 1992; McCrae and John, 1992). For example, if a given study reports that firstborns and laterborns scored as expected on one personality dimension, but scored contrary to expectation on another dimension, I counted one confirmation and one opposed finding under the two relevant dimensions (Sulloway, 1998b). By contrast, Harris (1998, 2001) and Townsend (2000) have classified a number of studies involving multiple results about disparate aspects of "personality" as single outcomes (but for which hypothesis?).1 Since these two commentators were not attempting to test any of my five hypotheses, they apparently did not understand the need for examining the original publications in order to be clear on this issue. This is why their tables of birth-order results contain no indication of the specific dimensions of the Five Factor Model of personality that apply to these reported findings, which are theoretically ambiguous and hence useless, as is, for hypothesis testing. Similarly, Harris and Townsend have not conducted statistical tests on their own tallies, even though such formal tests are necessary to validate or refute my five hypotheses. Harris and Townsend, then, have engaged only in counting, not in a formal meta-analysis.
Let us consider a specific example to illustrate my point. In the instance of a study by Price (1969), Harris (1998, 2001) and Townsend (2000) have each coded a single result, even though Ernst and Angst (1983) mention 11 significant findings relevant to at least three different dimensions of the Five Factor Model. The original article by Price (1969), however, reports 30 significant personality findings out of 74 results, which I have scored by Bonferonni-like procedures into five separate "outcomes"--one for each personality dimension of the Five Factor Model. Thus, Harris's and Townsend's single tally from this one study, which they evidently did not consult, omits almost all of the published findings and consequently ignores four-fifths of the scorable meta-analytic outcomes. In short, tallies by study are inappropriate whenever a study reports findings pertaining to two or more dimensions of personality, something that occurs in more than forty percent of the approximately one hundred and thirty relevant birth-order studies.
To be clear, then, individual publications may contain more than one "study," which is defined as an analysis involving a distinct population of subjects. In addition, each study may contain multiple "findings." Whenever appropriate (that is, in cases where multiple findings relate to the same dimension of personality within the Five Factor Model), these findings need to be grouped into single behavioral "outcomes." Given my own use of meta-analytic techniques for consolidating theoretically similar personality measures, more than nine hundred individual birth-order "findings" are encompassed by the more than two hundred meta-analytic "outcomes" associated with the birth-order literature published between 1940 and 1980 (see the Appendix).
The Counting of Duplicate Results by Harris and Townsend
Another problem with Harris's and Townsend's tallies is that they include numerous results from Ernst and Angst's (1983) review that duplicate previous listings for the same studies and personality dimensions. For Helen Koch's (1955a-1957) pioneering study, for example, Townsend lists 8 different results, even though formal hypothesis testing dictates that there can be only 5 meta-analytic outcomes--one for each dimension of the Five Factor Model. For the two different samples studied by Macbeth (1975), Townsend counts 18 distinct outcomes, 8 more than are theoretically possible if one is engaging in formal testing of my five hypotheses. Moreover, Townsend includes various additional results in his tallies (such as vocational interests, perceived favorableness of birth-order positions, and feeling favored by parents) that have nothing to do with the five hypotheses about personality that he ought to have been testing if he was indeed attempting to replicate, as he claims, my own meta-analysis. Harris's (1998, 2001) tallies reflect similar methodological inadequacies.
In short, a proper test of my hypotheses requires conducting five separate meta-analyses--one for each dimension of the Five Factor Model. Within each personality dimension, meta-analytic outcomes assessed in this manner are statistically independent, thereby allowing five separate statistical tests. None of my critics has acknowledged this fundamental point, which is why none of them has replicated my meta-analysis as it was actually described and conducted. These "replications" were bound to fail because, from the very start, they were never really designed to be true reassessments of my own methods, hypotheses, or associated empirical findings.
Reporting Errors in Ernst and Angst (1983)
There is another reason why Modell, Harris, and Townsend obtained different results from my own: their tallies contain substantial errors. None of these investigators consulted the original birth-order literature, relying instead on Ernst and Angst's (1983) sometimes flawed summaries.2 Although Ernst and Angst's literature review represents a noteworthy and useful achievement, it was not undertaken with formal meta-analytic criteria in mind and is incomplete, for this and other reasons, about certain relevant technical details. It is customary, moreover, for researchers who are performing a meta-analysis of a particular literature to actually read this literature. In the process of attempting to verify Ernst and Angst's tabulations during my own review of the literature, I encountered errors, omissions, and other inconsistencies that I corrected before tallying my results (Sulloway, 1997:472).3
A brief overview will suffice to indicate the kinds of discrepancies that exist between Ernst and Angst's (1983) informal summaries and the actual contents of the literature. In more than fifty instances, Ernst and Angst have reported "null" findings that, on closer inspection, prove to be statistically significant or to come from studies in which other significant findings were overlooked. Conversely, Ernst and Angst have sometimes overlooked null findings or mistakenly reported null findings as being significant. There are some studies in which Ernst and Angst have reported findings that, in the original publications, are not accompanied by formal statistics and so cannot be properly assessed in meta-analytic terms. In addition, I have found many studies in which the status of controls was incorrectly reported by Ernst and Angst. These errors are virtually all preserved in Harris's (1998, 2001) and Townsend's (2000, Appendix A) tallies, as well as in Townsend's pervasively flawed attempts to address this problem (in his Appendix C). (See below, "Additional Examples of Errors and Omissions by Townsend.")
In connection with the preparation of this technical report, I and a second investigator (Christopher Davis) examined all of the relevant birth-order publications for which controls are reported by Ernst and Angst (1983). (We omitted one unpublished dissertation [Dean, 1947], which we were unable to consult, from our analysis.) Upon completing our independent reviews, using formal coding protocols on which we recorded pertinent statistical information as well as reporting errors, we then attempted to reconcile any differences in our codings of the results. In the case of studies containing particularly problematic statistical issues, we sought the assistance of two statisticians at the Center for Advanced Study in the Behavioral Sciences (Lynn Gale and Lincoln Moses). Studies included in this meta-analysis possess at least one valid control, for either sibship size or social class.
Hypotheses Being Tested
In this meta-analysis, the hypotheses being tested (and outcomes being coded) are based on the Five Factor Model of personality (Costa and McCrae, 1992). These hypotheses include that firstborns are more conscientious and neurotic than laterborns; and that laterborns are more agreeable and open to experience than firstborns. Additionally, firstborns are expected to be more extraverted than laterborns in the sense of being assertive, whereas laterborns are expected to be more extraverted than firstborns in the sense of being fun-loving, affectionate, excitement seeking, energetic, and sociable (and on composite measures of extraversion).4 These five hypotheses are derived from a theory about sibling strategies, as well as a theory about parental investment (Sulloway, 1996, 2002; Hertwig, Davis, and Sulloway, 2002).
Formal Classification of Meta-Analytic Findings
After I had classified my own meta-analytic outcomes according to the Five Factor Model (Costa and McCrae, 1992), Oliver John (University of California, Berkeley) independently classified the same findings and outcomes in order that we might determine the reliability of such classifications. The concordance between our respective classifications is 88 percent, which indicates that findings can be reliably classified according to this model of personality. After we had conducted our respective classifications, Oliver John and I reconciled, where possible, any differences of opinion. In those cases where differences of opinion could not be reconciled, we excluded the relevant findings from further analysis. Classifications of findings by personality dimension, which are included in the Appendix to this report, are based on these joint assessments.
Personality has been construed somewhat broadly in our analysis. Although some of the classified findings are not strictly personality traits, they do represent behaviors, motivations, or mental processes that are closely linked to personality traits within the Five Factor Model. For example, we judged task orientation, need for achievement, scholastic achievement (but not IQ), and awareness of parental authority and control as all being reasonably faithful expressions of Conscientiousness. Similarly, we considered membership in clubs and organizations, popularity, and willingness to engage in dangerous activities as indicators of Extraversion. We were unable, however, to classify perceptions of "parental discipline" or "admitting readily to faults"--findings that we felt could each be categorized under more than one personality dimension. We were more confident still in excluding from our analysis findings about trait oppositeness among siblings, choice of college major, and parents' perceptions about the favorableness of birth-order positions--results that we considered as having little or nothing to do with the five hypotheses being tested here.
The "Scoring Method"
Findings in the Appendix have been coded using the "scoring method." This method codes each study, for each relevant personality dimension, on a five-step scale (that is, as full confirmations, partial confirmations, nulls, partially opposed outcomes, and fully opposed outcomes). Under the scoring method, a full confirmation (+2) and a null effect (0) are scored as a partial confirmation (+1), whereas a full confirmation (+2) and a fully opposed outcome (-2) are scored as a null or neutral outcome (0). Using the scoring method, interaction effects are scorable either as partial confirmations or as partial refutations, unless two opposing effects cancel themselves out, in which case the outcome has been scored as a null result. Findings that involve interaction effects at more than one level have been coded only for the lowest-level effect, and interaction effects have not been coded above the level of three-way effects. Ignoring interaction effects and restricting codings to main effects does not appreciably alter the general statistical conclusions that I reached previously, or that are presented in this report (see below, "Disregarding Interaction Effects").5
In instances of studies involving two or more findings on the same personality dimension, scoring has been determined by Bonferroni-like methods, including a formal set of rules for combining multiple results from the same study.6 For each personality dimension, a study has been considered as providing a full confirmation (or fully opposed outcome) if at least 2 significant positive (or negative) findings are present out of a maximum of 7 tests; if 3 significant positive or negative findings are present out of a maximum of 15 tests; or if 4 significant positive or negative findings are present out of a maximum of 25 tests. Whenever positive and negative findings are simultaneously present for the same dimension of personality, the net number of positive and negative findings has been used to decide the study outcome according to the scoring rule. Studies with net confirming or opposed findings on a given personality dimension, but which do not qualify as fully confirming or opposing outcomes, have been scored as partial outcomes, subject to the following qualification. For studies producing only a single significant result for a particular personality dimension, no more than 12 null results can be present for a study to be counted as a partial confirmation (or partially opposed outcome). Otherwise, the result has been counted as a null.
In treating the amalgamation of multiple findings into single outcomes for each dimension of the Five Factor Model, the main point is that there is no single "right" way to combine such multiple findings from the same study. Hence another researcher may choose another scoring method or another set of cutoff points that are consistent and sensible. Nevertheless, my own analyses, using a variety of different scoring rules, suggest that that the conclusions will be the same when proper statistical procedures are employed in analyzing the observed distributional trends for outcomes within each individual personality dimension (see below, "Other Scoring Rules").
Statistical Testing of Independent Meta-analytic Outcomes
The scoring method allows each study to be reduced to a single outcome for each personality dimension of the Five Factor Model. Within each personality dimension, scored outcomes are statistically independent of one another and can therefore be analyzed by standard statistical tests. Formal statistical tests of the validity of these five hypotheses are based on t-tests involving the overall distribution of results around the null outcome, using the five-step scoring method.
Another advantage to using this scoring method is that different weights can be assigned to the five possible outcomes when one conducts t-tests (for example, +2, +1, 0, -1, -2). Weights can also be assigned to reflect the presence and quality of control variables or other attributes of studies. For this and other reasons, the scoring method is more informative than the standard "vote-counting" method of meta-analysis.
META-ANALYTIC RESULTS (1940-1980)
When scored by the five-step method described earlier, the findings for the pre-1981 literature include a total of 230 scorable outcomes, which are based on more than nine hundred individual findings and are drawn from 122 different publications containing 126 separate studies. These 230 outcomes include 57 full confirmations, 42 partial confirmations, 112 nulls, 17 partially opposed outcomes, and 2 fully opposed outcomes. For my five hypotheses as a whole, the ratio of fully favorable outcomes to fully opposed outcomes is 28.5 to 1 (Table 1) .7 Overall, firstborns tend to be more conscientious and neurotic than laterborns, whereas laterborns tend to be more agreeable and open to experience than firstborns. Finally, laterborns are more extraverted than firstborns in most respects encompassed by this personality dimension, but (as expected) firstborns are more assertive than laterborns, which is also a recognized facet of extraversion (Costa and McCrae, 1992). (For the individual studies and scored outcomes on which Table 1 is based, see Table 2 .)
Using the scoring method, the median number of findings per outcome is 2, and the mean number of findings per outcome is 4.2. Of the 36 outcomes in this meta-analysis that reflect multiple findings condensed into fully favorable (or opposed) outcomes via the scoring method, the average net proportion of significant results is 63 percent. Of the 59 outcomes in this meta-analysis that reflect multiple findings condensed into partially favorable (or opposed) outcomes, the average net proportion of significant results is 30 percent. Of the 48 outcomes in this meta-analysis that reflect multiple findings condensed into null outcomes, the average net proportion of significant results is less than 1 percent.
The scoring method yields the same general conclusions as the method I employed previously (Sulloway, 1995, 1996). In addition, these conclusions appear to be statistically robust when one employs a wide variety of weights in the t-tests, including weights for the N of each study. These results are also robust when a variety of different cutoff criteria are employed for classifying, via the scoring method, multiple outcomes along the five-step scale described here (see below, "Other Scoring Rules").
Fully Controlled Outcomes
Of these 230 outcomes, 96 are doubly controlled for social class and sibship size. These 96 outcomes comprise 26 full confirmations, 19 partial confirmations, 42 nulls, 7 partial refutations, and 2 full refutations. For these doubly controlled results, the ratio of fully favorable outcomes to fully opposed outcomes is 13 to 1. In addition, and contrary to Townsend's (2000) wildly mistaken claims, the 96 outcomes from 50 studies that are controlled for both sibship size and social class are no less supportive of my hypotheses than are the 134 other outcomes controlled for only one or the other of these two variables. More specifically, doubly controlled outcomes exhibit 47 percent fully or partially favorable outcomes. For the remaining 134 outcomes, the overall rate of full or partial outcomes is 40 percent.8
Meta-Analytic Results Using Townsend's and Harris's Own Tallies
It is ironic that Harris's and Townsend's flawed attempts at replicating my results, marred as they are by many errors as well as by their failure to consider numerous overlooked findings, strongly support the theory about birth order that they seek to refute. Uncorrected tallies based on the results reported by Ernst and Angst (1983) exhibit significant trends for three of the Big Five personality dimensions, although neither Townsend nor Harris say anything about these robust trends present in their own tallies of the relevant data. Moreover, if Townsend's and Harris's tallies for Extraversion are coded, as they should be, according to the specific hypotheses described in Table 1 (which distinguishes firstborn assertiveness from other, laterborn forms of extraversion), then these tallies confirm four of my five hypotheses about birth order and personality.
The degree to which significant confirming findings are statistically overrepresented in the birth-order literature becomes even more apparent when one takes into account all of the necessary corrections to Townsend's and Harris's lists, together with the numerous findings that they, following Ernst and Angst (1983), overlooked. As Table 1 shows, these corrected tallies exhibit an overrepresentation of positive outcomes for all five dimensions of the Five Factor Model of personality, based on t-tests for the distribution of results around the null outcome.
In short, both the uncorrected meta-analytic results from Harris's (1998, 2001) and Townsend's (2000) summaries of results from Ernst and Angst (1983), and the corrected results based on a comprehensive review of the original literature, agree closely with the pattern of significant findings that I documented previously (Sulloway, 1995, 1996). It is worth noting that the corrected meta-analytic trends summarized in Table 1 reflect the collective conclusions of myself and two other investigators. In addition, this analysis has benefited from extensive statistical advice provided by four statisticians, including two experts in meta-analysis.
All in all, my meta-analytic conclusions about trends in the birth-order literature are remarkably robust when the relevant data--with or without errors, and with or without overlooked findings--are analyzed by any systematic form of meta-analytic techniques. The only way to avoid this conclusion is to avoid formal testing of these hypotheses, as Modell, Harris, and Townsend elected to do in connection with their "replications" of my original study. Being preoccupied with counting instead of meta-analysis, these investigators appear to have missed the proverbial forest for the trees.
The five meta-analytic trends reported in Table 1 are very similar to the five birth-order trends that I obtained in a study of 4,177 subjects who rated themselves and a sibling (Sulloway, 1999). The correlation between the effect sizes obtained via these direct sibling comparisons, and the overall proportion of favorable outcomes reported in Table 1, is .91. In other words, the trends in the two different sets of data are almost identical, with firstborns being more conscientious and slightly more neurotic, with laterborns being more agreeable and open to experience, and with both birth-order groups scoring highly, as predicted, in specific subcategories of extraversion. These results show that a meta-analysis of the birth-order literature in which findings have been classified by myself and an independent judge (Oliver John) yields nearly identical results to those obtained using a survey instrument that has been specially designed to assess birth-order effects according to the Five Factor Model of personality.
Differences in Results by Counting versus by the Scoring Method
Counting findings summarized by Ernst and Angst (1983) leads to different tallies and overall results than those obtained when these formal meta-analytic methods are employed.9 Such discrepancies in outcomes are especially likely if counts are made (1) without regard to classifying findings under the appropriate personality dimensions, (2) without the use of formal techniques for amalgamating related findings from the same study into single "outcomes," and (3) without regard to formal hypothesis testing. Townsend's (2000) tallies, for example, include 190 scorable results, 51 (27 percent) of which are favorable to my hypotheses, as scored by him (with numerous errors uncorrected). If Townsend's tallies are assessed in accordance with the five-step scoring method, we find 43 fully or partially favorable outcomes out of 153 scorable outcomes (or 31 percent). These tallies by Townsend, however, contain numerous errors as well as omissions. When these errors and omissions are corrected, the overall proportion of favorable outcomes for the studies counted by Townsend rises from 31 to 37 percent. Finally, when the studies counted by Townsend are scored in accordance with the specific hypotheses outlined in Table 1 (where findings for extraversion are scored by subcategories of that personality dimension), then the proportion of favorable outcomes rises from 37 to 40 percent. This corrected proportion of favorable outcomes agrees reasonably closely with the 43 percent figure that a coworker and I obtained for the 230 outcomes reported in Table 1, using the same five-step scoring method.
In this connection, let us look more closely at several specific studies mentioned previously, in which Townsend and Harris both overlooked findings and also counted redundant outcomes on the same dimension of personality. I am referring to two studies by Koch (in a series of nine publications--1955a-1957, 1960), one study by Price (1969), and two studies by Macbeth (1974). For the collective results reported by these three researchers, Townsend has counted 3 favorable results out of 29 (or just 10 percent). Based on formal hypothesis testing rather than simple counting, the maximum number of codable outcomes in Townsend's own listings for these five studies is not 29 but only 15 (given the particular findings he has counted and the fact that no study should be coded more than once for each relevant dimension of the Five Factor Model of personality). As a result of counting redundant results, Townsend has included 10 extra null results for the same dimensions of personality on which he has already counted other null results.
In addition, Townsend and Harris have both overlooked a number of findings that Ernst and Angst (1983) did not mention in their incomplete review of these five studies. Some of these results are presented only in Macbeth's (1975) unpublished dissertation, which contains more than a hundred findings, most of which go unmentioned in the incomplete summary published in Dissertation Abstracts International. Similarly, both Harris and Townsend have overlooked approximately seventy relevant findings in the study by Price (1969).10 Townsend and Harris are also inconsistent in how they have coded some of these findings. For example, Harris has counted a single extraversion finding from the study by Price (1969) as being favorable, whereas Townsend (2000) has counted this same result as an opposed finding. (I and two other investigators coded this same result, together with 14 overlooked findings on the same dimension of personality, as a fully favorable outcome.) When all incorrectly coded outcomes in these five studies are rectified, when redundant outcomes are properly eliminated, and when all of the omitted findings are included in the scoring of results, the rate of fully or partially favorable outcomes becomes 6 of 15 outcomes (40 percent)--four times higher than Townsend himself reports.
For these reasons, it is not really a pertinent or valid criticism to note, as Harris (1998) and Townsend (2000) have both done, that my previous codings contain a higher proportion of favorable outcomes compared with the individual findings listed in Ernst and Angst (1983). Although my correction of reporting errors is responsible for some of these differences, it is also important to note that the percentage of favorable outcomes associated with a meta-analytic treatment of these data is inevitably different from that based solely on tallies of individual results, many of which are redundant and should not be counted more than once. With most meta-analytic scoring procedures, the percentage of favorable outcomes tends to be higher than the percentage of favorable findings, precisely because any sound meta-analytic amalgamation of findings, using Bonferroni-like procedures, tends to have this effect when significant trends are present in the particular literature being assessed. By the same token, different methods of scoring outcomes may produce different percentages of confirming and opposed outcomes. It is not, however, the percentage of confirming outcomes alone that dictates a significant or nonsignificant trend for any given personality dimension. Rather, the results of formal hypothesis testing for the presence of such trends are decided by the overall distribution of confirming versus opposed outcomes. Using a wide variety of scoring rules, these distributions are relatively stable and produce similar statistical results in formal tests (see below, "Other Scoring Rules").
Other Scoring Rules
The scoring rule employed in Table 1 was suggested by two statisticians at the Center for Advanced Study in the Behavioral Sciences (Lynn Gale and Lincoln Moses). Because this particular rule is arbitrary, other scoring rules are also possible. Such alternative rules, however, appear to make little difference in the results when formal statistical tests are applied to the distribution of outcomes. Consider the following (more stringent) scoring rule: Score studies reporting more than 50 percent of their net outcomes in a given direction for a given personality dimension as full confirmations/opposed findings; score studies reporting 25 to 50 percent of their net outcomes in a given direction as partial confirmations/opposed findings; and score all other studies reporting multiple outcomes as null results.
Using this scoring rule, there are 44 full confirmations, 39 partial confirmations, 132 null results, 13 partially opposed outcomes, and 2 fully opposed outcomes (for a total of 230 outcomes). For these particular results, the ratio of fully favorable outcomes to fully opposed outcomes is 22 to 1. Moreover, the trends associated with the distribution of these outcomes are statistically significant, in the predicted direction, for all five dimensions of the Five Factor Model of personality.
The application of different scoring rules to these meta-analytic data underlines an important methodological point. When guided by the goals of hypothesis testing, the purpose of any formal analysis of these data is not just to tally findings, as Townsend and Harris have done, because different scoring rules and methods of data reduction--both of which are required in meta-analysis--inevitably produce somewhat different counts. What does not appear to change much with the application of different scoring rules are the robust statistical trends that are associated with these data. It is these trends, as assessed by formal tests, that ultimately matter in any formal meta-analytic review using vote-counting procedures or the more informative scoring method used here.
Disregarding Interaction Effects
Harris (1998) has objected that the coding of interaction effects may bias the outcome in favor of my hypotheses, since one does not know how many subgroups of subjects researchers may have scrutinized for such interaction effects once these researchers failed to find a significant main effect. This is a legitimate criticism. In response to it, I and a coworker have recoded all of the meta-analytic data in Table 1 in a manner that disregards interaction effects. That is to say, all findings involving interactions (two way and three way) have been coded as null (main) effects. It should be pointed out that this methodological procedure involves an overcorrection for the putative problem, because some studies appear to have published all interaction effects that were examined or predicted a priori, so chance interactions were not being selectively reported in these instances.
When the first scoring rule is applied to the outcomes in Table 1, with interaction effects being treated as null outcomes, the results are as follows: 56 full confirmations, 19 partial confirmations, 144 null results, 9 partially opposed outcomes, and 2 fully opposed outcomes (for a total of 230 outcomes). For these results, the ratio of fully favorable outcomes to fully opposed outcomes is 28 to 1. Although the number of null results has increased from 111 to 144, the overall trends in the distributions of these outcomes remain statistically significant, in the predicted direction, for all dimensions of the Five Factor Model of personality. One reason why these trends continue to be statistically significant is that interactions effects influence the scoring of only 14 percent of the outcomes. Moreover, after excluding interaction effects, the overall ratio of favorable to opposed outcomes actually increases, by a factor of 1.3. As a consequence, the statistical findings presented in Table 1, which include interaction effects, appear to provide a nonbiased test of my five hypotheses. Harris's (1998) criticisms about the counting of interaction effects, which she did not subject to any empirical test, are not therefore supported by an examination of the relevant data.
The File Drawer Problem
Another objection that can be made to my analysis of these findings involves the "file drawer problem," or the tendency for researchers to preferentially report, and for journal editors to preferentially publish, significant findings. Owing to this tendency, it is argued, null results are likely to remain unreported in file drawers (Rosenthal, 1987; Rosenthal and Rosnow, 1991). One suggested solution to this problem is to perform statistical tests only on those studies in which outcomes are either fully or partially significant (Hedges and Olkin, 1980), since the presence of significant findings argues against the involvement of the file-drawer problem in this specific class of studies. Because my present meta-analysis employs a weighted five-step scoring method, t-tests involving these data place most of the weight on studies containing significant favorable or opposed findings--essentially, the same approach recommended by Hedges and Olkin. These weighted trends for outcomes are statistically significant for all five dimensions of the Five Factor Model of personality, strongly suggesting that the file drawer problem is not a telling objection after all. Other methods of treating the file drawer problem lead to the conclusion that several thousand null results--all controlled--are required to invalidate the significant trends summarized in Table 1 (Rosenthal and Rosnow, 1991).11 In the light of the fact that only 230 controlled outcomes have emerged from Ernst and Angst's (1983) review of the literature from 1940 to 1980, its seems unlikely that several thousand additional controlled outcomes lie unreported in file drawers.
Moderator Effects: Within-Family Studies versus Extrafamilial Studies
According to Harris (1998), such consistent meta-analytic findings about birth order are mostly confined to within-family studies. In other words, Harris argues, birth order effects are limited to behavior within the family and do not generally extend to behavior outside the family of origin. This assertion is contradicted by the available evidence. T-tests on the 185 outcomes in Table 1 that do not involve within-family designs or assessments yield significant trends for four of the five dimensions of personality (all dimensions but Neuroticism, where the trend is in the predicted direction). The same result is obtained when interaction effects are disregarded for these data and are treated, in an overly conservative manner, as null effects. In addition, the meta-analytic trends within these extrafamilial data are very similar to those observed in the within-family data (r=.84). In other words, birth-order effects outside the family of origin appear to reflect the same general patterns of behavior as do those associated with behavior within the family. Effect sizes, however, tend to be smaller for extrafamilial behavior, which is not surprising since the behavioral context is different from the context in which these effects originally developed. For this reason, one expects birth-order effects expressed outside of the family of origin to be sensitive to behavioral contexts that are similar to those encountered within the family or that tap familial sentiments (Sulloway, 2001; Salmon and Daly, 1998). Harris (1998) is nevertheless correct in drawing attention to a general difference in results from within-family and extrafamilial studies.
Other methodological features of birth-order studies exert a substantial influence on the proportion of positive results that is typically reported. If we exclude within-family studies from the publications listed in Ernst and Angst (1983), only 26 percent of the test outcomes involving self-report personality data show fully or partially positive results. By contrast, experimental studies and studies documenting life-history data (for example, participation in dangerous sports) yield a 49 percent confirmation rate. Finally, studies that involve comparisons among family members yield the highest confirmation rate (70 percent). The important point to be made here is that these meta-analytic outcomes, all of which differ from expectation under the null hypothesis, also differ significantly among themselves. In particular, within-family and life-history studies together manifest twice the rate of positive outcomes as do self-report studies dealing with behavior outside the family.12
Weighting Studies by Sample Size
In Table 1, I have not attempted to weight studies by sample size, mainly because it makes a negligible difference in the results. According to Harris (1998), studies with small sample sizes are more likely to show null results than studies with large sample sizes. This claim is incorrect. There is almost no correlation between study outcomes and sample size (r =.02, for the 217 outcomes that include a sample size). The explanation for this result is as follows. Smaller studies, when they document significant results, are likely to be published precisely because they have obtained significant results, often with impressive effect sizes (which are generally required to reach statistical significance with small samples). By contrast, smaller studies with null results presumably tend to languish in file drawers (the so-called file drawer problem--Rosenthal, 1987). Larger studies, which are likely to be published regardless of outcome, tend to show a high rate of statistically significant results because the statistical power of such studies is greater than it is in smaller studies. The net consequence of these two opposing tendencies is that published birth-order findings are statistically significant in large studies just about as often as they are in small studies. Hence weighting study outcomes by sample size does not appreciably alter the results reported in Table 1. Weighting studies by sample size is important, however, in meta-analyses involving effect sizes because larger studies tend to report smaller effect sizes, which are presumably closer to the true effect sizes. (As a consequence, effect sizes tend to exhibit a funnel shape when plotted against the sample size of studies, a phenomenon that is typical of the published literature on virtually every behavioral subject.)
This form of meta-analysis does not address itself to the question of effect sizes, and a formal meta-analysis that does so would certainly be of value. The purpose of my own "vote-counting" and "scoring method" meta-analyses of the birth-order literature has been confined to showing that this literature accords with my predictions about behavioral trends in birth-order research, based on the Five Factor Model of personality. Meta-analytic results appear to support these five predictions, which in turn are consistent with a theory about the kinds of siblings strategies that might be expected in competition for parental investment (Sulloway, 1995, 1996, 2001; Hertwig, Davis, and Sulloway, 2002). Moreover, given the large sample sizes that are collectively encompassed by these results, an effect-size meta-analysis, using mean-weighted findings for each outcome, would not likely alter the general conclusions presented here about significant trends in the overall data, although such an analysis would add useful information about the magnitude of these effects.
Elsewhere, I have sought to determine effect sizes for birth order and personality using large samples and survey instruments that are specifically designed to measure the Five Factor Model of personality at the item and dimension levels (Sulloway, 1999, 2001). In my 1999 study, which involved 4,177 subjects who rated themselves and a sibling on a survey instrument composed of 30 bipolar traits specifically designed to map the Five Factor Model of personality, birth order accounted for 4 percent of the overall variance in a scale score of predicted differences for the five personality dimensions. (For a replication of these findings, see Chao, 2001; and for methodologically similar results and effect sizes, see Paulhus, Trapnell, and Chen, 1999.) In ratings of nonsiblings in this same study, birth order accounted for between 1 and 2 percent of the overall variance explained. A reduction in effect sizes in extrafamilial studies compared with within-family studies is typical of the birth-order literature more generally (see above, "Moderator Effects: Within-Family Studies versus Extrafamilial Studies").
Although birth-order research generally involves effect sizes that may be considered "small," especially in extrafamilial studies, it would be a mistake to dismiss such effect sizes as negligible (Cohen, 1988; Rosenthal and Rubin, 1982). Many supposedly small effect sizes in the behavioral sciences involve relationships of considerable theoretical importance, as well as practical significance. Unfortunately, many researchers do not understand the importance of small effect sizes. Judith Harris writes, for example, "A correlation of .19, even if it is significant in the statistical sense, is all but useless" (1998:19). Were, however, an effect of this magnitude to be found in a clinical test of a new drug, this therapeutic result would be equivalent to a medicine that increases the odds of a treated individual surviving a potentially fatal disease by a factor of 1.8. Clearly anyone who dismisses effects of this magnitude as "useless" will also be inclined to believe that the family as a whole has little influence on personality. Such a mistaken statistical inference also demands the conclusion that sex--one of the largest sources of individual differences--has no meaningful influence on personality, because the mean correlations reported on this topic are usually less than .15 (Feingold, 1994; Costa, Terracciano, and McCrae, 2001). Even a seemingly modest correlation of .10, which explains only 1 percent of the variance, is equivalent to a medicine that would increase the odds of survival among treated patients by 38 percent. Most behavioral scientists would consider it noteworthy if laterborns, relative to firstborns, regularly possess 38 percent greater odds of behaving in a certain manner that reflects differences in personality. Even smaller correlations constitute meaningful effect sizes for developmental sources of behavior. For example, a correlation of only .05 is equivalent to a medicine that would increase the odds of surviving a deadly diseases by 17 percent over the base rate for untreated individuals.
One should also keep in mind that a true correlation of .25 (in a world where such an effect could be perfectly assessed, without error), becomes a measured correlation of .10 when the reliability of the two variables being measured is a respectable .63. Because of imperfect reliability, the real influence of a variable with a measured correlation of .10 will often be equivalent to a medicine that increases the odds of a treated individual surviving a fatal disease by a factor of more than 2.0. Owing to considerations of statistical power, the likelihood of obtaining a statistically significant effect for a known relationship involving a correlation of .10 is less than 50 percent if the sample size is less than 385 subjects. (The median sample size for the birth-order literature reviewed in Table 1 is 297 subjects.) In order to achieve 95 percent confidence that one obtains a significant outcome given a true and perfectly measured correlation of .10, one needs a sample size of 1,294 subjects, which occurs in only 13 percent of the studies listed in Table 1. It is little wonder, as Cohen (1988:79-80) has observed, that documentation of small effect sizes (r≤.10) is rare in the social sciences, because researchers generally do not employ samples that are large enough to distinguish real effects of this magnitude from null effects. Owing to the typical study design, most birth-order effects appear to be just under the radar screen for detectable personality and behavioral differences. These effects do exist, however, and are psychologically meaningful, but they tend to be confused with null effects in the all-too-frequent studies that lack sufficient statistical power to document them.
Some birth-order effects--principally those that stem from differences in parental investment--are expected to express themselves as quadratic trends, with middleborns being higher or lower on various traits compared with firstborns and lastborns (Salmon and Daly, 1998; Hertwig, Davis, and Sulloway, 2002). Such quadratic effects involve variance that often goes underreported in birth-order research, because effect sizes--if reported at all--are usually provided only for linear trends without the quadratic component. To the extent that birth-order effects sometimes entail quadratic components, reported effect sizes--either as linear trends or as dichotomizations (firstborns versus laterborns) will underestimate the true magnitude of these effects. The same problem arises for birth-order effects that entail zigzag trends, which are expected to occur when siblings seek to maximize their differences from immediately adjacent siblings within the family (Hertwig, Davis, and Sulloway, 2002).
In interpreting the importance of modest effect sizes, behavioral scientists do well to bear in mind that only about 40 percent of the variance in most personality traits is available for explanation by environmental factors, after genetic and error variance are taken into account (Loehlin, 1992). If birth order turns out to explain just 1 percent of this remaining variance under certain predictable behavioral conditions, only 39 other influences of a similar magnitude are required to account for all of the remaining variance that can be attributed to purely environmental causes of behavior. Given the complexity of human behavior, as well as its myriad sources, who would reasonably expect any single environmental influence to explain substantially more than this (Turkheimer and Waldron, 2000)? The proper research attitude, then, it to point with pride to the documentation of such "small" effects and to appreciate that they are really not so small after all. In short, the results summarized in Table 1 are consonant with what would be expected if birth order explains a modest amount of the variance in personality, thereby showing consistent trends when data are amalgamated according to predictions based on the Five Factor Model of personality.
Overlooked Studies with Controls
In my earlier meta-analysis, I included studies in Ernst and Angst's (1983) review of the literature, regardless of outcome, that I encountered in the course of my own researches that turned out to contain unreported controls (Sulloway, 1995, 1996). The addition of such overlooked studies and outcomes raises a valid methodological objection (Townsend, 2000). Because I did not make a systematic examination of every publication listed in Ernst and Angst (1983) as being uncontrolled, one may object that a nonsystematic addition of such studies could introduce a source of bias. We can formally test for this possibility--something that Townsend, who eschews formal hypothesis testing throughout his critique of my meta-analysis, does not do. By including the 27 overlooked outcomes from the 22 controlled studies that I previously identified in this manner, the proportion of favorable outcomes in my results rises from 43 to 44 percent--a modest deviation that is well within an expected range of random statistical fluctuation. Although these 27 outcomes with overlooked controls are not significantly different from the 230 outcomes that I have counted in Table 1, I have not included them in these results. The only studies with overlooked controls that have been included in that Table 1 are studies from a small number of publications in which controls were incorrectly reported by Ernst and Angst (1983) in one place in their book and then correctly reported in another place in their book--see the Appendix ("CE/CE+/-").13
In spite of my previous efforts to identify additional controlled studies, there are further studies that Ernst and Angst (1983) report as being uncontrolled but that are, in fact, controlled. A search that I have conducted of the literature using PsychINFO has identified numerous birth-order publications that are controlled for sibship size or social class but that are not cited in Ernst and Angst's (1983) book. Because I have confined my scoring-method meta-analysis in Table 1 to studies analyzed by Ernst and Angst, I have not attempted to include in this table any additional controlled studies that are known to exist. There is little reason to believe, moreover, that the addition of more studies and outcomes would alter the robust results presented either in Table 1 or below, where this claim is formally tested by surveying such missing studies.
Townsend (2000) claims to have found 26 additional publications with controls, previously listed by Ernst and Angst (1983) as being uncontrolled, that are not already included in my own results. Unfortunately, Townsend presents no information indicating whether the relevant control variables in these studies were applied during formal hypothesis testing (see the Appendix, "CE, CE +/-") or are merely variables mentioned in the study, a serious confusion that also undermines the integrity of his other listings and tallies. In addition, Townsend does not indicate how these controlled studies were identified, a troubling methodological omission that takes on special importance given the fact that Townsend's list of overlooked studies is distinctly anomalous. As scored by him, his list contains significantly more nulls and disconfirming results than are to be found either in Townsend's other 190 scorable tallies drawn from Ernst and Angst (see his Appendix A) or in the 230 outcomes coded in my own Table 1. Moreover, Townsend's anomalous list of pre-1981 studies with overlooked controls is by no means complete. I and a coworker conducted a search of the birth-order literature using PsychINFO and the following key terms: (1) "personality, birth order, and sibship size"; (2) "personality, birth order, and family size"; (3) "personality, birth order, and social class"; (4) "personality, birth order, and socioeconomic status"; and (5) "personality, birth order, and SES." This search turned up 63 relevant publications, of which 34 are included in Ernst and Angst's (1983) bibliography. Of the 34 publications reviewed by Ernst and Angst (1983), 14 are already included in my own meta-analysis, and another 10, overlooked by Townsend, turn out to contain valid personality findings and controls for either sibship size, social class, or both variables.
In addition, Townsend's tallies in his Appendix B are marred by a variety of mistakes, including scoring errors, omissions of relevant findings, and repeated misstatements about controls. According to him, his 26 publications report 35 relevant findings. However, 7 of these 35 findings are not controlled for either sibship size or social class, as independently assessed by me and a coworker according to the guidelines described below (see the Appendix, "CE, CE=/-"). Control variables are mentioned in 6 of these 7 studies; but, in these 6 instances, controls were not employed in any of the statistical tests or present even as "partial controls," using some form of indirect test. When uncontrolled and redundant findings are eliminated from Townsend's Appendix B, when 6 controlled findings that Townsend overlooked are included in the results, and when controlled findings are properly coded as "outcomes"--one per dimension of the Five Factor Model of personality--they become 3 full confirmations, 2 partial confirmations, 21 nulls, 2 partially opposed outcome, and 1 fully opposed outcome, for a total of 29 results. These 29 outcomes may be contrasted with Townsend's own tallies of the same results: 4 confirmations, 21 nulls, and 10 opposed findings (for a total of 35 findings). As is apparent, most of Townsend's illegitimate results turn out to be opposed findings and account for the fact that his tallies are significantly more unfavorable to my hypotheses than all other controlled findings listed in Ernst and Angst (1983).
Here are a few examples showing how Townsend's own analysis has turned uncontrolled studies into controlled studies, and also showing how his treatment has turned null or confirming outcomes into opposed findings. Townsend interprets as an opposed outcome Dauphinais and Leitner's (1978) finding that laterborns males were more willing than firstborns males to join an encounter-type group--defined by Dauphinais and Leitner as a group promoting "human growth or sensitivity-training." Dauphinais and Leitner, however, explicitly hypothesized that firstborns, who they expected to express more extreme fears than laterborns, would be more threatened by such encounter groups and hence would be more fearful about joining them. Accordingly, if one accepts the authors' own interpretation of their findings, it ought to be as a full confirmation, not as an "opposed" outcome (as Townsend has counted it). I and Oliver John both independently coded this result on the dimension of Openness to Experience (rather than on Neuroticism), but the outcome is a full confirmation on either personality dimension.
In coding this study by Daupinais and Leitner (1978), Townsend has made a second error. Apparently following Ernst and Angst (1983), who listed separate effects for males and females, Townsend has counted two different outcomes (an opposed finding and a null--one outcome for each sex). Neither of these two effects, however, is properly controlled for sibship size or social class and neither includes formal statistics. The single partial correlation reported in this study, which is statistically significant and is controlled for sibship size, does not differentiate between the sexes. In short, this study contains only one controlled outcome. When properly interpreted according to the Five Factor Model of personality, this single controlled outcome is a confirmation--not a null finding and an opposed finding, as Townsend claims.
Townsend's interpretation of a publication by Nystul (1974) exemplifies his failure to understand what is a controlled study. Regarding the use of sibship size as a control, Nystul states that his 168 subjects came from families of, ostensibly, two-to-four siblings. An unknown number of these study participants, however, possessed additional siblings six or more years older or younger than other siblings. Because Nystul disregarded these additional siblings in his assessments of birth order and apparently of sibship size, an unknown number of Nystul's firstborns appear to have been laterborns. In addition, nowhere in Nystul's study are his results stratified by sibship size, and sibship size was not included in his analysis of variance model along, with birth order. Similarly, although Nystul collected data on socioeconomic status, nowhere in the study are his results stratified on this variable into the three disparate socioeconomic categories that were encompassed by his subjects, and socioeconomic status was also not included in the analysis of variance model. In short, this study is not controlled for sibship size or social class, and its outcome (a null) should not therefore have been included in Townsend's own tallies.
One further example will suffice to illustrate Townsend's frequent inaccuracies about the status of outcomes and controls. According to Townsend, Lichtenwalner and Maxwell (1969) found that firstborns are more creative than laterborns (an opposed finding) in a study that is purportedly controlled for social class. Social-class data were indeed collected by these authors, and subjects were then stratified into two groups (from the middle and lower classes). The authors, however, found that the middle-class children in their sample were significantly more creative than the lower-class children, and they also found that firstborn children in their sample were significantly more likely to come from middle-class families than were laterborn children. Because this study includes neither a direct nor an indirect control for these confounding effects of social class--for example by conducting a formal analysis of variance or by conducting separate statistical tests on the birth-order data that are stratified by social class--the published evidence is indeterminate on whether the reported birth-order finding would continue to be statistically significant if this result had been properly controlled. Because this study also lacks a control for sibship size, its opposed finding should not have been counted by Townsend.
From Table 3 in the study by Lichtenwalner and Maxwell (1969) it is nevertheless possible to calculate, by hand, separate statistical results for the reported birth-order effects, for each social class, something that the authors of this study did not themselves do. When these two tests are conducted, they are both nonsignificant. In addition, when the two effect sizes from these tests are transformed from r to rz, weighted by sample size, and then combined and transformed back to r, this combined result also proves to be nonsignificant. Based on these additional computations, the results of this study can be considered as partially controlled. The outcome of the study, however, is no longer an opposed finding but rather a null result (which is how I and a coworker have counted it).
Townsend's errors about the nature of controls are commonplace because he simply does not understand what is meant by a statistical "control." The presence of a variable in a study, without its inclusion in any of the relevant statistical analyses, does not constitute a control. Of Townsend's 35 findings in his Appendix B, I and a coworker have counted 11 instances in which he has erroneously asserted the presence of a sibship size control and 10 instances in which he has erroneously asserted the presence of a social class control. In all, Townsend has erroneously reported the status of controls in 23 of his 35 findings, including 17 instances in which he has reported controls that are not actually present, and another 6 instances in which he has failed to report valid controls (direct or indirect) that are actually present. All in all, the total number of errors that Townsend has made for these findings, including results that he overlooked, is greater than 40 (Table 3). In short, Townsend appears to lack the technical expertise to make competent judgments about the contents of the birth-order literature.
If we now consider all known studies with overlooked controls (those studies with legitimate controls and outcomes that were found by Townsend, those controlled studies previously identified by me [Sulloway, 1995], and those studies incorrectly classified by Ernst and Angst (1983) that I and a coworker subsequently found via a literature search using PsychINFO), the overall results are as follows: 20 full confirmations, 7 partial confirmations, 36 nulls, 4 partially opposed outcomes, and 1 fully opposed outcome, for a total of 68 outcomes (Table 3) . The overall proportion of full and partial outcomes is 40 percent, nearly the same proportion for the results listed in Table 1 (43 percent). Although there are too few outcomes in these data to conduct meaningful statistical tests for each dimension of the Five Factor Model of personality, we can use the scoring method to amalgamate the overall results into single outcomes for each study and then perform an omnibus test on the resulting trend (t=4.53, N=52, p<.0001).
In summary, the outcomes reported in Table 1 are statistically significant for all five dimensions of the Five Factor Model of personality, with or without the addition all known studies with overlooked controls (Table 3). Townsend fails to appreciate this fact because he fails to undertake formal hypothesis testing in connection with his own flawed and untested claims. These mistaken claims rely on data, moreover, that are seriously marred by numerous scoring errors, the omission of relevant data, fundamental statistical misunderstandings, the illegitimate counting of redundant effects, and, finally, repeated misjudgments about what it means for a study to be controlled.
Additional Examples of Errors and Omissions by Townsend
According to Townsend (2000, Appendix C), the following publications, which I previously informed him were erroneously or incompletely summarized by Ernst and Angst (1983), are correctly summarized by these two investigators. Hence these seven publications, according to Townsend, were correctly counted by him and were incorrectly assessed by me in my previous meta-analysis (Sulloway, 1995, 1996). The publications in question are Biegelsen (1976), Eysenck and Cookson (1970), Krinsky (1963), Macbeth (1975), Sutton-Smith and Rosenberg (1970), Tomeh (1970), and Yando, Zigler, and Litzinger (1975). In every instance, Townsend has made one or more mistakes in his assessments of these seven publications, as well as in one of two studies by Koch (1955a-1957), which is also mentioned in Townsend's (2000) Appendix C. A brief review these errors is warranted in order to demonstrate the repeated inadequacies in Townsend's treatment of important technical and statistical issues. (The same inadequacies are also present in Harris's (1998, 2001) treatment, which, like Townsend's analysis, relies on Ernst and Angst's  often flawed summaries.)14
Biegelsen (1976): Townsend has made multiple mistakes in assessing Biegelsen's two studies. Some of these errors are methodological. Others are due to Townsend's failure to consult the unpublished dissertation, which reports numerous findings that are not mentioned by Ernst and Angst (1983). Where Townsend has counted 5 nulls from only one of the two studies (including 2 redundant nulls for the same dimension of personality), I and a coworker have coded 1 fully favorable outcome, 2 partially favorable outcomes, 4 null outcomes, and 1 partially opposed outcome, for a total of 8 outcomes from both studies, based on more than 20 different findings. Moreover, the summary of Biegelsen's results in Dissertation Abstracts International should have alerted Townsend to the fact that these results cannot be reliably coded from this published summary of the original dissertation. About Biegelsen's two studies, the published abstract states that "while there were instances where birth order correlated in the direction which the hypotheses had predicted, in no instance were the correlations cross validated in both samples. Many of the results which reached a high level of statistical significance . . . did not replicate." If there were "many" highly significant results in each study that did not replicate in the other study, then one cannot safely count as "nulls" all of the results from either study, as Townsend has done. Rather, outcomes for each dimension of the Five Factor Model of personality need to be assessed by some form of scoring rule that weighs the overall proportion of confirming findings against the overall proportion of null and opposed findings. Because the published abstract does not provide this information, one must either exclude this dissertation from further analysis (as I did in Sulloway, 1995, 1996) or consult the original source, which in turn necessitates a very different set of codings from those corresponding to the incomplete summaries provided by Ernst and Angst (1983).
Eysenck and Cookson (1969, 1970): This study reports significant findings for extraversion and neuroticism, based on composite measures derived by factor analysis, and these two composite measures are both indirectly controlled for sibship size and social class. Individual scales, however, reveal mostly nonsignificant results for the same sample when controlled for sibship size alone (Eysenck and Cookson, 1970). This discrepancy owes itself in large part to the lower statistical power of the individual tests performed in Eysenck and Cookson (1970), compared with their other published results (Eysenck and Cookson, 1969). Townsend apparently failed to consult the first of these two publications and hence was unaware of the dilemma in coding that is presented by the two sets of conflicting results derived from the same data. In Table 1, I have conservatively coded these two results as nulls, but another investigator could legitimately defend coding them as two favorable outcomes, based on the 1969 publication. Townsend has also failed to note that the results for extraversion are indirectly controlled for social class in Eysenck and Cookson (1970). Finally, Townsend has overlooked a third outcome in this study, which supports the hypothesis that firstborns exhibit greater achievement (conscientiousness) than laterborns.
Koch (1955a, 1955b, 1956a, 1956b, 1956c, 1956d, 1956e, 1957): Publications by this researcher contain substantially more findings than are mentioned by Ernst and Angst (1983), and some of these findings are also reported incorrectly by Ernst and Angst. Moreover, proper scoring of these findings leads to different results than are claimed by Townsend, who counts 8 results where only 5 are theoretically allowable according to Five Factor Model of personality. Where Townsend has counted 2 confirmations, 5 null results, and 1 opposed finding, application of the scoring method to Koch's more than fifty main effects about personality yields 3 partial confirmations, 1 null outcome, and 1 partially opposed outcome.
Krinsky (1963): Following Ernst and Angst (1983), Townsend has counted a null result and an opposed result from this study. The null result, however, appears to have involved three statistically significant findings that were misreported by Ernst and Angst. Unfortunately, this study does not provide formal statistics, so it is impossible to know for certain which of these various results are statistically significant and which ones merely involve trends in the reported direction. In addition, the study lacks formal controls for sibship size and social class, although the data necessary to apply such controls were collected by Krinsky, a circumstance led Ernst and Angst (1983) to list this study as being controlled (see the Appendix, "CE/CE+/-"). So this study contains overlooked and ambiguously reported findings, fails to present proper statistics, and lacks true controls. Being statistically ambiguous as well as uncontrolled, this study and its four outcomes should not be included in any formal meta-analysis of the birth-order literature.
Macbeth (1975): I have already discussed some of the errors that Townsend has made with regard to this publication, which contains two different studies. Townsend failed to consult the unpublished doctoral dissertation, which contains more than a hundred findings that were not reported by Ernst and Angst (1983). In addition, Townsend has counted numerous redundant results on the same dimension of personality, tallying 18 results where only 8 are possible if one is actually attempting to test my five hypotheses. The need for consulting the unpublished doctoral dissertation is made clear by the summary of Macbeth's two studies in Dissertation Abstracts International. This summary reports that "several serendipitous birth-order findings [in the first study] failed to replicate with a new sample of 265 undergraduates." Based on this published summary, one must either omit the indeterminate results of the first study, as I did in Sulloway (1995, 1996), or one must consult the unpublished dissertation in order to determine whether the various unreported findings--some of which are statistically significant--affect the scoring of outcomes. Where Townsend, for Macbeth's first study, has counted 8 nulls and 1 opposed finding, a proper coding of the numerous unpublished results from this study using the scoring method leads to 1 full confirmation, 3 nulls, and 1 partially opposed finding. Where Townsend, for Macbeth's second study, has counted 9 nulls, a proper coding of these results, using the scoring method, produces in only 4 nulls--one for each of four relevant dimensions of the Five Factor Model of personality.
Sutton-Smith and Rosenberg (1970: 101-102): This study involves the use of the Bene-Anthony Family Relations Test, which Ernst and Angst (1983:94) claimed had produced no significant differences by birth order. Townsend himself has asserted that the results are ambiguous, so he did not count them. However, these results can be unambiguously scored on two different dimensions of the Five Factor Model of personality, as independently judged by myself and two other researchers. On two of four relevant Bene-Anthony scales, laterborns received significantly more negative interactions from their older siblings and also were less likely to be judged as manifesting negative behaviors themselves. With the concurrence of Oliver John, I and a coworker coded these results as a confirmation of the hypothesis that laterborns are more agreeable than firstborns. In a separate set of findings, firstborns were found to be more parent-oriented than laterborns on four of eight measures, which we scored, like other studies of this nature, as a confirmation of the hypothesis that firstborns are more dependent on parents and parental authority (Openness to Experience--Costa and McCrae's  Values facet).
Sutton-Smith and Rosenberg (1970:113): Townsend has counted a confirmation and two null results from this study. The confirmation, which involves identification with parents, is supported by a formal statistical test, so I and a coworker included this result in our meta-analysis. None of the other results from eight different personality scales is accompanied by formal statistical tests, and no such tests can be conducted from the data actually provided. For this reason, I and a coworker did not count any of these additional personality findings, which may or may not contain statistically significant results. By contrast, Townsend counted two null results from these ambiguous data--one for "aggression" and one for "personality" in general. Had any of the findings about "personality" been legitimately codable, however, they ought to have been assessed on the four different dimensions of personality that pertain to these data. ("Personality" is not a meaningful category in the Five Factor Model.) In addition, Townsend's category for "personality" is partially redundant, given the other results he has counted from the same study for "aggression" (Agreeableness).
Tomeh (1969, 1970, 1971, 1972): Townsend has failed to consider numerous findings from the original publications that went unreported by Ernst and Angst (1983). A proper coding of this study requires reanalysis of some of the data, contrasting orientation toward parents with orientation toward others. For each of three socioeconomic groups, I and a coworker computed effect sizes for each of 12 relevant measures, which we consolidated by transforming r to rz, weighting these values by sample size, and then transforming the weighted rz back to r. Where Townsend has counted 1 null finding relating to orientation to parents, we coded 12 different effects (5 of which were statistically significant, yielding a full confirmation by the scoring method). In addition, Townsend has counted a second finding relating to extraversion as a favorable outcome even though the two distinct findings on this subject, after being controlled for social class and sibship size, turn out to be nulls and ought to have been counted as a single null outcome. Lastly, Townsend has failed to recognize that some of the data in these four publications are controlled for sibship size as well as social class.
Yando, Zigler, and Litzinger (1975): Townsend has overlooked at least four relevant findings from this publication that were not reported by Ernst and Angst (1983). On the subject of behavioral wariness, Yando et al. conclude: "First-born were found to be more wary than later-born on the cosatiation and Placing Task measures" (p. 108). In addition, Townsend counted as a null result a series of findings about reinforcement that in some cases are statistically significant and partially confirmatory. When Yando et al. (1975) analyzed their data in a three-way analysis of variance (birth order by reinforcement condition by game type), they obtained a main effect of borderline significance (p<.10)--firstborns were more susceptible to social reinforcement. When the only children in the sample were excluded from this analysis, this main effect became statistically significant. In another task there was also a significant difference by birth order, although this outcome is mostly attributable to the only children in the sample and is appropriately scored as a null result. As Yando et al. themselves concluded about their overall results: "The findings of the present study clearly indicate that birth order makes a difference in children's responsiveness to social reinforcement" (p. 109). Curiously, Townsend (2000, Appendix C) passes over this key sentence summarizing Yando et al.'s significant empirical findings and instead quotes the next sentence of the article in which these authors urge caution in how one ought to interpret their overall empirical findings: "The clear [that is, statistically significant] effects of birth order appear to be so vitiated by other variables, such as the number of siblings, the last-born phenomenon, and possibly the time between sibling births, that no simple birth order formulation does justice to the complexity involved" (pp. 109-110). This statement about possible causal mechanisms, which makes sense in the overall context of the article, is misleading when quoted in isolation. The time interval between sibling births, for example, was not even investigated by Yando et al., and the "last-born phenomenon" that these authors mention does not in any way negate the significant effects reported for birth order and reinforcement. Because Yando et al.'s empirical findings about birth-order and social reinforcement were fully controlled for sibship size, these findings are legitimately countable in a meta-analysis no matter what other causal mechanisms these investigators speculated might be contributing to the observed effects. Townsend has failed to understand that meta-analysis involves the coding of empirical results, not speculative discussions about empirical results.
In the case of three other studies in his Appendix C, Townsend (2000) has acknowledged that my previous warnings to him about methodological problems, errors, or omissions were correct and hence that he is in error in his own assessments of these studies in his Appendix A. To sum up, of the numerous publications about which Townsend was previously warned and which he claims to have personally examined, he has made at least one scoring error (and sometimes more) for each study in his own Appendix A, and he has recognized only three of the errors associated with these numerous studies in his Appendix C. At the level of "outcomes," which Townsend has consistently has failed to distinguish from multiple "findings" in his own analysis, Townsend's results are even more problematic because his tallies include many redundant findings that should not have been counted more than once for any given personality dimension. In short, Townsend's (2000) claim that the reporting errors I previously identified "were mostly Sulloway's, not Ernst and Angst's" is incorrect. This conclusion is reinforced, moreover, testimony from one the authors of this study. According to CÚcile Ernst, who performed the literature review in Part 1 of Ernst and Angst (1983), there are "many" errors and omissions in her own informal assessments of this literature (personal communication, 30 October 2000). In short, Townsend's incorrect claims about the source of errors in Ernst and Angst's (1983) literature review appears to reflect an uncritical reliance on incomplete and sometimes inaccurate summaries of the birth-order literature, as well as a pervasive lack of technical competence in Townsend's own understanding of statistical conventions and data.
There are many more studies in which Townsend makes reporting errors that are similar to the ones I have just reviewed. These numerous errors on Townsend's part are summarized via "error codes" in the Appendix to this report (see also Table 2). Of the more than three hundred results that either Townsend or I and a coworker have coded, and that are classifiable personality traits, we disagree on the status of more than half of them. The Appendix (Table 2) provides one or more explanations for each of these discrepancies in our respective assessments.
META-ANALYTIC RESULTS (1981-1999)
Using PsychINFO, I have conducted a literature search for birth-order studies published between 1981 and the end of 1999. Using the search phrase "birth order," this literature search retrieved 1,169 publications. In order to obtain a more manageable sampling of these publications, I conducted four additional searches using the key words (1) "birth order and family size," (2) "ordinal position and family size," (3) "birth order and sibship size," and (4) "ordinal position and sibship size." Inclusion of the keywords "family size" and "sibship size" was intended to increase the likelihood that studies retrieved by this method would be controlled, at minimum, for sibship size. Altogether, these search protocols resulted in the retrieval of 225 publications. I and a coworker (Christopher Davis) then read through all of the publication abstracts. We selected for further analysis all journal articles in the English language that appeared to contain personality data. We excluded from consideration review articles lacking original data, as well as any publications that discussed exclusively psychiatric disorders, such as anorexia nervosa and schizophrenia. We found 94 publications that met these preliminary inclusion criteria.
The results from these 94 publications were then independently analyzed and coded by myself and Davis. Independent classifications of findings were also made by myself, Davis, and Oliver John. After we had each finished our independent assessments, we attempted to reconcile any differences in our codings and classifications within the Five Factor Model of personality. Studies for which there was not full agreement on classification status, or that were judged to contain data unrelated to the Five Factor Model of personality, or that did not contain proper controls for sibship size or social class, were not included in our subsequent analysis. After the completion of this stage in our review process, 56 publications remained containing 62 controlled studies, 363 findings, and 114 outcomes.15
For the period from 1981 to the end of 1999, which was not covered by Ernst and Angst's (1983) literature review, the distribution of outcomes for these 62 studies controlled for sibship size or social class is as follows: 20 full confirmations, 21 partial confirmations, 64 nulls, 4 partial refutations, and 5 full refutations, for a total of 114 tests of my hypotheses. The ratio of fully favorable outcomes to fully opposed outcomes is 4 to 1. These 114 outcomes are controlled for sibship size (N=47), social class (N=8), or both variables (N=59). The overall proportion of full and partial confirmations is 36 percent, modestly lower than the rate of such confirmations for studies published before 1981 (43 percent). A full list of these birth-order results retrieved by my search of the literature from 1981 to 1999 is presented in Table 4 (see the Appendix).
When examined by individual dimensions of the Five Factor Model of personality, the trends for the predicted relationships are statistically significant only for Conscientiousness, which has the largest number of outcomes (N=34, t=6.33, p<.0001). The trends for the other four dimensions are all in the expected direction, however. If each study is coded only once for the overall proportion of confirming or opposed findings, using the scoring method, the trend for the data as a whole is also statistically significant (N=62, t=5.01, p<.0001). When these data are added to the birth-order data from 1940 to 1980, the results for all five dimensions of the Five Factor Model of personality continue to confirm the hypothesized trends.
Harris's (1998) Survey of Article Abstracts
These collective results for the post-1980 literature disagree substantially with those reported by Harris (1998), both in their totals and in the overall distribution of outcomes relative to the null hypothesis. Harris's own counts are: 7 favorable, 5 mixed (direction unspecified), 20 nulls, 6 unfavorable, and 12 "unclear," for a total of 33 scorable outcomes with unknown controls (compared with at least 114 controlled outcomes in 62 studies that can be documented by actually examining the original literature). For Harris's data, the ratio of confirming to opposed findings is only 1.2 to 1, compared with a ratio of 4.0 to 1 found by me and a second investigator. It should be noted that Harris appears to have conducted her assessments solely by reading article abstracts available on the Internet from PsychINFO. Moreover, Harris does not seem to have checked any of these studies for the presence of controls for sibship size or social class, or taken this information into account in evaluating her results. Because Harris's tallies are grossly incomplete, lack information about controls, and apparently lump together findings from different dimensions of the Five Factor Model, they cannot be used to test any of the five hypotheses being assessed here and are therefore irrelevant to an evaluation of my previously published findings (1995, 1996).16
Overall, the birth-order literature from 1940 to the end of 1999 shows significant trends for all five dimensions of the Five Factor Model of personality, although the trend for Neuroticism is somewhat weaker than for the other four personality dimensions. Compared with laterborns, firstborns tend to be more conscientious, extraverted (in the sense of being assertive), and neurotic, whereas laterborns, compared with firstborns, tend to be more agreeable, open to experience, and extraverted (in the sense of being sociable and fun-loving). Discrepancies in tallies reached by different investigators, and caused by different methods of analysis, or because of a failure to examine the original literature, should not be confused with the more important issue of whether statistically significant trends emerge when tallies of independent outcomes are subjected to hypothesis testing using formal statistical methods. When one conducts five separate meta-analyses on the available birth-order data--one for each dimension of the Five Factor Model of personality--significant and robust trends emerge by any generally accepted method of analysis.
These meta-analytic trends hold for the literature published between 1940 and 1980, as well as for the more recent literature published between 1981 and 1999, as independently assessed by two investigators. Altogether, the literature from 1940 to 1999 presents at least 412 partially or fully controlled meta-analytic outcomes, which in turn have been coded from more than 1,400 individual findings culled from 240 separate studies. Overall, this literature contains 12 times as many fully confirming outcomes as there are fully opposed outcomes. Instituting full controls for sibship size and social class, as well as for sample size and for the nature of the sample, does not appreciably alter these general conclusions. Although effect sizes for these results are generally modest, they are by no means negligible if one understands the real-life importance of "small" effect sizes (Cohen, 1988; Rosenthal and Rosnow, 1991; Rosenthal, Rosnow, and Rubin, 2000). Within-family effects, such as those found when siblings rate one another, are somewhat larger than when birth-order differences are assessed outside the family context, especially in self-report data.
If one compares the five trends in the meta-analytic outcomes in Table 1 with the five trends that I published in Sulloway (1995, 1996)--scoring the results for Extraversion in the same manner--the resulting correlation between trends is .95. This comparison shows that the method of analysis that I employed in my earlier meta-analysis, based on a smaller and less complete subset of studies and findings, yields nearly identical conclusions. In sum, although my previous vote-counting meta-analysis of the birth-order literature (Sulloway, 1995, 1996) is vulnerable to several technical criticisms, none of these criticisms appears to make an appreciable difference in the overall results.
The Role of the Family
Based on twin studies, the shared environment appears to explain at least 5 percent of the variance in personality during adulthood (Dunn and Plomin, 1990; Loehlin, 1992: Table 3.20). On first consideration, 5 percent may seem like a negligible effect, leading some commentators to claim that the family exerts little or no influence on personality (Rowe, 1994; Harris, 1998; Pinker, 2002). Nevertheless, 5 percent of the variance is equivalent to a correlation of .22, which is in turn equivalent to a medicine that increases a person's likelihood of surviving a deadly disease by a factor of 2.0 (the odds ratio). Expressed another way, children who grow up with parents who are above average in conscientiousness or agreeableness are, for purely environmental reasons, twice as likely to be in the top half of the population distribution on these two aspects of personality compared with children who have not grown up with such parents. Such psychological effects are hardly negligible and exceed the influence that sex differences have on personality. Moreover, the family's shared environment appears to exert somewhat greater influence on attitudes and values than it does on personality (Dunn and Plomin, 1990). Especially over time, attitudes and values exert far-reaching effects on people's lives.
These shared environmental influences do not begin to take into account the ways in which countless interactions within the family cause siblings to become different rather than the same. These alternative kinds of within-family influences--the numerous causes of which are difficult to identify and measure--show up as variance attributable to the nonshared environment. This variance is generally believed to be at least seven times greater than the total amount of variance attributable to the shared environment (Dunn and Plomin, 1990; Loehlin, 1992: Table 3.20). The real insight from research in behavioral genetics is not that the family has little influence on personality. Rather, the most important conclusion from this research appears to be that the bulk of the influence of the family environment (including parents) is not shared by siblings. And why should it be? Parents react differently to each of their offspring, because offspring themselves are different. Similarly, the environment of each sibling includes other siblings (but not the self), so each sibling's within-family environment is necessarily different. Furthermore, siblings experience the exact same events, such as divorce or the loss of a parent, at different ages, so such experiences are not truly shared in the same way. In short, the family is not primarily a shared environment, as is misleadingly implied by the assertion that the family exerts little or no influence on personality.
On the whole, then, the family appears to influence both personality and social values in significant and nonnegligible ways--both through the shared and especially through the nonshared environments. Occasional claims to the contrary are based on (1) a failure to acknowledge that the influence of the family is not limited to the influence of the shared environment; (2) a failure to appreciate that most important experiences within the family are not (and cannot be) shared; and finally (3) a failure to appreciate that so-called "small" effects--when expressed in terms of the problematic concept of "variance explained"--are often very impressive when expressed in metrics that are more representative and intuitively understandable. In short, the environment, including the family environment, is still a big player in shaping human behavior. It just works differently than we had previously thought.
As for the influence of birth order, it is only one of many aspects of overall family dynamics by which parental investment, parental values, parent-child interactions, sibling interactions, and family niches all shape personality as children are growing up. The meta-analytic review of the birth-order literature presented in this report is consistent with this overall, multifaceted perspective.
I provide details in this Appendix about the specific studies and findings that I have employed in my scoring-method meta-analyses. The various publications, studies, and outcomes employed in this meta-analysis are listed along with various reporting errors and omissions by Ernst and Angst (1983) that I and a coworker encountered as we examined the original publications. This information is presented in abbreviated form using special letter codes. To understand these abbreviated codes, it is first necessary to consult a "Key," which follows.
Ernst and Angst's (1983) reported results, as summarized by Townsend's tallies (2000, Appendix A), are also itemized in my Appendix to this report. Like the tallies by Harris (1998, 2001), Townsend's counts were compiled directly from Ernst and Angst's (1983) summaries of the birth-order literature rather than from the original publications. For this reason, Townsend's listings agree closely with those of Harris (1998, 2001), which reflect mostly the same outcomes as well as the same errors and omissions.
Key to Error Codes and Abbreviations
What follows in this section is a description of the error codes and abbreviations used in the presentation of birth-order data in Table 2. The various coded designations for errors should be considered only as a general guide to the existence of these problems. Interested researchers should consult the relevant publications and unpublished dissertations for themselves.
Correct Outcomes (+2, +1, 0, -1, -2, 88, and 99):
Under the heading "Correct Outcome," the codes +2, +1, 0 , -1, and -2 correspond to assignments of study outcomes made in accordance with the five-step scoring method. The use of "99" indicates outcomes that cannot, or should not, be coded because of missing, ambiguous, redundant, or erroneous information about a reported finding. Outcomes have been scored only once for each relevant personality dimension in each study. For this reason, the outcome that is reported codes not only for the "Findings" listed for that particular record in Table 2 but also for all other relevant findings that are reported in the same study, some of which may be mentioned under separate records (where redundant outcomes are coded 99).
Under the heading "Townsend," I have employed the codings +2, 0, and -2 (confirmatory, null, and opposed) to indicate the outcomes claimed by Townsend (2000), based on Ernst and Angst's (1983) incomplete and sometimes inaccurate summaries of the birth-order literature. Findings that Townsend has designated as "ambiguous" are coded 88. The numerous findings or outcomes that either Townsend or Ernst and Angst (1983) fail to report are coded 99.
Under the heading "Big Five": Agreeableness in the Five Factor Model of personality.
Under the heading "Big Five": Conscientiousness in the Five Factor Model of personality.
"Control Error" in reporting by Ernst and Angst (1983): Such errors involve instances in which the presence of controls for sibship size or social class (or both variables) are incorrectly or inconsistently reported by Ernst and Angst (1983), or are not reported at all. The status of controls is not always straightforward, so I and a coworker have adopted the following four-step coding scheme in this meta-analysis: (0) studies without any mention of controls; (1) studies in which controls, although mentioned as variables that were included in the study, were not employed in any of the relevant statistical tests; (2) studies in which sibship size and social class were "partially controlled" (for example, by testing for birth-order effects in different socioeconomic subsamples or sibship sizes); (3) studies that were "fully controlled" by the inclusion of control variables in a formal analysis of variance or multiple regression model, or by confinement of the analysis to subjects from, say, two-child families. Studies in which a control variable was not included in a formal analysis of variance, but in which this variable was shown to be unrelated to the dependent variable in a bivariate analysis, have been considered "partially controlled." Codes indicating the status of controls (0-3) appear in my list of findings under the columns "SS" (for sibship size) and "SES" (for social class). Only studies coded "2" or "3" have been considered "controlled" for a given dependent variable in this meta-analysis. By contrast, Ernst and Angst (1983), Townsend (2000), and Harris (1998, 2001) sometimes mistakenly consider studies coded "0" or "1" as being controlled.
Studies incorrectly reported by Ernst and Angst (1983) as being controlled for one or both control variables (CE-) are not included in the meta-analytic results summarized in Table 1. Conversely, studies incorrectly reported as being uncontrolled for both control variables (CE+), as well as studies for which no information about controls is supplied, are included only if the same study is correctly reported elsewhere by Ernst and Angst as possessing at least one control. Studies sometimes contain reporting errors about the status of controls that do not affect meta-analytic codings (CE), and these studies are included in the meta-analysis. Also, some studies contain errors in the reporting of controls for some findings but not others, as individually distinguished in the Appendix.
Although numerous other studies with overlooked control variables are not included with the results summarized in Table 1, these studies have been considered in other analyses. The addition of these studies and outcomes to the results summarized in Table 1 makes no difference in my general conclusions about birth order and its relationship to the Five Factor Model of personality (see above, "Overlooked Studies with Controls" and Table 3).
Personality traits that could not be confidently classified on any one dimension of the Five Factor Model.
Properly assessed via the scoring method and according to the Five Factor Model of personality, no personality dimension may be scored more than once in a given study. In connection with Townsend's own results, "D-" generally indicates instances of duplicate or redundant findings for the same study and personality dimension. In some instances, "D-" also signifies results that, although not identical to other counted results, cannot be scored as such because their scoring is affected by the collective scoring of other results reported for the same dimension of personality.
Under the heading "Big Five": Extraversion in the Five Factor Model of personality.
Findings not mentioned by Ernst and Angst (1983), either in whole or in part, but reported in the original literature reviewed by them, including unpublished doctoral dissertations.
Under the heading "Big Five": Neuroticism in the Five Factor Model of personality.
No N is known for the study (also indicated as N=99--the code for missing data). There are 9 such studies reporting 14 codable outcomes via the scoring method. Ns are sometimes available for those studies for which Ernst and Angst (1983) do not report this information. In my previous meta-analysis of the studies listed in Ernst and Angst's tables (1983:93-189), I excluded all studies that did not report an N (Sulloway, 1995, 1996). These studies are included, however, in the present meta-analysis. Owing to this and other methodological considerations, the total number of studies that I previously analyzed in my vote-counting meta-analysis is fewer than the number Harris (1998) and Townsend (2000) included in their own tallies of this literature. My present scoring-method meta-analysis includes 230 outcomes from 126 studies.
Although I have not attempted to indicate this information with error codes, many of the Ns reported by Ernst and Angst (1983) are in error. One must often distinguish between the Ns reported for a given study and the ns involved in those subpopulations that have actually been employed in controlled statistical tests. Owing to a methodological flaw and a transcription error, the total N that I previously reported in my earlier meta-analysis (Sulloway, 1995, 1996) is incorrect, as Harris (1998) has pointed out. The correct unique N for subjects in my original meta-analysis is approximately 67,000. The unique N for the 230 controlled outcomes reported here, in Table 1, is approximately 80,000.17
Not a personality trait and hence excluded from the meta-analysis.
"Not verified." I have been unable to verify findings from one unpublished dissertation (Dean, 1947) and so have not included them in my meta-analytic totals. Unpublished doctoral dissertations sometimes contain results that go unreported by Ernst and Angst (1983). In my previous meta-analysis (Sulloway, 1995), I omitted a number of unpublished dissertations from my analysis, since I had not personally examined them and had noted potential problems in coding results based on the summaries presented in Dissertation Abstracts International. My caution in this regard proved to be well advised. In most cases, subsequent examination of these unpublished dissertations resulted in substantially different scorings of the results compared with the scorings one would make based on the incomplete findings reported by Ernst and Angst (1983).
Under the heading "Big Five": Openness to Experience in the Five Factor Model of personality.
The status of social class controls, coded from 0 to 3 (see above, "CE, CE+/-"). For a study to be controlled for social class, it was required to include individual data on this variable. Hence descriptions of subjects coming, for example, from schools that are characterized as having predominantly "middle class" students were not considered as controlled if individual data were not used in the actual statistical analyses. In such cases, the absence of formal controls can substantially affect statistical outcomes.
"Statistical and Other Problems": Findings that involve problems of statistical interpretation, validity, or inaccurate reporting of study outcomes. "SP-" indicates findings that had to be dropped from the meta-analysis. In these instances, either formal statistics are not provided in the study, or the statistics cannot be properly interpreted owing to ambiguous or insufficient information. In other cases (SP), the statistical outcome is ambiguous or misreported by Ernst and Angst (1983), or is reported incompletely, but can be coded correctly from information provided in the original study. For example, Ernst and Angst (1983) report findings from some studies as being statistically significant that are not so, and they report other findings as being null outcomes when they are in fact statistically significant.
The status of sibship size controls, coded from 0 to 3 (see above, "CE, CE+/-").
In Table 3, findings overlooked by Townsend (2000, Appendix B).
Under the heading "Big Five": Unclassifiable outcomes within the Five Factor Model of personality; also, findings that do not involve personality traits. See also "CL-" and "NT-" (above).
Click here to access Table 1 , summarizing the meta-analytic results from 1940 to 1980.
Click here to access the full meta-analytic data for Table 1 (Table 2.)
Click here to access a list of controlled birth-order studies (1940-1980) overlooked by Ernst and Angst (1983) (Table 3) . In the case of studies with findings tallied by Townsend (2000, Appendix B), my "Error Codes" list various errors made by him in his own listings. In the case of controlled findings overlooked by Ernst and Angst (1983) as well as by Townsend, "Error Codes" refer to reporting errors made by Ernst and Angst.
Click here to access Table 4 , which contains meta-analytic data from 1981 to 1999.
1. I have not been able to determine what procedures Modell (1997) employed in his own tallies, although he has acknowledged that he previously overlooked my own statement that I had counted "findings" rather than "studies" (personal communication).
2. In response my warnings, Townsend (2000) subsequently examined some of these publications, but his evident lack of expertise in statistical matters has caused him to make numerous errors in his interpretation of these studies. For these reasons, Townsend has failed to correct most of the errors contained in his own tabulations (see above, "Additional Examples of Errors and Omissions by Townsend," "Overlooked Studies with Controls," and the Appendix to this report).
3. To their credit, Ernst and Angst (1983:ix) explicitly warn readers about some of these problems, such as the omission of some published findings from their review and discussions. Although I corrected some errors and identified some overlooked findings in my earlier meta-analysis (Sulloway, 1995), I did not detect all such instances or record every one that I did detect in an informal list that I kept of these problem (Sulloway, 1998a). Based on a subsequent review of the literature, when I decided to make a systematic record of such reporting errors and omissions, I came to the conclusion that the only way to be sure that all of these problems were properly identified was to have two people independently review this literature, using the coding procedures described in the text.
4. Other predictions can be made for some personality dimensions in which findings by birth order are expected to be heterogeneous by subdimensions. For example, firstborns are expected to be more open to experience than laterborns in ways that reflect intellectuality and cultural interests, whereas laterborns are expected to be more open to experience than firstborns in ways that reflect unconventionality. Owing to the difficulty of identifying clear measures of "openness in intellect" (Costa and McCrae, 1992), I have not attempted to distinguish this aspect of Openness to Experience from others that are reflected in studies reported in the Appendix, although I have done so elsewhere using a survey instrument specifically designed to facilitate this distinction (Sulloway, 1999). Similarly, one could also attempt to distinguish the predicted tendency for firstborns to be more anxious than laterborns (Neuroticism) from the predicted tendencies for laterborns to be more self-conscious and impulsive than firstborns. Again, I have not attempted to observe these distinctions here, although I have done so elsewhere, using a survey instrument that specifically allows for them (Sulloway, 1999). Finally, the Activity facet of Extraversion presents potential problems of classification not considered here. To the extent that this facet is a marker for energetic activity, it can, in some behavioral contexts, reflect Conscientiousness, in which case it is expected to be a firstborn attribute (Sulloway, 1999). Similarly, if Activity reflects assertiveness, it is also expected to be a firstborn attribute. To the extent, however, that Activity reflects positive affect (for example, enthusiasm or zealousness), it is expected to be a laterborn attribute. Based on these considerations, one could reasonably omit the Activity facet in attempting to score Extraversion. In the very few instances in which individual findings relevant to this particular facet were encountered in this meta-analysis, they made no difference in the overall scoring of outcomes.
5. This method of scoring outcomes overcomes a deficiency in my previous meta-analysis (Sulloway, 1995). In that earlier treatment I used Bonferroni-like techniques to reduce studies with many findings, such as the studies by Koch (1955a-1957) and Price (1969), to single outcomes for each personality dimension of the Five Factor Model. However, in instances in which I encountered studies with only a few findings of a disparate nature--for example, a null result and either a confirming or an opposed result from the same sample--I counted each outcome, since my previous method of analysis had no provision for scoring such results as a single "partial" outcome. The same procedure was followed in the assessment of interaction effects. For these reasons, a modest proportion of my previous results were not statistically independent, although they were also not positively correlated with other findings on the same personality dimension. To a much greater extent, this same problem of nonindependent outcomes is associated with the counts by Harris (1998) and by Townsend (2000). There is nothing improper about tallying findings that are not fully independent if, as a preliminary step of research, one wishes to ascertain whether any general trends exist in the overall data. It is technically improper, however, to apply statistical tests to these data. In large part because of the robustness of the overall trends in the birth-order research discussed in this report, this technical limitation does not actually make any appreciable difference in the overall statistical conclusions that I previously reached.
6. In addition to our use of the scoring method, I and a coworker have occasionally employed other standard statistical techniques to reduce multiple findings to single outcomes for each dimension of the Five Factor Model (Rosenthal, 1991; Rosenthal and Rosnow, 1991; Rosenthal, Rosnow, and Rubin, 2000). Because the first of two studies by Koch (1955a-1957) includes more than fifty dependent variables, as well as all possible interaction effects between birth order and three other variables (namely, sex, sex of sibling, and three categories of age spacing), we restricted our scoring of outcomes in this study to main effects. In a separate analysis, we have also scored and statistically analyzed all other studies for main effects alone (see above, "Disregarding Interaction Effects").
7. If one confines statistical analyses to those studies that report data for only one dimension of the Five Factor Model, statistical tests on these fully independent outcomes can also be conducted. (See further, Table 1, note f.) It is also possible to conduct an omnibus statistical test in which each study is coded only once, using the scoring method to amalgamate multiple outcomes on different dimensions of the Five Factor Model of personality into single "study outcomes." This omnibus test yields significant results (t=8.08, N=126, p<.0001)--which is consistent with the fact that individual tests for each of my five hypotheses also yield significant results (Table 1).
8. In Townsend's analysis (2000, Table 6), fully controlled outcomes are somehow transformed into 23 confirmations, 153 nulls, and 25 refutations, for a total of 201 results. However, neither in Townsend's own tallies nor in Ernst and Angst (1983) are there anywhere near 201 doubly controlled results (or 153 doubly controlled nulls), so his claims are grossly mistaken. By my count, there are just 67 doubly controlled outcomes in Townsend's Appendix A--namely, 18 confirmations, 40 nulls, and 9 refutations (as scored by him, and with controls classified by him--not always correctly). Note that these 67 outcomes, even with the many errors and methodological problems they contain, generally favor my five hypotheses. So how did Townsend conclude that there were 153 controlled null results when there are actually only 40 in his own tallies of these data? In his Table 6, Townsend appears to have classified more than a hundred studies lacking dual controls as "null" outcomes, no matter what their actual outcomes--a curious and untenable interpretation of what is meant by a null finding among controlled studies.
9. Meta-analytic assessments will tend to differ from tallies of raw findings for two reasons. When a study contains findings relevant to two or more dimensions of the Five Factor Model of personality, meta-analytic methods lead to an increase in outcomes relative to the method of counting findings by study that is sometimes employed by Townsend (2000) and Harris (1998). By contrast, whenever multiple findings are present in the same study for the same personality dimension, meta-analytic methods lead to a reduction in outcomes compared with simple counting.
10. In Sulloway (1995, 1996) I previously coded only 16 of 30 significant findings that are listed with formal statistics in Price's (1969) Table 3. However, the Appendix to Price's article lists 90 results, none of which are accompanied by formal tests of statistical significance. When formal tests are applied to each of the results reported in this Appendix, another 14 significant findings relating to personality are identifiable. Like the study by Price (1969), many other birth-order publications contain more relevant data than at first meets the eye, and many studies contain multiple problems that affect codings. In addition, if a study is disqualified from meta-analytic consideration for one problem (such as a lack of adequate statistics), it is easy to miss other methodological problems that might otherwise have been detected. It is partly for these reasons that I have conducted the present meta-analysis with a coworker, who has independently checked all of the same studies. In the process, we rectified any discrepancies between our respective codings. We also detected some errors in my previous codings, as well as findings I had previously missed. The additional corrections and overlooked findings reported in the Appendix do not affect the general conclusions that I reached in my earlier vote-counting meta-analysis (Sulloway, 1995, 1996), although the methodology employed here provides these previous findings with a more statistically robust foundation.
11. Harris (2000) has correctly noted a methodological shortcoming in my previous treatment of the file drawer problem, namely, the nonindependence of some of the data in the overall test that I conducted for all five dimensions of the Five Factor Model of personality (Sulloway, 1998b). The present analysis remedies this problem. It is also worth noting that birth-order studies reporting more than one finding, which are not statistically independent of one another, have the same mean rate of significant confirmations as do studies that report only a single finding, which are independent of one another. This fact strongly suggests that the nonindependence of findings in some birth-order studies only negligibly affects estimates of the number of null findings in file drawers that are required to invalidate existing significant findings (Rosenthal and Rosnow, 1991).
12. In this analysis of the moderating effect of study type, studies were categorized as including real-life data (such as sports activities or educational achievements) even if these data were obtained through self-report. Studies involving within-family data include those with personality or behavioral ratings of offspring made by parents, ratings of siblings by siblings or other people, and studies involving identification with parents or parental values. Studies employing self-report, but involving within-family data, have also been classified as within-family studies.
13. In my previous meta-analysis (Sulloway, 1995, 1996), I added 22 outcomes from studies with overlooked controls, 11 of which were confirming. Although these outcomes were somewhat more favorable to my hypotheses than other outcomes, the deviation from the expected proportion of confirming outcomes is well within chance expectations. If just 3 of these 22 outcomes had been nulls instead of confirmations, these additions would have matched the confirmation rate of the other 174 outcomes. More to the point, deleting all studies with overlooked controls from my previous meta-analysis does not affect the outcomes of any of the relevant statistical tests that I conducted (Sulloway, 1995, 1996). In Table 1, I have set aside all of these studies with overlooked controls in order to eliminate any objections on this point.
A similar methodological issue arises if, in a meta-analytic treatment, one includes overlooked birth-order findings from studies in which other findings were only partially reported by Ernst and Angst (1983). It is therefore worth noting that the findings and outcomes that I previously added to Sulloway (1995) were not significantly different from those reported by Ernst and Angst. (The modest difference in confirmation rates was 38 percent for the studies in which I included overlooked findings and 36 percent for the studies in which I did not included such findings.) In the present meta-analysis, this methodological issue has been handled by systematically including all known findings from all relevant publications, as independently assessed by two investigators.
14. Townsend's (2000) Appendix C is based on a list of errors and omissions in Ernst and Angst's (1983) literature review that I sent to him (Sulloway, 1998a). Townsend (2000) is mistaken when he states that this list of errors is complete, and hence that there are no other errors or omissions in Ernst and Angst's (1983) review of controlled studies that need to be corrected. (For this reason, Townsend has failed to check most of these publications himself for possible problems.) Townsend appears to have misunderstood my own comments on this subject. The list of problems that I sent to Townsend was a complete copy of a list that I had assembled, based on an informal record that I had previously kept of the most important errors and inconsistencies I had noticed in Ernst and Angst's listings (Sulloway, 1998b). For reasons discussed in notes 3 and 10, this record did not constitute a full listing of the problems actually present in Ernst and Angst's review.
15. In Table 4, results from 10 unpublished dissertations have been coded from the details provided in Dissertation Abstracts International. (In none of these 10 instances, however, did the summaries of findings suggest any potential scoring inconsistencies based on the data actually present in the unpublished dissertations.) Codings based on the unpublished dissertations, where more information may be available, may nevertheless differ from those based on the published summaries. Omission of these 10 dissertations from the meta-analysis does not appreciably affect the overall results.
16. Christopher Davis and I also attempted to replicate Harris's (1998) findings for the post-1980 period using the specific method she describes of retrieving and analyzing article abstracts that she describes. We found many more findings (and scorable outcomes) than Harris reports, and we also found a much higher proportion of favorable findings than she reports, consistent with our other results, reported here, for the original literature (as opposed to article abstracts). Because these meta-analytic results for article abstracts are based on incomplete summaries of the original literature, I have not included them here in any formal analysis.
17. In Sulloway (1995, 1996) my total for the number of subjects inadvertently included some subjects more than once, since this sample size was derived by summing the Ns for individual outcomes rather than for individual studies. Because Ns were not involved in any of my formal statistical analyses, this error did not affect any of my previous statistical conclusions about the relationship between birth order and the Five Factor Model of personality.
I thank Lynn Gale and Lincoln Moses, statisticians at the Center for Advanced Study in the Behavioral Sciences (Stanford, California), for helpful advice in connection with this study. For additional advice on statistical and meta-analytic questions, I thank Robert Rosenthal, Donald Rubin, Michael Shanahan, and Sanjay Srivastava. Oliver John provided independent classifications of the meta-analytic data in the Appendix to this report according to the Five Factor Model of personality, and Christopher Davis checked and independently coded all of the pre-1981 birth-order data and also provided independent assessments of the post-1980 data. Portions of this technical report are have appeared in Sulloway (2000) and are used here by permission of the Association for Politics and the Life Sciences.
Biegelsen, C. E. (1976). The relationship of being first or later born and vocational, academic and personality variables. Unpublished doctoral dissertation, Washington University. Abstracted in Dissertation Abstracts International, 37-B (1976):1871-72.
Chao, M. (2001). The birth-order controversy: Within-family effects and their generalizability. Honors thesis, University of California, Berkeley.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum.
Costa, P. T. Jr. and McCrae, R. R. (1992). NEO PI-R Professional Manual. Odessa, Fl.: Psychological Assessment Resources.
Costa, P. T., Jr., Terracciano, A., and McCrae, R. R. (2001). Gender differences in personality traits across cultures: Robust and surprising findings. Journal of Personality and Social Psychology, 81:322-31.
Dauphinais, S., and Leitner, D. W. (1978). Effect of birth order, sex, and family size on affiliation with encounter groups. Psychological Reports, 42:673-74.
Dean, D. A. (1947). The relation of ordinal position to personality in young children. Master's thesis, State University of Iowa, Iowa.
Dunn, J.. and Plomin, R. (1990). Separate Lives: Why Siblings Are So Different. New York: Basic Books.
Eaves, L. J., Eysenck, H. J., and Martin, N. G. (1989). Genes, Culture and Personality: An Empirical Approach. London and San Diego: Academic Press.
Ernst, C., and Angst, J. (1983). Birth Order: Its Influence on Personality. Berlin and New York: Springer-Verlag.
Eysenck, H. J., and Cookson, D. (1969). Personality in primary school children: 2.--Teachers' Ratings. British Journal of Educational Psychology, 39:123-30.
Eysenck, H. J., and Cookson, D. (1970). Personality in primary school children: 3.--Family background. British Journal of Educational Psychology, 40:117-31.
Feingold, A. (1994). Gender differences in personality: A meta-analysis. Psychological Bulletin, 116:429-56.
Harris, J. R. (1998). The Nurture Assumption: Why Children Turn Out the Way They Do. New York: Free Press.
Harris, J. R. (2000). Personality and birth order: Explaining the difference between siblings. Politics and the Life Sciences.
Harris, J. R. (2001). The 179 "studies" (actually, findings) in Ernst & Angst's (1983) text and tables. Unpublished manuscript (http://home.att.net/~xchar/tna/birth-order/index.htm#findings).
Hedges, L. V., and Olkin, I. (1980). Vote-counting methods in research synthesis. Psychological Bulletin, 88:359-69.
Hertwig, R., Davis, J. N., and Sulloway, F. J. (2002). Parental investment: How an equity motive can produce inequality. Psychological Bulletin, 128:728-45.
Hunter, J. E., and Schmidt, F. L. (1990). Methods of Meta-Analysis: Correcting Error and Bias in Research Findings. Newbury Park, CA: Sage.
Koch, H. (1955a). Some personality correlates of sex, sibling position, and sex of sibling among five- and six-year-old children. Genetic Psychology Monographs, 52:3-50.
Koch, H. (1955b). The relation of certain family constellation characteristics and the attitudes of children toward adults. Child Development, 26:13-40.
Koch, H. (1956a). Attitudes of young children toward their peers as related to certain characteristics of their siblings. Psychological Monographs, 70:1-41.
Koch, H. (1956b). Children's work attitudes and sibling characteristics. Child Development, 27:289-310.
Koch, H. (1956c). Sibling influence on children's speech. Journal of Speech and Hearing Disorders, 21:322-28.
Koch, H. (1956d). Sissiness and tomboyishness in relation to sibling characteristics. Journal of Genetic Psychology, 88:231-44.
Koch, H. (1956e). Some emotional attitudes of the young child in relation to characteristics of his sibling. Child Development, 27:393-426.
Koch, H. (1957). The relation in young children between characteristics of their playmates and certain attributes of their siblings. Child Development, 28:175-202.
Koch, H. (1960). The relation of certain formal attributes of siblings to attitudes held toward each other and toward their parents. Monograph of the Society for Research in Child Development, 25:1-124.
Krinsky, S. G. (1963). The relationships among birth order, dimensions of independence-dependence and choice of a scientific career. In W. W. Cooley, ed., Career Development of Scientists, pp. 157-70. Cambridge: Harvard Graduate School of Education.
Lichtenwalner, J., and Maxell, J. W. (1969). The relationship of birth order and socioeconomic status to the creativity of preschool children. Child Development, 40:1241-47.
Loehlin, J. C. (1992). Genes and Environment in Personality Development. Newbury Park, CA: Sage Publications.
Macbeth, B. L. (1975). Birth order, personality, and scholastic aptitude. Unpublished doctoral dissertation, Department of Psychology, University of Oregon. Abstracted in Dissertation Abstracts International, 36-B (1976):4757.
McCrae, R. R., and O. P. John (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60:175-215.
Modell, J. (1997). Family niche and intellectual bent. Review of Born to Rebel. Science, 275:624.
Nystul, M. S. (1974). The effects of birth order and sex on self-concept. Journal of Individual Psychology, 30:211-15.
Paulhus, D. L., P. D. Trapnell, and D. Chen (1999). Birth order effects on personality and achievement within families. Psychological Science, 10:482-88.
Pinker, S. (2002). The Blank Slate: The Modern Denial of Human Nature. New York: Viking.
Price, J. (1969). Personality differences within families: Comparisons of adult brothers and sisters. Journal of Biosocial Science, 1:117-205.
Rosenthal, R. (1984). Meta-Analytic Procedures for Social Research. Revised ed. Newbury Park, CA: Sage.
Rosenthal, R. (1987). Judgment Studies: Design, Analysis, and Meta-Analysis. Cambridge: Cambridge University Press.
Rosenthal, R., and Rosnow, R. L. (1991). Essentials of Behavioral Research: Methods and Data Analysis. Boston: McGraw Hill.
Rosenthal, R., Rosnow, R. L., and Rubin, D. B. (2000). Contrasts and Effect Sizes in Behavioral Research: A Correlational Approach. Cambridge: Cambridge University Press.
Rosenthal, R., and Rubin, D. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74:708-712
Rowe, David C. (1994). The Limits of Family Influence: Genes, Experience, and Behavior. New York and London: Guilford Press.
Salmon, C., and Daly, M. (1998). Birth order and familial sentiment: Middleborns are different. Human Behavior and Evolution, 19:299-312.
Sulloway, F. J. (1995). Birth order and evolutionary psychology: A meta-analytic overview. Psychological Inquiry, 6:75-80.
Sulloway, F. J. (1996). Born to Rebel: Birth Order, Family Dynamics, and Creative Lives. New York: Pantheon.
Sulloway, F. J. (1997). Born to Rebel: Birth Order, Family Dynamics, and Creative Lives. Revised paperback edition. New York: Vintage.
Sulloway, F. J. (1998a). A list of "Errors and Inconsistencies in Ernst and Angst's Literature Review." Unpublished manuscript.
Sulloway, F. J. (1998b). Birth Order and The Nurture Misassumption: A Reply to Judith Harris. Edge, 47. Http://www.edge.org/documents/archive/edge47.html.
Sulloway, F. J. (1999). Birth order. Encyclopedia of Creativity, 1:189-202. Edited by Mark A. Runco and Steven R. Pritzker. San Diego: Academic Press.
Sulloway, F. J. (2000). Born to Rebel and its critics. Politics and the Life Sciences, 19:181-202.
Sulloway, F. J. (2001). Birth order, sibling competition, and human behavior. In Conceptual Challenges in Evolutionary Psychology: Innovative Research Strategies, pp. 39-83. Edited by Harmon R. Holcomb III. Dordrecht and Boston: Kluwer Academic Publishers.
Sutton-Smith, B., and Rosenberg, B. G. (1970). The Sibling. New York: Holt, Reinhart, and Winston.
Tomeh, A. K. (1969). Birth order and kinship affiliations. Journal of Marriage and the Family, 31:19-26.
Tomeh, A. K. (1970). Birth order and friendship associations. Journal of Marriage and the Family, 32:60-69.
Tomeh, A. K. (1971). Birth order and family influences in the middle east. Journal of Comparative Family Studies, 2:88-106
Tomeh, A. K. (1972). Birth order and dependence patterns of college students in Lebanon. Journal of Marriage and the Family, 34:361-74.
Turkheimer, E., and Waldron, M. (2000). Nonshared environment: A theoretical, methodological, and quantitative review. Psychological Bulletin, 126:78-108.
Townsend, F. (2000). Birth order and rebelliousness: Reconstructing the research in Born to Rebel. Politics and the Life Sciences, 19:135-156.
Yando, R., Zigler, E., and Litzinger, S. (1975). A further investigation of the effects of birth order and number of siblings in determining children's responsiveness to social reinforcement. Journal of Psychology, 89:95-111.
©2002 Frank J. Sulloway