Wednesday, June 26, 2013

WHAT YOU NEED TO KNOW AS YOUNG BOY OR GIRL ABOUT SEX EDUCATION

The effectiveness of school-based sex education programs in the promotion of abstinent behavior: a meta-analysis

  1. Mónica Silva
+ Author Affiliations
  1. Escuela de Administración, Pontificia Universidad Católica de Chile, Vicuña Mackenna 4860, Santiago, Chile

Abstract

This review presents the findings from controlled school-based sex education interventions published in the last 15 years in the US. The effects of the interventions in promoting abstinent behavior reported in 12 controlled studies were included in the meta-analysis. The results of the analysis indicated a very small overall effect of the interventions in abstinent behavior. Moderator analysis could only be pursued partially because of limited information in primary research studies. Parental participation in the program, age of the participants, virgin-status of the sample, grade level, percentage of females, scope of the implementation and year of publication of the study were associated with variations in effect sizes for abstinent behavior in univariate tests. However, only parental participation and percentage of females were significant in the weighted least-squares regression analysis. The richness of a meta-analytic approach appears limited by the quality of the primary research. Unfortunately, most of the research does not employ designs to provide conclusive evidence of program effects. Suggestions to address this limitation are provided.

Introduction

Sexually active teenagers are a matter of serious concern. In the past decades many school-based programs have been designed for the sole purpose of delaying the initiation of sexual activity. There seems to be a growing consensus that schools can play an important role in providing youth with a knowledge base which may allow them to make informed decisions and help them shape a healthy lifestyle (St Leger, 1999). The school is the only institution in regular contact with a sizable proportion of the teenage population (Zabin and Hirsch, 1988), with virtually all youth attending it before they initiate sexual risk-taking behavior (Kirby and Coyle, 1997).
Programs that promote abstinence have become particularly popular with school systems in the US (Gilbert and Sawyer, 1994) and even with the federal government (Sexual abstinence program has a $250 million price tag, 1997). These are referred to in the literature as abstinence-only or value-based programs (Repucci and Herman, 1991). Other programs—designated in the literature as safer-sex, comprehensive, secular or abstinence-plus programs—additionally espouse the goal of increasing usage of effective contraception. Although abstinence-only and safer-sex programs differ in their underlying values and assumptions regarding the aims of sex education, both types of programs strive to foster decision-making and problem-solving skills in the belief that through adequate instruction adolescents will be better equipped to act responsibly in the heat of the moment (Repucci and Herman, 1991). Nowadays most safer-sex programs encourage abstinence as a healthy lifestyle and many abstinence only programs have evolved into `abstinence-oriented' curricula that also include some information on contraception. For most programs currently implemented in the US, a delay in the initiation of sexual activity constitutes a positive and desirable outcome, since the likelihood of responsible sexual behavior increases with age (Howard and Mitchell, 1993).
Even though abstinence is a valued outcome of school-based sex education programs, the effectiveness of such interventions in promoting abstinent behavior is still far from settled. Most of the articles published on the effectiveness of sex education programs follow the literary format of traditional narrative reviews (Quinn, 1986; Kirby, 1989, 1992; Visser and van Bilsen, 1994; Jacobs and Wolf, 1995; Kirby and Coyle, 1997). Two exceptions are the quantitative overviews by Frost and Forrest (Frost and Forrest, 1995) and Franklin et al. (Franklin et al., 1997).
In the first review (Frost and Forrest, 1995), the authors selected only five rigorously evaluated sex education programs and estimated their impact on delaying sexual initiation. They used non-standardized measures of effect sizes, calculated descriptive statistics to represent the overall effect of these programs and concluded that those selected programs delayed the initiation of sexual activity. In the second review, Franklin et al. conducted a meta-analysis of the published research of community-based and school-based adolescent pregnancy prevention programs and contrary to the conclusions forwarded by Frost and Forrest, these authors reported a non-significant effect of the programs on sexual activity (Franklin et al., 1997).
The discrepancy between these two quantitative reviews may result from the decision by Franklin et al. to include weak designs, which do not allow for reasonable causal inferences. However, given that recent evidence indicates that weaker designs yield higher estimates of intervention effects (Guyatt et al., 2000), the inclusion of weak designs should have translated into higher effects for the Franklin et al. review and not smaller. Given the discrepant results forwarded in these two recent quantitative reviews, there is a need to clarify the extent of the impact of school-based sex education in abstinent behavior and explore the specific features of the interventions that are associated to variability in effect sizes.

Purpose of the study

The present study consisted of a meta-analytic review of the research literature on the effectiveness of school-based sex education programs in the promotion of abstinent behavior implemented in the past 15 years in the US in the wake of the AIDS epidemic. The goals were to: (1) synthesize the effects of controlled school-based sex education interventions on abstinent behavior, (2) examine the variability in effects among studies and (3) explain the variability in effects between studies in terms of selected moderator variables.

Literature search and selection criteria

The first step was to locate as many studies conducted in the US as possible that dealt with the evaluation of sex education programs and which measured abstinent behavior subsequent to an intervention.
The primary sources for locating studies were four reference database systems: ERIC, PsychLIT, MEDLINE and the Social Science Citation Index. Branching from the bibliographies and reference lists in articles located through the original search provided another source for locating studies.
The process for the selection of studies was guided by four criteria, some of which have been employed by other authors as a way to orient and confine the search to the relevant literature (Kirby et al., 1994). The criteria to define eligibility of studies were the following.
  1. Interventions had to be geared to normal adolescent populations attending public or private schools in the US and report on some measure of abstinent behavior: delay in the onset of intercourse, reduction in the frequency of intercourse or reduction in the number of sexual partners. Studies that reported on interventions designed for cognitively handicapped, delinquent, school dropouts, emotionally disturbed or institutionalized adolescents were excluded from the present review since they address a different population with different needs and characteristics. Community interventions which recruited participants from clinical or out-of-school populations were also eliminated for the same reasons.
  2. Studies had to be either experimental or quasi-experimental in nature, excluding three designs that do not permit strong tests of causal hypothesis: the one group post-test-only design, the post-test-only design with non-equivalent groups and the one group pre-test–post-test design (Cook and Campbell, 1979). The presence of an independent and comparable `no intervention' control group—in demographic variables and measures of sexual activity in the baseline—was required for a study to be included in this review.
  3. Studies had to be published between January 1985 and July 2000. A time period restriction was imposed because of cultural changes that occur in society—such as the AIDS epidemic—which might significantly impact the adolescent cohort and alter patterns of behavior and consequently the effects of sex education interventions.
  4. Studies had to be published in a peer-reviewed journal. The reasons for this criterion are 3-fold. First, there have been many reports published in newspapers or advocacy newsletters claiming that specific sex education programs have a dramatic impact on one or more outcome variables, yet when these reports have been investigated, they often were found lacking in valid empirical evidence (Kirby et al., 1994; Frost and Forrest, 1995). Second, unpublished studies are hard to locate and the quality of unpublished research makes it doubtful whether the cost involved in undertaking retrieval procedures is worth investing. This is not to say that all conference papers are defective or all journal articles are free of weaknesses. However, regardless of varying standards of review rigor and publication criteria between journals, published articles have at least survived some form of a refereeing and editing process (Dunkin, 1996). Finally, an added advantage of including only published articles is that it helps reduce the risk of data dependence. The probability of duplication of studies is likely to be increased when including dissertation and papers presented at conferences, which often constitute previous drafts to published studies. Even considering only published studies, it may be difficult to detect duplication. The same data set, or a subset of it, may be repeatedly used in several studies, published in different journals, with different main authors, and without any reference to the original data source. Published studies which were known or suspected to have employed the same database were only included once.1

Coding of the studies for exploration of moderators

The exploration of study characteristics or features that may be related to variations in the magnitude of effect sizes across studies is referred to as moderator analysis. A moderator variable is one that informs about the circumstances under which the magnitude of effect sizes vary (Miller and Pollock, 1994). The information retrieved from the articles for its potential inclusion as moderators in the data analysis was categorized in two domains: demographic characteristics of the participants in the sex education interventions and characteristics of the program.
Demographic characteristics included the following variables: the percentages of females, the percentage of whites, the virginity status of participants, mean (or median) age and a categorization of the predominant socioeconomic status of participating subjects (low or middle class) as reported by the authors of the primary study.
In terms of the characteristics of the programs, the features coded were: the type of program (whether the intervention was comprehensive/safer-sex or abstinence-oriented), the type of monitor who delivered the intervention (teacher/adult monitor or peer), the length of the program in hours, the scope of the implementation (large-scale versus small-scale trial), the time elapsed between the intervention and the post-intervention outcome measure (expressed as number of days), and whether parental participation (beyond consent) was a component of the intervention.
The type of sex education intervention was defined as abstinence-oriented if the explicit aim was to encourage abstinence as the primary method of protection against sexually transmitted diseases and pregnancy, either totally excluding units on contraceptive methods or, if including contraception, portraying it as a less effective method than abstinence. An intervention was defined as comprehensive or safer-sex if it included a strong component on the benefits of use of contraceptives as a legitimate alternative method to abstinence for avoiding pregnancy and sexually transmitted diseases.
A study was considered to be a large-scale trial if the intervention group consisted of more than 500 students.
Finally, year of publication was also analyzed to assess whether changes in the effectiveness of programs across time had occurred.
The decision to record information on all the above-mentioned variables for their potential role as moderators of effect sizes was based in part on theoretical considerations and in part on the empirical evidence of the relevance of such variables in explaining the effectiveness of educational interventions. A limitation to the coding of these and of other potentially relevant and interesting moderator variables was the scantiness of information provided by the authors of primary research. Not all studies described the features of interest for this meta-analysis. For parental participation, no missing values were present because a decision was made to code all interventions which did not specifically report that parents had participated—either through parent–youth sessions or homework assignments—as non-participation. However, for the rest of the variables, no similar assumptions seemed appropriate, and therefore if no pertinent data were reported for a given variable, it was coded as missing (see Table I).
View this table:
Table I.
Description of moderator variables

Decisions related to the computation of effect sizes

Once the pool of studies which met the inclusion criteria was located, studies were examined in an attempt to retrieve the size of the effect associated with each intervention. Since most of the studies did not report any effect size, it had to be estimated based on the significance level and inferential statistics with formulae provided by Rosenthal (Rosenthal, 1991) and Holmes (Holmes; 1984). When provided, the exact value for the test statistic or the exact probability was used in the calculation of the effect size.
In order to avoid data dependence, a conservative strategy of including only one finding per study was employed in this review. When multiple variations of interventions were tested, the effect size was calculated for the most successful of the treatment groups. This decision rests on the assumption that should the program be implemented in the future, the most effective mode of intervention would be chosen. Similarly, to ensure the independence of the data in the case of follow-up studies when multiple measurements were reported across time a single estimate of effect size was included.2
Analyses of the effect sizes were conducted utilizing the D-STAT software (Johnson, 1989). The sample sizes used for the overall effect size analysis corresponded to the actual number used to estimate the effects of interest, which was often less than the total sample of the study. Occasionally the actual sample sizes were not provided by the authors of primary research, but could be estimated from the degrees of freedom reported for the statistical tests.
The effect sizes were calculated from means and pooled standard deviations, t-tests, χ2, significance levels or from proportions, depending on the nature of the information reported by the authors of primary research. As recommended by Rosenthal, if results were reported simply as being `non-significant' a conservative estimate of the effect size was included, assuming P = 0.50, which corresponds to an effect size of zero (Rosenthal, 1991). The overall measure of effect size reported was the corrected d statistic (Hedges and Olkin, 1985). These authors recommend this measure since it does not overestimate the population effect size, especially in the case when sample sizes are small.
The homogeneity of effect sizes was examined to determine whether the studies shared a common effect size. Testing for homogeneity required the calculation of a homogeneity statistic, Q. If all studies share the same population effect size, Q follows an asymptotic χ2 distribution with k – 1 degrees of freedom, where k is the number of effect sizes. For the purposes of this review the probability level chosen for significance testing was 0.10, due to the fact that the relatively small number of effect sizes available for the analysis limits the power to detect actual departures from homogeneity. Rejection of the hypothesis of homogeneity signals that the group of effect sizes is more variable than one would expect based on sampling variation and that one or more moderator variables may be present (Hall et al., 1994).
To examine the relationship between the study characteristics included as potential moderators and the magnitude of effect sizes, both categorical and continuous univariate tests were run. Categorical tests assess differences in effect sizes between subgroups established by dividing studies into classes based on study characteristics. Hedges and Olkin presented an extension of the Q statistic to test for homogeneity of effect sizes between classes (QB) and within classes (QW) (Hedges and Olkin, 1985). The relationship between the effect sizes and continuous predictors was assessed using a procedure described by Rosenthal and Rubin which tests for linearity between effect sizes and predictors (Rosenthal and Rubin, 1982).
A weighted least-squares regression analysis was conducted to test the joint effect of the significant moderators on the effect sizes. The results of the univariate analyses were used to select the predictors to be included in the model. Categorical predictors were included as dummy variables. All predictors were entered simultaneously. Significance of each regression coefficient was tested using a z-test where the standard errors in the output of SPSS were adjusted by a factor of the square root of the mean square error for the regression model (Hedges and Olkin, 1985). Model specification was tested using the QE goodness-of-fit statistic.3

Results

The search for school-based sex education interventions resulted in 12 research studies that complied with the criteria to be included in the review and for which effect sizes could be estimated.
The overall effect size (d+) estimated from these studies was 0.05 and the 95% confidence interval about the mean included a lower bound of 0.01 to a high bound of 0.09, indicating a very minimal overall effect size. Table II presents the effect size of each study (di) along with its 95% confidence interval and the overall estimate of the effect size. Homogeneity testing indicated the presence of variability among effect sizes (Q(11) = 35.56; P = 0.000).
View this table:
Table II.
Effect sizes of studies
Among the set of categorical predictors studied, parental participation in the program, virginity status of the sample and scope of the implementation were statistically significant.4
Parental participation appeared to moderate the effects of sex education on abstinence as indicated by the significant Q test between groups (QB(1) = 5.06; P = 0.025), as shown in Table III. Although small in magnitude (d = 0.24), the point estimate for the mean weighted effect size associated with programs with parental participation appears substantially larger than the mean associated with those where parents did not participate (d = 0.04). The confidence interval for parent participation does not include zero, thus indicating a small but positive effect. Controlling for parental participation appears to translate into homogeneous classes of effect sizes for programs that include parents, but not for those where parents did not participate (QW(9) = 28.94; P = 0.001) meaning that the effect sizes were not homogeneous within this class.
View this table:
Table III.
Tests of categorical moderators for abstinence
Virginity status of the sample was also a significant predictor of the variability among effect sizes (QB(1) = 3.47; P = 0.06). The average effect size calculated for virgins-only was larger than the one calculated for virgins and non-virgins (d = 0.09 and d = 0.01, respectively). Controlling for virginity status translated into homogeneous classes for virgins and non-virgins although not for the virgins-only class (QW(5) = 27.09; P = 0.000).
The scope of the implementation also appeared to moderate the effects of the interventions on abstinent behavior. The average effect size calculated for small-scale intervention was significantly higher than that for large-scale interventions (d = 0.26 and d = 0.01, respectively). The effects corresponding to the large-scale category were homogeneous but this was not the case for the small-scale class, where heterogeneity was detected (QW(4) = 14.71; P = 0.01)
For all three significant categorical predictors, deletion of one outlier (Howard and McCabe, 1990) resulted in homogeneity among the effect sizes within classes.
Univariate tests of continuous predictors showed significant results in the case of percentage of females in the sample (z = 2.11; P = 0.04), age of participants (z = –1.67; P = 0.09), grade (z = –1.80; P = 0.07) and year of publication (z = –2.76; P = 0.006).
All significant predictors in the univariate analysis—with the exception of grade which had a very high correlation with age (r = 0.97; P = 0.000)—were entered into a weighted least-squares regression analysis. In general, the remaining set of predictors had a moderate degree of intercorrelation, although none of the coefficients were statistically significant.
In the weighted least-squares regression analysis, only parental participation and the percentage of females in the study were significant. The two-predictor model explained 28% of the variance in effect sizes. The test of model specification yielded a significant QE statistic suggesting that the two-predictor model cannot be regarded as correctly specified (see Table IV).
View this table:
Table IV.
Weighted least-squares regression and test of model specification

Discussion

This review synthesized the findings from controlled sex education interventions reporting on abstinent behavior. The overall mean effect size for abstinent behavior was very small, close to zero. No significant effect was associated to the type of intervention: whether the program was abstinence-oriented or comprehensive—the source of a major controversy in sex education—was not found to be associated to abstinent behavior. Only two moderators—parental participation and percentage of females—appeared to be significant in both univariate tests and the multivariable model.
Although parental participation in interventions appeared to be associated with higher effect sizes in abstinent behavior, the link should be explored further since it is based on a very small number of studies. To date, too few studies have reported success in involving parents in sex education programs. Furthermore, the primary articles reported very limited information about the characteristics of the parents who took part in the programs. Parents who were willing to participate might differ in important demographic or lifestyle characteristics from those who did not participate. For instance, it is possible that the studies that reported success in achieving parental involvement may have been dealing with a larger percentage of intact families or with parents that espoused conservative sexual values. Therefore, at this point it is not possible to affirm that parental participation per se exerts a direct influence in the outcomes of sex education programs, although clearly this is a variable that merits further study.
Interventions appeared to be more effective when geared to groups composed of younger students, predominantly females and those who had not yet initiated sexual activity. The association between gender and effect sizes—which appeared significant both in the univariate and multivariable analyses—should be explored to understand why females seem to be more receptive to the abstinence messages of sex education interventions.
Smaller-scale interventions appeared to be more effective than large-scale programs. The larger effects associated to small-scale trials seems worth exploring. It may be the case that in large-scale studies it becomes harder to control for confounding variables that may have an adverse impact on the outcomes. For example, large-scale studies often require external agencies or contractors to deliver the program and the quality of the delivery of the contents may turn out to be less than optimal (Cagampang et al., 1997).
Interestingly there was a significant change in effect sizes across time, with effect sizes appearing to wane across the years. It is not likely that this represents a decline in the quality of sex education interventions. A possible explanation for this trend may be the expansion of mandatory sex education in the US which makes it increasingly difficult to find comparison groups that are relatively unexposed to sex education. Another possible line of explanation refers to changes in cultural mores regarding sexuality that may have occurred in the past decades—characterized by an increasing acceptance of premarital sexual intercourse, a proliferation of sexualized messages from the media and increasing opportunities for sexual contact in adolescence—which may be eroding the attainment of the goal of abstinence sought by educational interventions.
In terms of the design and implementation of sex education interventions, it is worth noting that the length of the programs was unrelated to the magnitude in effect sizes for the range of 4.5–30 h represented in these studies. Program length—which has been singled out as a potential explanation for the absence of significant behavioral effects in a large-scale evaluation of a sex education program (Kirby et al., 1997a)—does not appear to be consistently associated with abstinent behavior. The impact of lengthening currently existing programs should be evaluated in future studies.
As it has been stated, the exploration of moderator variables could be performed only partially due to lack of information on the primary research literature. This has been a problem too for other reviewers in the field (Franklin et al., 1997). The authors of primary research did not appear to control for nor report on the potentially confounding influence of numerous variables that have been indicated in the literature as influencing sexual decision making or being associated with the initiation of sexual activity in adolescence such as academic performance, career orientation, religious affiliation, romantic involvement, number of friends who are currently having sex, peer norms about sexual activity and drinking habits, among others (Herold and Goodwin, 1981; Christopher and Cate, 1984; Billy and Udry, 1985; Roche, 1986; Coker et al., 1994; Kinsman et al., 1998; Holder et al., 2000; Thomas et al., 2000). Even though randomization should take care of differences in these and other potentially confounding variables, given that studies can rarely assign students to conditions and instead assign classrooms or schools to conditions, it is advisable that more information on baseline characteristics of the sample be utilized to establish and substantiate the equivalence between the intervention and control groups in relevant demographic and lifestyle characteristics.
In terms of the communication of research findings, the richness of a meta-analytic approach will always be limited by the quality of the primary research. Unfortunately, most of the research in the area of sex education do not employ experimental or quasi-experimental designs and thus fall short of providing conclusive evidence of program effects. The limitations in the quality of research in sex education have been highlighted by several authors in the past two decades (Kirby and Baxter, 1981; Card and Reagan, 1989; Kirby, 1989; Peersman et al., 1996). Due to these deficits in the quality of research—which resulted in a reduced number of studies that met the criteria for inclusion and the limitations that ensued for conducting a thorough analysis of moderators—the findings of the present synthesis have to be considered merely tentative. Substantial variability in effect sizes remained unexplained by the present synthesis, indicating the need to include more information on a variety of potential moderating conditions that might affect the outcomes of sex education interventions.
Finally, although it is rarely the case that a meta-analysis will constitute an endpoint or final step in the investigation of a research topic, by indicating the weaknesses as well as the strengths of the existing research a meta-analysis can be a helpful aid for channeling future primary research in a direction that might improve the quality of empirical evidence and expand the theoretical understanding in a given field (Eagly and Wood, 1994). Research in sex education could be greatly improved if more efforts were directed to test interventions utilizing randomized controlled trials, measuring intervening variables and by a more careful and detailed reporting of the results. Unless efforts are made to improve on the quality of the research that is being conducted, decisions about future interventions will continue to be based on a common sense and intuitive approach as to `what might work' rather than on solid empirical evidence.

Footnotes

  • 1 Five pairs of publications were detected which may have used the same database (or two databases which were likely to contain non-independent cases) (Levy et al., 1995/Weeks et al., 1995; Barth et al., 1992/Kirby et al., 1991/Christoper and Roosa, 1990/Roosa and Christopher, 1990 and Jorgensen, 1991/Jorgensen et al., 1993). Only one effect size from each pair of articles was included to avoid the possibility of data dependence.
  • 2 Alternative methods to deal with non-independent effect sizes were not employed since these are more complex and require estimates of the covariance structure among the correlated effect sizes. According to Matt and Cook such estimates may be difficult—if not impossible—to obtain due to missing information in primary studies (Matt and Cook, 1994).
  • 3 QE provides the test for model specification, when the number of studies is larger than the number of predictors. Under those conditions, QE follows an approximate χ2 distribution with kp – 1 degrees of freedom, where k is the number of effect sizes and p is the number of regressors (Hedges and Olkin, 1985).
  • 4 An assessment of interaction effects among significant moderators could not be explored since it would have required partitioning of the studies according to a first variable and testing of the second within the partitioned categories. The limited number of effect sizes precluded such analysis.
  • References marked with an asterisk indicate studies included in the meta-analysis.

References

No comments:

OO