The effectiveness of school-based sex education programs in the promotion of abstinent behavior: a meta-analysis
+ Author Affiliations
Abstract
This review presents the findings from
controlled school-based sex education interventions published in the
last 15 years
in the US. The effects of the interventions in
promoting abstinent behavior reported in 12 controlled studies were
included
in the meta-analysis. The results of the analysis
indicated a very small overall effect of the interventions in abstinent
behavior. Moderator analysis could only be pursued
partially because of limited information in primary research studies.
Parental
participation in the program, age of the
participants, virgin-status of the sample, grade level, percentage of
females, scope
of the implementation and year of publication of
the study were associated with variations in effect sizes for abstinent
behavior
in univariate tests. However, only parental
participation and percentage of females were significant in the weighted
least-squares
regression analysis. The richness of a
meta-analytic approach appears limited by the quality of the primary
research. Unfortunately,
most of the research does not employ designs to
provide conclusive evidence of program effects. Suggestions to address
this
limitation are provided.
Introduction
Sexually active teenagers are a matter of
serious concern. In the past decades many school-based programs have
been designed
for the sole purpose of delaying the initiation of
sexual activity. There seems to be a growing consensus that schools can
play an important role in providing youth with a
knowledge base which may allow them to make informed decisions and help
them
shape a healthy lifestyle (St Leger, 1999). The school is the only institution in regular contact with a sizable proportion of the teenage population (Zabin and Hirsch, 1988), with virtually all youth attending it before they initiate sexual risk-taking behavior (Kirby and Coyle, 1997).
Programs that promote abstinence have become particularly popular with school systems in the US (Gilbert and Sawyer, 1994) and even with the federal government (Sexual abstinence program has a $250 million price tag, 1997). These are referred to in the literature as abstinence-only or value-based programs (Repucci and Herman, 1991).
Other programs—designated in the literature as safer-sex,
comprehensive, secular or abstinence-plus programs—additionally
espouse the goal of increasing usage of effective
contraception. Although abstinence-only and safer-sex programs differ in
their underlying values and assumptions regarding
the aims of sex education, both types of programs strive to foster
decision-making
and problem-solving skills in the belief that
through adequate instruction adolescents will be better equipped to act
responsibly
in the heat of the moment (Repucci and Herman, 1991).
Nowadays most safer-sex programs encourage abstinence as a healthy
lifestyle and many abstinence only programs have evolved
into `abstinence-oriented' curricula that also
include some information on contraception. For most programs currently
implemented
in the US, a delay in the initiation of sexual
activity constitutes a positive and desirable outcome, since the
likelihood
of responsible sexual behavior increases with age (Howard and Mitchell, 1993).
Even though abstinence is a valued outcome
of school-based sex education programs, the effectiveness of such
interventions
in promoting abstinent behavior is still far from
settled. Most of the articles published on the effectiveness of sex
education
programs follow the literary format of traditional
narrative reviews (Quinn, 1986; Kirby, 1989, 1992; Visser and van Bilsen, 1994; Jacobs and Wolf, 1995; Kirby and Coyle, 1997). Two exceptions are the quantitative overviews by Frost and Forrest (Frost and Forrest, 1995) and Franklin et al. (Franklin et al., 1997).
In the first review (Frost and Forrest, 1995),
the authors selected only five rigorously evaluated sex education
programs and estimated their impact on delaying sexual
initiation. They used non-standardized measures of
effect sizes, calculated descriptive statistics to represent the overall
effect of these programs and concluded that those
selected programs delayed the initiation of sexual activity. In the
second
review, Franklin et al. conducted a
meta-analysis of the published research of community-based and
school-based adolescent pregnancy prevention
programs and contrary to the conclusions forwarded
by Frost and Forrest, these authors reported a non-significant effect of
the programs on sexual activity (Franklin et al., 1997).
The discrepancy between these two quantitative reviews may result from the decision by Franklin et al. to include weak designs, which do not allow for reasonable causal inferences. However, given that recent evidence indicates
that weaker designs yield higher estimates of intervention effects (Guyatt et al., 2000), the inclusion of weak designs should have translated into higher effects for the Franklin et al.
review and not smaller. Given the discrepant results forwarded in these
two recent quantitative reviews, there is a need
to clarify the extent of the impact of school-based
sex education in abstinent behavior and explore the specific features
of the interventions that are associated to
variability in effect sizes.
Purpose of the study
The present study consisted of a
meta-analytic review of the research literature on the effectiveness of
school-based sex
education programs in the promotion of abstinent
behavior implemented in the past 15 years in the US in the wake of the
AIDS
epidemic. The goals were to: (1) synthesize the
effects of controlled school-based sex education interventions on
abstinent
behavior, (2) examine the variability in effects
among studies and (3) explain the variability in effects between studies
in terms of selected moderator variables.
Literature search and selection criteria
The first step was to locate as many studies conducted in the US as possible that dealt with the evaluation of sex education
programs and which measured abstinent behavior subsequent to an intervention.
The primary sources for locating studies
were four reference database systems: ERIC, PsychLIT, MEDLINE and the
Social Science
Citation Index. Branching from the bibliographies
and reference lists in articles located through the original search
provided
another source for locating studies.
The process for the selection of studies was guided by four criteria, some of which have been employed by other authors as
a way to orient and confine the search to the relevant literature (Kirby et al., 1994). The criteria to define eligibility of studies were the following.
-
Interventions had to be geared to normal adolescent populations attending public or private schools in the US and report on some measure of abstinent behavior: delay in the onset of intercourse, reduction in the frequency of intercourse or reduction in the number of sexual partners. Studies that reported on interventions designed for cognitively handicapped, delinquent, school dropouts, emotionally disturbed or institutionalized adolescents were excluded from the present review since they address a different population with different needs and characteristics. Community interventions which recruited participants from clinical or out-of-school populations were also eliminated for the same reasons.
-
Studies had to be either experimental or quasi-experimental in nature, excluding three designs that do not permit strong tests of causal hypothesis: the one group post-test-only design, the post-test-only design with non-equivalent groups and the one group pre-test–post-test design (Cook and Campbell, 1979). The presence of an independent and comparable `no intervention' control group—in demographic variables and measures of sexual activity in the baseline—was required for a study to be included in this review.
-
Studies had to be published between January 1985 and July 2000. A time period restriction was imposed because of cultural changes that occur in society—such as the AIDS epidemic—which might significantly impact the adolescent cohort and alter patterns of behavior and consequently the effects of sex education interventions.
-
Studies had to be published in a peer-reviewed journal. The reasons for this criterion are 3-fold. First, there have been many reports published in newspapers or advocacy newsletters claiming that specific sex education programs have a dramatic impact on one or more outcome variables, yet when these reports have been investigated, they often were found lacking in valid empirical evidence (Kirby et al., 1994; Frost and Forrest, 1995). Second, unpublished studies are hard to locate and the quality of unpublished research makes it doubtful whether the cost involved in undertaking retrieval procedures is worth investing. This is not to say that all conference papers are defective or all journal articles are free of weaknesses. However, regardless of varying standards of review rigor and publication criteria between journals, published articles have at least survived some form of a refereeing and editing process (Dunkin, 1996). Finally, an added advantage of including only published articles is that it helps reduce the risk of data dependence. The probability of duplication of studies is likely to be increased when including dissertation and papers presented at conferences, which often constitute previous drafts to published studies. Even considering only published studies, it may be difficult to detect duplication. The same data set, or a subset of it, may be repeatedly used in several studies, published in different journals, with different main authors, and without any reference to the original data source. Published studies which were known or suspected to have employed the same database were only included once.1
Coding of the studies for exploration of moderators
The exploration of study characteristics
or features that may be related to variations in the magnitude of effect
sizes across
studies is referred to as moderator analysis. A
moderator variable is one that informs about the circumstances under
which
the magnitude of effect sizes vary (Miller and Pollock, 1994).
The information retrieved from the articles for its potential inclusion
as moderators in the data analysis was categorized
in two domains: demographic characteristics of the
participants in the sex education interventions and characteristics of
the program.
Demographic characteristics included the
following variables: the percentages of females, the percentage of
whites, the virginity
status of participants, mean (or median) age and a
categorization of the predominant socioeconomic status of participating
subjects (low or middle class) as reported by the
authors of the primary study.
In terms of the characteristics of the
programs, the features coded were: the type of program (whether the
intervention was
comprehensive/safer-sex or abstinence-oriented),
the type of monitor who delivered the intervention (teacher/adult
monitor
or peer), the length of the program in hours, the
scope of the implementation (large-scale versus small-scale trial), the
time elapsed between the intervention and the
post-intervention outcome measure (expressed as number of days), and
whether
parental participation (beyond consent) was a
component of the intervention.
The type of sex education intervention
was defined as abstinence-oriented if the explicit aim was to encourage
abstinence
as the primary method of protection against
sexually transmitted diseases and pregnancy, either totally excluding
units on
contraceptive methods or, if including
contraception, portraying it as a less effective method than abstinence.
An intervention
was defined as comprehensive or safer-sex if it
included a strong component on the benefits of use of contraceptives as a
legitimate alternative method to abstinence for
avoiding pregnancy and sexually transmitted diseases.
A study was considered to be a large-scale trial if the intervention group consisted of more than 500 students.
Finally, year of publication was also analyzed to assess whether changes in the effectiveness of programs across time had
occurred.
The decision to record information on all
the above-mentioned variables for their potential role as moderators of
effect sizes
was based in part on theoretical considerations and
in part on the empirical evidence of the relevance of such variables in
explaining the effectiveness of educational
interventions. A limitation to the coding of these and of other
potentially relevant
and interesting moderator variables was the
scantiness of information provided by the authors of primary research.
Not all
studies described the features of interest for this
meta-analysis. For parental participation, no missing values were
present
because a decision was made to code all
interventions which did not specifically report that parents had
participated—either
through parent–youth sessions or homework
assignments—as non-participation. However, for the rest of the
variables, no similar
assumptions seemed appropriate, and therefore if no
pertinent data were reported for a given variable, it was coded as
missing
(see Table I).
View this table:
Decisions related to the computation of effect sizes
Once the pool of studies which met the
inclusion criteria was located, studies were examined in an attempt to
retrieve the
size of the effect associated with each
intervention. Since most of the studies did not report any effect size,
it had to
be estimated based on the significance level and
inferential statistics with formulae provided by Rosenthal (Rosenthal, 1991) and Holmes (Holmes; 1984). When provided, the exact value for the test statistic or the exact probability was used in the calculation of the effect
size.
In order to avoid data dependence, a
conservative strategy of including only one finding per study was
employed in this review.
When multiple variations of interventions were
tested, the effect size was calculated for the most successful of the
treatment
groups. This decision rests on the assumption that
should the program be implemented in the future, the most effective mode
of intervention would be chosen. Similarly, to
ensure the independence of the data in the case of follow-up studies
when multiple
measurements were reported across time a single
estimate of effect size was included.2
Analyses of the effect sizes were conducted utilizing the D-STAT software (Johnson, 1989).
The sample sizes used for the overall effect size analysis corresponded
to the actual number used to estimate the effects
of interest, which was often less than the total
sample of the study. Occasionally the actual sample sizes were not
provided
by the authors of primary research, but could be
estimated from the degrees of freedom reported for the statistical
tests.
The effect sizes were calculated from means and pooled standard deviations, t-tests, χ2,
significance levels or from proportions, depending on the nature of the
information reported by the authors of primary research.
As recommended by Rosenthal, if results were
reported simply as being `non-significant' a conservative estimate of
the effect
size was included, assuming P = 0.50, which corresponds to an effect size of zero (Rosenthal, 1991). The overall measure of effect size reported was the corrected d statistic (Hedges and Olkin, 1985). These authors recommend this measure since it does not overestimate the population effect size, especially in the case
when sample sizes are small.
The homogeneity of effect sizes was
examined to determine whether the studies shared a common effect size.
Testing for homogeneity
required the calculation of a homogeneity
statistic, Q. If all studies share the same population effect size, Q follows an asymptotic χ2 distribution with k – 1 degrees of freedom, where k
is the number of effect sizes. For the purposes of this review the
probability level chosen for significance testing was
0.10, due to the fact that the relatively small
number of effect sizes available for the analysis limits the power to
detect
actual departures from homogeneity. Rejection of
the hypothesis of homogeneity signals that the group of effect sizes is
more
variable than one would expect based on sampling
variation and that one or more moderator variables may be present (Hall et al., 1994).
To examine the relationship between the
study characteristics included as potential moderators and the magnitude
of effect
sizes, both categorical and continuous univariate
tests were run. Categorical tests assess differences in effect sizes
between
subgroups established by dividing studies into
classes based on study characteristics. Hedges and Olkin presented an
extension
of the Q statistic to test for homogeneity of effect sizes between classes (QB) and within classes (QW) (Hedges and Olkin, 1985). The relationship between the effect sizes and continuous predictors was assessed using a procedure described by Rosenthal
and Rubin which tests for linearity between effect sizes and predictors (Rosenthal and Rubin, 1982).
A weighted least-squares regression
analysis was conducted to test the joint effect of the significant
moderators on the effect
sizes. The results of the univariate analyses were
used to select the predictors to be included in the model. Categorical
predictors were included as dummy variables. All
predictors were entered simultaneously. Significance of each regression
coefficient
was tested using a z-test where the standard errors in the output of SPSS were adjusted by a factor of the square root of the mean square error
for the regression model (Hedges and Olkin, 1985). Model specification was tested using the QE goodness-of-fit statistic.3
Results
The search for school-based sex education interventions resulted in 12 research studies that complied with the criteria to
be included in the review and for which effect sizes could be estimated.
The overall effect size (d+)
estimated from these studies was 0.05 and the 95% confidence interval
about the mean included a lower bound of 0.01 to
a high bound of 0.09, indicating a very minimal
overall effect size. Table II presents the effect size of each study (di) along with its 95% confidence interval and the overall estimate of the effect size. Homogeneity testing indicated the presence
of variability among effect sizes (Q(11) = 35.56; P = 0.000).
View this table:
Among the set of categorical predictors studied, parental participation in the program, virginity status of the sample and
scope of the implementation were statistically significant.4
Parental participation appeared to moderate the effects of sex education on abstinence as indicated by the significant Q test between groups (QB(1) = 5.06; P = 0.025), as shown in Table III. Although small in magnitude (d = 0.24), the point estimate for the mean weighted effect size associated with programs with parental participation appears
substantially larger than the mean associated with those where parents did not participate (d
= 0.04). The confidence interval for parent participation does not
include zero, thus indicating a small but positive effect.
Controlling for parental participation appears to
translate into homogeneous classes of effect sizes for programs that
include
parents, but not for those where parents did not
participate (QW(9) = 28.94; P = 0.001) meaning that the effect sizes were not homogeneous within this class.
View this table:
Virginity status of the sample was also a significant predictor of the variability among effect sizes (QB(1) = 3.47; P = 0.06). The average effect size calculated for virgins-only was larger than the one calculated for virgins and non-virgins
(d = 0.09 and d = 0.01,
respectively). Controlling for virginity status translated into
homogeneous classes for virgins and non-virgins although
not for the virgins-only class (QW(5) = 27.09; P = 0.000).
The scope of the implementation also
appeared to moderate the effects of the interventions on abstinent
behavior. The average
effect size calculated for small-scale intervention
was significantly higher than that for large-scale interventions (d = 0.26 and d = 0.01, respectively). The effects corresponding to the large-scale category were homogeneous but this was not the case for
the small-scale class, where heterogeneity was detected (QW(4) = 14.71; P = 0.01)
For all three significant categorical predictors, deletion of one outlier (Howard and McCabe, 1990) resulted in homogeneity among the effect sizes within classes.
Univariate tests of continuous predictors showed significant results in the case of percentage of females in the sample (z = 2.11; P = 0.04), age of participants (z = –1.67; P = 0.09), grade (z = –1.80; P = 0.07) and year of publication (z = –2.76; P = 0.006).
All significant predictors in the univariate analysis—with the exception of grade which had a very high correlation with age
(r = 0.97; P = 0.000)—were entered into a weighted least-squares regression analysis. In general, the remaining set of predictors had
a moderate degree of intercorrelation, although none of the coefficients were statistically significant.
In the weighted least-squares regression
analysis, only parental participation and the percentage of females in
the study
were significant. The two-predictor model explained
28% of the variance in effect sizes. The test of model specification
yielded
a significant QE statistic suggesting that the two-predictor model cannot be regarded as correctly specified (see Table IV).
View this table:
Discussion
This review synthesized the findings from
controlled sex education interventions reporting on abstinent behavior.
The overall
mean effect size for abstinent behavior was very
small, close to zero. No significant effect was associated to the type
of
intervention: whether the program was
abstinence-oriented or comprehensive—the source of a major controversy
in sex education—was
not found to be associated to abstinent behavior.
Only two moderators—parental participation and percentage of
females—appeared
to be significant in both univariate tests and the
multivariable model.
Although parental participation in
interventions appeared to be associated with higher effect sizes in
abstinent behavior,
the link should be explored further since it is
based on a very small number of studies. To date, too few studies have
reported
success in involving parents in sex education
programs. Furthermore, the primary articles reported very limited
information
about the characteristics of the parents who took
part in the programs. Parents who were willing to participate might
differ
in important demographic or lifestyle
characteristics from those who did not participate. For instance, it is
possible that
the studies that reported success in achieving
parental involvement may have been dealing with a larger percentage of
intact
families or with parents that espoused conservative
sexual values. Therefore, at this point it is not possible to affirm
that
parental participation per se exerts a direct influence in the outcomes of sex education programs, although clearly this is a variable that merits further
study.
Interventions appeared to be more
effective when geared to groups composed of younger students,
predominantly females and
those who had not yet initiated sexual activity.
The association between gender and effect sizes—which appeared
significant
both in the univariate and multivariable
analyses—should be explored to understand why females seem to be more
receptive to
the abstinence messages of sex education
interventions.
Smaller-scale interventions appeared to
be more effective than large-scale programs. The larger effects
associated to small-scale
trials seems worth exploring. It may be the case
that in large-scale studies it becomes harder to control for confounding
variables that may have an adverse impact on the
outcomes. For example, large-scale studies often require external
agencies
or contractors to deliver the program and the
quality of the delivery of the contents may turn out to be less than
optimal
(Cagampang et al., 1997).
Interestingly there was a significant
change in effect sizes across time, with effect sizes appearing to wane
across the years.
It is not likely that this represents a decline in
the quality of sex education interventions. A possible explanation for
this trend may be the expansion of mandatory sex
education in the US which makes it increasingly difficult to find
comparison
groups that are relatively unexposed to sex
education. Another possible line of explanation refers to changes in
cultural
mores regarding sexuality that may have occurred in
the past decades—characterized by an increasing acceptance of
premarital
sexual intercourse, a proliferation of sexualized
messages from the media and increasing opportunities for sexual contact
in adolescence—which may be eroding the attainment
of the goal of abstinence sought by educational interventions.
In terms of the design and implementation
of sex education interventions, it is worth noting that the length of
the programs
was unrelated to the magnitude in effect sizes for
the range of 4.5–30 h represented in these studies. Program length—which
has been singled out as a potential explanation for
the absence of significant behavioral effects in a large-scale
evaluation
of a sex education program (Kirby et al., 1997a)—does not appear to be consistently associated with abstinent behavior. The impact of lengthening currently existing programs
should be evaluated in future studies.
As it has been stated, the exploration of moderator variables could be performed only partially due to lack of information
on the primary research literature. This has been a problem too for other reviewers in the field (Franklin et al., 1997).
The authors of primary research did not appear to control for nor
report on the potentially confounding influence of numerous
variables that have been indicated in the
literature as influencing sexual decision making or being associated
with the initiation
of sexual activity in adolescence such as academic
performance, career orientation, religious affiliation, romantic
involvement,
number of friends who are currently having sex,
peer norms about sexual activity and drinking habits, among others (Herold and Goodwin, 1981; Christopher and Cate, 1984; Billy and Udry, 1985; Roche, 1986; Coker et al., 1994; Kinsman et al., 1998; Holder et al., 2000; Thomas et al., 2000).
Even though randomization should take care of differences in these and
other potentially confounding variables, given that
studies can rarely assign students to conditions
and instead assign classrooms or schools to conditions, it is advisable
that
more information on baseline characteristics of the
sample be utilized to establish and substantiate the equivalence
between
the intervention and control groups in relevant
demographic and lifestyle characteristics.
In terms of the communication of research
findings, the richness of a meta-analytic approach will always be
limited by the
quality of the primary research. Unfortunately,
most of the research in the area of sex education do not employ
experimental
or quasi-experimental designs and thus fall short
of providing conclusive evidence of program effects. The limitations in
the quality of research in sex education have been
highlighted by several authors in the past two decades (Kirby and Baxter, 1981; Card and Reagan, 1989; Kirby, 1989; Peersman et al., 1996).
Due to these deficits in the quality of research—which resulted in a
reduced number of studies that met the criteria for
inclusion and the limitations that ensued for
conducting a thorough analysis of moderators—the findings of the present
synthesis
have to be considered merely tentative. Substantial
variability in effect sizes remained unexplained by the present
synthesis,
indicating the need to include more information on a
variety of potential moderating conditions that might affect the
outcomes
of sex education interventions.
Finally, although it is rarely the case
that a meta-analysis will constitute an endpoint or final step in the
investigation
of a research topic, by indicating the weaknesses
as well as the strengths of the existing research a meta-analysis can be
a helpful aid for channeling future primary
research in a direction that might improve the quality of empirical
evidence and
expand the theoretical understanding in a given
field (Eagly and Wood, 1994).
Research in sex education could be greatly improved if more efforts
were directed to test interventions utilizing randomized
controlled trials, measuring intervening variables
and by a more careful and detailed reporting of the results. Unless
efforts
are made to improve on the quality of the research
that is being conducted, decisions about future interventions will
continue
to be based on a common sense and intuitive
approach as to `what might work' rather than on solid empirical
evidence.
Footnotes
-
↵1 Five pairs of publications were detected which may have used the same database (or two databases which were likely to contain non-independent cases) (Levy et al., 1995/Weeks et al., 1995; Barth et al., 1992/Kirby et al., 1991/Christoper and Roosa, 1990/Roosa and Christopher, 1990 and Jorgensen, 1991/Jorgensen et al., 1993). Only one effect size from each pair of articles was included to avoid the possibility of data dependence.
-
↵2 Alternative methods to deal with non-independent effect sizes were not employed since these are more complex and require estimates of the covariance structure among the correlated effect sizes. According to Matt and Cook such estimates may be difficult—if not impossible—to obtain due to missing information in primary studies (Matt and Cook, 1994).
-
↵3 QE provides the test for model specification, when the number of studies is larger than the number of predictors. Under those conditions, QE follows an approximate χ2 distribution with k – p – 1 degrees of freedom, where k is the number of effect sizes and p is the number of regressors (Hedges and Olkin, 1985).
-
↵4 An assessment of interaction effects among significant moderators could not be explored since it would have required partitioning of the studies according to a first variable and testing of the second within the partitioned categories. The limited number of effect sizes precluded such analysis.
-
References marked with an asterisk indicate studies included in the meta-analysis.
No comments:
Post a Comment