MEERA Glossary

Analysis of Covariance

Analysis of Covariance is a statistical test used to assess whether the means of more than two groups are equal after accounting for the effects of one or more other predictive factors.  ANCOVA accomplishes a similar test to ANOVA, while also accounting for and removing the effects of other factors such as gender, age or pretest score that may influence the outcome you want to know about.

Analysis of Variance

Analysis of Variance is a statistical test for assessing whether the means of more than two groups are equal.  ANOVA is an extension of the 2-independent samples t-test, allowing for comparison of more than two groups. 

Attitude

An attitude is a positive or negative evaluation of something.  Environmental educators are interested in many different attitudes that people may have.  A few of these include attitudes toward nature, animals, built environments, environmental problems, environmental policies, and environmental behaviors.

Marcinkowski, T. (1993). Assessment in Environmental Education. In R. J. Wilke (Ed.), Environmental Education Teacher Resource Handbook:  A Practical Guide for K-12 Environmental Education (pp. 143-198). Millwood, NY: Kraus International Publications.

Beginner / Intermediate / Advanced

Throughout the MEERA website, resources are frequently designated as beginner, intermediate, and/or advanced.  These designations indicate the approximate evaluation skill level that each resource requires and should help you in selecting the most appropriate resource.

Behavior

Behavior simply refers to actions.  Environmental educators are often interested in human behaviors that have direct or indirect impacts on the environment.  For examples, some behaviors that environmental educators are interested in include lifestyle behaviors (e.g., riding the bus), volunteer behaviors (e.g., participating in a local park restoration project), civic behaviors (e.g., voting in a local election), consumer behaviors (e.g., buying organic produce), and advocacy behaviors (e.g., boycotting an environmentally unfriendly business).

Benchmarks

Benchmarks are standards of excellence or achievement against which things can be measured.  In education, benchmarks are often represented as statements of what students should know or be to do by a certain grade level.  The term benchmark is sometimes used interchangeably with the term standard.  Some examples of benchmarks or standards relevant to environmental education include NAAEE’s Excellence in Environmental Education Guidelines for Learning (Pre K-12) and the National Research Council’s National Science Education Standards

Bias

In statistics, bias describes the extent to which a measurement or sample underestimates or overestimates a true value.  Statistical bias can be used to describe several different concepts including:

Categorization of responses

Categorization of responses is a process used in content analysis that involves organizing qualitative data (e.g., interview transcripts, written statements, etc.) into different categories or themes that reflect similarities in the information.  Categories may reflect any concept a researcher is interested in.  For instance, categories may relate to subject matter, goals, values, actors, behaviors, etc.  After categories are developed, they can be used to code the data.  Coding of qualitative data, in turn, is an analytic process through which data may be examined, interpreted and categor

Census

A census is a process of obtaining information from an entire population.  In contrast, the process of sampling involves obtaining information from a subset of a population, often with the goal of making inferences about the whole population.  A well known example of a census is the national census of population and housing which is implemented every ten years in the United States.

Chi-square Test

A chi-square test is a statistical technique used to test for associations between variables when you have nominal data.  Nominal data are data that fit into discrete classes such as gender [male, female] or type of car [sedan, SUV, pickup truck].  Thus, for example, a chi-square test could be used to determine if, for a given population, there is an association between gender and type of car driven.

Coding

Coding is the process of turning initial evidence (raw data) into standardized symbols.  When coding quantitative data, the standardized symbols are usually numeric. For example, the gender of a student may be coded as “1” for a female and “2” for a male.

When coding qualitative data, the standardized symbols usually take the form of categories or themes.  For example, in an interview with a student, a researcher could code for the types of environmental behaviors that the student mentions.

Content analysis

Content analysis is a research analysis technique where data content (e.g., speech, written text, interviews, images, etc.) is categorized and classified.  Content analysis can take a qualitative approach (e.g., focusing on meanings or implications of categories) or a quantitative approach (e.g., focusing on the relative frequency with which different categories are mentioned).

Related terms: Qualitative data, categorization of responses

Control group

In evaluation research, the control group is the group of people who do not receive the service or intervention, but who are similar to those who are receiving the service or intervention (i.e., the treatment group).  For example, to study the impact of an environmental education program, an evaluator may designate one fifth grade classroom that participates in the program as the treatment group, and another fifth grade classroom in the same school that does not participate in the program as the control group.  The control and treatment groups may be compared to determine what impacts, if

Convenience sample

A sample selected not because it is representative of the population, but because it is convenient for the researcher to use – such as when college professors conduct a study with their own students.  

Vogt, W. P. (1993). Dictionary of Statistics and Methodology. London: Sage Publications

Criterion sample

A critical case sample is a purposive sample that selects particularly important cases that logic or prior experience suggests will allow for generalization to a larger population.  A common example is the selection of key voting precincts for predicting the outcome of an election.  The assumption is that what is true for these critical cases is likely to be true for most others, thus allowing such cases to be generalized.

Critical case sample

A critical case sample is a purposive sample that selects particularly important cases that logic or prior experience suggests will allow for generalization to a larger population.  A common example is the selection of key voting precincts for predicting the outcome of an election.  The assumption is that what is true for these critical cases is likely to be true for most others, thus allowing such cases to be generalized.

Dependent variable

A dependent variable is a factor that is predicted to change in response to another factor, called an independent variable.  For example, if you are examining the effect of a professional development workshop on teachers’ intentions to teach about the environment, the workshop will be the independent variable and the teachers' intentions will be the dependent variable.

Related terms: Independent variable

Descriptive statistics

Descriptive statistics are statistic procedures that are used to summarize, organize, graph, or otherwise represent information directly from data.  Examples include measures such as frequency, mean, mode, and median.  One example of a descriptive statistic would be the mean grade point average for students in a study.  Descriptive statistics differ from inferential statistics, which try to make conclusions that extend beyond what the data immediately show.

Related terms: mean, standard deviation

Educational impacts

Educational impacts are outcomes of education that can contribute to meeting long-term goals.  Educational impacts include long-term effects on learners, teachers, or the learning environment.
Examples include:

Effect size

An effect size is a measure of the strength of a relationship between two variables.  Thus, in program evaluation, effect size can be used to estimate the magnitude of a treatment on an outcome.  When undertaking research, it is often helpful to know not only whether a result is statistically significant, but also what the size of the result is.  In other words, effect size can help determine whether the outcome of a treatment was practically meaningful.  For example, a weight loss program could either report that it leads to statistically significant weight loss or that it leads to an ave

Environmental impacts

Environmental impacts are positive or negative effects that human actions have on the environment.  Assessing environmental impacts is an important environmental policy process.  For example, the Environmental Protection Agency requires an environmental impact assessment to be completed before approving certain types of projects (e.g., building a new highway).

Evaluation design

An evaluation design consists of the methods used by the evaluation.  This typically involves deciding whether the evaluation will be experimental, quasi-experimental, or non-experimental, and whether it will involve:

  • Pretest, posttest, both, or multiples of both
  • A control group
  • Random assignment of individuals to groups

 

An evaluation design will also clarify if data will be collected through surveys, interviews, document collection, observation, focus groups, case studies or other means.

Evaluation questions

Evaluation questions are the questions that will be answered by evaluators through data collection, analysis, and interpretation. Evaluation questions should be developed in conjunction with program stakeholders and should grow from a program’s goals and objectives. The questions should focus on aspects and outcomes of the program that are important to the stakeholders.  See Step 3 for more information.

Evaluation report

There are two common types of evaluation reports, progress reports and final reports. Progress reports provide information and recommendations based on the successes and challenges of a project while it is being implemented. Final reports present findings and conclusions about the project's accomplishment of its intended goals when the project is complete.  Components commonly included in an evaluation report include:

Experimental design research

Experimental design research is a research approach where the researcher controls the selection of participants in the study and the participants are randomly assigned to treatment and control groups.  By designating treatment and control groups, the researcher manipulates a variable.  For example, say a program evaluator wants to know the effect of a zoo field trip on students’ attitudes toward endangered species.  To examine the effect of the zoo trip, the experimental researcher can compare attitudes of students who have and have not taken the field trip.  Experimental design research c

Extreme case sample

An extreme case sample is one in which the researcher chooses participants who are highly unusual cases – these participants may be particularly competent or exceptionally troublesome.  A few examples could include studying an outstanding student who organized a successful state-wide environmental campaign, a student who became an eco-terrorist, and a student who dropped out of a year-long program after a week.

Formal Education

Although there are no absolute definitions for formal education, this term generally refers to the structured educational system provided by the state for children and young adults.  In most countries, the formal education system is state-supported and state-operated. In some countries, the state allows and certifies private systems which provide a comparable education.  Common formal education settings include classrooms, school science laboratories, and college lecture halls.  Institutions that provide formal education include preK-12 schools, colleges and universities.

Formative evaluation

Evaluators use formative evaluations to improve the performance or implementation of a program through periodic or ongoing monitoring of the program, its processes, and its participants. Evaluators use this type of evaluation most often during the implementation phase of a program.

Formative evaluation

“Formative evaluation is typically conducted during the development or improvement of a program … and it is conducted, often more than once, for in-house staff of the program with the intent to improve. The reports normally remain in-house; but serious formative evaluation may be done by an internal or an external evaluator or preferably, a combination; of course, many program staff are, in an informal sense, constantly doing formative evaluation.

Generalizability

In program evaluation, generalizability may be defined as the extent to which you can come to conclusions about a larger population based on information from a sample of that population.  For example, when a national curriculum developer such as Project Learning Tree develops a new curriculum, it first tests the curriculum in a small number of schools that are assumed to be representative of schools around the country.  If this assumption is correct, the outcomes of using the curriculum in the small number of schools can be generalized to estimate the outcomes of using the curriculum in ma

Goal

Programs should have clearly specified goals and objectives before proceeding with an evaluation. A program goal is a broad statement of what the program hopes to accomplish or what changes it expects to produce.  An sample goal could be to increase students’ reducing, reusing and recycling behaviors. Objectives are different from goals in that objectives are specific and measurable, while goals are broad statements of intended outcomes that do not have to be measurable.

Related term: Objectives

Health Impacts

Health impacts are positive or negative outcomes that affect human health in a population. Some environmental education efforts focus on educating people about:

Hierarchical linear modeling

Hierarchical linear modeling is an advanced form of statistical regression that allows for analysis of data at multiple levels.  For example, in educational program evaluation, the evaluator is often interested in the impact of a program on students.  However, the students are nested within classrooms nested within schools – and both classrooms and schools can have significant impacts on students.  Hierarchical linear modeling allows researchers to account for and remove the effects of factors that are not of primary interest (e.g., the effect of being in a certain classroom) so that the i

Homogeneous sample

A homogeneous sample is one in which the researcher chooses participants who are alike – for example, participants who belong to the same subculture or have similar characteristics.  A homogeneous sample may be chosen with respect to only a certain variable – for instance, the researcher may be interested in studying participants who work in a certain occupation, or are in a certain age group.  Homogeneous sampling can be of particular use for conducting focus groups because individuals are generally more comfortable sharing their thoughts and ideas with other individuals who they perceive

Impact evaluation

Impact evaluation is evaluation that focuses on the benefits or payoff of a program rather than on examining program processes, delivery or implementation. 

Scriven, M. (1991).  Evaluation Thesaurus.  Newbury Park, CA: Sage Publications.

For information on the differences between impacts and outcomes in EE, click here.

Impacts

In evaluation, impacts are the broad, ultimate changes that occur within a community, organization, society, or environment as a result of a program or activity.  Impacts are typically long-term changes, sometimes affecting individuals other than the direct program participants.  Desired types of environmental education program impacts can include environmental impacts (e.g., improved water quality), educational impacts (e.g., improved achievement in school), and health impacts (e.g., lower asthma rates).

Implementation evaluation

Implementation evaluations focus on telling “decision makers what is going on in a program, how the program has developed, and how and why programs deviate from initial plans and expectations.”  Rather than focusing on outcomes, implementation evaluation pays attention to inputs, activities, processes and structures within programs. 

Independent variable

The independent variable is the presumed cause in a study that can predict or explain the values of another variable in the study.  Thus, the independent variable is the hypothesized cause of an outcome (dependent variable).  For example, if you are examining the effect of a professional development workshop on teachers’ intentions to teach about the environment, participation in the workshop would be the independent variable, and teachers’ intentions would be the dependent variable.

Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.

Indicator

An indicator is an observable measure of achievement, performance, or change.  It provides evidence of a condition or result.   Examples of indicators include test scores, number of participants, program completion rates, types of environmentally responsible behaviors observed, or participants’ confidence in their environmental problem solving skills.

Inferential statistics

“With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one rather than one that might have happened by chance in this study.

Informal Education

Formal, nonformal and informal education are not completely distinct.  However, notwithstanding some aspects of overlap, informal education often refers to voluntary learning experiences where the learner chooses what, when, and how to learn.  In informal learning settings, an educator may make opportunities and information available to the learner to use however he or she chooses.  Examples of informal learning experiences could include watching a nature program on television or reading interpretive signs in a park.

Informed consent

In research practice, informed consent is a legal condition entered into by a person who agrees or “gives consent” to participate in a study based on learning about and understanding the facts and implications of participation.  Before being asked to agree to participate, participants should be provided with information concerning the purpose of the program evaluation, what they will be asked to do as participants, and how the information they provide will be used.

Institutional Review Board

Institutional review boards, or IRBs, (also known as independent ethics committees, or human ethics committees) are committees mandated by federal regulation that review and monitor plans for research and evaluations involving human subjects.

Instrument

In program evaluation, an instrument is a tool that is used to gather, organize, and analyze information needed to answer the evaluation questions.  Instruments can include observation forms, interview protocols, questionnaires, and performance tests.

Online Evaluation Resource Library.  (Retrieved August 26, 2007).  Glossary of Plan Components. http://oerl.sri.com/plans/plansgloss.html

Intermediate Testing

Intermediate testing is sometimes included in program evaluation designs.  For example, in addition to conducting a pretest and posttest, an evaluator may decide to include an intermediate test by collecting some data (e.g., a survey or test of knowledge) while the participants are experiencing the program.  Including an intermediate test is particularly useful for longer programs, and allows the evaluator to track participants’ progress as they experience the program over time.  Intermediate testing can be added to a variety of program evaluation designs.  In some cases, evaluators may de

Key incident analysis

In key incident analysis, multiple examples of major events are recorded and reflected upon to draw conclusions.  Key incidents can be gathered in different ways, but one common approach is to ask a participant to tell a story about an experience they have had that is relevant to the phenomenon of interest.  This technique is helpful for learning from non-routine events that may challenge common understandings and beliefs.

Knowledge

Environmental educators are typically interested in four domains of environmental knowledge including:

Likert scale

The Likert Scale (pronounced “lick-ert”) is a type of rating scale often used in surveys that asks respondents to indicate their level of agreement with a statement.  A typical Likert scale item consists of a statement and five levels of agreement that the respondent can choose from.

Example:
Despite our special abilities humans are subject to the laws of nature.
__ Strongly agree
__ Agree
__ Neither agree nor disagree
__ Disagree
__ Strongly disagree

Logic Model

A logic model is a graphic representation of the linkages between program goals, resources, activities, and expected outcomes.  Logic models illustrate the ways in which program inputs and activities are thought to lead to outputs and outcomes in both the short and long term.  Logic models often include diagrams or pictures that illustrate these relationships.  In program evaluation, logic models provide a basis for developing evaluation strategies.  For more information, see Step 2.

Maximum variation sample

A maximum variation sample is a purposefully selected sample of persons or settings that represent a wide range of experience related to the phenomenon of interest.  With a maximum variation sample, the goal is not to build a random and generalizable sample, but rather to try to represent a range of experiences related to what one is studying.  Maximum variation sampling is an emergent or sequential approach – what one learns from initial participants can inform the subsequent direction of the study.

Mean

The mean is a descriptive statistic that measures the central location of the data.  It is calculated as an average – by finding the sum of all individual data points and dividing by the number of points.  The mean is sensitive to outliers or extreme cases.  In other words, having a few extreme measurements in a data set can lead to a mean score being very different from a median score.  For this reason, it can be helpful to report an additional descriptive statistic such as a standard deviation with the mean.  In environmental education the mean might be calculated for test scores, such a

Mixed Methods Research

Mixed methods research is an approach for conducting social science research that combines different types of research.  Generally, quantitative and qualitative approaches are combined in a mixed methods study.  The goal of using a mixed methods approach is to achieve a more robust view of what is being studied.  An example of a mixed methods program evaluation could include the use of surveys, interviews, and observations for data collection.

Multiple treatment groups

When evaluators are interested in the effects of different components of a complex program, or in the effects that a program has on different groups of individuals, they can include multiple treatment groups.  For example, if a nature center provides three different education programs to the public, an evaluator can consider the participants in the different programs to represent different treatment groups.  Or, if the nature center is interested in the effectiveness of one program when administered to teenagers versus senior citizens, an evaluator can compare results for these two treatme

Needs assessment

A needs assessment is an analysis approach typically conducted in the initial stages of planning to determine the need for a project or program by considering aspects like resources available, extent of the problem and need to address it, and audience interest and knowledge. A needs assessment can also be conducted for writing a grant proposal in order to provide evidence to the potential funder of the need for the program.

Nonformal Education

Formal, nonformal and informal education are not completely distinct entities.  However, notwithstanding some aspects of overlap, nonformal education often refers to voluntary, structured learning activities that take place outside of a formal learning setting.  Workshops, seminars, service groups, zoos, tours, and nature centers are typical settings for nonformal learning in environmental education. 

Heimlich, J. (1993).  Nonformal Environmental Education: Toward a Working Definition. ERIC Bulletin. May 1993.

Objective

An objective is a specific and measurable result that can be reached to accomplish a particular goal. Several examples of objectives for an education program about waste reduction could include:

  • Students will be able to identify three ways in which an item can be reduced, reused, and recycled
  • Students will report having reused paper in the last week.

The related goal that these objectives could contribute to accomplishing could be to increase student reducing, reusing, and recycling behaviors.

Outcome Evaluation

Outcome evaluation is used to examine a program’s direct effects on specifically defined target outcomes, and to provide direction for program improvement.  For example, outcome evaluation may show that an environmental education program was successful in achieving its target outcome of 90% of participants being able to explain the greenhouse effect. 

For information on the differences between impacts and outcomes in EE, click here.

Outcomes

Outcomes are the likely or achieved short-term and medium-term effects of a program or intervention. In other words, outcomes are what happen as a result of the program or activities.  Environmental education outcomes that are commonly measured include changes in knowledge, skills and attitudes.

For information on the differences between impacts and outcomes in EE, click here.

Outputs

Outputs are the products and services that are produced by a program.  Output measures can be used to indicate the degree to which products and services were produced as planned.  Example outputs of an environmental education program could include a teachers’ manual, a workshop series, an Earth Day event, or an environmental writing competition.

Participatory evaluation

Participatory evaluation is a bottom-up approach to evaluation that is controlled either partially or fully by interested program participants, staff, board members, and community members.  These participants ask the questions, plan the evaluation design, gather and analyze data, and determine actions to take based on the results.

For more information, see Participatory Evaluation.

Pilot test

A pilot test is an initial trial of a program, instrument, or other activity intended to test out procedures and discover and correct potential problems before proceeding to full scale implementation.  Pilot tests can be conducted either for a program (i.e., testing out the program with a small group of participants) or for an evaluation (i.e., testing out instruments and data collection procedures with a small group of people similar to program participants).  When possible, a pilot test, or trial run, is conducted with a sample group that is representative of the target population.

Posttest

In program evaluation, a posttest is a test or measurement administered after services or activities have ended. Posttest results are often compared with pretest results to examine the effects of the program being evaluated.

Power analysis

In statistics, power is defined as the probability of rejecting the null hypothesis (i.e., there is no effect), when the alternative hypothesis (i.e., there is a real effect) is true.  Researchers want the power, which is also referred to as Beta, to be as large as possible because greater power provides greater ability to detect an effect.

Pretest

In program evaluation, a pretest is a test or measurement administered before the program or activities begin. The results of a pretest can later be compared with the results of a posttest to show evidence of the effects of the program being evaluated.

Program

In the context of evaluation, a program is the activity (or set of activities) that is being evaluated.  In an environmental education evaluation, the program of interest may be one or more of:

  • A classroom curriculum
  • A field trip
  • A workshop for teachers
  • Or another type of activity

 

Purposeful sample

Purposeful sampling is a non-random method of sampling where the researcher selects “information-rich cases for study in depth.

Qualitative data

Qualitative data consists of any information that is collected that is not numerical.  Types of sources of qualitative data include interviews, observations, field notes, written documents, photographs and videotapes. 

Research Methods Knowledge Base. (Retrieved August 26, 2007).  Qualitative Data. http://www.socialresearchmethods.net/kb/qualdata.php

Quantitative data

Quantitative data is data that is measured, analyzed, summarized, or presented in numerical form.  For example, quantitative evaluation methods can be used to measure instances, participants, size, degree, extent or frequencies.  In turn, these data can be used to provide a quantitative picture what is happening through reporting statistics either verbally or graphically (i.e., using tables, charts, histograms or graphs).

Quasi-experimental design

In evaluation research, quasi-experimental design studies are different from experimental design studies in that participants are not randomly assigned to treatment and control groups.   While a quasi-experimental design typically uses comparison groups, random assignment is not used in this design because it is often impractical or impossible to do so.  To minimize threats to validity in quasi-experimental design studies, researchers can try to account for pre-existing factors (e.g., pretest knowledge or attitudes) that could affect the outcome differences between treatment and comparison

R squared

In statistics, R2 tells you what proportion of the variability in the dependent variable (the outcome) is explained by an independent variable (often, the treatment) in your statistical model.  In other words, R2 helps you determine how well your statistical model fits.  If you have a low R2 value, this suggests that your treatment is not having a big effect on the outcome you are interested in.  If you have a high R2 value, the administration of your treatment or program has done a good job explaining the outcomes that you have found.   

Random assignment

Random assignment is an experimental design technique used to assign participants to different treatment and control groups such that each individual in each group is assigned entirely by chance.  The intention underlying random assignment is that the characteristics for the different groups will be roughly equivalent and therefore any effect observed between groups can be linked to a treatment effect instead of to a characteristic of individuals in a group.  Researchers often use computer programs to randomly assign participants to different groups.  Other techniques as simple as pulling

Regression analysis

Regression analysis is a statistical tool used to investigate relationships between one dependent variable and one or more independent variables. In evaluation, regression analysis can be used to explain how the variation in the outcome depends on the program treatment – or in other words, to explain the effect of the treatment on the outcome.  Regression analysis can also be used to predict future effects.

Related terms: R squared

Reliability

Reliability is the consistency or repeatability of a measurement.  Thus, a test measurement would be considered reliable if the test was given twice and a person’s score on the test was the same or similar both times. For example, if Emma took a test about knowledge of climate change twice and her scores were very close both times, this would provide evidence for the reliability of the the measures used to test her knowledge.

Retention test

A retention test is a test that is administered not immediately, but some time period after a program or activity has ended, to determine how well an outcome was retained.   For example, a year after a composting workshop, participants could be given a retention test to determine how much they are still composting compared with how much they were composting immediately after the workshop ended.

Retrospective pre-test

A retrospective pretest is given after the program ends, rather than before the program.  Typically in conjunction with the posttest, the participant is asked to think back and answer questions about their level of understanding, skill, attitudes or behavior before the program.  For example, after completing a program on energy conservation, students could be asked how frequently they turned off lights that were not in use before participating in the program.

Sample

A sample is a group of subjects or cases selected from a larger group in the hope that this smaller group (the sample) will reveal important things about the larger group (the population).

Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.

Self-report

Self-report is a method used to evaluate participant characteristics by asking the participants to provide ratings about themselves.  For example, an evaluator might ask a participant to rate their level of science knowledge, their attitude toward a politician, their skill in addressing local environmental problems or their recent volunteer participation.

Skills

Two important domains of skills relevant to environmental education include cognitive skills and affective skills.

“In environmental education, two sets of cognitive skills are of particular relevance: (1) skills for investigating environmental problems and issues, including identification, analysis, and evaluation; and (2) skills for dealing with action strategies, including their appropriate selection and the planning, implementation, and evaluation of discrete actions.

Snowball sample

A snowball sample is a purposive sample selected by relying on previously identified group members to identify other members of the same population.  As new names are identified, the sample gets bigger much like a rolling snowball.  Any names mentioned repeatedly are likely to be particularly valuable.  Snowballs samples are useful when a desired characteristic of a population is rare, or when a list of population members is unavailable to the researcher. 

Henry, G. (1990).  Practical Sampling. Newbury Park, CA: Sage Publications.

Stakeholder

In the context of program evaluation, a stakeholder is an individual who has an interest in, affects, or may be affected by a program, evaluation, or evaluation outcome.  For example, stakeholders can include program funders, steering committee members, program staff, program participants, and others.

Standard deviation

A standard deviation is a statistical measure of the spread or variability in a distribution of scores.  “It is a measure of the average amount the scores in a distribution deviate from the mean. The more widely the scores are spread out, the larger the standard deviation.”

Vogt, W. P. (2005). Dictionary of Statistics and Methodology. London: Sage Publications.

Statistical significance

A result is called "statistically significant" if it is unlikely to have occurred by chance.  This term is often used to describe differences, for example whether or not the difference in scores for two groups is statistically significant.  Note though, that even if a difference between groups is statistically significant, that only means there is a difference, and does not necessarily mean that the difference is large or important.

Summative evaluation

Summative evaluation is evaluation designed is to evaluate the effectiveness of a program or activity based on the original objectives or for a variety of other purposes. Stakeholders including supervisors and grant makers are particularly interested in summative evaluation. Issues such as short term and long term program outcomes, cost of program development, on-going costs in relation to efficiency and effectiveness of a program are often examined in summative evaluations.

T-Test

There are several kinds of t-tests, but the most common is the two sample t-test also known as the independent samples t-test.  The two sample t-test tests whether or not two independent populations have the same mean values for a measure.  For example, a researcher could use a two sample t-test to test whether a treatment and a control group have the same mean posttest score for attitude toward raising gasoline taxes.

Treatment group

The treatment group consists of the individuals who are participating in the program or intervention being studied.  The treatment group can be compared with the control group (those who are not participating in the program or receiving the service) to determine what, if any changes the program caused.

Related terms: Control group

Typical case sample

Typical case sampling is a type of purposeful sampling in which, “subjects are selected who are likely to behave as most of their counterparts would.”  For example, an evaluator might select students with a socio-demographic profile like that of the larger population of interest.

Bamberger, M., Rugh, J., & Mabry, L. (2006).  Real World Evaluation: Working Under Budget, Time, Data, and Political Constraints. Thousand Oaks, CA: Sage Publications.

Unadjusted score

An unadjusted score is an original score (for example a score on a test) that has not been transformed.  An example of an unadjusted score would be the number of questions that a student answered correctly on a test.  An unadjusted score can be contrasted with a standard score, which indicates how many standard deviations an observation is above or below the mean. 

Unit of Analysis

“The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study:

  • individuals
  • groups
  • artifacts (books, photos, newspapers)
  • geographical units (town, census tract, state)
  • social interactions (dyadic relations, divorces, arrests)

In environmental education program evaluation, your unit of analysis might, for example, be students, curriculum materials, or visits to your nature center or project website.

Validity

Validity is used to “describe a measurement instrument or test that accurately measures what it is supposed to measure; the extent to which a measure is free of systematic error.  Validity also refers to designs that help researchers gather data appropriate for answering their questions.  Validity requires reliability, but the reverse is not true.