|
|
||||||
|
The joy of the Lord is my strength.
|
|
|||||
|
Understanding Standardized Tests Transcript Hello. My name is Susan C. Anthony and this transcript is of my workshop "Understanding Standardized Tests". The handouts for this workshop are available in portable document format on my web site: www.SusanCAnthony.com. The purpose of this workshop is to make you more aware of the value and the limitations of standardized tests. Love them or hate them, standardized tests are an important part of life in America. My background Before we begin, I should tell you a little about me. I grew up in the Colorado Rocky Mountains 35 miles west of Denver. I went to elementary school in a tiny, two-room mountain school, graduated from Nederland High School in a class of 25, and went on to the University of Northern Colorado where I earned a degree in Elementary Education. I was one of those kids who loved taking tests and did well on them. If offered a chance to take a test, I would take it. Once the home economics teacher came around and asked if any girls wanted to take the Betty Crocker Homemaker of Tomorrow test. Class wasn't too exciting that day so I did and I won an award! It was embarrassing because cooking is anything but a strength for me. My husband teases me about that to this day. Tests paid off well for me. On the basis of standardized test scores, I was awarded a scholarship that enabled me to attend college, something my family could not have afforded otherwise. After college, I moved to Alaska and taught intermediate grades for 10 years with the Anchorage School District. Now my mission is to support home schooling families in any ways I can so they have the greatest chance of success in their vitally important endeavor. History of Testing The whole idea of standardized testing grew out of Darwins "Origin of Species" which was written in 1859. Darwin believed the human race was evolving toward higher intelligence, actually toward perfection. He also believed that acquired intelligence was passed on to the next generation; in other words, if you study hard and learn a language or skill, it can be passed on to your children. Darwins ideas were at the roots of the holocaust. The Nazi policy of "eugenics" was begun to encourage the "fittest" to procreate and to eliminate those not deemed "fit." Most people in the late 1800s and early 1900s believed that they were living in the "Age of Progress". Science and technology could and would solve all of mankinds problems, possibly during their lifetimes. New ideas were good, old traditional ideas were not. By 1908, people had accepted the idea of "mental level." It has always been obvious that some people find it easier to learn than others. In an effort to increase the progress of mankind, sterilization laws were enacted in 16 states with the aim of eliminating mental retardation. In World War I, the U.S. Army administered "mental tests" with the aim of selecting officer candidates from the large pool of recruits. In the 1920s, the U.S. limited immigration to people from Northern Europe because they did better on IQ tests. The interesting thing is that the average scores of all immigrant groups in the U.S. go up within a few generations. In 1923, Jews had the lowest group IQ scores. Now they have the highest. Colleges began to use intelligence tests as a basis for selecting scholarship recipients. The first Scholastic Aptitude Test was administered in 1926, but until a scoring machine was invented, it wasnt practical to administer tests to huge groups of people. The first scoring machine was invented in 1931 by a high school teacher named Reynold Johnson. As a Minnesota farm boy, hed entertained himself by scratching pencil marks on the outsides of spark plugs in the Model T Fords of his older sisters dates and then hiding in the bushes to watch and giggle. Graphite conducts electricity, so the plugs would misfire and the cars wouldnt start. He applied this principle to the scoring machine. He was eventually hired by IBM and invented lots of other things. The popularity of tests in America is rooted in a couple of American ideas. Remember that in Europe, rulers inherited their status and position. Our founding fathers did not want that to be the case in this new country. Thomas Jefferson promoted the idea of a "natural aristocracy" based on virtue and talent rather than an "artificial aristocracy" based on wealth and birth. Americans have always valued the ideals of universal individual opportunity and social mobility. Tests were one way to try to find the young people most likely to succeed. One early testing advocate wanted to base students progress and rewards on what they knew rather than how many hours they spent in class. Many people still like that idea, but no one has figured out a really fair way to do that. In 1943, the Navy administered a test to exempt Navy candidates from the draft in order to allow the most promising to go to college so they could later perform higher military tasks. For over a century, it was thought that heredity played a much bigger part in determining intelligence than did environment. That assumption changed in the 1960s and 70s. Poor scores on tests began to be blamed on environment, and there was a new assumption that everyone could be above average given proper schooling. In the 1970s, the U.S. Supreme Court outlawed the use of intelligence tests in hiring because they are "discriminatory by nature." Thats true. They attempt to discriminate between those with higher and lower potential for learning. Some major school districts, including Chicago, Los Angeles and New York, banned the use of IQ tests because minority students tended to score low as a group. But remember, all minorities and immigrants score low for a few generations. Historically, scores have improved over time and some groups that once scored low now score very high. As a result of these changed assumptions, achievement tests became more popular than intelligence tests, and that is the case today. Achievement tests try to measure what someone has already learned, while intelligence tests try to measure the ability to learn. Most scientists now believe that intelligence is between 50% and 70% inherited. 30% to 50% depends on the environment. These figures are based on studies of identical twins separated at birth into different environments. Limitations of Tests One of the main problem Americans have with tests is we put too much emphasis on them. They are valuable. They are not useless. But they are not the most important thing in the world! Someone once wrote, "Not everything precious can be measured, and not everything measurable is worth teaching." Here are some things standardized tests cant measure:
As important as I thought tests were when I was in school, no one has ever asked for my scores in real life. To me that is disappointing, because I had good scores. But its something to keep in mind when your kids don't do as well as youd like. Tests are not bad in and of themselves. In life there are many kinds of tests. Even God tests us, sometimes without warning. Dont you wish that youd get a little warning, maybe an angel speaking to you. "This is a test. This is only a test. This test will last for five days." Then, when its over, youd get some feedback or a grade, maybe, "Well done, my child." The Bible says again and again that God tests us for our benefit. Steps in Teaching Testing or evaluation is one of the steps of teaching:
Measurement v. Evaluation Measurement is the process of taking a test. If you get on the bathroom scale and it says 135 pounds, you have measured your weight. Evaluation is judging what a test score means and deciding what action to take next. The measurement must be placed in context in order to decide what it means. For example, 135 pounds can mean something entirely different if you are 3 tall as compared to 6 tall. The meaning of the measurement would be different if you were wearing heavy clothing. With tests, you have to decide what the score means. Maybe the child had a bad day. Maybe he hasnt been taught the material yet. Tests are just one piece of the puzzle. Important decisions should never be made based on a single test score! Informal v. Formal Evaluation Informal evaluation is the teacher noticing what the child can and cannot do. It is a continual part of any good teaching. There is no need for a grade but it gives the teacher feedback as to what might need to be clarified or explained in another way. Formal testing is structured and is often used as a basis for grades or determining mastery. One thing I learned later in my career is that its good to let kids know your objectives in advance, then teach. I used to ask them if they thought they were ready for the test. Reliability v. Validity Reliability means consistency. If the bathroom scale says 122 pounds when you get up, 115 pounds after breakfast, and 150 pounds later the same day, it is not reliable. A physicians scale is more reliable than a bathroom scale, but for our purposes, the less expensive scale is usually good enough. A test is not reliable if the questions are ambiguous or too easy to guess. It is not reliable if the student is ill, is not listening, is in emotional turmoil, or has a bad attitude and doesnt do his best. It isnt reliable if a child is paralyzed with fear. Minor test anxiety is normal and actually tends to help in most cases. A test is not reliable if the room is noisy or chaotic, or the temperature isnt comfortable. It's better that the temperature be cool than warm. Keep in mind that reliability is higher for longer tests. Total scores are more reliable than sub-scores for each subject area and group scores are more reliable than individual scores. Errors tend to cancel each other out so the larger the sample, the more you can depend on it being close to truth. Validity means the test actually measures what it sets out to measure. A test is only valid for what it sets out to measure. Your bathroom scale, for example, is not useful for determining your height. An invalid test is not reliable. Standardized tests are not valid if it they are not given under standard conditions. The directions must be read word for word, like a flight attendant. If a child is on the wrong page, puts more than one mark in a row, or loses his place, the test is not valid. Standardized tests are only valid if they relate to what has been taught. They assume that all kids at a certain age have been taught the same things. Objective Tests v. Subjective Tests Objective test questions have only one right answer. Multiple-choice, true-false, matching, and fill-in-the-blank tests are objective. In sports, racing is evaluated objectively. Subjective tests may be scored differently by different people. Essay tests are subjective. In sports, ice skating and gymnastics are subjectively judged. Objective tests are always more reliable, but not everything can be measured objectively. Aptitude Tests v. Achievement Tests Aptitude Tests (intelligence tests) are predictive and set out to predict the future. At this time, only about 8% of standardized tests are aptitude tests, for reasons mentioned when we talked about history. A few facts you might want to keep in mind about IQ. It is not accurate before ten years of age, but tends to be stable after 10. The average difference between siblings is 11-12 points. When kids are adopted from bad to good environments, the average gain is 6 points. This again shows that heredity is a stronger influence than environment. Achievement Tests are reflective. They are designed to measure the skills and knowledge a person has attained. Achievement gaps tend to widen with time. ITBS, CAT, and Stanford tests are designed to measure achievement. About 88% of standardized tests are achievement tests. The last few percent are readiness, personality, and vocational tests. Teacher-Made Tests v. Standardized Tests Teacher-made tests are specific to the students and the objectives taught. They yield information on an individuals progress as well as the effectiveness of the teachers instruction. These are by far more useful to teachers than standardized tests because they are more valid. They are focused and often there are many items to test each objective so that a single mistake doesnt unduly influence the score. Standardized tests are given to large groups of people under standard conditions so that no one has any advantage. They are great for yielding information about large groups of people. They are less valuable for yielding information about individuals because there are so few items to test each objective. One mistake will greatly influence an individual score. How Standardized Tests Are Made
Criterion-Referenced v. Norm-Referenced Tests Criterion-referenced tests are mastery tests. Scores are often reported as percentages. Everyone can get a high score if they know the material. Achievement is measured against a set of objectives, rather than against the scores of others. Norm-referenced tests yield a normal curve, otherwise known as a bell curve. Scores are reported as percentiles. By definition, not everyone can score high. These tests are designed to discriminate. They will always contain some very difficult questions in order to discriminate between the 98th and 99th percentile, so do not be alarmed if your child has not been exposed to everything on the test. In a norm-referenced test, it is possible to get a high score with little knowledge if no one else knows the material, either, because your score is compared with the scores of others. How well you do depends on which set of others youre using for comparison. SAT scores would be quite different if the norm group was all high school seniors or only Ivy League applicants. Its good that kids have some pressure to study and achieve, some accountability for what theyve learned. If they do well, its great. The test results are encouraging and validating. If they dont do well, try to figure out the cause and take the results with a big grain of salt if your impression of your kids' achievement is vastly different from what the scores indicate. As long as kids are doing their personal best, we all need to learn to be content with our achievement and avoid the pitfalls of pride or despair. Galatians 6:4 says, Each one should test his own actions. They he can take pride in himself, without comparing himself to somebody else. Did you know that all 50 states claim to have above average test scores? Thats called the "Lake Wobegon Effect." Once after this workshop a mom came up to talk with me and broke into tears. Her daughter had taken a standardized test and done very poorly. She felt so inadequate as a teacher, even though the child had not done well in school either. After all, dont you always read that homeschoolers are all above average? She was fighting despair. After we talked awhile, she dried her tears and said, "Isnt it ironic? When I was pregnant, I prayed my daughter would be a healthy baby with ten fingers and ten toes. As soon as she was born, I immediately wanted her to be an exceptional child." Some norm-referenced tests provide special readouts to give more details on criterion, or the specific things your child knew or didnt know on the test. Bell Curves Bell curves, or normal curves, are a fact of nature. If you put measured and charted the leaves from a tree in your yard, youd get a bell curve. Some curves are tall and narrow, others short and squat, but they are statistically the same. The bell curve for heights of NBA basketball players would be a lot taller and narrower than the one for heights of all Americans. Its an interesting fact that there is more variability in the test scores of boys than girls. There are more boys at either end of the bell curve, very high and very low. Standardized tests may report one or more types of scores. The Raw Score is the number correct. It is used to calculate other scores and means nothing on its own. The same raw score on two different tests can yield very different percentile scores. The Stanine is a one digit score, 1-9, with 9 being the highest. For a Normal Curve Equivalent, the bell curve is divided into equal units numbered 1 to 99. This score can be averaged. For the National Percentile score, the bell curve is divided into unequal units, each containing approximately the same number of people. A percentile score of 66 means that you scored higher than 66% of the people in the norm group. Because these are unequal units, they cannot be averaged or compared. I didnt realize this when I was teaching and I sometimes did average them. I learned more about tests preparing for this workshop than I did in four years of college and 12 years of teaching! Because the percentile units are unequal, small differences near the middle of the scale are insignificant. An increase or decrease of less than 18 percentile points between the 25th and 75th percentile is statistically insignificant, so dont worry if your children score somewhat lower than in a previous year and dont crow if they score somewhat higher. It doesnt mean much. Between the 1st and 12th percentile, and the 88th and 99th, a 6-point increase or decrease is significant. Grade equivalent scores can be misleading. If a third grader gets a G.E. of 5.4, it means that the average fifth grader in the fourth month of school would score as well as that third grader did on the third grade test. The child is above average, but would not likely score as average on the fifth grade test because he wouldnt have been exposed to the material. Dont use these scores as a basis to skip children ahead in the curriculum. On subtests, the grade equivalent can vary by as much as a full year if just one question is answered differently. Why Standardized Tests? Just because tests arent perfect doesnt mean theyre worthless. Again, its like your bathroom scale. You wouldnt disregard all measurements because theyre not perfect, because they dont measure everything, because you sometimes get invalid results or because you just dont like the results. But you have to guard against the hysteria that often surrounds these tests and avoid misusing the results or being intimidated by those who do. Schools use standardized test results to choose goals for schools or districts, to select kids for special programs, to select scholarship recipients and to evaluate the effectiveness of curriculums. Schools misuse the results when they compare teachers or schools on the basis of these scores. Homeschoolers can use standardized achievement tests to gauge the relative effectiveness of instruction, to see where kids might need extra help, to pinpoint relative strengths and weaknesses, to choose goals, and to see progress through the years. As much as homeschoolers may resist tests, they are one reason homeschooling is now legal in all 50 states. If there were no way to measure academic achievement independent of schools, it would have been much harder to prove the effectiveness of homeschooling. Most uses by the press of test scores are misuses. The press tends to exaggerate statistically insignificant differences and most readers are uneducated as to the meaning of the scores. The Dangers of Testing If tests are overemphasized, there will be a negative impact on the quality of instruction. Remember that quote that not everything precious can be measured and not everything measurable is worth teaching. Dont weight tests too heavily compared with other measures or judge yourself or others harshly based on scores. Remember that the scores are not absolute. There is whats called the "standard error of measurement" which means that your score is a point within a range. Some score readouts show the range rather than the point. When people teach to the test, its usefulness is largely destroyed. The pressure to raise test scores affects their accuracy. Someone once said that God limited our intelligence but not our stupidity. We have to try to limit our own stupidity by keeping things in context. Remember that tests arent unfair. Life is unfair. Tests measure the results. Its somewhat dangerous to even pretend to know what intelligence is. Robert Hale, an expert in psychological testing at Pennsylvania State University, was once asked to define intelligence. His response: Well, that is, I guess, the most difficult question to answer. I dont know if I can give you an answer. Intelligence tests are useful at predicting how well children will do in school left to their own devices, and all other things being equal, but since neither of those two conditions ever applies, by itself an IQ score tells you very little. What to Look For Compare your childs scores with scores in previous years. Are there any patterns? Are the scores consistent? Are there any sharp rises or drops? Can you account for them? Are any of the scores unexpectedly high or low? If so, investigate the cause and use the information to plan future instruction. A test score only tells you how a child performed on a particular set of test items at a particular time. It may or may not reflect the childs true ability. Alaskan skier Tommy Moe won a gold medal for the U.S. in Lillehammer, Norway in the 90s, unexpectedly. The next week, he flubbed in another major race. The question was, which of the two performances represented his true ability? Some people thought he was a flash in the pan. Time alone could tell. He went on to win most of the major downhill races in the next few years. Now there is no doubt about his true capability. The Little Prince A popular little book when I was in college was The Little Prince, by Antoine de Saint Exupéry. The book was written in 1943. Its a fantasy account of a little man who had traveled to earth from his home on an asteroid that was not much larger than a house. The little boy who meets him in the Sahara Desert has this to say about scores and numbers. Grownups love figures. When you tell them that you have made a new friend, they never ask you any questions about essential matters. They never say to you, "What does his voice sound like? What games does he love best? Does he collect butterflies?" Instead, they demand, "How old is he? How many brothers has he? How much does he weigh? How much money does his father make?" Only from these figures do they think they have learned anything about him. If you were to say to the grownups: "I saw a beautiful house made of rosy brick, with geraniums in the windows and doves on the roof," they would not be able to get any idea of that house at all. You would have to say to them: "I saw a house that cost $20,000." Then they would exclaim: "Oh, what a pretty house that is!" Just so, you might say to them: "The proof that the little prince existed is that he was charming, that he laughed, and that he was looking for a sheep. If anybody wants a sheep, that is proof that he exists." And what good would it do to tell them that? They would shrug their shoulders and treat you like a child. But if you said to them: "The planet that he came from is Asteroid B-612," then they would be convinced, and leave you in peace from their questions. They are like that. One must not hold it against them. Children should always show great forbearance toward grownup people. But certainly, for us who understand life, figures are a matter of indifference. What is essential is invisible to the eye. I Samuel 16:7 validates this, Man looks at the outward appearance, but the Lord looks at the heart. Remember, not everything precious can be measured, and not everything measurable is worth teaching. God had reasons for creating us with differing abilities, even though we may not understand. Love your kids for who they are, celebrate every achievement, and don't let any test scores keep you or them from achieving your goals. |
||||||
|
May he give you the desire of your heart, and make all your plans succeed. Psalm 20:4
|
Home | Help | About Susan | News | Books | Workshops | Resources | Ordering Info www.SusanCAnthony.com Instructional Resources Co., P.O. Box 111704, Anchorage, AK 99511-1704 |
|||||