Preview only show first 10 pages with watermark. For full document please download

Pendulum Model Fit

   EMBED


Share

Transcript

Finding the Best Relationship - Goodness of Fit Regression Modelling Finding a line of best fit Consider a set of points A,B,C,D … that have been graphed and look as if they might obey some sort of rule, just like those on the graph below. Suppose we want to draw a diagonal line among these points, not necessarily touching any of them, but drawn in such a way that the errors are minimised. Imagine we drew a vertical line from each point until it touched our diagonal line. Such lines have been labelled CQ and DR. If we change the height or slope of our diagonal line, all of these small vertical lines will vary in length. Some will get longer and others will get shorter. Now imagine that we draw a square on each of these lines and find the total area of all the little squares. If we change the height or slope of our diagonal line, as shown in the second graph, the areas will also change. Image each little square made of rubber bands trying to pull the line toward its point. The line of best fit is that line that minimizes the total area of all the little squares. Calculating a line of best fit If you find the hypotenuse of a right-angled triangle, the Theorem attributed to Pythagoras specifically talks about squares on the sides of the triangle, but we don’t usually draw those squares when we do calculations. In the same way, no one would bother to draw the squares that determine the line of best fit. We just do the calculations. These calculations are called a Regression Analysis and they usually take very much longer than finding the hypotenuse of a right-angled triangle. Few people have ever worked out a Regression Analysis using just pen and paper. Even using an ordinary calculator it is a big job. These days, computers and ClassPads make life much easier. Copyright 2007, Hartley Hyde Page 1 of 11 Finding the Best Relationship Revision Switch on your ClassPad o and tap on the Spreadsheet icon If your spreadsheet has data from a previous investigation, check if you need to save this before you clear it from the screen. Then open the file “Pendulum”. You should have data in Columns A and B. Column A contains the lengths of pendulums and column B contains the corresponding Period, the time taken for a pendulum of that length to oscillate once. Drawing a Graph Select all of the cells into which you typed values. To do this tap on cell A2 and drag down and across until you reach the last value that you typed in column B. Tap the $ to the right of the graphing icon in the tool bar. The graphing icon is always the one after the A icon. It may use any of the thirteen different icons shown in the drop down menu: it depends on which type of graph you used last. Tap on the scatter graph icon X. Your screen will split and a scatter graph will appear. Length is on the X-axis and Period is on the Y-axis. The data points should follow a smooth curve like this. Finding a rule Tap on the graphing screen and you will see the spreadsheet menu bar and tool bar change to the graphing screen menu bar and toolbar. Tap on Series in the menu bar and from the drop down menu tap on Trend and this will offer a further menu from which you tap on Linear. This asks the ClassPad to draw the line of best fit through your data points. Copyright 2007, Hartley Hyde Page 2 of 11 Finding the Best Relationship If you tap on the line, the equation of the line of best fit appears in the formula bar at the bottom of the screen. Given the accuracy of our data, we should round these values to two significant figures. Thus our line of best fit is y = 0.15 x + 2.9 However, at the end of the previous pendulum lesson we found a curve that seemed to fit this data better than this “line of best fit” does. How do we know which line or curve fits best? To do this properly, we have to export our data to the Statistics Application. Exporting My Data I resize my screen. When I examine my Length data in column A, I have data in all the cells between A2 and A33. Your data may have more or less data than this so you will need to adapt accordingly. When I tap the column heading “A”, the whole column is highlighted. From the File Menu I choose Export and the dialogue box asks me if I want to export the Range A1:A999. I change this to A2:A33. I give the Variable a name, in this case “l” for Length, and I select the Type of data as “List”. Then I tap OK. I then repeat the process for column B as shown. Copyright 2007, Hartley Hyde Page 3 of 11 Finding the Best Relationship Importing the Data Close your spreadsheet and open the Statistics Application. You will need to press the keyboard button k. Tap on the column header called List1, type “l” and then tap or push E. Tap on the column header called List2, type “p” and then tap or push E. You should now have two lists of data that match the data in your spreadsheet. From the SetGraph Menu uncheck all boxes except the StatGraph1 box as shown below. Then tap on the top rectangle called Setting … and you should see a dialogue box. Enter each item as shown on the second screen and then tap “Set”. This sets up the Length list to be graphed on the X-axis and the Period list to be graphed on the Y-axis. Tap on the graphing icon at the far left of the Tool Bar and you should see the familiar pattern of squares from your pendulum experiment. And so you are at last ready to investigate how well the various curves fit your pendulum data. Copyright 2007, Hartley Hyde Page 4 of 11 Finding the Best Relationship General Instructions for Drawing Regression Graphs We are going to go through this procedure ten times: After drawing each graph, tap on the spreadsheet to get back to the top screen. Tap on the SetGraph Menu and un-check the checkbox for Previous Regression. From the Calc Menu select the next type of Regression Curve. In this case tap on Linear Regression. Set the Calculation Dialogue Box as shown on the first screen. For each graph, increment the Copy Formula by one. Thus the formula for this Regression will appear as y1. The next regression formula needs to be set to appear as y2. When you tap OK, you are presented with the Calculation results. The statistics from this dialogue box need to be copied onto your results sheet (next page). The numbers a, b and r 2 are called statistics. They describe your data. The number a is the coefficient of the highest power of x in the formula. For a Linear Regression, a is the coefficient of x, the gradient of the Regression Line. The number b is the y-intercept of the Regression Line. If a = 0.14 and b = 2.9 it follows that the formula y = a.x +b becomes y1 = 0.14 x + 2.9. Enter this equation in the table on the next page. Then copy the first four significant figures of r2 into the same table. Tap OK and you are shown the graph with the Regression Line or curve. Use a pencil to copy the regression line or curve onto the sample graph. Copyright 2007, Hartley Hyde Page 5 of 11 Finding the Best Relationship The Polynomial Regressions Here is an answer sheet for the first four regression models. Linear Regression Quadratic Regression y= a ⋅ x + b y = a ⋅ x2 + b ⋅ x + c y1(x) = _____ x ______ r2 = _________ y2(x) = _____ x2 _____ x ______ r2 = __________ To start your next graph, choose Quadratic Regression and store the formula as y2. Copy two significant figures of the statistics a, b and c into the spaces in the formula below the graph. Copy r2 into the table. Repeat this procedure with the models shown here and on the next page. Cubic Regression Quartic Regression y = a ⋅ x3 + b ⋅ x2 + c ⋅ x + d y = a ⋅ x4 + b ⋅ x3 + c ⋅ x2 + d ⋅ x + e y3(x) = _____x3 _____x2 _____x _____ y4(x) = __________x4 __________x3 ________x2 _______x ______ r2 = _________ r2 = __________ Comments Checkpoint Copyright 2007, Hartley Hyde Page 6 of 11 Finding the Best Relationship Answer Sheet for the Rest of the Regression Models Logarithmic Regression y= a + b Y5(x) = r 2 ⋅ ln (x) ______+ ______ ln(x) = Exponential Regression y=a⋅e Y6(x) = r 2 b ⋅x ______ e ______ x = abExponential Power Regression y= a ⋅ b y=a⋅x x Y7(x) = ______ r2= ⋅ ______ _____x Y8(x) = ______ x r2= Sinusoidal ___ sin(____ x + ____ ) ____ Copyright 2007, Hartley Hyde ______ Logistic Regression y= a ⋅ sin (b ⋅ x + c) + d Y9(x) = b y = c 1 + a • e−b• x Y10(x) = Page 7 of 11 Finding the Best Relationship Some Other Factors to Consider You will probably have noticed that the value of r2, the Coefficient of Determination, provides a reasonably good indicator of goodness of fit. Two of the polynomials gave a remarkably good showing compared with the Power model. Although the Quartic fitted very well, it was clearly going in the wrong direction. The cubic would provide a reasonable model if you only used values in the experimental domain (i.e. interpolation) however, if you tried to use the cubic for even larger pendulums, outside the experimental domain (i.e. exterpolation) it would be seriously in error. Polynomials rarely occur as a solution to Physics problems. The only one that springs to mind is Newton’s second equation s(t ) = v(0)t + 12 at 2 . As we always knew the correct answer is Period = 2√Length, this investigation is fairly artificial compared with patiently probing the unknown like real scientists. Physicists often take a short cut by examining dimensions. e.g. Length = [L] Like other aspects of an equation the dimensions have to balance. In the formula Period = 2π Length g the gravitational constant g is an acceleration and has dimensions [L][T]-2. Thus the right-hand side of the formula has dimensions √ ([L]/ [L][T]-2) = [T] which is the time dimension expected for Period. Newton’s equation works too— each term has dimension [L]. But more important, if you study physics you will eventually see the page of physics theory that shows why a pendulum moves with simple harmonic motion and must have the formula shown above. It even explains why it’s not accurate if the amplitude is too large. Spreadsheet and statistics modelling will never replace human reason and models built on long standing theories of the nature of the universe, but they do give us a bit of a leg up when we have no other idea. Above all, we hope you saw some of the interesting functions built into your ClassPad. And so finally, you can write in this space, the best formula to describe the Period of a Pendulum. Checkpoint Copyright 2007, Hartley Hyde Page 8 of 11 Finding the Best Relationship Checkpoints This lesson has been divided into only two checkpoints but each is likely to be busy. The answers given here are based on a data set collected by my class, however, the data collected by your students should not be very different. Checkpoint 1 The students are first asked to investigate goodness of fit for the four polynomials from linear to quartic. Their outcomes should be similar to these. Linear Regression Quadratic Regression y= a ⋅ x + b y = a ⋅ x2 + b ⋅ x + c Y1(x) = 0.15 x + 2.9 y2(x) = 0.00074 x2 + 0.27 x + 2.0 r 2 = 0.9218 r 2 = 0.9761 Cubic Regression Quartic Regression y= a ⋅ x3 + b ⋅ x2 + c ⋅ x + d y = a ⋅ x4 + b ⋅ x3 + c ⋅ x2 + d ⋅ x + e Y3(x) = 0.000009x3 –0.003x2+0.42x+1.5 Y4(x) = 0.00000022x4 + 0.000083x3 –0.01x2 + 0.59x + 1.2 r 2 = 0.9920 r 2 = 0.9978 Students should be able to see that consecutive polynomials fit points better and that r2 is increasing, however, it would be encouraging to see some doubt about the likelihood of the cubic and quartic models. Copyright 2007, Hartley Hyde Page 9 of 11 Finding the Best Relationship Checkpoint 2 The students are then asked to investigate goodness of fit for the other six models. Their outcomes should be similar to these. Logarithmic Regression y= a + b ⋅ ln (x) Y5(x) = 3.1 + 3.1 ln(x) r 2 = 0.8064 Exponential Regression y=a⋅e b ⋅x Y6(x) = 2.3 e r 2 0.016x = 0.5700 abExponential Power Regression y= a ⋅ b y=a⋅x x y7(x) = 2.3 ⋅ 1.0 r 2 = 0.5700 x Sinusoidal y= a ⋅ sin (b ⋅ x + c) + d y9(x) = 95 sin(0.0041x + 0.83) – 68 Copyright 2007, Hartley Hyde b y8(x) = 2.0 x 0.50 = 2 r 2 = 0.9997 √x Logistic Regression y = c 1 + a • e−b• x y10(x) = 25 1 + 7.9e−0.050 x Page 10 of 11 Finding the Best Relationship Further Notes for the Teacher This investigation may take longer than one lesson. For slow classes I would use two lessons rather than cross any of the models off the list: I think we satisfy some curiosity if we work through all of the models. There is of course one other model that has not been mentioned. The spreadsheet features a Quintic model in the Trend set but this has no counterpart in the Statistics package at this stage. Although you don’t get to see r2 for the Quintic, the graph clearly shows that it is not in the hunt. However, this may provide a few more minutes investigation for those students that finish early. You may be accosted by a professional statistician that demands to know why we are using the Coefficient of Determination for goodness-of-fit instead of the Standard Error. For a longer answer and a worked example, see the CACTUS pages in the August 2007 edition of The Mathematics Teacher. The simple answer is that the Coefficient of Determination is conserved under logarithmic and other transformations because it is a ratio. The Standard Error is not. The CACTUS article also gives you a copy of the data set collected by my class. However, I think it is much better if your class collects its own data. Whatever data set you use, you do need to include data for really long pendulums, such as those entered on the data sheet for the previous investigation. Without this important data the power model is defeated by the polynomials. Copyright 2007, Hartley Hyde Page 11 of 11