# Full text of "hp :: 9000 200 :: 98820-13111 StatisticalLibrary Jul82"

## See other formats

HP Computer Systems Statistical Libraiy for the HP 9826 and. 9836 Computers m HEWLETT PACKARD m HEWLETT PACKARD Warranty Statement Hewlett-Packard makes no expressed or implied warranty of any kind, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose, with regard to the program material contained herein. Hewlett-Packard shall not be liable for incidental or consequential damages in connection with, or arising out of, the furnishing, performance or use of this program material. HP warrants that its software and firmware designated by HP for use with a CPU will execute its programming instructions when properly installed on that CPU. HP does not warrant that the operation of the CPU, software, or firmware will be uninterrupted or error free. Use of this manual and flexible disc(s) supplied for this pack is restricted to this product only. Additional copies of the programs can be made for security and back-up purposes only. Resale of the programs in their present form or with alterations, is expressly prohibited. Restricted Rights Legend Use, duplication, or disclosure by the Government is subject to restrictions as set forth in paragraph (b)(3)(B) of the Rights in Technical Data and Software clause in DAR 7-1 04.9(a). s\ Statistical Library for the HP 9826 and 9836 Computers Manual Part No. 98820-13111 Disc Part Numbers Basic Statistics General Statistics Statistical Graphics I Statistical Graphics II Regression Analysis Analysis of Variance I Analysis of Variance II Principle Components and Factor Analysis Monte Carlo Routines Monte Carlo Tests 98820-13114 98820-13115 98820-13116 98820-13117 98820-13118 98820-13124 98820-13125 98820-13126 98820-13127 98820-13128 Important The flexible disc containing the programs is very reliable, but being a mechanical device, is subject to wear over a period of time. To avoid having to purchase a replacement medium, we recommend that you immediately duplicate the contents of the disc onto a permanent backup disc. You should also keep backup copies of your important programs and data on a separate medium to minimize the risk of permanent loss. Hewlett-Packard Desktop Computer Division 3404 East Harmony Road, Fort Collins, Colorado 80525 Copyright by Hewlett-Packard Company 1982 Printing History New editions of this manual will incorporate all material updated since the previous edition. Update packages may be issued between editions and contain replacement and additional pages to be merged into the manual by the user. Each updated page will be indicated by a revision date at the bottom of the page. A vertical bar in the margin indicates the changes on each page. Note that pages which are rearranged due to changes on a previous page are not considered revised. The manual printing date and part number indicate its current edition. The printing date changes when a new edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to change.) The manual part number changes when extensive technical changes are incorporated. July 1982... First Edition Ill Table of Contents Commentary vii Summary of available routines viii Basic Statistics and Data Manipulation 1 General Information 1 Start 6 Edit 10 Tranform 12 Missing Value 13 Recode 15 Sort 16 Subfiles 18 Change Names 18 Store Data 18 Join 19 Printer Is 20 Select and Scan 21 Basic Statistics 22 Missing Value 24 Go To Advanced Stat 23 Return to BSDM 24 Backup 24 Examples 25 Regression Analysis 55 General Information 55 Multiple Linear Regression 58 Stepwise Regression (Variable Selection Procedures) 60 Polynomial Regression 64 Nonlinear Regression 66 Standard Nonlinear Regressions 71 Residual Analysis 73 Examples 75 IV Statistical Graphics 127 General Information 127 Common Plotting Characteristics 129 Time Plot 130 Histogram 131 Normal Probability Plot 134 Weibull Probability Plot 135 Scattergram 136 Semi-Log Plot 136 Log-Log Plot 136 3D Plot 137 Andrew's Plot 138 Examples 139 General Statistics 157 General Information 157 One Sample Tests 158 Paired Sample Tests 164 Two Independent Sample Tests 169 Multiple-Sample (>3 Samples) Tests 175 Statistical Distributions (see Table 1, next page) 181 Examples 186 Analysis of Variance 217 General Information 217 Discussion 219 Data Structures 228 Factorial Design 242 Nested or Partially Nested Design 243 Split Plot Designs 245 One-Way Classification 246 Two-Way Unbalanced Design 247 One-Way Analysis of Covariance 248 F-Prob 250 Orthogonal Polynomials 251 Contrasts 252 Interaction Plots 254 Multiple Comparisons 255 Examples 257 Principal Components and Factor Analysis 307General Information 307 Principal Components 308 Factor Analysis 309 Discussion 311 Methods and Formulae 313 Examples 318 Monte Carlo Simulations 355 General Information 355 9826/36 Uniform Random Number Generator 359 Random Number Generators 360 Beta 361 Binomial 362 Chi-Square 363 Exponential 364 F 365 Gamma (Alpha) 366 Gamma (A,B) 367 Geometric 368 Lognormal 369 Negative Binomial 370 Standard Normal 371 Normal 372 Bivariate Normal 373 Pareto of the First Kind 374 Pareto of the Second Kind 375 Poisson 376 Random Points on M-dimensional Unit Sphere 377 Super Uniform 378 t 379 Type I Extreme Value 380 Type II Extreme Value 381 Uniform 382 Weibull 383 Tests for Randomness 384 Chi-Square 384 Kolmogorov-Smirnov 386 Maximum-of-T 387 Modified Poker 388 Runs 389 Serial 390 Spectral 391 Elementary Sampling Techniques 393 Selection Sampling 393 Shuffling 394 Appendix Changes Necessary For Larger Data Sets 397 Statistics Library Data Formats 398 Statistical Tables 407 VI Table 1 Statistical Distributions Table Values and Right-Tail Probabilities Continuous 1. Normal 2. Two-paremeter gamma 3. Central F 4. Beta 5. Student's T 6. Weibull 7. Chi-square 8. Laplace 9. Logistic Discrete 1. Binomial 2. Negative Binomial 3. Poisson 4. Hypergeometric 5. Gamma Function 6. Beta Function 7. Single Term Binomial 8. Single Term Negative Binomial 9. Single Term Poisson 10. Single Term Hypergeometric Vll Commentary The Stat Library, which we have developed for Hewlett-Packard, is an integrated package developed specifically for the HP desktop computers. We set as our objective in preparing this library to develop an integrated system which provides the user with a flexible collection of routines for data manipulation, exploration, and analyses. The package uses a common front end, which provides for considerable flexibility in data handling. The Basic Statistics and Data Manipulation (BSDM) front end has been updated and enhanced for inclusion with this library. The programs are interactive in operation using the CRT display to list a "menu" of options at appropriate times. The group of special function keys are used only with the BSDM routines to connect the user directly with a specific operation. The statistical analyses range from the very elementary summary statistics to complicated routines for principal com-ponents and factor analysis. The figure on the next page is a diagram showing the essential organizational structure of the Stat Library. Notice that there are six major segments in the Stat Library which operate on the data: Input Routines, Manipulation Routines, Data File Management Routines, Selection Routines, Data Exploration Routines, and Statistical Analysis Procedures. This library has evolved out of our ten years' experience in developing software for desktop computers. We are currently using these routines in our Statistical Laboratory. We hope you will find them useful. Thomas J. Boardman, Ph.D. Professor-In-Charge Statistical Laboratory Colorado State University Fort Collins, CO 80523 VI11 HP Stat Library Integrated Statistical Routines INPUT ROUTINES \ DATA EXPLORATION ROUTINES \ DATA / \ SELECTION ROUTINES \ s. / MANIPULATION ROUTINES DATA FILE MANAGEMENT STATISTICAL ANALYSIS PROCEDURES Operation Subprogram Package (Key Words) Description Containing Routine Input Routines BSDM Keyboard Direct numeric input by the user. Mass Storage Of data previously stored on one of several mass storage devices. Graphics Input Using the Graphics Tablet Other User supplied routines Manipulation Routines Sort Sortinc data on one or two variables. Join Rename Subfile Recode Edit Transformation Data Recovery Joining two data sets either by adding variables or observations to existing set. Change variable label, subfile name, or project title. Several methods to specify or create subfiles (groups within your data set). Method to recode variable values into another variable. To correct, add, or delete observations or variables. By algebraic routines including user supplied function. To assign missing values. To create new variables by us- ing ranks, subfile codes, sequence num- bers, standardized scores, or lagged vari- ables. A backup data file may be accessed if necessary. (Continued) IX Data File: Management Routines Store Store Subfile(s) Store Variables Direct Purge Selection Routines By Subfiles Exclude Missing Values Select BSDM Save data set on user file. Save particular subfile on a user file. Save particular variables on a user file. Obtaining a catalog or directory of data file(s). Eliminate selected data files. To choose a portion of the data for further analyses. Always excluded from analyses and data exploration routines. To choose a portion of the data set for further processing on the basis of values from one or two variables. The values selected are shown on the CRT and the data set is reduced down to the selected data set size. BSDM Data Exploration Routines Selected Listing Scan Summary Statistics Graphics Displays Frequencies Cross Tabulation Several ways are available to list all or a BSDM portion of the data set. Same as Select (above) except that BSDM data set is not reduced. Many basic statistics such as mean, BSDM median, standard deviation, etc., on all or a portion of the data set. Eight common statistical graphics for Stat Graphics studying data sets such as normal prob- ability plots and semi-log plots. Under development for future addition to library. Under development for future addtion to library. (Continued) Statistical Analysis Procedures General Parametric Methods Common one, two-independent, and two-paired sample inferential proce- dures. Also one way analysis of variance. General Statistics General Nonparametric Method Regression Analyses Polynomial Multiple Linear Regression Stepwise Nonlinear Standard Nonlinear Analysis of Variance (AOV) One Way One Way Covariance Two Way Unbalanced Factorial Split Plot Nested Principal Components and Factor Analysis (Others) Common one, two-independent, and two-paired sample nonparametric in- ferential procedures. Also the Kruskal Wallis test for 3 or more independent samples. Selection procedures including the step- wise, forward, backward, and manual routines. From user supplied functions using the Marquardt Compromise algorithm. Several common nonlinear models are available for use on your data set. One way AOV procedure. One way analysis of covariance proce- dure. The AOV procedure for two way facto- rials which are unbalanced. AOV procedure for up to 5 factors with balanced data. AOV methods for several types of split plot designs with up to 4 factors. AOV methods for completely or partial- ly balanced nested designs. Common multivariable dimension re- duction procedures. Extensive use of graphics. In the future. General Statistics Regression Analysis Analysis of Variance Principal Components and Factor Analysis Basic Statistics and Data Manipulation General Information Description This set of programs allows you to create a statistical data base which can be accessed by other Hewlett Packard statistical routines. It alleviates the need to key in data each time a new statistical procedure is used. The capabilities of this set of programs include data entry and several manipulative data operations. A wide variety of summary statistics may be obtained. In addition, the programs have many ease-of-use features - the human interface is a major concern in designing the programs. Specific capabilities follow. Data Entry: Data Manipulation: Summary Statistics: Other Features: Keyboard Magnetic media (flexible discs) Graphics tablet Other input devices (paper tape, etc.) Edit incorrect/incomplete data sets Transform - both algebraic and non-algebraic Assign codes to intervals of data Sort Divide data set into subfiles Join two data sets Select portions of the data Basic statistics (mean, standard deviation, etc.) Correlation matrix Order statistics (max, min, median, etc.) Error detection Easy error correction Variables can be named Data can be stored for future reference Data can be listed Data can be scanned for specified qualities A backup file of the data can be recalled Printer unit can be changed Missing data values can be assigned Typical Program Flow Get data into memory from keyboard or disc (RESTART) List the entered data (LIST) Correct mistakes (EDIT) Break the data into subfiles (SUBFILE) Transform the original data by normalizing, etc. (TRANSFORM) List the edited and transformed data set (LIST) Obtain basic statistics such as means, standard deviations, etc. (STATS) Go to an advanced statistical routine such as Regression, AOV, etc. (ADV. STAT) Special Considerations Data Matrix Configuration The data matrix incorporated in this program should be thought of as a p-by-n array whose columns correspond to observations and whose rows correspond to variables as shown below. OBSERVATIONS Oj O, o, o r VARIABLES V 3 V. Subfiles may be created, in which case the structure becomes only slightly more complex as shown below. OBSERVATIONS SUBFILE 1 SUBFILE 2 OiO^.O^ o ni + 1 ...o ni + „ 2 SUBFILES ^n, + ...n s _, + l-"*-'n 1 + ... + n s VARIABLES V 2 V. Scratch Data Sets There are two data files which are used by the statistical data base. They are "DATA" and "BACKUP". DATA is the file which contains the most current form of your data matrix. It is updated upon completion of any procedure which modifies the data matrix or any variable names. Thus, DATA contains the data that will be used for any statistical calculations. BACK- UP on the other hand, is not updated automatically. After the data has been first entered a copy of the DATA file is automatically put into BACKUP. From then on BACKUP can only be modified manually via the BACKUP PROCEDURE. This procedure will also let you retrieve the BACKUP file and copy it to the DATA file. So, if you erroneously alter your data matrix, the original data set is still retrievable. Data File Configuration The scratch file on the program medium, "DATA", and any files created to hold stored data and related information are configured as follows. The data file is broken into logical records of 1280 bytes each (if you are unfamiliar with logical records, refer to your desktop's Programming Techniques Manual.) The first logical record is a "header file", which contains information pertinent to the data set which is stored in the remaining logical records. The header file contains the following information (variables): Limitations data set title (T$) 80 characters number of observations (No) No*Nv < = 1500 number of variables (Nv) 50 variable names (Vn$(*)) 10 characters each number of subfiles(Ns) 20 subfile names(Sn$(*)) 10 characters each subfile characterizations (Sc(*)) N/A The remaining logical records contain D(*,*), the data matrix. For a detailed explanation of the data file, see the appendix. Parser BSDM is equipped with an elementary parser. This means that wherever an answer could require multiple responses the parser will separate your response into its individual parts. For example, when asked "What variables are desired?", you may respond in three ways: 1. ALL: enter ALL if you want the entire set of variables to be used 2. 1,2,3,...: enter the specific variables you want 3. 4-7: enter a dash (-) if you want all variables from 4 to 7 So, a sample response for the question might be: 1,3,5-8,10,15.21-25 The response would be interpreted to mean that you requested variables 1,3,5,6,7,8,10,15,21,22,23,24 and 25. Thus, anywhere multiple values may be input, you may enter the responses in this manner. In several cases the words "NONE" or "NO" are also possible responses. When they are allowed, it is mentioned in the prompt. These words may be used interchangeably. Note Entering negative numbers is no different than entering positive ones. For example, the input: -10- -3,1-4 would mean all numbers between -10 and -3 and all between 1 and 4. Incorrect Responses If a response outside the range of plausible responses is input from the keyboard, an appropri- ate message is displayed on the CRT. Program execution is resumed by asking the question, or in some cases a previous question, again. If a plausible response is given, but it is not correct, a couple of possibilities exist. First, if an incorrect value has been entered for a data point, it may be corrected using the EDIT program. Second, in many cases, responses to several questions are printed on the CRT. Then a question such as "Is the above information correct?" is asked. This allows any of the printed information to be changed. Hardware Requirements 9826 or 9836 computer with 240k bytes, available user memory — required. External printer — required. The CRT may be used as the printer but results will be difficult to read and understand. External plotter — optional. External mass storage — optional. Note Both the user-defined transformation option and non-linear regres- sion require that you specify the form of the functions before you begin BSDM. See page 69 for an explanation. r Getting Started 1. If your 9826 or 9836 computer is ROM-based, go to Step 2. Otherwise, if your system is RAM-based, or if you do not wish to turn the computer OFF and the complete system is ready: a. Make sure that Basic is ready and all peripherals are properly connected and turned on. (Make sure PI and P2 are set properly if a hardcopy plotter is being used). b. Insert the Basic Statistics disc into the internal flexible disc drive. c. Type : Scratch A ( EXECUTE ) d. Type: Load "AUTOST" >1 ( EXECUTE ) e. Go to Step 5. 2. If the 9872C (or any peripheral) is being used, make sure it is properly connected and turned on. Make sure PI and P2 are set properly if a hardcopy plotter is being used. 3. Insert the Basic Statistics disc into the internal disc drive. 4. Turn the computer on. 5. You will be asked a series of questions which should be self-explanatory. If you have any questions turn to the Special Considerations section of the manual covering the proce- dure in question. You will find some general comments on how that section of the program works. Start Object of Program This program allows you to enter a data matrix into memory. The data may be entered from the keyboard, or from some other input device such as a graphics tablet, etc. Conversely, the data may have been entered previously and stored in the program scratch file ("DATA") or in a user-created file on a flexible disc or hard disc. In this case, the function of this program is to retrieve the previously stored data and place it into memory so that further operations can be performed. After the data is in memory, a listing option is available to obtain a complete or partial copy of the data. Typical Program Flow j Specify printer 1 Specify data type, e.g.. raw data Specify data entry mode Magnetic or disc Keyboard ! ■ Data retrieved, description of data set printed Enter data set title, # variables, # observations, variable names ■ 1 Enter data manually! Data stored on DATA & BACKUP files Special Considerations Terminology The displayed prompts concerning the scratch file ("DATA"), whether the data was stored by this program, and whether the data is in the proper configuration are explained here and in the Special Considerations section of General Information for BSDM. The prompts concerning the data medium and program medium may cause confusion. The word "medium" is used since the set of programs making up this software package may be on floppy disc. Thus, the "program medium" refers to the disc on which the programs making up this package are stored. Conversely, the "data medium" refers to the disc on which the file containing the data matrix resides. In some cases, the program medium and the data medium are the same. However, this is not determined by the program and hence, the prompts are displayed to make sure the correct medium is in the correct device. Data on Mass Storage If the data is on a mass storage device, it may have been stored in one of four ways. The following discussion explains the prompts that apply to each situation. 1. If the data was entered using this statistics package (and was the last data set used on this package), it will be on the disc in the scratch file called "DATA". Thus, an affirmative answer to the prompt "Is data stored on the program medium's scratch file (DATA)?" will retrieve the data and related information. 2. The data may have been entered using the Basic Statistics and Data Manipulation routines and then stored using the STORE routine of BSDM. After specifying the file name and the storage unit in which the data resides, you should answer Yes to the prompt "Was data stored by this program?". Then, the data and related information will be retrieved. 3. The data may be stored as: all observations of variable one followed by all observations of variable two, etc. This is in the same configuration as data stored by the BSDM routines, i.e., variables = rows and observations = columns. To retrieve the data, a Yes response to the prompt "Is the data in proper configuration...?" should be given. 4. The data may be stored as: all variables of observation one followed by all variables of observation two, etc. This is the transpose of what is expected by the BSDM routines, i.e., observations = rows, variables = columns. To retrieve this type of data a Yes response should be given to the prompt "Data stored as contiguous array with observations = rows...?". Notice that in cases 3 and 4, the data was stored by a program other than a statistics routine. Thus, no variable names or other auxiliary information will be stored along with the data. 8 As an example, suppose you have run your own program where you have created a file by storing data acquired from three sensors as it came in from the devices. A picture of five readings (observations) from the sensors would look like this: 1 Reading 2 3 4 5 Sensor 1 Sensor 2 Sensor 3 7.2 8.0 7.8 7.4 7.9 7.5 7.1 8.1 7.5 7.2 7.8 7.6 7.3 8.0 7.9 If the data were stored in this order: 7.2, 7.4, 7.1, 7.2, 7.3, 8.0,..., 7.5, 7.6, 7.9, then it is in what we call the proper configuration, and the situation is that described in note 3 above. Conversely, if the data were stored as: 7.2, 8.0, 7.8, 7.4, 7.9, 7.5, ... , 7.3, 8.0, 7.9, then it is the transpose of what is expected and the situation is that described in note 4 above. Keyboard Entry When entering data from the keyboard, an option to enter data one case at a time is offered. The following example will serve to explain this feature. Suppose an investigator has collected four observations on each of three variables. He has the following data matrix: Variable 1 2 3 1 10 2 5 Observation 2 11 2 6 3 9 3 7 4 9 2 6 He elects to enter the data one case at a time. Then, when the prompt "Observation #, all variables (separated by commas) = ?" is displayed, he enters 10, 2, 5 and presses CONT, etc. This allows for quick entry of the data. The other form of keyboard entry will prompt you at each observation for the required vari- able. Missing Values If you have missing values, use an unused number for a temporary code for a missing value. Subsequently you can change your values to the program's value of -9999999.99999 by using the TRANSFORM operation. Graphics Input Data may be input by digitizing from a graphics tablet. You may find this form of input very useful. The following diagram briefly describes the types of information requested by the program. Specify printer Specify raw data Choose graphics input mode of data entry ■ Specify graphics input device Input select code & bus code of device Input project title Input form of input, e.g. (x,y) pairs ■ Specify digitizing mode Specify sample size requirements Digitize chart limits Input numeric values of limits Digitize data 10 "Other" Input Because of the wide variety of formats that could be used when entering data from "other" devices, no attempt was made to program in the necessary statements. It will be necessary for you to provide the statements before using the program. Refer to the Operating Manual of the appropriate device for detailed instructions. In general, though, 1. Type: LOAD "F ILE1 " 2. Press: ( EXECUTE ) 3. Type: EDIT Other, in put ( EXECUTE ) 4. Change the to a 1 in line 1731: Other_inpi.it: Implemented^) 5. Press: ( ENTER) 6. Press: (PAUSE) 7. Type: EDIT 1 h e r i n ( EXECUTE ) 8. Type in and enter the appropriate statements for "other" input, referring to the Operat- ing Manual for the input device. Edit Object of Program This program is designed to allow you to perform a variety of editing procedures on your data set. The editing capabilities include: Correct a data value Correct an entire observation Delete a variable Delete an observation Add a variable Add an observation Insert an observation (in ordered data) Delete a subfile All of these operations may be performed repeatedly. For example, three variables may be added in succession. After the data matrix has been edited, you are given the option of listing the data. Special Considerations Order of Corrections As stated in the program note printed on the screen, the data is renumbered after deletions or insertions are performed. For this reason, if more than one deletion (insertion) is to be per- formed, it is recommended that the highest-numbered observation (or variable) be deleted, then the next highest-numbered, etc. For example, if observations three and eight are to be deleted, then it is recommended to delete observation eight first, then observation three. Notice that if observation three were deleted, first, the subsequent renumbering would move observation eight to position seven. The recommendation is meant to alleviate confusion which may occur due to the renumbering. If you delete several observations at once using the answering technique described in the Special Considerations section of BSDM General In- formations under "Parser", you do not need to worry about the renumbering problem. Your responses will be sorted from highest to lowest automatically. So to delete observations five through eight, just enter 5-8 and you will have no problems. 11 Subfiles Insertions or deletions of observations will affect the content of subfiles which exist at the time of editing. For example, if subfile one consists of the first 10 observations while subfile two consists of the last 20 and if observation five is deleted, then observation ten (formerly num- bered 11) will have jumped from subfile two to subfile one. Thus, it may be necessary to change the subfile structure after editing. It is recommended that subfiles be created only after all editing has been performed. Correcting Data Value(s) When correcting a data value, you must specify the variable number and observation number of the value to be corrected. Then, the old value is displayed prior to your correction so you can be sure you are altering the correct value. Correcting Observation(s) When correcting an entire observation, you specify the observation to be corrected. The old values are then listed on the screen and you may then enter the new values one-at-a-time. Adding Observation(s) In adding observations you will be asked to enter the number of observations that are to be placed at the end of the data matrix. Observations should be entered one-at-a-time with the data values separated by commas. If an observation is to be inserted, the position of the insertion must be specified by entering the number of the existing observation which the insertion will precede. For example, if an observation is to be inserted between observations 8 and 9, you must enter 9 when the prompt "Insertion to precede observation #?" is displayed. You will then be asked to enter the number of observations that are to be inserted at this point. Deleting Observation(s) You will be asked to enter the numbers corresponding to the observations to be deleted. They will be sorted and the observations will be deleted from highest-numbered to lowest- numbered to avoid renumbering confusion. Deleting Subfile(s) This option works the same as deleting observations. All you need to specify is the subfile number and all observations within the subfile will be deleted. All observations after the ones deleted will be renumbered. Deleting Variable(s) You will be asked to enter the numbers corresponding to variables to be deleted. They will be sorted and the variables will be deleted from highest-numbered to lowest-numbered. Exceeding Program Limitations If the addition of an observation or of a variable will exceed program limitations, these options will not be executed. Methods and Formulae The data matrix is redimensioned into a row vector to facilitate the shuffling of elements necessitated by the editing operations. The vector contains all the observations of variable one, followed by the observations of variable two, etc. When an observation is inserted, for example, the elements of the data vector are shuffled one-at-a-time to make room for the incoming observation. Similarly, when an observation is deleted, the remaining observations are "packed" together so that the resultant data vector has no "holes" between observations. 12 Transform Object of Program This procedure is designed to allow you to transform your data. The transformations available fall into three categories. Algebraic transformations allow you to perform the standard algebraic operations on one or two variables in the data set. There is also the capability for you to define your own transformation. The second category of transformations is the assigning of missing values. With this section you may assign any value in the data set to correspond to missing data. The final section is new variables. Here, you may perform such operations as generating uniform random numbers, standardizing variables, lagging variables, creating rank variables, sequence variables, and variables corresponding to subfiles. In all the sections the transformed results will be placed in a variable you specify, either old or newly-created. Hence, transformations on more than two variables may be performed iter- atively or via a transformation defined by you. Special Considerations Missing Values (Algebraic Transformations) None of the pre-specified algebraic transformations are applied to missing values. Thus, mis- sing values are unaffected by these transformations. However, this is not necessarily the case with the user-defined transformation. If you define a transformation and there are missing values, you must make provisions to ensure that the transformation is not applied to the missing values (unless, of course, this is desired). This may be accomplished as explained below. User-Defined (Algebraic Transformations) Before you start to run the Basic Statistics and Data Manipulation program, you should prepare your own transformation function and store it on the data storage medium. Consider the following example. Suppose your data set consists of four variables. There are missing values. You desire to form variable five as the sum of the exponential of variables one and three. If there is a missing value in either of these variables, you wish to assign a missing value to the transformed variable. Recall that the data is of the form D(J,I) where J is the variable number and I is the observation number. In the transformation routine the variable Z is used to denote the variable where the transformed data is to be stored. Thus, to accomplish the above- described transformation, follow the instructions below: 1. Insert a flexible disc into the internal disc drive. 2. Type: SCRATCH A ( EXECUTE ) 3. Press: EDIT ( EXECUTE ) 4. Now you should be able to see line number = 10" on the upper-left corner of the CRT. Start to type in your function as a subroutine. Press (ENTER) after each line. For example: 10! A comment to identify perhaps your file n a m e • 20 SUB Function <D<*) »Z .1 ) (Note: This line must be exactly the same as above.) 30 IF D(l .1)0-9933939 4 99999 AND D(3 .1 ) 0-9999999. 93333 THEN SO 40 D(Z .1 )=-9999993. 33999 13 50 SO TO 80 BO D(Z tI)=EXP(D( 1 »I ) )+EXP(E(3 tl) ) 70 ! Note: The value of Z will be asked by the program. You must specify the variable numbers for the right hand side of the equation (i.e., 1 and 3) 80 SUBEND (Note: This line must be the last line of the subroutine) 5. Press: ( CLR SCR ] 6. Type: STORE "your filename : mass storage identifier" ( EXECUTE ) Now you can proceed with data entry through BSDM. Declaring Missing Values This section allows you to assign missing values to any or all of the variables in the data set. It may be used successively so that you can assign different missing values to each variable or different sets of variables. The program asks you to enter the variables to which a missing value is to be assigned. You are then asked what numbers are to be considered missing values for that group of variables. Then, these variables are scanned and all missing values are transformed to -9999999.99999, which is the standard missing value code. Create Rank Variables This operation will take a variable, rank its values in ascending order, and place the resulting ranks in the variable specified by you. As an example, consider the following variable which has four observations. Variable 1 23 25 29 20 You could create a second variable which contains the ranks corresponding to the observa- tions in the first variable. You would obtain the following: Variable 1 Variable 2 23 2 25 3 29 4 20 1 14 Creating Variables by Subfile This option may only be used when a subfile structure is present. If used, this option will assign the subfile number associated with each observation to the specified variable. For a simple example, suppose you have a data set with one variable containing five observa- tions. Subfile one consists of the first two observations, while subfile two has the last three observations. In this case, you could create a second variable whose observations correspond to the subfile numbers associated with the original variable. This variable would look like the following. Variable 2 1 1 2 2 2 Creating Variables by Sequence Number By selecting this option, you can place the observation numbers in a specified variable. For example, in a data set with five observations, you could create a second variable which would look like the following: Variable 2 1 2 3 4 5 Creating Standardized Score Variables In this option, a chosen variable is standardized by the following formula: New Variable = Specified Variable - Mean of Specified Variable Standard Deviation of Specified Variable The new variable can be placed in any variable you specify. Notice that standardized variables have a mean of zero and a standard deviation of one. Creating Lag Variables The lag variable operation will take the value of a chosen variable n-lags before and use it as the current observation of the lagged variable being created. As an example, consider the following data set: Var.l Var.2 1 2 3 2 1 4 Obs.# 3 4 6 4 1 2 5 2 4 15 We can create variable 3 by lagging variable two by one lag. We can also create variable four by lagging variable one by two lags. We would obtain the following: Var.l Var.2 Var.3 Var.4 1 2 3 MV MV 2 1 4 3 MV Obs.# 3 4 6 4 2 4 1 2 6 1 5 2 4 2 4 Notice that missing values are placed in the first n observations of an n-lag variable since lagged values cannot be assigned. Creating Uniform Random Number Variables This option allows you to generate uniform random numbers between zero and one and have them placed in a variable of your choice. As an example of the use of this option, you could select a random sample of the observations in your data set to be used in a subsequent analysis. To do this, you could first use the uniform random number option to assign a uniform random number to each observation. Then, you could use the select procedure (described later in this manual) to chose a portion of the data set based on the uniform random numbers. For example, if you selected observa- tions that had a corresponding random number value between zero and one-half, you expect to have selected about one-half of your data set. Recode Object of Program This program allows you to assign codes to various categories or classes of data. The categor- ies are intervals along the real number line and 20 of these may be specified. The recoding is done on one variable at a time. The same coding scheme may be used iteratively on succes- sive variables. A summary of the coding intervals, codes, and number of observations assigned to each code is printed as hard copy. Special Considerations Coding Schemes Four coding schemes are available for the sole purpose of eliminating unnecessary entries from the keyboard. If the coding intervals are all of the same length and are contiguous, that is, together they form a connected interval, then the interval construction can be accom- plished internally knowing only the interval length and lower limit for the first interval. Similar- ly, if the intervals are of equal length but noncontiguous, for example, [10,20), [25,35), [35,45), [50,60) then the lower limit of each interval needs to be specified but the upper limit may be com- puted internally. Hence, the coding schemes are meant only to minimize the amount of in- formation which needs to be entered from the keyboard. Clearly, the coding intervals could all be constructed by requiring you to enter the lower and upper limits for each and every interval (which is necessary, and what is done if the intervals are unequal and non- contiguous). 16 Coding is carried out one observation at at time. If you wish to recode more than one variable you must use the procedure successively, once for each variable to be recoded. Listed below are the available recoding options. 1. Contiguous intervals of equal length 2. Contiguous intervals of unequal length 3. Non-contiguous intervals of equal length 4. Non-contiguous intervals of unequal length Option 1 will recode a variable into equally spaced intervals that are side by side. The second option will recode based on intervals of unequal length that are side by side. Options 3 and 4 will recode into intervals that need not be side by side. For equally spaced intervals, use option 3 and for unequally spaced intervals use option 4. Brackets The brackets used to denote the coding intervals are meant to follow their usual mathematical interpretation, that is, the intervals are closed on the left and open on the right. Hence, if you want a value to fall into a certain interval, make sure it is strictly less than the upper limit for the interval. Observations Which Do Not Fall in an Interval If an observation does not fall into any of the coding intervals, a table will appear giving you three options on how to handle these values. You may either 1) leave them unrecoded, 2) assign them a special code, or 3) assign them the missing value code. Sort Object of Program This program allows the data matrix, or individual subfiles of the data matrix, to be sorted according to the values of one variable. For example, suppose you have five observations of three variables, say height, weight and age and want to arrange the observations in ascending order according to age. This is accomplished by sorting the data matrix according to variable three. The data may be sorted in ascending or descending order. If you want to perform a hierarchical sort, the sort procedure must be used successively. For example, suppose you wish to sort a data set on weight and within weight by age. To do this, you should first sort on age and then use the sort procedure again and sort on weight. The sort procedure also sorts either in ascending or descending order. A sort in ascending order will place the observations in order from lowest to highest based on the variable sorted. A descending-order sort will put the observations in order from highest to lowest. 17 Special Considerations Subfile Structure Options If subfiles are ignored, the entire data set will be sorted and, in the process, the composition of the subfiles is subject to change. The option of sorting certain subfiles may be used to sort a single subfile or a set of successive subfiles according to one variable. The option of sorting all subfiles may be used to sort each and every subfile. The options of sorting certain subfiles and sorting all subfiles treat each subfile as if it were a separate data set. Thus, the sort is done with respect to one subfile at a time. What Happens It is important to note that entire observations are moved when the sort is carried out. Thus, referring to the example given in the Object of Program section above, a person's height and weight remain with the person's age as shown below. Original Data Set Variable Height Weight Age 1 72 170 21 2 70 165 25 Observation 3 69 150 20 4 70 165 25 5 73 160 19 Data Set Sorted by Age Variable Height Weight Age 1 73 160 19 2 69 150 20 Observation 3 72 170 21 4 70 165 25 5 70 165 25 18 Subfiles Object of Program This program allows you to specify subfiles or logical groupings of the observations. This may be accomplished by entering the number of observations in each subfile or by entering the observation number of the first observation in each subfile. A third option is to create subfiles for each level of a specified variable. Names for the subfiles are entered in all cases. A fourth option allows you to destroy the existing subfile structure. Special Considerations Use of Subfiles Subfiles may be created in order to specify logical groupings of observations. A subfile struc- ture allows you to consider each subfile as a separate data set or to lump all the subfiles together and analyze the overall data set. For example, suppose you want to determine the output generated each day by each of three shifts. You would like to analyze the data separ- ately for each of the three shifts as well as for the work force as a whole. You could form three separate data sets and do the individual analyses, then later join the three sets together for the overall analysis. However, since the same variables were measured for each of the shifts, the situation is well handled by specifying a subfile for each shift. The subfile structure options make it possible to do the analysis by subfile as well as for the overall data set. Change Names Object of Program This program allows you to rename the data set, to rename variables and/or to rename sub- files. These names are then stored, along with the data, on the program medium's scratch file ("DATA"). You may change a single variable or subfile name, or you may change a set of names. Store Data Object of Program This program allows you to store the entire data matrix and related information in a file so that it may be retrieved at a later date for further analysis. Alternatively, a subset of the data matrix may be stored by specifying which variables and/or subfiles are to be saved. 19 Special Considerations Use of Program The store feature will be useful in two different situations. First, if an investigator has a data set which he may want to analyze further at a later date, he may store it and retrieve it later via the E&asic Statistics and Data Manipulation Start routine. Secondly, if several people have access to the data input programs, it becomes mandatory that each be able to store his data set in a unique place. Note that if only one person uses the routine on one data set it is unnecessary to use the store feature since the data and related information are kept in "DATA" - the scratch file on the program medium. Protecting Existing Data The existence of a file is checked in the program in an attempt to avoid the accidental loss of existing data. Thus, when a file is specified to receive the data, an attempt is made to ensure that you are not accidentally storing the new data in a file which you did not know existed. List Object of Program This program allows you to obtain a listing of the data matrix. The listing will appear on the device that has been specified for hard-copy in the Start routine or in the Output Unit routine. You can list all the data, or a specified subset of the data. You may also specify how you want the data listed, i.e., by observation, by variable, etc. Join Object of Program This progam allows you to join or combine two data sets into a single unit. One data set must be in memory and the other data set must have been previously stored by the Basic Statistics and Data Manipulation program. Two options are available. First, observations may be added together (if both sets have the same number of variables). Second, variables may be added together (if both sets have the same number of observations). A check is made in the program to make sure the two sets can be joined. Also, summary information on both data sets is printed before the joining operation is performed. Thus, the joining can be aborted if the resultant set will not be as expected. 20 Special Considerations Adding Observations Suppose data on six variables was gathered in each of the 52 weeks in 1975, analyzed, and stored on an auxiliary data disc. Suppose the same variables were measured in 1976, analy- zed, and stored. If you are interested in lumping the two sets of data together for an overall analysis, you may use the Add Observations option of the joining routine. One set of data must be retrieved via the Start routine. Then, after entering the Join routine, the second set may be retrieved and the joining carried out. Notice that the variables must be in the same order in the two data sets. Adding Variables Suppose you measured five variables on each of 50 subjects in an experiment. These were analyzed and stored on disc. Later, you realize that three more variables are of interest. You measure these variables on the subjects in the same order as before and analyze them. All eight variables measured on each subject could be combined into a single data set via the joining routine. Subfiles If variables are added, the subfile structure assigned to the resultant data set is the subfile structure of data set #1, that is, the data set that is in machine memory prior to the joining operation. If observations are added, the following procedures are employed: 1) If no subfiles exist in either data set, the resultant data set has no subfiles. 2) If data set #1 has no subfiles, but data set #2 does, then a subfile named "SET #1" is created which consists of data set #1 and the subfiles of data set #2 remain unchanged. 3) If data set #1 contains subfiles, but data set #2 does not, then a subfile named "SET #2" is created which consists of data set #2 and the subfiles of data set #1 remain unchanged. 4) If both data sets contain subfiles, all of the subfiles of data set #1 are retained and as many subfiles of data set #2 are retained as possible - the upper limit of total subfiles for the resultant set being determined by the prog- ram limitations (see Special Considerations of Basic Statistics and Data Manipulation). Printer Is Object of Program This program allows you to specify the device on which the hard-copy output will be printed, or conversely, to specify that no hard-copy is desired, i.e., that output be directed to the CRT. Special Considerations The hard copy option can be changed in two ways: 1. Select "PRINTER" key when you are asked to "SELECT ANY KEY". 2. This option can only be used when the program is not expecting a n answer. For example, when Notes are displayed on the CRT and you are asked to press ( CONTINUE ) when ready. The printer may be changed as follows: 21 For Non-HP-IB Printer: 1. Type: H c = (the select code of the desired printer) ( EXECUTE ) 2. Type: H c b u s = 993 ( EXECUTE ) For HP-IB Printer: 1. Type: He = (the select code of the desired printer) ( EXECUTE ) 2. Type: H c b u s = (the bus address of the HP-IB device) ( EXECUTE ) Select and Scan Object of Program This program allows you to look at a portion of your data set that satisfies a conditional statement. If you are scanning the data set, your output will include the observation numbers satisfying the scanning criterion and their distribution throughout the subfile structure. The data set which you are scanning will remain unaltered. When using the select option, your output will be the same as scanning, but the data set will be reduced to just those observations satisfying the selection criterion. Remember, the BACKUP file (explained in Special Consid- erations of Basic Statistics and Data Manipulation) will contain the original data set. The selec- tion and scanning procedure may be performed over all subfiles or over a user-specified subset of the data. Specieil Considerations There are four different scanning or selection criteria offered in this routine. Explanations of each conditional statement follow. One Variable This option will allow you to "edit" your data set based on specified values for one chosen variable. For example, you may scan (or select from) your data set based on variable number two and have the routine report the observations where variable two has any of the following values: 1, 2.6, 4-8. Variable A OR Variable B This option will allow you to "edit" your data set based on specified values of two chosen variables. An OR operation links the two variables. For example, if two of your variables are temperature and humidity, you may want to select (or scan) all observations that have a temperature of 70-80 degrees, OR have a humidity level of 50-80. Variable A AND Variable B This option performs much like the OR option except is uses an AND operator. For example, you may want to select (or scan) all observations that have a temperature of 72 degrees AND a humidity level of 50-80. 22 Variable A = Variable B In this case the observations that would be selected (or scanned) are the observations where Variable A has the same value as Variable B. For example, you might want to know which observations have equal temperature and humidity level. Basic Statistics Object of Program This program computes a variety of summary statistics for data which was entered via the Start routine of Basic Statistics and Data Manipulation. The statistics may be computed by subfile or for the entire data set (ignoring subfiles). Basic statistics which are computed include: number of observations, number of missing values, sum, mean, variance, standard deviation, coeffi- cient of skewness, coefficient of kurtosis, coefficient of variation, standard error of the mean, and a confidence interval on the mean. An option is available to compute a correlation matrix for data sets having more than one variable. Order statistics computed include: the maximum, the minimum, range, and midrange. Additional order statistics which may be computed in- clude: the median, 25th percentile, 75th percentile, Tukey's middlemeans, and user-specified percentiles. These statistics are divided into three groups. You may specify any or all of the groups for output. Special Considerations Parser on Statistics Options Three options for statistics will be offered. They are 1) the common summary statistics, 2) the correlation matrix, and 3) the order statistics such maximum minimum, median, etc. You may respond "ALL" to the prompt asking you for your choice of options. Or, you may choose a portion of the options by responding as documented in the General Information section of Basic Statistics and Data Manipulation e.g., 1-2. Data Type If the data input type is not "RAW DATA", the Basic Statistics may not be computed. For example, Basic Statistics cannot be computed if the covariance matrix was entered as data. Hard-Copy Output If a hard copy of the statistics is not being made, the program halts occasionally so that you may study the results on the CRT. In this case, simply press CONTINUE to continue program execution. Additional Order Statistics If the option to obtain additional order statistics (Tukey's middlemeans and percentiles) is exercised, the data matrix is sorted and the observations of each variable are arranged in ascending order. At the end of the program the original data matrix is re-loaded into memory. Thus, if the program is aborted, that is, if the program is stopped before the reloading can occur, the data matrix will be in the sorted state. So, if the portion of the program used to calculate additional order statistics is accessed, abortion of the program is discouraged. 23 1 2 3 4 5 1 5 M 3 4 5 -E 2 6 7 M 6 4 3 1 3 2 1 1 Methods and Formulae Variance: The best unbiased estimator is calculated by these programs, i.e., the denominator in the formula is N-l, where N is the number of observations used in the calculation. Correlations: Suppose you have the following data matrix: OBSERVATION VARIABLE Here, an M denotes a missing value. When computing the correlation between variables 1 and 2, we discard observations 2 and 3 since variable 1 is missing a data value for observation 2 and variable 2 is missing the data value for observation 3. However, when computing the correlation between variables 1 and 3, we need only discard observation 2. Similarly, the correlation between variables 2 and 3 is computed by discarding observation 3. Hence, the correlations may be based on different numbers of observations. An observation is thrown out if a data value from that observation is missing from one of the two variables for which the correlation is being computed. Tukey's Middlemeans Midmean: The midmean is the sum of all observations between (and including, if applicable) the 25th and 75th percentiles divided by the number of observations between those two percentiles. That is, it is the mean of all observations between the 25th and 75th percentiles. Trimean: The trimean is a weighted average of the median and the 25th and 75th percentiles: (1/4) (25th percentile + 2(median) + 75th percentile). Midspread: The midspread is the difference between the 75th and 25th percentiles: 75th percentile - 25th percentile. Go To Advanced Stat Objective This procedure loads a file which prompts you to remove the BSDM program medium and insert the desired advanced statistics program medium into the mass storage device. You press CONTINUE after you have made this change. The new routines are then prepared to carry on further analyses on the data set in memory. 24 Return To BSDM Objective This procedure operates in the reverse of "Go To Advanced Stat" and should be used when you wish to return to the BSDM routines from an advanced statistics routine. Backup Objective This routine allows you to transfer the original data which is stored in the file called "BACK- UP" to the program scratch file called "DATA". You might find this useful in a case where the data currently in the "DATA" file is not the data you wish to be analyzing. This could occur, for example, if you inadvertantly stored a transformed variable in place of one of your original variables. Note that no operations, including editing, are performed on the data stored on the "BACKUP" file. This routine also allows you to transfer the data set in the opposite direction. That is, you may transfer the data stored in "DATA" to the "BACKUP" file. You might choose to do this after you have edited the original data set but before you perform any other operations. Then, the "BACKUP" file would contain the corrected original data without any further manipulations or modifications. 25 Examples Example 1 This is a hypothetical set of data from a non-existent factory. The purpose of this example is to show the use, in part, of the LIST, EDIT TRANSFORM, SORT, SUBFILE, and STATS routines. BASIC STATISTICS AND DATA MANIPULATION [Answer all yes/no questions with Y / N 1 Are you SoinS to user defined transformation or do No n -linear regression ? (Y/N) N Are you usinq an HP IB Printer? YES Enter select code > bus address (if 7il press CONT) We input these values separated by a comma or press CONTINUE if default (7,1) is correct ^ *^ fl^- t^ -^ ^ <^ ^ ^. ^ ^t ^ ^ ^ flt Jft ^ J^ ^ ^ J^ P^ ^ ^(. V. ^ JH J^ J^ J^. j^ J|t ^. V ^ J|t ^ J^ ^. ^k. J^ j^t 7f( Jfs. /ft Y J(t ^fC/)(Jfv?fC/f(3f(i)fC^Cj , |(^jftJf( ?(C ?jC »ff J(C ?fC J|( JfC Jff )(C )|C 5t^ )(t 3fC )f( Y 'K * -V t 1 * 5|C )ft * DATA MANIPULATION * fl* T* ♦ *r *r t t* t * * ™ * * * T* * ™ * * * * * * ^ ™ ^ ^ * ^ * * * * t ™ ^ * ^ * * ^ ^ * * * ^ ^ ^ * ^ ^ * ™ * ^ t^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^k ^s ^ ^t ^k ^ ^t ^C Jfi ^t jji JfC jpt JfC Enter DATA TYPE: i Mode nu fiber ~ ? Raw data Project title for this dota set (<= 80 characters) HYPOTHETICAL FACTORY DATA Number of variables = The data will be entered by typing it in on ? the keyboard. Title WuMber of observations/variable = ? 17 Variable * i nane (<= 10 characters) ? Nv No TEMP<C) Variable # 2 nane (<= 10 characters) 7 Label for variable 1 PRODUCTION Variable # 3 nane (<= 10 characters) = ? Label for variable 2 DAYS Variable # 4 nane (<= 10 characters) 7 Label for variable 3 PAYROLL Variable # 5 nane <<= 10 characters) = ? Label for variable 4 WATER USE Is above infor nation correct? YES Label for variable 5 Approve information on CRT (shown below). 26 HYPOTHETICAL FACTORY DATA Data file none t Data type is: Raw data Nunber of observations: 17 Nunber of variables: 5 Variable nanes: i. TEMP(C) 2. PRODUCTION 3. DAYS 4. PAYROLL 5. WATER USE Do you want to enter data one case at a tine, i , e , ,, by observation? All variables will be entered separately by v f c , , , , ■. ■, , . j l n commas. Observation # i , all variables (separated by conr-ias) = 7 14.9,6396,21,134,3373 Observation # 2 , all variables (separated by connas) - 7 18,4,5736,22,146,3110 Observation # 3 , all variables (separated by cowr-ias) = 7 21 .6,6116,22,158,3180 Observation * 4 , all variables (separated by cocrnas) = ? 25.2,8287,20,171,3293 Observation * 5 , all variables (separated by con«as) = 7 26.3,13313,25,198,3390 Observation * 6 , all variables (separated by connas) = ? 27.2,13108,23,194,4287 Observation # 7 , all variables (separated by coMfias) = ? 22.2,10768,20,180,3852 Observation # 8 , all variables (separated by commos) = ? 17. It 12173, 23, 191. ,3366 Observation * 9 , all variables (separated by cot-mas) = 7 12.5 ,11390,20,195,3532 Observation # 10 , all variables (separated by cownas) = 7 6.9,12707,20 ,192,3614 Observation * 11 , all variables (separated by comas) = ? 6.4,15022,22,200 ,3896 Observation # 12 , all variables (separated by connas) = 7 13.3,13114,19,211 ,3437 Observation # 13 , all variables (separated by conwas) = ? 18.2, 1 2257 , 22 , 203 , 3324 Observation # 14 , all variables (separated by carinas) = 7 22.8,13118,22,197,3214 Observation * 15 , all variables (separated by connas) - ? 26. 1 ,13100,21,196,4345 Observation # 16 , all variables (separated by cotwas! = 27 26.3,16716,21,205,4936 Observation * 17 , all variables (separated by commas) = ? 4,2,14056,22,205,3624 PROGRAM NOW STORING DATA ON SCRATCH DATA FILE AND BACKUP FILE- SELECT ANY KEY Option number = ? LIST Select Special Function Key-LIST List all the data Enter method for listing data: 3 In tabular form HYPOTHETICAL FACTORY DATA Data type is; Raw data Variable * 1 Variable # 2 Var iab' l.e * 3 Variable * 4 Variable \ 5 (TEMP(C) ) (PRODUCTION) (DAYS ) (PAYROLL. ) (WATER USE ) OBS* 1 14,90000 6396.00000 21 , 00000 134, 000 00 3373 , 00000 p 18,40000 5736.0 0000 22 , ooonc 146, 000 3110 , 00000 3 21 ,60000 6116, 00000 22 , 00000 158,00000 3180 , 00000 4 25.20000 8287, 000 20 ,00000 1 7 1 , 3293 , 5 26,30000 13313,00000 25 ,00000 198, 00 000 3390 ,00000 6 27.200 00 13108,0000 23 ,00000 194, 0000 4287. 7 22,20000 10768,00000 20 , ,00000 180 .00000 3852 , 00000 8 17. 10000 12173. 00000 23, 00000 191.00000 3366, 00000 9 12,50000 11390.00000 20, ,00000 195,00000 3532 , 00000 10 6,90000 12707.00000 20 , 00000 .1.92, 00000 3614, 11 6,40000 15022,00000 22, 00000 200 , 00000 3896 , 00000 12 13.30000 13114.00000 19, 00000 211 , 00 000 3437, 13 18.20000 12257,00000 22. 00000 203, 00000 3324 , 00000 14 22.80000 13118,00000 22, 00000 197,00000 3214. 15 26.10000 13100,00000 21 , 00000 196, 00000 4345 , 00000 16 26,30000 16716.00000 21 , 00000 205, 00000 4936, 00000 17 4.20000 14056,00000 22, .00000 205, 00000 3624. 00000 Option number = SELECT ANY KEY Select option desired 1 EDIT ROUTINES Observation nunber (enter 'NONE' when done) 11 Variable number = ? Old value = 15022 — Correct value = ? 15024 OBS VAR OLD NEW # # VALUE VALUE 11 2 15022,00000 15024.00000 Observation number (enter 'NONE' when done) = Exit List routine Select Special Function Key-EDIT Choose to correct a data value. At observation #11 For variable 2 Should be 15024 28 NONE Select option desired Which observations are to be deleted ? i!) Observation * i deleted, 16 observations rectain. Select option desire d : Delete an observation Are o b s ■!>. r v a t i o n s ordered, i e sb ou Id add i t i ons NO How Many observations are to be added' Add an observation i. n s e r t e d "/ Add at the end 1 Enter observation * 17 (variables separated by connas! ■> 4 .2, 12707, 20. ,192, 3614 Observation # 17 Variable * 1 = 4.2 Observation # 17 Variable * 2 = 12707 Observation * 17 Variable * 3 = 2 Observation * 17 Variable * 4 = 192 Observation # 17 Variable * 5 = 3614 Total nuwber of observations now = IV Select option desired : New observation #17 PROGRAM NOU UPDATING SCRATCH DATA PILE SELECT ANY KEY LIST Option nunber = 1 1 Enter Method for listing data: 3 Exit Edit routines Select Special Function Key-LIST List all the data In tabular form HYPOTHETICAL FACTORY DATA Data type is: Raw data Variable * 1 Variable * 2 Variable # 3 Var tab * te * 4 Variable * 5 (TEMP(C) ) (PRODUCTION) (DAYS ) (PAYROI. ,.L ) (WATER USE ) OBS* i 14.90000 6396 , 00000 21 , , 00000 134 ,00000 3373, 00000 2 18,40000 5736.000 00 22, ,00000 146 00000 3110, 00000 3 21 .60000 6116.00000 22, , 00000 158 ,00000 318 0. 00000 4 25,20000 8287. 00000 20, 00000 171 , 3293 , 5 26,30000 13313.00000 25, 00000 198 ,00000 3390 00000 6 27.20000 13108,00000 23, 00000 194. 00000 4287 . 7 22.20000 10768,00000 20 , ,00000 180 ,00000 3852, 00000 8 17, 10000 12173.00000 23, 00000 191. 3366 , 00000 9 12.50000 11390.00000 20, .00000 195 , 00000 3532 00000 10 6,40000 15024. 00000 22, 00000 20 0, 00000 3896. 00000 11 13.30000 13114,00000 19 00000 211 .00000 3437 00000 12 18,20000 12257. 00000 c. f... . ,00000 20 3, 00000 3324 . 00000 13 22,80000 13118,00000 22, , 00000 197 , 00000 3214 . 00000 14 26,100 00 13100 ,00000 21 , ,00000 196. ,00000 4345, 15 26.30000 16716, 00000 21, ,00000 205 ,00000 4936 . 00000 16 4.20000 14056. ooono 22 , 00000 20 5. 3624. 00000 17 4,20000 12707,00000 20, ,00000 192 ,00000 3614 , 00000 29 Option nunber = ? SELECT ANY KEY Enter option number desired Name of data f.ile - ? HYPO: INTERNAL Is data medium placed in device INTERNAL ? YES PROGRAM NOW STORING DATA ON HYPO • INTERNAL Is program medium replaced in device? YES Enter option nunbsr desired : SELECT ANY KEY Exit List routine Select Special Function Key labeled-STORE Store all the data On this file on our floppy Exit Store routine Select option desired : Transformation niiciber : = "> TRANSFORMATION ROUTINES Select Special Function Key labeled-TRANSFORM Algebraic transformations Variable nticiber corresponding to X = ? a*(Xtb) + c Parameter a :=: ? ,2642 Parameter b = ? i Paraweter c = '> To convert liters to gallons X 6 = .2642X 5 Store transformed data in Variable * ( < = 6 ) ? Variable name (<= 10 characters) = ? GALLONS Is above information correct? X 6 now called GALLONS. YES press 'CONTINUE' when ready The following transformation was performed; a*(X A b)+c where a = .2642 b = i c = X is Variable # S Transformed data is stored in Variable * 6 (GALLONS) Select option desired : PROGRAM NOW UPDATING SCRATCH DATA FILE SELECT ANY KEY Exit transformation routine 30 SORT ROUTINES ENTER OPTION NUMBER DESIRED NuMber of the Variable on which to sort Select Special Function Key labeled-SORT Sort in ascending order 3 On variable 3 (Days in month) Data se t i HYPOTHETICAL FACTORY DATA has been arranged in ascending order according to Variable * 3 ENTER OPTION NUMBER DESIRED PROGRAM NOW UPDATING SCRATCH DATA FILE SELECT ANY KEY Option lumber -•• f 1. E n t e r M t> t h o d f o r 1. i <■■, t i n a cl a t a : 3 LIST Exit sort routine Select Special Function Key labeled-LIST List all the data In tabular form HYPOTHETICAL FACTORY DATA Data type is: Raw data Variable # 1 Variable ♦ 2 Var i ab 3 l.e * 3 Var iabj Le # 4 Variable # 5 < TEMP < C > > (PRODUCTION) (DAYS ) (P AYR 01. J... ) (WATER USE ) DBS* i 13.30 000 13114.0 00 00 19 211 3437 .000 2 22.20 10768.00000 20 .00000 180 . 3852 . 3 25.20 8287 .00000 20 .000 171 .000 3293 . 4 4.20000 12707.00000 20 . 192 . 3614 . S 12.50000 11390 .00000 20 3.95 3532 . 6 26.3000 16716.00000 21 205 . 4936 .00000 7 26. iOOOO 1310 0.000 21 196. 4345 .00000 8 14.90000 6396 .00000 21 134 . 3373 .00000 9 6.40000 15024.00000 22 20 0. 3896 . 10 21 .60000 6116.00 000 22. 158 .000 3180 .000 ii 18.20000 12257.0 00 00 22. 203. 3324 .00000 12 22.800 13118.000 00 22 197 . 3214 .00000 13 10. 40000 5736 .00000 146. 3110 . 14 4.20 00 14056. 000 00 22. 205 3624 15 27.20 00 13108.00000 23. 194 4287 16 17. 10000 12173. 00000 23 191 . 3366 . 17 26.30 000 13313.00000 25 198. 3390 Variable * 6 ( GALLONS ) DBS* 1 908.05540 (L 1017.69840 3 870 .01060 4 954.81880 5 933. 15440 6 1304.09120 31 7 1147.94900 B 891 .14660 9 1029.32320 10 840.15600 11 878.20080 12 849.13880 13 821.66200 14 957.46 080 15 1132.62540 16 889.29720 17 895.6380 Dp t i on SELECT nu«bsr =■ f ANY KEY SUBFILE Option nuriber Exit list routine Select Special Function Key labeled-SUBFILES Nu nber of subfiles ( <=20 Ncine of Subfile # 1 '. < "- 10 characters ) ? Select method of subfile specifications which ask you to enter the first observation in each subfile. FY '76 Ncme of Sub ft ? I ( = \ (l (- h rj r O r t (> ,'" - ) Subfile I 2 •. nurtber of first observation ? 1. 3 Is the above information correct ? YES 8 u b f i 1 e none: be g i n n i. n u o b s e r u a t i o n n u ri b e r of o b s e r v a t i o n s 1, FY '76 i 12 2, FY '77 13 5 Summary Option notiber = f PROGRAM NOW STORING DATA SELECT ANY KEY Exit subfile routine BASIC STATISTICS ROUTINES U) h a t s t a t i s t .i. c o r. » t i o n s a r e d e s i r Select Special Function Key labeled-STATS 1 Mean, Ci, Variance, Standard Deviation, Skewness, Kurtosis VARIABLES ? ALL Compute statistics for all variables Confidence coefficient for confidence interval on the Mean<e,g, 90,95,992) :::: '> 95 Option nu nber = ? What subfiles are desired "> i Compute statistics for selected subfiles, For FY76 32 * **« A*********** *********** K******************** ********************* ********* * SUMMARY STATISTICS * * ON DATA SET: * * HYPOTHETICAL FACTORY DATA * Subfile: FY'76 BASIC STATISTICS VARIABLE * OF # OF NAME OBS . MI SS SUM MEAN VARIANCE STD . DEV . TEMP(C) 12 2.1.3 .7000 17. 8083 56.9572 7.5470 PRODUCTION 12 138993 .000 11582. 750 10478676.7500 3237 . 0784 DAYS 12 250 .0000 20 . 8333 i .0606 1 .0299 PAYROLL. 12 2242 .0000 186. 8333 504 5152 22.4614 WATER USE 12 43996 . 3666 3333 274270 .7879 523.7087 GALLONS 12 11623 7432 968. 6453 19144.5508 138.3638 VARIABLE COEFF •ICIENT STD. ERROR NAME OF VARIATION OF MEAN TEMP CO 42.37903 2. 17863 PRODUCTION 27.94741 934.46405 DAYS 4 . 94332 . 29729 PAYROLL 12.02217 6.48405 WATER USE 14.28426 151. 18168 GALLONS 14.28426 39.94220 95 7. CONFIDENCE INTERVAL LOWER LIMIT UPPER LIMIT 13 .01195 22 .60471 9525 .47409 13640 .02591 20 .17882 21 . 48784 172 . 55832 201 .10834 3333 . 49825 3999 .16841 880 .71024 1056 .58 030 VARIABLE SKEWNESS KURTOSIS TEMP CO PRODUCTION DAYS PAYROLL WATER USE GALLONS -.53473 -.42217 --. 18352 •1.22848 1 . 34739 1 . 34739 - . 96332 -.66250 •1.18041 .55306 . 89749 . 89749 What statistic: options arc <:!< 1 VARIABLES=? A I. L Mean, Ci, Variance, Standard Deviation, Skewness, Kurtosis Compute statistics for all variables C o n f i d ence r.: o e. f f i <:: j. e n t f o r <:: o n f i cl e n c t> .i. n t e r v a I a n 1 h e m e a n ( e . a 95 Op 1 i on nuwber = ? <?0 : 95 ,9?) ? What subfiles are desired Compute statistics for selected subfiles. For subfile FY77 33 a*************************************** w^ * SUMMARY STATISTICS * * (3N DATA SET: * * HYPOTHETICAL FACTORY DATA * Subfile: FY '77 BASIC STATISTICS VARIABLE # OF # 01" NAME OK IS. MI :ss SUM MEAN VARIANCE STD . DEV . TEMP<C) 1 1 93 .2000 18. 640 85.7230 9 . 2587 PRODUCTION S 58386 ,000 11677 20 11481348.70 3388.4139 DAYS s 115 23. .0000 1 .5000 1.2247 PAYROLL s 934 . 186. 8000 547.7000 23.4030 WATER USE s 17777 .0000 3555 4000 20 0388.8000 447.6481 GALLONS s 4696 , 6834 939 3367 13987.4669 118.2686 VARIABLE COEFFICIENT NAME OF VARIATION TEMP CO 49.67099 PRODUCTION 29. 01735 DAYS 5.32498 PAYROLL 12.52837 WATER USE 12.59065 GALLONS 12.59065 ITD. IRROR OF MEAN 95 7. CONFIDENCE INTERVAL LOWER LIMIT UPPER LIMIT 4 . 14060 7 .14334 30 . 13666 15 .34476 7469 .74622 15884 . 65378 . 54772 21. .47921 24 .52079 10 .46614 157 .740 09 215 .85991 .19431 2999 .54742 4111 . 25258 52 .89134 792 .480 43 1086 .19293 VARIABLE TEMP(C> PRODUCTION DAYS PAYROLL WATER USE GALLONS NESS KURTOSIS - .68247 -.77608 1 . 35662 .05662 .91287 -.50000 1 .30917 .02054 .91055 - . 44827 .91055 -- . 44827 What statistic opt. tons ums cles.irad '•> VARIABLES" ? ALL Option n u fiber ::: v Correlation matrix Compute statistics for all variables What s i) b f i 1 e ;?: a r e. cl t> sir- e cl f i ,2 Compute statistics on selected subfiles. 34 ,1s u. W/ vju s^ J/ \i/ ^ \ii* si/ "J/ ■*!/ si/ si/ sV \1/ si/ sJ/ si/sl/sii'si/sj/^sL'st ^ ^ sk si/ si/ si/ *f/ yi/ 4* ^ si/ si/ si/ s^ W ^ si/ ^ ^1/ i" >fr \V "^ si/ si/ si/ J/ ^ ^ ^ >t ^ ^ 4 >t ^ ^\t ^ ^ W ^t \V 'J/ 4>k^4/ st' sV sV \l/ "A* \iy ?fi /fi Jfi sp. /f. /J-. Sf- 'p-^'p'r- TN'iV'p.^.i'n^'p-'r^^T^'r. ^ ^s /|s fl\ iy. ^ /p. i^^^^Jls^^^^^^^^^^^^.x^^^^^^^^^^^^^^^^^^^^^^^^^^^^*r , 'Ts'i>'r' * SUMMARY STATISTICS * * ON DATA SET ■■ * * HYPOTHETICAL FACTORY DATA * \i/ \t/ sL- si/ si/ U/ si/ \i/ si/ ^ si/ si/ si/ ^/ si/ \L- J/ \1/ \1/ ^ \1/ *!/ vt' si/ sVsi/sl/si/st/sl/sl/^/%Lr \L> ^/ si/ \i/ \l/ si/ si/ si/ s^ slf *si/ W ^/ ^ sL" \^ s^ *!/ \^ si/ ^J/ sir ^ s^ si/ s^ s^ \t st st \i/ st \V "J/ si/ "s^ s^ si/ si/ si/ ^ si/ sir -si/ si/ \^ \U Subfile: FY "76 CORRELATION MATRIX TEMP < C ) PRODUCTION DAYS PAYROLL WATER USE PRODUCTION -. ill. 3482 DAYS 1627763 081945 PAYROLL WATER USE 1007200 .8872541 .1113502 2511888 6589095 368011 3820119 GALLONS .2511888 .6589095 0368011 .3820119 .0000000 > u b f i 1 e •Y'77 CORRELATION MATRIX TEMP(C) PRODUCTION DAYS PAYROLL WATER USE PRODUCTION ■0709995 DAYS .6614042 ,4116924 PAYROLL WATER USE .1292917 .9974909 . 3924963 2656162 5754985 209757 5259584 GALLONS .2656162 .5754985 .0209757 .5259584 .0000000 What statistic options are desired ? VARIABLES ALL Option nunber -= '> i~ What subfiles are desired ? 1 ,2 Median, Mode, Percentiles, Min, Max, Range. Compute statistics for all variables Compute statistics for selected subfiles. Both subfiles 35 ft******************************************* * SUMMARY STATISTICS * * ON DATA SET: * * HYPOTHETICAL FACTORY DATA * Subfile: FY '76 ORDER STATISTICS VARIABLE TEMP(C) PRODUCTION DAYS PAYROLL WATER USE GALLONS MAXIMUM 26.30 16716.00000 22. 0000 211.000 00 4936.00000 1304. 09120 MINIMUM 4. 6116. 19 134. 3180. 840. 20000 00000 00000 1560 22 10600 3 77 1756 463 RANGE .10000 00000 00000 93520 MIDRANGE 15 .25000 11416. .00000 20 .50000 172 .50000 4058 .00000 1072 . 12360 TUKEY* HINGES VARIABLE MEDIAN TEMP<C> 19.90000 PRODUCTION 12482. 00 00 DAYS 21 . 00000 PAYROLL 195.50000 WATER USE 3484.50000 GALLONS 920.60490 VARIABLE MIDMEAN TEMP ( C > 18.83333 PRODUCTION 12222.66667 DAYS 20.83333 PAYROLL 193.33333 WATER USE 3522.000 00 GALLONS 930.51240 Other percent!]. e<r,<Y/N>? NO -th X-ile 12.90000 9527 20 175 3308 874 50 00 50 00 50000 10570 TUKEY 'S MIDDLEMEANS 11901 21 192 3S37 934 TRIMEAN 19.17500 8750 00000 8750 70658 75-th 24 13116 22 201 3874 1023 Z-ile 00000 00000 50 000 51080 MI DSP READ 11 $588 2 26 565 149 10 50000 00000 .500 00 40510 36 Subfile: FY '77 ORDER STATISTICS VARIABLE MAXIMUM MINIMUM RANGE MIDRANGE TEMP(C) 27.20 00 4.20000 23 .00000 15 .700 00 PRODUCTION 14056.00000 5736.00 00 8320 .00000 9896 .00000 DAYS 25 .00000 22.00000 3. .00000 23 .50 00 PAYROLL 205.00000 146.00000 59 00000 175 ,500 00 WATER USE 4287. 000 3110 .00000 1177. 3698 .50 00 GALLONS 1132.62540 821.6620 310 96340 977 .14370 TUKEY "S HINGES VARIABLE MEDIAN 25-th Z-ile 75-th Z-ile TEMP(C> 18.40000 17.10000 18 ,40000 PRODUCTION 13108.00000 12173.00000 13108 .00000 DAYS 23.00000 22.00000 23 PAYROLL 194. 00000 191.00000 194 .00000 WATER USE 3390 .00 000 3366.00000 3390 GALLONS 895.6380 889.29720 895 .6380 TUKEY'S MIDDLEMEANS VARIABLE MIDMEAN TRIMEAN MIDSPREAD TEMP(C) 20 .60000 18.07500 1 30 000 PRODUCTION 12864.66667 12874.250 935. DAYS 22 . 66667 22.750 1 . 000 PAYROLL 194.33333 193.250 3. WATER USE 3460 .00000 3384.000 00 24. GALLONS 914. 13200 894.05280 6. 34080 t h e r p e rcen 1 i '1 f ■■■■■ f NO W 1 1 a i '-■ 1 a t i s r i r o p t i. o n *' : a r 9 <:\ t> <■■ i r <• cl ? SELECT ANY KEY Exit basic statistics routine Note: All Basic Statistics for these subfiles could have been obtained more effi- ciently than we demonstrated in this example by responding "ALL" to the above question. 37 Example 2 The data set is from the MINITAB STUDENT HANDBOOK authored by T. Ryan, and B. Joiner and published by the Duxbury Press (1976). The data appeared on page 279. The operation performed on two sets SAMPLE A and SAMPLE B demonstrate the following operations: JOIN, LIST, RECODE, SUBFILE (by variable), STORE, SELECT, and STATS. BASIC STATISTICS AND DATA MANIPULATION [Answer all y e s / n o questions with Y / N ] Are you 3oinsf to use user defined transformation or non-linear regression ? (Y/N) NO Are you using an HPIB Printer? YES Enter select codet bus address (if 7)1 press CONT) ? yy ^ ^ ^ ^ ^ "A' *!/ W ^ W 4 ^ 4 W ^ ^ W 4 *if 4 4 W 4 4 W ^ 4 4' W ^" ^^^^ ^ i^ ^ ^ "A* ^ \lf ^ ^ ^ ^Lf W ^ '^ ^ .__^ .__f .__f ^ 'Jf .A' '__f" .J_f ^ ^ ^ ^ .it *__f '_l_" .ii' .Jf ".If -_k" "i -Jf ,_k" '-if' 4 .__f .t .t "Jf 4 it * DATA MANIPLH._ATT.ON * ^ ^ ^ ^ ^ ^ _^ Jp j^ j^ ^ .^ ^ ^ .^ ^ ' ^ 9 ^ * ^ ^ ^ * * * ^ ^ ^ ^ ^ ^ ^ ^ * ^ * * * ^ * * ^ * ^ * ^ ^ ^ * t* * ^ * * ^ ^ ^ ™ * * * * ^ ^ * * * * * * * * ^* * ^ '^ 'r* ™ Enter DATA TYPE: i Mode nunber ~ f Raw data Data is from mass storage Is data stored on the program's scratch file (DATA)? NO Data file nacie = f GRADEB: INTERNAL Was data stored by the BS&DM system ? YES Is data medium placed in deuice INTERNAL ? The data was stored under the name GRADEB in a different place, so the pro- gram must retrieve it. YES Is program medium placed in correct deuice ? YES PROGRAM NOW STORING DATA ON SCRATCH DATA FILE AND BACKUP FILE SAMPLE B Data file nawe : GRADEB : INTERNAL Data type is: Raw data Nunber of observations: SO Nunber of variables: 3 This data is the second set of 50 student grades (GPA) and scores on the ACT tests (Verb and Math). The data taken from the Minitab Student Handbook on page 279. 38 Var .tab 1« nawes : 1. VERB 2 . MATH 3. GPA Subfiles: HONE SELECT ANY KEY Op t ion nufiber = ',' 1 E n ter Method f () r 1 i s t i n <:t d a t a : 3 Select Special Function Key labeled-LIST List all the data. In tabular form. Data type Raw d a t a SAMPLE B Variable # 1 Variable * 2 Var iab le * 3 (VERB ) (MATH ) (GPA ) OBS# i 500 , 00000 661 ,00000 2,30000 2 460. ,00000 692, 00000 1,4000 3 717, ,00000 672 ,00000 2,80000 4 592, ,00000 441. , 00000 2,40000 5 752, 00000 729 , 00000 3,40000 6 695 ,00000 68.1 .00000 2,50 00 7 610 , 00000 777 ,00000 3.60000 8 620 , 00000 638, 00000 2,60000 9 682, , 00000 701 ,00000 3.60000 10 524 , 00000 700, , 00000 2.90000 ii 552, ,00000 692 ,00000 2.60000 12 703 , 00000 710 ,00000 3,80000 13 584. , 00000 738 ,00000 3. 00000 14 550, , 00000 638 ,00000 2.50000 15 659 ,00000 672 .00000 3,50000 16 585, , 00000 605, 00000 2.00000 17 578. 00000 614 ,00000 3, 00000 18 533, 00000 630, 00000 2,00000 1? 532, 00000 586 ,00000 i ,80000 20 708, ,00000 701, 00000 2.30 00 21 537, 681 ,00000 2.10000 22 635, , 00000 647, 00000 3.00000 23 591 , , 00000 614 , 00000 3,30000 24 552 , 00000 669, 00000 3.00000 25 557 00000 674 ,00000 3.20000 26 599 , 00000 664, , 00000 2,300 00 27 540 00000 658 ,00000 3,30000 28 752, ,00000 737, 00000 3.30000 29 726, 00000 800 , 00000 3.90000 30 630. , 00000 668 . 00000 2.10000 31 558, 00000 567 , 00000 2,60000 32 646, 00000 771. , 00000 2,40000 33 643, 00000 719 , 00000 3,30000 34 606, 00000 755 , 00000 3,10000 35 682, 00000 652, ,00000 3.60000 36 565, 00000 672, 00000 2.9000 37 578, 00000 629 00000 2.40000 38 488, 00000 611 , 00000 1 .80000 39 361 , 00000 602 , 00000 2.40000 40 560, 00000 639, 00000 2.90000 41 630 , 00000 647 ,00000 3.50000 42 666 , 00000 705, 00000 3 .40000 43 719, ,00000 668 ,00000 i... .30000 44 669 ,00000 70 i , 00000 p ,90000 45 57 i. , 00000 647 ,00000 i .800 46 520, ,00000 583. 00000 p .80000 47 57i , ,00000 593 ,00000 p ,30000 48 539, 601., 00000 2 ,50000 49 580 , 630 , 00000 f... .40000 50 629, ,00000 695. 00000 ':> ,90000 39 Option nu fiber ■= '> SELECT ANY KEY JOIN ROUTINE Option nu nber = 1 2 Do you wish to continue with the JOIN procedure ? Title for combined data set (<= 80 characters) = ? TOTAL ACT SCORE/GPA COHPARISON DATA File naMe of data set #2 = ? GRADEA: INTERNAL Is data set #2 MediuM placed in device INTERNAL ? YES Press ' CONTINUE ' Press ' CONTINUE ' Is p ro 9 r am med i um YES when ready to continue when ready to continue placed in device ? Exit List routine. Select Special Function Key labeled-JOIN Choose to add observations. To continue you must have 1. Data Set #1 currently in memory. 2. Data Set #2 previously stored by this program. 3. Total observations times varibles < 1500. 4. Each data set must contain the same number of variables arranged in the same order. This data set (the first set A in the Minitab manual) was previously stored. TOTAL ACT SCORE/GPA COMPARISON DATA Nuttber of variables: 3 NuMber of observations! 1.00 Variable naties: i . VERB 2 . MATH 3. GPA Subfiles: NONE PROGRAM NOW UPDATING SCRATCH DATA FILE Option nu fiber = V SELECT ANY KEY LIST ROUTINE Option nunber = 1 i Enter Method for listing data: 3 The two data sets are combined. That is the second 50 observations are 'attached' to the bottom of the original 50 observa- tions. Exit Join routine Select Special Function Key labeled-LIST List all the data In tabular form 40 TOTAL ACT SCQRF/GPA COMPARISON DATA Data type .is: Raw data Var iab 1 .e * i Var iab 1 .e * 2 Var x able * 3 (VERB ) (MATH ) (GPA > OBS* 1 500 . 00000 661 , ooooo 2,30000 2 460 , 00000 692, ooooo 1 ,40 00 3 717. 00000 672, ooooo 2,80000 4 592, 00000 441 , ooooo 2,40000 5 752, 00000 729, ooooo 3,40000 6 695, 00000 681, ooooo 2,500 00 7 610 . 00000 777, ooooo 3,600 8 620 , 00000 638. ooooo 2,600 00 9 682, 00000 701 , ooooo 3,60000 10 524, 00000 700, ooooo 2,90 00 il 552, 00000 692, ooooo 2,600 12 703, 00000 710, ooooo 3,80000 13 584, 00000 738, ooooo 3, 00000 14 550 . 00000 638 , ooooo 2,50000 IS 659. 00000 672, ooooo 3,50000 16 585, 00000 605, ooooo 2,00000 17 578. 00000 614 ooooo 3,00000 18 533, 00000 630 , ooooo 2.00000 19 532, 00000 586, ooooo 1 .80000 20 708, 00000 701 , ooooo 2,30 00 21 537, 00000 681 . ooooo 2,10000 22 635, 00000 647, ooooo 3,00000 23 591 , 00000 614 , ooooo 3.30000 24 552, 00000 669, ooooo 3.00000 25 557, 00000 674 , ooooo 3.20000 26 599, 00000 664, ooooo 2,30000 27 540 , 00000 658 , ooooo 3.30000 28 752. , 00000 737, ooooo 3,30000 29 726, ,00000 800 .00000 3,90000 30 630, 00000 668, ooooo 2,10000 31 558, 00000 567 , ooooo 2,60000 32 646, 00000 771 , ooooo 2,40000 33 643, 00000 719 ,00000 3,30000 34 606. ,00000 755 ooooo 3,10000 33 682, 00000 652 ,00000 3,60000 36 565 ,00000 672 , ooooo 2,90 00 37 578, , 00000 629 ,00000 2,40000 38 488, , 00000 6.1.1. , ooooo 1 .800 00 39 361. , 00000 602 ,00000 2,40000 40 560 , 00000 639 , ooooo 2,9000 41 630 , 00000 647 ,00000 3.50000 42 666 , 00000 705 , ooooo 3.4000 43 719 ,00000 668 ,00000 2,30000 44 669 ,00000 701 , ooooo 2,9000 45 571 , 00000 647 , ooooo 1 ,80000 46 520 , ooooo 583 ,00000 2 , 8 47 571 , ooooo 593 , ooooo 2,300 48 539 , ooooo 601 ooooo 2,50 4V 580 , ooooo 63 .00000 2.40000 50 629 , ooooo 695 , ooooo 2 , 90000 51 623 ,00000 5 9 ,00000 2,60000 52 454 , ooooo 471 , ooooo 2,30 53 643 ,00000 700 ,00000 2,4000 54 585 ,00000 71.9 , ooooo 3 .000 5 5 719 , ooooo 71.0 ooooo 3 .10 56 693 , ooooo 643 ,00000 2 . 9 57 571 ,00000 665 ooooo 3 ,10 41 58 646, 00000 71.9, 3. 3 59 613, 00000 693 ,00000 p , 3 6 655, 00000 701 , 00000 3, 3 61 662 , 614. 2 6 62 585, 00000 557, 3 3 63 58 , 00000 6 1 1 . o 64 648, 00000 701 , 3, 65 4 05, 61 i . 1 . 90000 66 506, 00000 68.1. , 00000 o 7 6? 669, 653 . 00000 2 68 558 , 50 , 3 , 3 69 577 , 635 , 7> 00000 7 487 , 584, 00000 p ,30 00 71 682, 00000 629 ,00000 3 ,30000 72 565, 00000 624 , 00000 2 ,80 000 73 552, 00000 665 ,00000 1 ,70000 74 567 , 724 00000 2 ,400 00 75 745, 00000 746 ,00000 3 ,40000 76 610, 00000 653 , 2 ,80000 77 493, 00000 605 00000 o ,40000 78 571, 00000 566, 00000 1 ,90000 79 682, 00000 724 , 00000 ':> ,50000 80 600, 677, 00000 p ,30000 81 740 , 729 , 00000 3 ,40000 82 593 ,00000 611 ,00000 p ,80000 83 488, 683 ,00000 1 ,90000 84 526 , 00000 777, 00000 3 ,00000 85 630 ,00000 605 ,00000 3 ,70000 86 586 , 00000 653 , 00000 p ,3000 87 610 , 00000 674 .00000 2 ,90000 88 695 ,00000 634, , 00000 3 .30000 89 539 ,00000 601 ,00000 2 ,10000 90 490 ,00000 701, ,00000 1 ,20 000 91 509, 547 , 00000 3 .30000 92 667 ,00000 753 , 00000 p ,00000 93 597, 00000 652 ,00000 3 ,10 94 662, 00000 664, 00000 2 ,60000 95 566, 00000 664 ,00000 2 ,40000 96 597, 00000 602, 00000 p ,40000 97 604, 00000 557 ,00000 p ,30000 98 519, 529, 00000 3 , 00000 99 643, 00000 715 ,00000 o ,90000 100 606, 00000 593, 00000 3 ,40000 Option nu fiber = : ? SELECT ANY KEY Option number -- "> RECODE ROUTINE Store recoded data in Mar table * (<= 4 ) ? Exit List routine Select Special Function Key labeled-RECODE Recoding using contiguous unequal inter- vals is chosen. Variable nane <<= 10 characters) -- ? Recoded data stored in variable 4. RANKS Nu fiber of the variable to be recoded = f Variable name or label. Recode based on variable 3 (GPA). Nuwber of recoding intervals to be specified <<=20) = ? 42 4 Four intervals Lower 1 iM.it of first interval = '> i , See table below for summary of recoded Upper Unit of interval * 1 - specifications. ■> ?.. For- data falling in interval i. = [ 1. , 2 !> , code ? i !.) p p e r lifi 1. 1 o f 1 1 "i t e r v a 1 I 2 ? for data falling in interval £ - [2,3 ), codi I.J p p e r liM.it of interval # 3 = ? 3 . 5 I- o r d a t a f a 1 1 i n a i n i n t e r v a 1 3 = [ 3 3 . 5 ) . <:: o d i ? 3 U p per 1 i m 1 1 of interval # 4 - 4 For data falling in interval 4 ~ [ 3,5 , 4 ) .. code =■ ? 4 Is above inforfiation correct? YEB Variable * 3 is recoded into 4 categories, and the recoded values are stored in Variable t 4 , where: CATEGORY BOUNDS * OHS LOWER UPPER CODED CODE 1,000 2,000 9 1.000 2,000 3,000 54 2,000 3.000 3,50 29 3,000 3 ,500 4,000 8 4,000 Option nunber =- ? Exit Recode routine. PROGRAM NOW UPDATING SCRATCH DATA FILE SELECT ANY KEY LIST ROUTINE Summary: Note that upper limit is not closed but open. That is a value of 3.5 would be recoded as a 4. Option nunber Select Special Function Key labeled-LIST 1 List all the data. Enter Method for listing data: 3 In tabular form. TOTAL ACT SCORE/GPA COMPARISON DATA Data type is: Raw data 43 Variable * 1 Variable ♦ 2 Variable ♦ 3 Variable # 4 (VERB ) (MATH ) (GPA ) (RANKS ) OBS# 1 500 ,00000 661.00000 2.30000 2.00000 2 460. 00000 692, 00 00 1 .400 00 1,00000 3 717.00000 672,00000 2.80000 2.00000 4 592.00000 441 .00000 2,40000 2,00000 5 752.00000 729.000 00 3,40000 3, 00000 6 695.00000 681 , 00000 2.50000 2,00000 7 610.00000 777.00000 3,60000 4,00000 8 620 .00000 638,00000 2.60000 2.00000 9 682. 00000 701 , 000 00 3.60000 4.00000 10 524, 00000 700, 00000 2.90000 2.00000 li 552. 00000 692.00000 2.60000 2,00000 12 703,00000 710.00000 3.80000 4,00000 13 584, 00000 738,00000 3,00000 3,00000 14 550. 00000 638, 000 2,50000 2 ,00000 15 659, 00000 672,00000 3,50000 4.00000 16 585, 00000 605. 00000 2, 00 00 2,00000 17 578. 000 00 614.00000 3, 00000 3,00000 18 533.00000 630 , 00000 2,00000 2.00000 19 532,000 00 586.00000 1 ,80000 1 , 00000 20 708, 00000 701.00000 2,300 00 2.00000 21 537, 00000 681 .0 0000 2,10000 2.000 22 635,00000 647. 00 00 3. 00000 3,0000 23 591 , 00000 614,00000 3,30000 3,00000 24 552.00000 669, 00000 3, 00000 3,00000 25 557.00000 674.00000 3.20000 3,00000 26 599.00000 664. 00000 2,300 00 2.00000 27 540,00000 658,00000 3,30000 3.00000 28 752.00000 737.0 000 3.30000 3,00000 29 726.00000 800 .00000 3,90000 4,00000 30 630, 00000 668, 00 000 2.10000 2,00000 31 558.00000 567,00000 2.60000 2,00000 32 646,00000 771 ,00000 2,40000 2.00000 33 643.00000 719,00000 3,30000 3, 00000 34 606,00000 755,00000 3.10000 3.00000 35 682.00 00 652,00000 3.60000 4,00000 36 565, 00000 672,00 000 2.90 00 2, 00000 37 578.00000 629,00000 2.40000 2.00000 38 488. 00000 611.00000 1 .80000 1,0000 39 361 .00000 6 02,00000 2.40000 2,00000 40 560,00000 639,00000 2,90000 2,0000 41 630 . 00000 647,00000 3,50000 4, 00000 42 666.00000 705.00000 3,40000 3.00000 43 719.00000 668,00000 2,30000 2.00000 44 669.00000 7 01,00000 2,90000 2.00000 45 571 .00000 647.00000 1,80000 1,00000 46 520.00000 583.00000 2,80000 2.00000 47 571.00000 593.00000 2.30000 2,00000 48 539. 00000 601, 000 00 2.50000 2,00000 49 580.00000 630.00000 2.40000 2.00000 50 629.00000 695,00000 2.90000 2.00000 51 623,00000 509,00000 2,60000 2.00000 52 454, 00000 471 , 00000 2,30000 2,00000 53 643.00000 700,00000 2.40000 2,00000 54 585.00000 719,00000 3.00000 3, 00000 55 719,00000 710 ,00000 3,10000 3, 00000 56 693.00000 643,00000 2.90000 2, 00000 57 571 .00000 665,00000 3. 10000 3.00000 58 646.00000 719.00000 3.30000 3,00000 44 59 613.00000 693, 00000 2,30000 o 00000 60 655.00000 701 , 00000 3.30000 3. 00000 61 662,00000 614. 00000 2.60000 2, 00000 62 585.00000 557. 00000 3.30000 3, 00000 63 580,00000 611 . 00000 2.00000 ° 00000 64 648.00000 701 . 00000 3,00000 3. 00000 65 405.00000 611 . 00000 1,90000 i , 00000 66 506.00000 681 . 00000 2.70000 2. 00000 67 669, 00000 653, 00000 2,00000 P 00000 68 558.00000 500, 00000 3.30000 3 , 69 577, 00000 635, 00000 2,00000 ? 00000 70 487. 00000 584, 00000 2.30000 P 00000 71 682,0 0000 629, 00000 3.30000 3, 00000 72 565, 00000 624, 00000 2.80000 ? 73 552. 0000 665, 00000 1 ,70000 i , 74 567,00000 724, 00000 2,40000 P 75 745,00000 746 , 3,40000 3. 00000 76 610.00000 653 00000 2.80000 P 77 493.00000 605 00000 2,40000 P , 00000 78 571.00000 566, 00000 1 ,90000 .1 , 79 682. 00000 724 00000 2,50000 2, , 00000 80 600 , 00000 677 00000 2,30000 P 00000 81 740 , 00000 7?9 ,00000 3,40000 3. , 00000 82 593, 00000 611 , 00000 2,80 00 P 83 488.00000 683 . 00000 1,90000 .1 , 00000 84 526, 00000 777 ,00000 3. 00000 3 ,00000 85 630, 00000 605 ,00000 3,70000 4 , 00000 86 586, 00000 653 , 00000 2.30000 ■;:> , 87 610 . 00000 674 ,00000 2,90000 o . 00000 88 695 ,00000 634 ,00000 3,30000 3 .00000 89 539, 0000 601 .00000 2,10000 p ,00000 90 490 . 00000 701. , 00000 1 ,20 00 1. ,00000 91 509,00000 547 . noooo 3,300 3 ,00000 92 667. 00000 753 ,00000 2,0000 T> ,000 93 597, 00000 652 ,00000 3,10 3 ,00000 94 662.000 00 664 , 00000 2,60000 P .00000 95 566,00000 664 , 00000 2,40000 p , 00000 96 597, 00000 602 , 00000 2,40000 P .00000 97 604, 00000 557 ,00000 2 ,30000 n .00000 98 519, 00000 529 , 00000 3,00000 3 ,00000 99 643,00000 715 .00000 2,90000 2 ,00000 100 606.00000 593 , 00000 3,40000 3 , 00000 Option nuMber = SELECT ANY KEY Option nunber == '> SUBFILE ROUTINES Exit List routine Select Special Function Key labeled-SUBFILES Choose to create subfile by values of a 3 variable. U h i c: h variable s h o u I. d be u s e d t o <:: r e a t e t h e s, u b file s ? Enter variable no. to be used in creating 4 subfiles. Criterion value = 1 Enter nane for subfile 1 (<~10 characters) ? POOR Criterion value = 2 Enter narie for subfile 2 (<-"i0 characters) ? AVERAGE Criterion value - 3 Enter narie for subfile 3 (<-10 characters) ? GOOD Criterion value =■■ 4 Enter none for subfile 4 (<-10 characters) 45 EXCELLENT Is the above information correct ? YES Subfile: niM* : beginning observation — nunber of observations i . POOR 2. AVERAGE 3 . GOOD 4. EXCELLENT Option number = ? PROGRAM NOW STORING DATA SELECT ANY KEY Option nuMber = V i Enter Method for listing data 3 1 5 64 93 LIST ROUTINE 9 54 8 Exit Subfile routine Select Special Function Key labeled-LIST List all the data In tabular form TOTAL ACT PCORE/GPA COMPARISON DATA Data type is: Raw data i Data is again listed b ranged on the basis < Variable # 1 Variable # 2 Variable * 3 Variable * 4 (VERB ) (MATH ) (GPA ) (RANKS > OBS# i 460 .00000 692. 0000 1.40000 1,00000 o 532.000 00 586.00000 1 ,80000 1 ,000 3 488, 00000 611,00000 1.80000 1.00000 4 571 .00000 647, 00000 1 ,800 00 .1. ,000 5 405.00000 611.00000 1 ,900 1,00000 6 552, 00000 665.0 000 1 ,70000 1,0000 7 571 .0 0000 566,000 1 .90 00 1,00000 8 488. 00000 683.00000 1 ,90000 1. , 9 490 .0 00 00 701 ,0000 1,20000 1.0000 10 500. 00000 661 ,000 2,30 00 2.00000 11 717,00000 672,000 00 2,80000 2,00000 12 592.00000 441 , 00000 2.40000 2.00000 13 695.00000 681 ,0 00 00 2.50000 2.00000 14 620.00000 638,00000 2,60000 2 .00000 IS 524,0 00 00 700.0000 2,90000 2.00000 16 552.00000 692. 00000 2.60000 2.000 17 550 , 00000 638,00000 2,50000 2,00000 18 585.000 00 605, 00000 2.00000 2,00000 19 533.00000 630 .00000 2, 00000 2,00000 20 708, 00000 701 ,00000 2.30000 2,00000 21 537. 0000 681 .00000 2.10000 2,00000 22 599. 00 00 664, 0000 2.30000 2.00000 23 630. 00000 668,00000 2. 10000 2,00000 24 558.0 00 00 567,00000 2,60000 2,00000 25 646, 00000 771 .000 00 2,40000 2,00000 26 565, 00000 672,00000 2,90000 2,00000 27 578, 00000 629,00000 2.40000 2,00000 28 361 , 00000 602. 00000 2.40000 2,00000 29 560 .00000 639,00000 2.90000 2,00000 30 719. 00000 668. 00000 2.30000 2. 00000 31 669.00000 701 . 00000 2.90000 2. 00000 32 520.00000 583.00000 2.80000 2. 00000 33 571.00000 593.00000 2.30000 2.00000 34 539. 00000 601 ,00000 2.50000 2.00000 46 35 580 , 630 . 00000 2,40000 p 00000 36 629. 00000 695. 2,90000 37 623. 00000 509, 00000 2,60000 p ,00000 38 454, 00000 471 , 00000 2,30000 p 39 643. 70 , 00000 2,40000 '■> ,00000 4 693, 00000 643, 00000 2,90 00 p .00000 41 613. 693, 00000 2,30000 o ,00000 42 662, 00000 614, 00000 2,60000 ,00000 43 580 , 00000 6 1 i . 00000 2. 00000 O 00000 44 506. 00000 681 , 00000 2,7000 p 45 669. 653, 00000 2, 00000 p 00 00 46 577. 00000 635, 00000 2, 00000 p t 47 487. 00000 584, 00000 2,30000 p | 48 565. 00000 624 . 00000 2,80 00 p 4? 567. 00000 724, 00000 2,40000 p 00000 5 610, 653, 00000 2,80000 p 51 493, 00000 605, 00000 2,40000 2 00000 52 682, 00000 724, 2,50000 p .00000 53 600 , 00000 677, 2,300 2 , , 00000 54 593 00000 611 , 2.80000 2. , 00000 55 586, 00000 653, 00000 2.30000 ,00000 56 610, 00000 674, 00000 2,90000 2 ,00000 57 539, 00000 601 , 00000 2,10000 2, ,00000 58 667, 00000 753, 00000 2,00000 59 662, 00000 664, ,00000 2,60000 2 , 00000 6 566, 664, 00000 2,40 00 p 61 597, 602, 00000 2,40000 ,00000 62 604, 00000 557, 00000 2,30000 2 63 643 , 00000 715, 00000 2,90000 2 64 752 , 00000 729 , 3,40000 3, 65 584, 738, 3.00000 3, ,00000 66 578, 00000 614, 00000 3,00000 3 .00000 67 635, 00000 647 . ,00000 3.00000 3 ,00000 68 591 , 00000 614, ,00000 3.30 00 3 .00000 69 552 669 ,00000 3.00000 3 , 00000 7 557 , 00000 674, ,00000 3,200 00 3 00000 71 540, 00000 658, , 00000 3.30000 3 ,00000 72. 752 , 00000 737 00000 3,30000 3 , 00000 73 643 719 ,00000 3,30000 3 ,00000 74 606 , 00000 755 , 00000 3,10000 3 , 00000 75 666 00000 705 ,00000 3,40000 3 ,00000 76 585 , 00000 719, 3 , 00000 3 ,00000 77 719 7 1 , ,00000 3,10000 3 ,00000 78 57.1 , 00000 665, 00000 3,10 3 ,00000 7 9 646, 719 ,00000 3,30000 3 , 00000 80 655 , 00000 701 , 00000 3,30 00 3 .00000 8 1 585, 557, ,00000 3,30000 J , .00000 82 648 , 00000 701 , 3.00000 3 ,00000 83 558 ,00000 50 , 00000 3.30000 '*, 84 682 ,00000 629 ,00000 3.30000 3 , 85 745 ,00000 746 ,00000 3 ,40000 3 86 740 ,00000 729 ,00000 3,400 3 ,000 87 526 .00000 777 ,00000 3 ,00000 3 .00000 88 695 . 00000 634 ,00000 3,30000 3 .00000 89 509 ,00000 547 ,00000 3,30000 3 . 00000 90 597 . 00000 652 ,00000 3,10000 y f . 00000 91 519 00000 529 ,00000 3, 00000 3 ,00000 92 606 , 00000 593 , 00000 3,40000 3 ,00000 93 610 . 00000 777 .00000 3.60000 4 ,00000 94 682 ,00000 701 ,00000 3.60000 4 .00000 95 703 , 00000 710 .00000 3.80000 4 , 00000 96 659 ,00000 672 . 00000 3,50000 4 ,00000 97 726 , 00000 800 .00000 3,90000 4 , 00000 98 682 . 00000 652 , 00000 3,60000 4 , 00000 99 630 . 00000 647 ,00000 3.50000 4 ,00000 100 630 . 00000 605 .00000 3.70000 4 , 00000 47 STORE ROUTINE Option nunber =- ? SELECT ANY KEY Enter option nunber desired : 1 Norte of data file = ? TGRADE: INTERNAL Is data medium placed in device ? 7 YES PROGRAM NOW STORING DATA ON TGRADE : INTERNAL Exit List routine Select Special Function Key labeled-STORE Store the complete set of data. On this file. * * * * * The data and related information are stored in TGRADE : INTERNAL # * # * * Is program medium placed in device ? YES Enter option nu fiber desired SELECTION ROUTINES SELECT ANY KEY Choose option desired : Choose option desired SELECTION BASED ON ONE VARIABLE Which variable should be used "> Criterion variable = i (VERB) What values can the criterion variable take ? 550-80 Allowable values : 550-800 Which subfiles do you want to be selected f Exit Store routine. Choose Special Function Key labeled-SELECT Select choosen instead of Scan. Choose to Select on basis of value of just one variable. Variable 1 = Verb Select those cases for which Verb is be- tween 550 and 800. ALL SUBFILES TO BE SELECTED ■ ALL For both subfiles. BSERVi ftTIONS SAT I SI -YING SELECT ION CR ITERIC IN = 3 5 9 10 12 14 IS 16 17 19 20 22 23 24 25 26 28 29 30 31 32 45 58 33 47 59 35 49 61 36 50 62 37 51 63 38 53 64 40 54 65 42 55 67 43 56 68 44 57 69 These observations meet the criteria. 70 7 5. 72 73 74 75 76 78 79 8 i) 81 82 84 85 86 87 88 89 90 91 93 94 95 96 97 98 99 10 48 SUBFILE POOR AVERAGE GOOD EXCELLENT BEFORE SELECTION AFTER SELECTION NIJM OF OBS NUM OF OBS 9 54 29 8 3 42 25 8 PROGRAM NOW UPDATING SCRATCH DATA FILE Choose option desired : SELECT ANY KEY STATS ROUTINE What statistic, options are desired ? VARIABLES' ALL The Selection routine saves only those observations whose verbal score was be- tween 550 - 800. The rest of the observa- tions are discarded from the program memory. Exit Select routine. Select Special Function Key labeled-STATS Mean, CI, Variance, Standard Deviation, Skewness, Kurtosis. Statistics will be computed for all variables. Confidence coefficient for confidence interval on the nean(e,q, 90,95, 331) - : 95 Option nu fiber = ? Uliat subfiles '.ire desired ? 1-4 With a 95% coefficient. Complete statistics for specified subfiles. All subfiles * SUMMARY STATISTICS * * ON DATA SET: * * TOTAL ACT SCORE/GPA COMPARISON DATA * Subfile: POOR BASIC STATISTICS VARIABLE # OF # OF NAME OBS. MISS SUM MEAN VARIANCE STD . DEV. VERB 3 1694 .0000 564 6667 120.3333 10.9697 MATH 3 1878 0000 626 0000 2781.0000 52 . 7352 GPA 3 5 .4000 1 8000 .0100 .1000 RANKS 3 3. 0000 i. 0000 0.0000 0.0000 VARIABLE COEFFICIENT STD , ERROR 95 X CONFIDE NCE INTERVAL NAME OF VARIATION OF MEAN LOWER LIMIT UPPER LIMIT VERB 1.94268 6.33333 537.60540 591.72793 MATH 8.42415 30 .44667 495.90649 756.09351 GPA 5.55556 .05774 1.55331 2.04669 RANKS 0.00000 0.00000 1.00000 1.00000 49 VARIABLE VERB MATH GPA RANKS SKEWNESS KURTOSIS -.70711 -.61556 0.00000 -1.50000 -1.50000 -1.50000 Subfile: AVERAGE BASIC STATISTICS VARIABLE * OF * OF NAME OBS. MISS SUM MEAN VARIANCE STD .DEV. VERB 42 25935 0000 617. 5000 2382.4024 48.8099 MATH 42 27318 0000 650. .4286 3694.4460 60.7820 GPA 42 104. 3000 2 .4833 . 0814 .2853 RANKS 42 84 0000 2. 0000 0.0000 0.0000 VARIABLE COEFFICIENT STD. ERROR 95 X CONFIDE NCE INTERVAL NAME OF VARIATION OF MEAN LOWER LIMIT UPPER LIMIT VERB 7.90443 7.53152 602.28627 632.71373 MATH 9.34491 9 . 37886 631.48322 669.37393 GPA 11.49047 .04403 2.39439 2.57227 RANKS 0.00000 0.00000 2.00000 2.00000 VARIABLE SKEUNESS KURTOSIS VERB MATH GPA RANKS .54518 -1.03447 -.03388 -.82101 2.32038 -.90383 Subfile: GOOD BASIC STATISTICS VARIABLE # OF * OF NAME OBS. MISS SUM MEAN VARIANCE STD .DEV. VERB 25 15948. 0000 637 9200 4324.1600 65.7583 MATH 25 16856. 0000 674. 2400 4096.6067 64.0047 GPA 25 80 .3000 3 .2120 .0236 .1536 RANKS 25 75. .0000 3 0000 0.0000 0.0000 50 VARIABLE CO EFFICIENT STD . ERROR 95 % CONFIDENCE INTERVAL NAME CO- VARIATION OF MEAN LOWER LIMIT UPPER LIMIT VERB 10.30824 13.15167 610.76982 665.07018 MATH 9.49287 12.80095 647.81385 700.66615 GPA 4 . 78278 .03072 3.14857 3.27543 RANKS 0. 00000 0.00000 3.00000 3.00000 VARIABLE VERB MATH GPA RANKS SKEWNESS .48079 -.96523 -.27487 KURTOSIS -1.04529 .42114 -1.47768 Subfile: EXCELLENT BASIC STATISTICS VARIABLE * OF # OF NAME OBS. MISS SUM MEAN VARIANCE STD.DEV. VERB 8 5322 .0000 665.2500 1607.6429 40.0954 MATH 8 5564 0000 695.5000 4398.5714 66.3217 GPA 8 29 .2000 3.6500 .0200 . 1414 RANKS 8 32. 0000 4.0000 0.0000 0.0000 VARIABLE COEFFICIENT STD. ERROR 95 X CONFIDENCE INTERVAL NAME OF VARIATION OF MEAN LOWER LIMIT UPPER LIMIT VERB 6.02712 14.17587 631.72037 698.77963 MATH 9.53583 23 . 44827 640.03874 750.96126 GPA 3.87456 .05000 3.53174 3.76826 RANKS 0.00000 0.00000 4.00000 4.00000 VARIABLE VERB MATH GPA RANKS SKEWNESS .07320 . 38485 .64794 KURTOSIS -1.21757 -.97545 -.77551 What statistic options are desired V 2 VARIABLES'* 7 ALL Option nunber ;=; ? 2 What subfiles are desired ? 1-4 Correlation matrix Statistics completed for all variables Compute statistics for specified subfiles All subfiles 51 * SUMMARY STATISTICS * * ON DATA SET= * * TOTAL ACT SCORE/GPA COMPARISON DATA * IK************************************************ Subfile: POOR CORRELATION MATRIX MATH GPA RANKS VERB -.6404640 .8660254 MATH -.9386522 GPA Subfile: AVERAGE CORRELATION MATRIX MATH GPA RANKS VERB .3530502 .0440427 MATH .0482350 GPA Subfile: GOOD CORRELATION MATRIX MATH GPA RANKS VERB .4981619 .5173239 MATH -.0706494 GPA Subfile: EXCELLENT CORRELATION MATRIX MATH GPA RANKS VERB .3654701 .6651140 MATH .4934875 GPA What statistic options are desired ? Median mode, percentiles, Min., Max., 3 Range VARIABLES= ? Statistics computed for all variables ALL Option noMber == ? Compute Statistics for specified subfiles What subfiles are desired ? All subfiles 1-4 52 ************************************************** * SUMMARY STATISTICS * * ON DATA SET: * * TOTAL ACT SCORE/GPA COMPARISON DATA * ******************************************************************************** Subfile: POOR ORDER STATISTICS VARIABLE VERB MATH GPA RANKS MAXIMUM 571.00000 665.00000 1.90000 1.00000 MINIMUM 552.00000 566.00000 1.70000 1.00000 RANGE 19.00000 99.00000 .20000 0.00000 MIDRANGE 561.50000 615.50000 1.80000 1 .00000 TUKEY'S HINGES VARIABLE VERB MATH GPA RANKS MEDIAN 571.00000 647.00000 1.80000 1.00000 25-th %-ile 552.00000 566.00000 1.70000 1.00000 75-th %-ile 571.00000 647.00000 1.80000 1.00000 TUKEY'S MIDDLEMEANS VARIABLE VERB MATH GPA RANKS MIDMEAN 564.66667 626.00000 1.80000 1.00000 TRIMEAN 566.25000 626.75000 1.77500 1 .00000 MIDSPREAD 19.00000 81.00000 .10000 0.00000 Other percentiles NO (Y/N>? Subfile: AVERAGE ORDER STATISTICS VARIABLE MAXIMUM MINIMUM RANGE MIDRANGE VERB 719.00000 550.00000 169.00000 634.50000 MATH 771.00000 441.00000 330.00000 606.00000 GPA 2.90000 2.00000 .90000 2.45000 RANKS 2.00000 2.00000 0.00000 2.00000 TUKEY 'S HINGES VARIABLE MEDIAN 25-th %-ile 75-th %-ile VERB 607.00000 578.00000 646 .00000 MATH 658.50000 624.00000 681 .00000 GPA 2.40000 2.30000 2 .60000 RANKS 2.00000 2.00000 2 .00000 53 TUKEY'S MIDDLEMEANS VARIABLE MIDMEAN TRIMEAN MIDSPREAD VERB 610.13636 609.50000 68.00000 MATH 655.95455 655.50000 57.00000 GPA 2.46818 2.42500 .30000 RANKS 2.00000 2.00000 0.00000 Other percentiles(Y/N)? NO Subfile: GOOD ORDER STATISTICS VARIABLE MAXIMUM MINIMUM RANGE MIDRANGE VERB 752.00 00 552.00000 2 00.00000 652.00000 MATH 755.00000 500.00000 2' 55.00000 627.50000 GPA 3.40000 3.00000 .40000 3.20000 RANKS 3.00000 3.00000 0.00000 3.00000 TUKEY 'S HINGES VARIABLE MEDIAN 25-th Z~ile 75-th X-ile VERB 635.00000 585.00000 666.00000 MATH 701.00000 634.000 00 719.00000 GPA 3.30000 3.10000 3.30000 RANKS 3.00000 3.00000 3.00000 TUKEY'S MIDDLEMEANS VARIABLE MIDMEAN TRIMEAN MIDSPREAD VERB 626.53846 630.25000 81.00000 MATH 685.76923 688.75000 85.00000 GPA 3.23077 3.25000 .20000 RANKS 3.00000 3.00000 0.00000 Other percentiles <Y/N>? NO Subfile: EXCELLENT ORDER STATISTICS VARIABLE MAXIMUM MINIMUM RANGE MIDRANGE VERB 726.00000 610.00000 116.00000 668.00000 MATH 800.00000 605.00000 195.00000 702.50000 GPA 3.90000 3.50000 .40000 3.70000 RANKS 4.00000 4.00000 0.00000 4.00000 54 TUKEY'S HINGES VARIABLE VERB MATH GPA RANKS MEDIAN 670.50000 686.50000 3.60000 4.00000 25-th X-ile 630.00000 649.50000 3.55000 4.00000 75-th %-ile 692.50000 743.50000 3.75000 4.00000 TUKEY'S MIDDLEMEANS VARIABLE MIDMEAN TRIMEAN MIDSPREAD VERB 663.25000 665.87500 62.50000 MATH 683.75000 691.50000 94.00000 GPA 3.62500 3.62500 .20000 RANKS 4.00000 4.00000 0.00000 Other percent iles(Y/N>? NO What statistic: options are desired ? Exit Basic Statistics routine SELECT ANY KEY 55 Regression Analysis General Information Description The Regression Analysis software provides you with five routines to perform various types of linear and non-linear regressions. The regression routines include: • Multiple Linear Regression • Polynomial Regression • Variable Selection Procedures (Stepwise algorithm, etc.) • Non-linear Regression • Standard Non-linear Regression Models In addition, a residual analysis module is included which will be helpful in judging the quality of the chosen regression model. Brief desciptions of each regression routine follow. The multiple linear regression routine performs a least-squares regression on a set of predeter- mined variables. The variable selection procedures perform least-square regressions iteratively on sets of vari- ables which are determined by one of four selection procedures - stepwise, forward selection, backward elimination, or manual. These selection procedures are helpful in determining which of the independent variables are "important" in predicting the behavior of the depen- dent variable. The polynomial regression routine is a special case of the multiple linear regression procedure where the independent variables are actually powers of a single variable. In other words, the form of the regression model is: Y = BO + B1*(X) + B2*(X|2) + ... + Bp*(X|p), where Y is the dependent variable, X is the independent variable, and Bl, ..., Bp are the regression coefficients. A routine is also provided so you can plot the X-Y data along with the regression curve. The non-linear regression routine allows you to determine the coefficients of virtually any model you wish to specify. It is more difficult to use than the multiple linear regression routines; however, its use is mandatory when the model is non-linear in the regression coefficients. An example of this is the model: Y = Bl(Exp)B2*Xl + B3*X2), where Exp is the exponential function. A plotting routine is provided so you can plot any variable versus the dependent variable. If the model has only one independent variable, the regression curve can also be plotted. 56 The routines referred to as "standard" non-linear regressions determine the regression coeffi- cients for the following four types of common non-linear regression models: • Y = A*XTB + C • Y = A*Exp(BX) + C • Y = A*Exp(BX) + OExp(DX) + E • Y = A*Sin(BX) + OCos(DX) + E Also provided is a routine to plot the data along with the computed regression curve. All of the regression programs provide an analysis of variance table, correlations, and the regression coefficients, as well as their standard errors. The residual analysis routine provides a list of the residuals as well as a plot of the standar- dized residuals versus observation number or any variable. Typical Program Flow Enter data via BSDM ■ Select Advanced Statistics option . Choose type of regression routine Specify model Obtain regression output Obtain table of residuals ■ Plot residuals 57 Special Considerations Terminology By an independent variable we mean a variable that can be set to a desired value (for exam- ple, input temperature or catalyst feed rate in a chemical reaction), or values that can be observed but not controlled (for example, the outdoor humidity). As a result of changes in one or more independent variables, the dependent variable will be affected. For example, the purity of a chemical product may be affected by temperature and the catalyst feed rate. In a simple linear regression: Y = BO + B1*X, Y is the dependent variable, and X is the independent variable, while BO and Bl are the regression coefficients. Data Structure Data is input via the Basic Statistics and Data Manipulation routines. You need to tell the regression routine the number of the BSDM variable which you want to be your dependent variable. In general, you tell the routine how many independent variables are in your regression model. Then, you specify the BSDM variable numbers which you want to be your independent variables. For example, suppose you input 10 variables in the BSDM procedure. You might specify that variable #4 is your dependent variable and that you want to have five independent variables. You then might specify the independent variables as BSDM variables #2, #3, #5, #7, and #9. If you specify subfiles with the BSDM procedure, you may perform regressions on individual subfiles. Note Non-Linear Regression You will have to create a file which contains the function and partial derivatives before you get into the program. The steps in- volved are shown on page 69. 58 Multiple Linear Regression Object of Program This routine is designed to calculate a least-squares multiple linear regression on a predeter- mined set of variables. The general form -of the regression model is: Y = BO + B1X1 + B2X2 + ... + BpXp + Error where Y is the dependent variable, XI, X2, ..., Xp are the independent variables and BO, Bl, ..., Bp are the regression coefficients. Several basic statistics, as well as the correlation matrix, are output. An analysis of variance table is printed. The regression coefficients and their standard errors are output and confi- dence intervals are constructed about them. Output along with each regression coefficient is an associated t-value. This statistic is used to test if the regression coefficient is significantly different from zero, i.e., if the term is useful in the model. In addition, the regression equation may be used for predictions and a residual analysis may be performed. Typical Program Flow Input data via BSDM Edit, transform, and list data.Obtain basic statistics Select Advanced Statistics option Select MLR routine Specify subfile and variables to analyze Calculate correlation matrix, R-squared, and standard error of estimate Obtain AOV table Obtain confidence intervals on parameter estimates Obtain residual analysis 59 Special Considerations Method of Computing Sums of Squares and Cross Products Matrix If a data value is missing for one or more variables, the entire observation is deleted, i.e., not used in computing the sums of squares and cross products matrix (or correlations). Consider the following matrix where missing values are denoted by an M. Variable 1 2 3 1 M 3 2 2 1 3 4 Observation 3 2 2 3 4 M 4 M 5 1 3 3 Observation 1 is deleted since the data value is missing for variable 1 and observation 4 is deleted since the data value is missing for variables 1 and 3. Hence, only obervations 2, 3, and 5 will be used to compute the sums of squares and cross products matrix, as well as the correlations. Constant Term In the output of the regression coefficients, the term labeled "Constant" refers to the intercept or initial value when all the independent variables are zero. This constant term corresponds to the BO term in the general form of the model shown in the Object of Program section. Transforming Variables After you input your data via Basic Statistics and Data Manipulation, you can use the trans- formation routine to create new variables. The transformation routine has several predefined functions which will allow you to create transgenerated regression variables. Refer to the Basic Statistics and Data Manipulation section for further details on transforming variables. Additional Sum of Squares in AOV Table In the analysis of variance table, you will see that the degrees of freedom and the sum of squares of regression are dividied into several parts, each with one degree of freedom. For example, suppose a regression problem has three independent variables, say XI, X2, and X3. You will notice that these three variables are listed below the "regression" term in the AOV table, and that each has one degree of freedom. See the sample problem on page 25. The meaning for the XI line is as follows. We first consider only XI in the regression model and from the sum of squares we can tell how much of the variation of the dependent variable is explained by introducing XI into the model. The meaning for the X2 line is as follows. Given that XI is in the model, if we introduce X2 into the model we can see how much additional variation is explained by X2. Then, in the X3 line, we suppose XI and X2 are already in the model. The sum of squares shows how much additional variation is explained by adding X3 to the regression model. The total degrees of freedom of the independent variables are equal to the regression degrees of freedom. The sum of squares of the indepen- dent variables will also add up to the sum of squares for regression. 60 Methods and Formulae The Cholesky square-root method is used to factor the sum of squares and cross products matrix. It is felt that this method produces less round off error than other inversion techniques. This method, as well as all other methods and formulae used may be found in F.A. Graybill's Theory and Application of the Linear Model, Chapters 7 and 10. Stepwise Regression (Variable Selection Procedures) Object of Program This program allows a regression model to be built iteratively using one of four variable selec- tion procedures. The procedures- are stepwise, forward, backward, and manual. A correlation matrix is calculated and output. An analysis of variance table, as well as partial correlations, F values for deletion and inclusion, and the regression coefficients are output at each step of the regression. In addition, a residual analysis may be performed. The four selection procedures operate as follows: Stepwise You specify an F-to-enter and an F-to-delete, and the program begins with no variables in the regression model. If any of the variables have an F value larger than the F-to-enter, then the variable with the largest F value is entered into the model. This process is repeated with the remaining variables. At this point, the F values of the variables in the model are compared with the F-to-delete. If a variable has a smaller F value than the F-to-delete, it is removed from the model. This process of adding and deleting variables continues until all the variables in the model have F values larger than the F-to-delete and all the variables not in the model have F values smaller than the F-to-enter, or until the tolerance value becomes too small. A small tolerance value signals that the matrix has become unstable. Forward Selection You input an F-to-enter. The program operates in the same manner as the stepwise selection procedure, except that variables are not deleted. The process continues until all variables not in the model have F values smaller than the F-to-enter, or until the tolerance value becomes too small. Backward Elimination You input an F-to-delete and the program begins with all the variables in the model. If any variable has an F value smaller than the F-to-delete, then that variable with the smallest F value is deleted from the model. This process continues until all the variables in the model have F values larger than the F-to-delete or until the tolerance value becomes too small. 61 Manual Selection As the name implies, variables are added or deleted manually until you are satisfied with the model. Typical Program Flow Input data via BSDM ■ Select Advanced Statistics option Select Stepwise routine Specify variables to be in regression ■ Choose selection method . Specify control parameters such as F to enter Variable selection is performed ■ Residual analysis Special Considerations F Values Insufficient for Further Computation If one of the stepwise, forward, or backward procedures is used in the selection of variables, the program will proceed automatically by entering and/or removing variables from the model until the F values are not exceeded or until the tolerance value is not met. At this point the program reverts to the manual mode. So, for example, this allows you to enter a variable whose F value is just slightly less than the specified F-to-enter. 62 Methods of Computing Correlations Two methods of computing correlations are available. The first method will use an observa- tion only if data values are present for each variable. The second method uses all possible data values to compute each correlation. If no missing values are present, method two should be used to speed computation. A simple example will show the difference between the two methods. Suppose we have the following data set: Variable 1 2 3 1 2 3 M 2 3 2 4 3 1 3 5 4 M 1 4 Observations If method one is used to compute the correlations, only observations 2 and 3 will be used. Observation 1 will be deleted entirely since the data value is missing for variable 3. Similarly, observation 4 will be deleted entirely since the data value is missing for variable 1. Conversely, suppose method two is chosen. The correlation between variables 1 and 2 will be computed using the data values of observations 1, 2, and 3. The correlation between vari- ables 1 and 3 will use the data values associated with observations 2 and 3. Similarly, the correlation between variables 2 and 3 will use the data values associated with observations 2, 3, and 4. Hence, data values from a given observation are used if the data points are present for the two variables under consideration. The observations used to compute AOV table are the same as those used to get the correla- tions. F-to-enter, F-to-delete A variable must have an F value which is greater than the value of F-to-enter for entry into the regression model via the stepwise or forward selection procedures. A typical value is 4. A variable may be deleted from the regression via the stepwise or backward selection proce- dures only if its F value is less than the value of F-to-delete. When using the stepwise proce- dure, you must have F-to-enter > = F-to-delete. The F-to-enter should be selected from tabled values for your desired significance level with 1 and n-v degrees of freedom, where n is the number of observations and v is the number of variables in the regression. Since you don't know how many variables will be in the regression a priori, you might guess the number of variables which will end up in the regression for your initial analysis. 63 Tolerance Value You will be asked to enter a tolerance value. Your input must be between and 1. The tolerance value is a scaled function of the determinant of the X'X matrix, and is a measure of the stability of the correlation matrix. If a variable not in the equation is linearly dependent on one of more of the variables already in the model, then the correlation matrix will have a determinant of zero. So, if the computed tolerance value gets too small, this might suggest a singular matrix. A suggested value for the tolerance is .01. Reading the Output In the algorithm, one variable will be entered or deleted per step. The variables currently included in the regression model are printed on the left side of the table. The variables which are not currently included in the model are printed on the right side of the table. Partial Correlation The partial correlations of the variables not currently in the regression equation are output. After a variable, say XI, has been entered into the regression model, the program calculates the partial correlation of the other independent variables with the dependent variable, given that XI is in the regression model. Adding One Variable to the Model If any of the variables has an F value larger than the F-to-enter, then the variable with the largest F will be entered into the model provided that its tolerance value is greater than the user specified tolerance value. Deleting One Variable from the Model If any variable currently in the regression equation has an F value smaller than F-to-delete, then the one with the smallest F value will be deleted from the model at that step. Manual Selection After you have completed a portion of the program, you will see the prompt "Input 'K', delete ' — K' ?". At this point the program is operating in a manual mode. That is, you may add a variable to the regression equation by entering its number, or delete a variable from the equation by entering its number preceeded by a minus sign. Methods and Formulae All methods and formulae used in this routine may be found in Statistical Methods for Digital Computers by K. Enslein, et.al. 64 Polynomial Regression Object of Program This program is designed to fit a polynomial regression model of the form: Y = BO + B1(X) + B2(X|2) + B3(X|3) + ... + Bp(X f p) where p < = 10. The regression coefficients, BO, Bl, ..., Bp are computed by the method of least squares. The degree of the regression, p, is chosen by you with the aid of a preliminary analysis of variance table and, if desired, an X-Y scatter plot. The preliminary analysis of variance table shows the additional sum of squares explained by models of successive degrees as well as the associated F values and R-squared values. After the degree of the regression is selected, an analysis of variance table for the model is printed and confidence intervals are constructed about the coefficients. In addition, a residual analysis may be performed. Typical Program Flow Input data via BSDM Select Advanced Statistics option Select Polynomial Regression Specify variables and subfile for analysis Plot the X-Y pairs Input maximum degree of regression to consider Decide degree of regression based on preliminary AOV table Obtain final AOV, parameter estimates and confidence intervals Plot regression line Perform residual analysis 65 Special Considerations Degree of Model The maximum degree of the model has been set (somewhat arbitrarily) at 10. Models of degree ten involve arithmetic operations using the X variable raised to the 20th power, where X is the independent variable. Hence, substantial round-off errors may occur with models of high degree. In general, a model of degree p will involve X values raised to the 2*p power. It is therefore suggested that you use extreme caution in choosing models of high degree. Method of Computing Sums of Squares and Cross Products Matrix If a data value is missing for one of the two variables, the entire observation is deleted, i.e., not used in the computation of the sums of squares and cross products matrix. See Special Considerations of the Multiple Linear Regression section for an example. Preliminary AOV Table After plotting the X-Y data pairs, you will be asked to specify the maximum degree of the regression. A preliminary AOV table will be displayed which will show the additional sum of squares and R-squared for the linear, quadratic, cubic, ... regression models. This table can be used as an aid in determining the appropriate degree for your polynomial model. Plotting Considerations When plotting the data and regression, every tic mark on the axes will be labeled. So, you should specify no more than 10 tic marks to obtain an uncluttered plot. One tic mark will coincide with the point where the X-axis crosses the Y-axis. Another tic mark will coincide with the point where the Y-axis crosses the X-axis. Plotting the data is highly recommended since a plot may suggest the degree of the polyno- mial model. Methods and Formulae The Cholesky square-root method is used to factor the sum of squares and cross products matrix. It is felt that this inversion method produces less round-off error than other proce- dures. This method, as well as all other methods and formulae may be found if F.A. Graybill's Theory and Application of the Linear Model. 66 Nonlinear Regression Object of Program Given a model Y = f(X 1 ,X 2 ...,Xm;P 1 p 2 ,...,p P ) + € where the model f contains m independent variables X; and p parameters (3j and given n observations (Yi,Xi 1) Xi 2 ,...,Xi m ) ; i = l,2,...,n this program computes the least square estimates fjj; that is, the program adjusts the (3j to minimize n i = l Q = £{Yi-f(Xi 1 ,Xi 2 ,...,Ximi 1 ,p 2 ,...ip)} 2 You supply the functional form of f. For example, one possible form would be Y = p 1 exp(p 2 X : + p 3 X 2 ) + p 4 The program also provides X-Y scatter plots (the non-linear regression curve can be added to the plot if the model contains only one independent variable). After each iteration the follow- ing information is output: the iteration number, estimated parameter values, and sum of squared residuals (Q). Confidence intervals (regions) on the parameters are also constructed. In addition, a residual analysis may be performed. Before beginning the program, you will need to create a file which contains the function and partial derivatives. The necessary steps are shown in the Special Considerations section. Typical Program Flow 67 Input data via BSDM Select advanced statistics Insert program medium Choose Nonlinear Regression Specify variables and subfiles Plot X versus Y Load subroutine with function and partial derivatives. 1 Enter initial values for every parameter Estimation of parameters Plot regression line ■ Confidence invervals on parameters Residual analysis 68 Special Considerations Limitations The maximum number of parameters in the model is 20. Also, the number of observations times the number of parameters must be less than or equal to 5000. Convergence Criteria From a user viewpoint there are three modes of program termination during the iterative stage of estimation of the parameter. The first mode is the satisfactory completion of the convergence criteria; that is, the iteration is terminated whenever Ul! < delta for all j 0.001+ |0j | where delta is a small number that you input, and Sj is the change in pj resulting from the last iteration. This is the normal termination which should occur when a proper function has been specified for f, the derivatives are specified correctly, and the initial estimates for the para- meters are reasonable. A second mode of termination can occur when the program determines that the process is not converging in a satisfactory manner. (For the procedure used in determining whether the process is converging properly, see Reference 5.) If the program does terminate the iterative process, you are able to respecify the convergence coefficient (Delta), the function and/or derivatives, and the initial parameter estimates. The third method of termination of the iterative process is for you to "force off" the computa- tional process by pressing the "No" key. Quick Plot A quick plot is essentially a default plot with plotting parameters: 1 . X-min = actual X-min, X-max = actual X-max. 2. Y-min = actual Y-min, Y-max = actual Y-max. 3. Y-axis crosses X-axis at X-min. 4. X-axis crosses Y-axis at Y-min. 5. Distance between X-tics - (Xmax-Xmin)/5. 6. Distance between Y-tics = (Ymax-Ymin)/5. 7. Number of decimals for labeling X-axis and Y-axis = 2. You may wish to have the quick plot drawn in order to "see" what the relationship between Y and the X you have chosen looks like. The actual limits of the confidence intervals are very data dependent. Caution should be exercised in using these limits if many iterations were required to determine the regression coefficients. 69 Before you Run Non-linear Regression To run non-linear regression, you must first create a file which contains the function and partial derivatives you wish to use. You can create as many of the files as you wish. The procedure to create these files is as follows: • Insert your floppy in the built-in disc drive • Type SCRATCH A; press EXECUTE • Press EDIT key; press EXECUTE You should now see the line number ten on the screen. • Now type in each line of the file, pressing ENTER after every line that has been entered. The file should resemble the one below. Note Remember that partial derivatives should be taken with respect to PC). 10 SUB Function<P<*> .X<*> »F) 20 F=P( 1 )+P(2)*X( 1 ) -P<3> 30 SUBEND 40 SUB Partial (P(*) >)<(*) ,Der<*) ) 50 Der(l)=l B0 Der(2)=X( 1 ) -P<3> 70 Der(3)=P(2)*L0G(X( 1 ) )*X( 1 ) " P ( 3 ) B0 SUBEND • The two SUB statements in your file must be exactly the same as in the example. • When you have finished typing the two subroutines, press the CLR SCR KEY. Type STORE "name of file". You may name your file whatever you like as long as the name is not greater than ten characters long and has nothing but letters and numbers in it. • You may now begin running the Statistics Library by typing LOAD "AUTOST",l with the BASIC Statistics and Data Manipulation disc in the internal disc drive. 70 Methods and Formulae The Marquardt's procedure (see Reference 5) is used to obtain the estimated parameters in each iteration. Define Z = (Zij)= | af(X 1 j,X 2 j,...,Xmjj 1 ,...p P ) 1 r Qf(Xjj )1 dpi api then each iteration can be written as o(k + l) = A(k) + g(k) where 8(k) is the solution of the set of linear equations (A + Xl)5=Z'(Y-f(X,p)) = g where A = Z'Z and g are evaluated at p(k) (both A and g are normalized in the program), and where X is an adjustable parameter which is used to control the iteration. The motivation of Marquardt's method is to choose X so as to follow the Gauss-Newton method to as large an extent as possible, while retaining a bias towards the steepest descent direction to prevent divergence. The square root method is used to solve the system of linear equations in each iteration and toobtainC = (Cij) = A _1 . For the confidence intervals (regions) on parameters, the 1 - a one-at-a time confidence inter- val on pj is PJ-tta/ain-pJtSe^jjli^^Pi^pj + tfa/am-pXSe^jj) 1 ^ and the approximate 1 -a simultaneous confidence intervals on pj's are pj-(pF(a:p,n-p)Se 2 Qj) 1 /2^pj«pj + ((pF(a:p,n-p)Se 2 Cjj)V2 where p is the number of parameters in the model, n is the number of observations (exclude the missing values), t(a/ 2 :n-p) is the a/2 upper point of the T-distribution with n-p degrees of freedom. F(a:p,n-p) is the a upper point of the F-distribution with p and n-p degrees of freedom, and Se is the standard error of the residuals. References 1. Draper, N., and Smith, H., (1980) Applied Regression Analysis, 2nd Edition, John Wiley and Sons, Inc., New York. 2. Fletcher, R. (1971) "A Modified Marquardt Subroutine for Nonlinear Least Squares", United Kingdon Atomic Energy Authority Research Group Report. 3. Graybill, F. (1976) Theory and Application of the Linear Model, Wadsworth Publishing Co., Inc., California. 4. Kopitzke, R., and (Boardman, T.J., Editor). Unpublished Notes for 9830A Statistical Distribution Pac. Hewlett-Packard, September 1976. Part No. 09830-70854. 5. Marquardt, D. (1963). "An Algorithm for Least Squares Estimation of Nonlinear Para- meters". J. Soc. Indust. and Appl. Math., 11. No. 2. 71 Standard Nonlinear Regressions Object of Program This program determines the regression coefficients for the following four types of standard non-linear regression models: 1. Y = A(XTB) + C 2. Y = A*Exp(BX) + C 3. Y = A*Exp(BX) + OExp(DX) + E 4. Y = A*Sin(BX) + OCos(DX) + E where the intercept term, C or E above, is optional. The intercept is determined by using an approximate minimum Y value in the observed data as the initial value. Typical Program Flow Input data via BSDM Select Advanced Statistics option Choose Standard Non-linear Regression routine Specify variables and subfile for the analysis Choose model and, if desired, intercept Plot the data ■ Use initial parameter values provided or supply your own Non-linear regression performed to estimate parameters ' Plot regression curve ■ Obtain confidence intervals ■ Perform residual analysis 72 Special Considerations Initial Parameter Estimates In models 1), 2), and 3), initial estimates for parameters are obtained by linearizing the model. This is accomplished by taking the logarithm of both sides of the equation for model 1, and by taking the logarithm of Y in models 2 and 3. In model 3, C is taken as .1*A and D = .5*B. In model 4: A = (Ymax - E) * Sin(a) * Cos(B * Xmax) B = 360 / (length in units of X of a typical cycle) C = (Ymax - E) * Cos(a) * Sin(B*Xmax) D = B E = sample mean of y where a = 90 - B * XI, for data in degrees, and XI is the X value at Ymax. For angular units in radians, the estimates of B and C will change accordingly. Convergence Criteria There are three ways by which the program may terminate its iterative procedure of estimat- ing the model parameters. a. The iteration is terminated when | Aj | / (.001 + | J3j | < Delta for all regression coefficients, |3j, where Delta is a small number that you input, and Aj is the change in |3j resulting from the last iteration. This is the normal termination which should occur when the proper model has been selected for a given data set and the initial estimates are chosen prop- erly. b. When the program determines that the process is not converging in a satisfactory man- ner, it will terminate. For the procedure used in determining whether the procedure is converging properly, see reference 5 in the Non-linear Regression section. If the prog- ram does terminate the iterative process, you can re-specify the convergence coefficient (Delta), and/or the initial estimates of the parameters and try the regression again. c. You may force the iterative procedure to terminate by pressing the "Stop" key. Angular Units for Model 4 When model 4, the trigonometric model, is chosen, you need to specify two additional items for the program. You must declare whether your X values are in degrees or radians. In addition, during the routine which supplies the initial estimates for the parameters, you need to specify the length of a typical cycle of data. 73 Residual Analysis Object of Program This program allows you to analyze the residuals from a regression problem in order to check the adequacy of the regression model. It may be used upon completion of any of the regres- sion routines. The residuals may be printed and/or plotted. The residual printout includes the observed values, predicted values, residuals, and standar- dized residuals. A final column shows which residuals are significantly large. The residual plot allows you to plot the standardized residuals versus observation number or versus any of the variables in the model. Residuals may be generated for subfiles which were not used in the determining the regres- sion equation. This may be useful as a method of confirming the adequacy of the derived model. Typical Program Flow Request a residual analysis upon completion of regression Printout residuals Plot residuals Special Considerations Range of Standardized Residuals The standardized residuals are plotted in a range from -5 to 5. If any standardized residuals are outside this range they will not be plotted, but a note showing the number of residuals off scale will be added to the plot. Significance of Residuals The last column in the residual table output shows which residuals are significantly large. In this column, two asterisks are printed for standardized residuals between two and three stan- dard deviations away from zero. Similarly, three asterisks are printed for standardized re- siduals between three and four standard deviations away from zero, and four asterisks are printed for standardized residuals four or more standard deviations away from zero. Distance Between X Tic Marks When Plotting The first tic mark will coincide with the minimum X value. Every tic mark will be labeled. Hence, an uncluttered plot would contain no more than 10 tic marks. 74 Methods and Formulae Suppose you wish to fit a regression model of the form: Y = BO + B1X1 + B2X2 where BO, Bl, and B2 are the regression coefficients. We will call the nth predicted value for Y, y(n), the nth residual r(n), and the Jth observation of the Ith variable, D(I,J). We would then calculate the following: 1. Predicted Y: y(n) = bO + bl*D(Xl,n) + b2*D(X2,n), where bO, bl, and b2 are the predicted regression coefficients. 2. Residual: r(n) = D(Y,n) - y(n) 3. Standard error of residuals: Ser = (residual mean square) f .5, where the residual mean square is calculated in the regression routine. 4. Standardized residual: SR(n) = r(n)/Ser The residuals for a nonlinear regression are derived in a similar manner except that the non- linear regression model is used to predict Y. 75 Example 1: Multiple Linear Regression The data below will illustrate Multiple Linear Regression. The data consists of three variables, XI, X2 and the independent variable Y: Are you Join* to use user defined transformation or do Non-linear regression? (Y/N) NO A r a <i o u u z .i. n g a n II P I B Pr i. nt c r "/ YES Enter select code i bus address (if 7»1 press CONT)? ¥ * * * * * * M **###*';(:#* * * ****** * * * * X: * * * * * * * * * * * * * * * * * * * >i: * * t # * * *: 1 1 1 * * * * * * % * % * 4: * * t :<■: * * H: * * D A T A M A N I P U L A T 1 N ' «: ********************************************** Enter DATA TYPE: 1 Mode nu fiber ■-■ ? Raw data I.:: data stored on the [KMigraw's scratch file (DATA) YES Stored on mass storage Previously stored on scratch data file. EXAMPLE OF MULTIPLE LINEAR REGRESSION Data file na«e ; DATA D a t a t v p e ,i. s ; Raw d a r a Nurtber of observ a 1 i o n s : NiiMber of variables; Variable nawes; .1. A <J J. '.'■ 1 1 U Xi t- X2 3 Y 4 XI A 2 S X2"2 6 Xi*X2 '3 u b f j 1 1 c fe : NONE Note: X4, X5, and X6 are derived from X1 and X2 by transformations. SELECT ANY KEY Opt j. on nu fiber Select special function key labeled-LIST List all the data Eiv ti '? t h o d f o r 1 j. s 1; i i "i ci d a \ < 76 In tabular form MULTIPLE LINEAR REGRESSION EXAMPLE Data type isi Raw data Variable * i Variable * 2 Variable ♦ 3 Variable # 4 Var ■iable # 5 (Xi > <X2 ) <Y > (Xi A 2 > <X2 A 2 ) OBS4 i 7.80000 4.00000 0,00000 60 ,84000 16.0 0000 2 7.80000 8.00000 ,03100 60,84000 64, 0000 3 7.80000 12,00000 ,47500 60.84000 144,00000 4 39.00000 4. 00000 . 01600 1521 ,00 00 16, 00000 S 39,00000 8.00000 8, 000000E-03 1521 .00000 64,00000 6 39. 00000 12.00000 ,19000 1521,00000 144, 00000 7 78,00000 4,00000 ,00000 6084,00000 16,0 0000 8 78.00000 8,00000 .03900 6084,00000 64 . 9 78, 00000 12.00000 .00000 6064, 000 00 144, 00000 Variable # 6 <Xi*X2 ) DBS* 1 31.20000 2 62.40000 3 93,60000 4 156. 00000 5 312,00000 6 468, 00000 7 312.00000 8 624. 00000 9 936.0 0000 For this data set only X1 , X2 and Y need by typed in. When this is done, select the tran- formation key on the template. To get X1 f 2, choose option 1 allowing a=1, b = 2, and c = 0. This creates a new variable X f 2. The same is done to obtain X2 f 2. To obtain X1*X2, choose option 10 allowing a=1, b = 1, and c=1. Once you have all these variables, store them by using the Store key on the template. Option nurtber = 7 SELECT ANY KEY Exit from the List routine. What statistic: options are desired f Select Special Function Key labeled-STATS Select just the mean, ci, variance, standard deviation, skewness, and kurtosis of all the data variables. 1 VARIABLES = ■? ALL Confidence coefficient for confidence interval on the meanfe.S. 90 »95 t99X ) = 95 95% ci for means requested. I******************************************************************************** * SUMMARY STATISTICS * * ON DATA SET; * * MULTIPLE LINEAR REGRESSION EXAMPLE * *********************************************** 77 BASIC STATISTICS VARIABLE ♦ OF # OF NAME OBS. MISS SUM MEAN VARIANCE STD , DEV , XI 9 374 ,40000 41 ,60000 927.810 00 3 , 45997 X2 9 72 ,00000 8.00000 12,00000 3,46410 Y 9 ,7590 ,08433 ,02506 , 15832 Xi A 2 9 PP997 ,52000 2555,28000 7403936,57637 2721 , 01756 X2 A 2 9 672, 00000 74,66667 3136,00000 56,00000 Xi*X2 9 2995 ,20000 332,80000 90043,20000 300 ,07199 VARIABLE COEFFICIENT STD , ERROR 95 % CONFIDENCE INTERVAL. NAME OF VARIATION OF MEAN LOWER LIMIT UPPER LIMIT ::<i 73.22109 10.15332 18.18009 65, 01991 X2 43.30127 1 , 15470 5.33654 10 ,66346 Y 187.72946 .05277 -, 03739 .20606 Xi A 2 106,48608 907,00585 463,15784 4647.40216 X2 A 2 75.00000 18,66667 31 ,60967 117,72366 Xi*X2 90, 16586 100.02400 102.08217 563,51783 VARIABLE SKEWNESS KURTOSIS XI , 13506 -1 ,50000 X2 0, 00000 -1 ,50000 Y 1.93769 2 .29099 Xi A 2 .53922 -1 , .50000 X2 A 2 ,29480 -1. ,50000 Xi*X2 ,88424 , 26334 What statistic options are desired ? 2 Request the correlation matrix of all the data VARIABLES = variables. ? ALL * SUMMARY STATISTICS * * ON DATA SET: * * MULTIPLE LINEAR REGRESSION EXAMPLE * CORRELATION MATRIX XI X2 Y Xi A 2 X2 A 2 X2 0, 0000000 ,4209438 .5916875 XI A 2 .9747877 0.0000000 -.3905355 X2 A 2 , 0000000 .9897433 ,625096.1. .0000000 Xi*X2 .8120711 .4802402 ,2314209 ,7915969 ,4753145 What statistic options are desired ? VARIABLES ? Gives median, mode, percentiles, min, max, and range of all the data. ALL 78 * SUMMARY STATISTICS * * ON DATA SET : * * MULTIPLE LINEAR REGRESSION EXAMPLE * ORDER STATISTICS VARIABLE MAXIMUM MINIMUM RANGE MIDRANGE Xi 78.00000 7,80000 70, ,200 00 42,90000 X2 12.00000 4, 00000 8 ,00000 8,00000 Y .47500 0,00000 ,4750 .23750 Xi"2 6084.00000 60.84000 6023 ,16000 3072,42000 X2*2 i44. 00000 16, 00000 128 ,00000 80,00 00 Xi*X2 936.00000 31 .20000 TUKEY 904 >s h: ,80000 483.60000 INGES VARIABLE MEDIAN 25-th %-j.le 75-th %-ile Xi 39.0 00 00 7.80000 39.00000 X2 8,00000 4, 00000 8. 00000 Y .01600 ,00000 ,03100 Xi A 2 1521. 00000 60.84000 1521 . 00000 X2*2 64.00000 16,00000 64.00000 Xi*X2 312.00000 93.60 000 312 , 00000 TUKEY' S MIDDLEMEANS VARIABLE MIDMEAN TRIMEAN MIDSPREAD Xi 40.56000 31 ,20000 31 .20000 X2 8,00000 7.00000 4.00000 Y ,01880 , 01S7S . 03100 Xi A 2 2141.56800 1155.96000 1460 , 16000 X2 A 2 70,40000 52.00000 48, 00000 Xi*X2 268.320 257,40000 218,40000 — Other per cen tile s? NO What statistic options are desired ? Note: All three sets of statistics could have selected original by answering ALL to option SELECT ANY KEY question. Exit Basic Statistics routine. Select special function key labeled-ADV STATS Remove BSDM medium. Option number = ? Insert regression medium. 5 Multiple linear regression. Nuwber of the dependent variable = f 3 Y = variable'Y" Which of the refraining variables should be included in the regression '> ALL X,, X 2 , Xf2, X2|2, X1 and X2 Is above infor nation correct? YES Displayed on CRT 79 MULTIPLE LINEAR REGRESSION ON DATA SET: MULTIPLE LINEAR REGRESSION EXAMPLE — where Independen t var lab le (s ) = <l)Xi <2)X2 <4>X1 A 2 <5)X2*2 (6)Xi*X2 STANDARD COEFF, OF VARIABLE N MEAN VARIANCE DEVIATION VARIATION XI 9 41,60000 927,81000 30 .45997 73,22109 X2 9 8,00000 12,00000 3.46410 43,30127 Xl*2 9 2555.28000 7403936,57637 2721 .01756 106.48608 X2 A 2 9 74.66667 3136. 00000 56. 00000 75,00000 Xi*X2 9 332.80000 90043.20000 300.07199 90 . 16586 Y 9 .08433 . 02506 . 15832 187,72946 CORRELATION MATRIX XI X2 Xl*2 X2 A 2 X1*X2 X2 Xi*2 X2 A 2 X.i.*X2 Y 0,0000000 .9747877 .0000000 ,8120711 -.4209438 0,0000000 ,9897433 ,4802402 .5916875 0,0000000 .7915969 .4753145 -.3905355 ,6250961 -.2314209 ANALYSIS OF VARIANCE TABLE SOURCE DF TOTAL 8 REGRESSION 5 XI 1 X2 1 X1 A 2 1 X2 A 2 1 Xi*X2 1 RESIDUAL 3 SUM OF SQUARES .20052 ,17769 .03553 , 07020 ,00158 .01531 ,05507 .02283 MEAN SQUARE 03554 03553 07020 0158 01531 05507 ,00761 F-VALUE 67 67 23 21 01 24 R-SQUARED = .88615 STANDARD ERROR OF ESTIMATE .0872327012721 From the AOV table we see that the addition- al sum of square for each variable produces a 'reasonable' F except X 4 and X 5 . VARIABLE x CONSTANT' XI X2 Xi A 2 X2 A 2 X1*X2 REGRESSION COEFFICIENTS STD. FORMAT E-FORMAT 0218 - 218154219795E- -02 0247 246964177292E- -02 02576 - 257643442623E- -01 002 23 132929 11 S8E- ■04 547 5468750 00 0E- •02 083 •••• 833990121900E- ■03 STANDARD ERROR REG. COE CIENT T-VALUt" 25209 - . 1 0517 .48 6364 -.40 00 05 .46 0386 1 . 42 031 -2 . 69 80 Confidence coefficient (e.ci,, 90,95,99) 95 COEFFICIENT -. 00218 .00247 -.02576 .00002 .00547 -.00083 Note: All but the last T values are very small. Not a very good model. 95 V. CONFIDENCE INTERVAL ^CONSTANT' XI X2 Xi A 2 X2 A 2 Xi*X2 Residual anal v sis and/ or prediction '•' YES P r i n t out re s i d u a 1. s ? YES LOWER LIMIT -.72581 -.01237 -.20845 -.00012 -.00560 -.0 01.72 UPPER LIMIT .72145 .01731 . 15692 , 00017 . 01653 .00006 TABLE OF RESIDUALS STANDARDIZED DBS* 1 3 4 8 9 OBSERVED Y , 00000 . 03100 .47500 , 01600 , 00800 .19000 .00000 . 03900 ,00000 PREDICTED Y -, 2309 , 11033 ,41876 -. 0.1.634 , 01300 ,21734 , 05543 -. 04533 . 02890 RESIDUAL , 02309 -, 07933 .05624 . 03234 - . 050 -. 02734 - . 5543 , 08433 - . 2890 RE' SI DUAL .26468 - , 90944 . 64476 , 370 73 ... 05732 ... .31.342 - ,63541 . 96676 .33135 SIGN IF Dor-bin -Watson Statistic: 2,8245975174 For test for autocorrelation of residuals. R a s i dual plots-? YES Would you like to Plot on CRT ? NO Plotter identifier strins (press CONT if % HPGL') l ! 1 o t t e r sele c t c o d e , B u «. # ■■" (defaults are 7 1 5 ) ' I? e s i. d u a 1 plot o p t ion no, ■= '•' 1 For p 1 o 1 1 i n a , X - m i n : = "> .1. F « r p 1 o 1 1 i n <:i , X-mqx = ? 9 Distance between X- ticks ;:;: "■' 1 -I of dec i na Is for- label 1 inci X-ax.i. <:■: (< = /> ■- 1 Residual Plots Press CONTINUE Press CONTINUE Plot residuals vs time sequence. Number of pen color to be used ? 1 I i3 a b o v e i n f o r m a t i. o n <:: o r r e c: t ',' YES 81 EXRMPLE OF MULTIPLE LINERR REGRESSION -J CC 3 n M en u a a u N M Q oc <E a z (E 5 4. 3. 2. 1 . 0/1 -1. -2. -3. -4. -5. x 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 (VI CO m to CD O) SEQUENCE ♦' Residual plots Op t .i. on n u ciber Exit from residual plots. Return to BSDM. 82 Example 2: Stepwise Regression The data shown below is the same as used in Multiple Linear Regression. Following the data are the results from the stepwise and backward selection procedures. Are you SoinS to use user defined transformation or do Non-linear regression? (Y/N) ND Are you using an HP IB Printer'? YES Printer select code, bus address = '? Enter select codei bus address (if 7 <) press CQNT)? * DATA MANIPULATION * #**#*#*#*#***#***:********#***##*^ Enter DATA TYPE!: i Raw data Mode nuMber = '•' 2 Stored on mass storage Is data stored on the proaraw's scratch file (DATA)? YES Previously stored Same as MLR example. EXAMPLE OF STEPWISE LINEAR REGRESSION Data file naiie ; DATA Data type is: Raw data NuMber of observations: 9 NiiMber of variables; h Variable nones: i . X i 2. X2 3 . Y A. Xi A 2 s. x;.?*2 6, Xi*X2 Subfiles: NONE SELECT ANY KEY Op t i on nu fiber :: Select special function key labeled-LIST 1 List all the data. E 1 1 i e r- n c tho <:l for 1 i s tin a d n t n : '3 In tabular form. EXAMPLE OF STEPWISE LINEAR REGRESSION Data type is: Raw data 83 Variable # (XI Var iab le (X2 Mar iabli <Y f 3 Mar i ab le (XI \? Var <X2 able * 5 ' J ) OBS* 1 2 3 4 7,80000 7.80000 7.80000 39. 00000 39. 00000 39,00000 78.00000 78.00000 78,00000 4,00000 8,00000 12,00000 4. 00000 8.00000 .12,00000 4,00000 8. 00000 12,00000 0,0000 , 03100 .475 , 5.60 80Q0000E-03 , 19000 , 00000 , 0390 0,00000 6 6 60 1521 840 .8400 ,84000 1521 ,0 00 00 1521 ,00 000 6084. 00 00 6084. 000 6084,0000 i 6,0000 64 , 144,0 00 16, 000 64, 0000 .1 44,00 16,00000 64 .000 144, 00 00 Mar. table # 6 <Xi*X2 ) OBS* 1 31,20000 2 62,40000 3 93.60000 4 156. 00000 5 312.00000 6 468, 00000 7 312.00000 8 624. 00000 9 936.00000 This is the same data set that was used for multiple linear regression. Refer to that ex- ample for instructions on how to form X1 | 2, X2|2, X1*X2. Option nuMber = SELECT ANY KEY Option nu fiber :::: ? 2 Procedure nuwber = : ? 1 To lerance val ue < i . e . .01 , 1 ) . .1. I- - v a J. ut? for i. n c 1 o s i o n = ? F - value for d e 1 e t ,i. on = '■' lis above inf or Mat i. on correct? YES N u m b e r o f d e p e n cl e n t >> a r • i a b 1 e Which remaining variables desired in regression? ALL Is above information correct? YES Exit the List routine. Select special function key labeled-ADV STATS Remove BSDM disc. Insert Regression Medium. Stepwise regression Choose the stepwise algorithm. Input tolerance value. F-to enter A F-value with 1 and n-k de- grees of freedom where k = expected number of coefficients in mode, f-to delete Note: We used F enter = F delete a common practice. Also, for n = 9 we probably should have used a much larger F. We definitely do not recommend small sample sizes except as examples. Variable 3 = Y With all others used as X,. Information on CRT 84 5|C ^ 5^ )fC 5fC 5|C 5(C 5(C )tf 5fC IfC jfC )fC ^k 3^ ^|C JfC ^C 5fC 5|C 5(C 5(C )|t )ft 3(C 3f^ J^ *t^ Jft )fC Jfs ?|( )(C ?(C JfC ?K 'T^ ?r- t^ 't^ ^ * ^ * $<f'^^^*^^)(t)f^^^*^$*)K*^)is^*^*^^-'rJ^^'T 1 -'f' "ft .^ )fi )f. STEPWISE REGRESSION on DATA SET: EXAMPLE OF STEPWISE LINEAR REGRESSION ******************************************************************************** Dependent variable: <3)Y I n d e p e n d a n T v a r i a b 1 e ( s ) : Tolerance = , i F-- value for inclusion F -- v a 1 o e for deletion - Method numbe r = ? ( 1 ) X i >.?.)X2 <4>X1 A 2 <5>X2 A 2 (6)Xi*X2 The stepwise algorithm can enter or delete variables at a step. This example does not show any variables which are deleted. CORRELATION MATRIX XI X2 XI A 2 X2 A 2 X.1*X2 Y Kl 1.0000000 . 0000000 i .0000000 XI A 2 , 9747877 . n 1,0000 yp-. :> n , I) 1 ,0000000 Xi*X2 . !3 1 2 '? 1. 1 . 4802402 7 9 1 t- q /, Q , 4753145 J ,0000 Y 42094' J 3 , 591 6875 390 C ;35S 6250 <*M ■ , 231420 c > , ************************************************************** r***M STEP NUMBER F TO PART F TO REGRESS I ON COEFFICIENTS STD ♦ •--VARIABLE ENTER CORR TOL DELETE STD, FORMAT E-FORMAT ERRO» 1.X1 1,51 .421 1,000 2.X2 3.77 ,592 1.000 4,Xi A 2 1,26 .3?i 1.0 00 5, X2"2 4 , 49 ,62S i .000 6,Xi*X2 .40 .231 1.000 Var. 5 has largest F-value and correlation, so it is the variable to enter the model. *#*****************#************************************************************ STEP NUMBER 1 VARIAF<I.E>X2 A 2> ADDED R-SOUARED = .39075 A n a I v i : : is-, of '- ' <x r i. a n <. TabL SOURCE TOTAL REGRESSION RESIDUAL. DF SUM OF SQUARES ?. c w> n7R3-> mfaw sr iar; UAL STANDARD ERROR . 132107402855 F TO P ART- F TO RF.GRES *■■■• -VARIABLE ENTER CO RR TOL DELETE ST! ), FORMAT 1 , XI 2 ,46 , 539 1,000 2 . X2 . 37 , ?42 , 020 4 , X 1 " ? 2.00 .50 1,000 c, . X2 A 2 a, , 49 .0 01 7'" 6 , X!*X2 B . 7 '■'■'. .77 , 774 ON COEFFICIENTS :- FORMAT IT! 7 7. ■'•>[. "0 ■■' STD "PRO! 5 85 Constant - -- • 047619047S19 Var 6 has the largest F-value and correlation, so it is the variable to enter the model, ft***************************************** A'************:***'*** ***'***:***********'** STEP NUMBER 2 MARIABLE'Xi*X2> ADDED R -SQUARED - ,75163 Analysis of Mariano* Table SOURCE DF SUM OF- SQUARES MEAN SQUARE F VALUE TOTAL 8 ,20 052 REGRESSION 2 ,15072 .07536 ",08 RESIDUAL 6 .04980 .00830 STANDARD ERROR = .0911067112552 REGRESSION COEFFICIENTS 3TD STD, FORMAT E- FORMAT ERROR F TO PART F TO ♦ — VARIABLE ENTER CORR TOL DELETE 1 .XI 4.7J. .696 . 148 2.X2 ,45 ,286 , 020 4, XI '2 4. S3 . 689 ,19 S.X2 A 2 5 6.86 6X.i*X2 8 . 72 C o n stan t = .0037S2291577G SOURCE DF SUM OF SQUARES TOTAL 8 . 20 052 REGRESSION 3 ,1.7486 RESIDUAL s , 2565 00268 . 268474330203E-02 0007 ,00036 •■■■■ 360245767615E-03 ,0001 Var 1 has the largest F-value and correlation, so it is the variable to enter the model. STEP NUMBER 3 MAR I ABLE 'XI' ADDED R-SQUARED - ,87206 Analysis of Variance Table MEAN SQUARE F-VALUE ,0 5829 11 36 . 00513 STANDARD ERROR - .0716294324428 REGRESSION COEFFICIENTS ST!) STD, FORMAT E- FORMAT ERROR ,00469 .468749152939E-02 0022 .00396 .395611766121E-02 ,0003 -,00086 - .85942300316-03 .0002 Constant = -.120040391928 None of the remaining variables have an F- value greater than F-To-Enter and none of the variables in the model have an F-value less than F-To-Delete, so the model is com- plete with X1 , X2 t 2, and X1 *X2. Tolerance value too sfiall and /or F -uol u '=>s i n ?:■ o f f i. c i. e n t To p " o c e e t:l Input 'K', delete '-K', or, enter to end regression , No other terms added or removed. P r o c: « dure n u n b e r = ? 2 Choose the forward (stepwise) algorithm. Tolerance value (i.e, ,01. .001) = f . 1 Tolerance F-value for Indus i on •= ? 4 F-To-Enter (perhaps too small) Is above inforwation correct? F TO PART F TO 1= — VARIABLE ENTER CORP TOL DELETE 1 .XI 4.71 2 . X2 .20 .220 , 020 4 , X 1 * 2 .26 .248 ,050 5.X2 A 2 :-'5 , 76 6,X1*X2 1 1 . 89 86 YES N u m b <■? r o f d e p e n d e n t war .i. a b 1 e '-' 3 Which of the renal nina variables should b ALL I s above inforn a t ion cor r e c t ? YES Note: No F to remove in FORWARD. Y = X 3 eel in the r e ar w <: ; <? j. o n '■' All others potential. *************************** #*r****M FORWARD REGRESSION on DATA SET: EXAMPLE OF STEPWISE LINEAR REGRESSION *.L. *Aj *■» .t, J, ^, ^ Of ^ ilr ^f ^ Jy ilf iLf ■*: *(/ iLi ^ ^ 'i/ ^ •i' -A. -Jj -if ij,- ■,!/ ■Af ii -^ Jf iJ.' J/ ^ >Aj \L- \1; xL \1» \U •X' -J/ -Jj -J^ J.' \l/ \L> 'At \lf \L- vlf J/ ^.' ^ ^ ^ ^ ^ ^ ^ J j -^ ^ ^ 'd' \1' ^ ^ ^ ^ * \1' -A' ^* St- ^lf ■jb 'A - Dependent variable: (3>Y Independent var iable (s) : Tolerance = .01 F - v a ] i) e for i nclusion Method nuMber = ? < 1 > X 1 (2^X2 ( 4 > X 1 " ?. <5>X2 A 2 <6>Xi*X2 The forward procedure will only add vari- ables to the model and will stop when no variable has an F to enter larger than 4 (or whatever value you specify). CORRELATION MATRIX XI X2 Ki A 2 S' p A '? X1*X2 Y 1,0000000 X2 0000000 XI A 2 9747877 0.0000 o 1,00 0.0000 0.0000000 i X1*X2 .812071 i. .480 240 2 t 7915969 . 475X1 <*5 J .00 Y 4?f) ! '4 :>S 5916075 6250<-'61 2'H4?0<? 5 ,0000 *****************#*****#************************'*'+:*•**'*************************** STEP NUMBER F TO PART F TO ♦—VARIABLE ENTER CORR TOL DELETE I .XI 1 ,51 .421 1 . 000 2.X2 3,77 .592 .1 ,00 4, XI -2 1.26 ,391 1.000 5 , X.9'2 4 . 49 , 625 1 ,00 6 ,X1*X2 .40 ,231 1 , 000 R E G P E S S I P N C H E F F I C I i;: " N T S BTD . r OR MAT E FORMA" STD ERROR The results for this portion of the example will be the same as the stepwise algorithm above. ************************************** ********Niif:***** ****** *********'l-:**''|: ****■**"** STEP NUMBER 1. VARIABI.E'X2 A 2' ADDED R -SQUARED = ,39 75 Analysis of Variant Table SOURCE DF SUM OF SOU ARES TOTAL 8 ,20 052 REGRESSION 1 ,0 7835 RESIDUAL 7 1^217 MEAN SQUARE , O'?035 , 01745 F VALUE 4 . 49 87 STANDARD ERROR . 132107402855 # — VARIABLE 1 ,Xi 2.X2 4,X1 A 2 S,X2"2 6.Xi*X2 Constant = - F TO PART F TO ENTER CORR TOL DELETE 2,46 .539 i ,000 .37 .242 .020 2.00 ,50 i. ,000 4.49 8.72 ,770 .774 047619047619 R EGRESS I ON COEFF I C I ENTS STD, FORMAT E-FORMAT , 0177 176721938776E-02 STD ERROR OR STEP NUMBER 2 VARIABI...E'X1*X2' ADDED R -SQUARED = .75163 Analysis of Variance Table SOURCE DF SUM OF SQUARES TOTAL 8 ,20052 REGRESSION 2 ,15072 RESIDUAL 6 , 498 MEAN SQUARE ,07536 , 0830 F- VALUE 9 , 8 STANDARD ERROR .0911067112552 F TO PART F TO ♦—VARIABLE ENTER CORR TOL. DELETE i ,X1 4.71 .696 ,148 2.X2 . 45 .286 .020 4.Xi A 2 4.53 ,689 .190 S.X2 A 2 16.86 6.X1*X2 8,72 Constant = .0037622915776 REGRESSION COEFFICIENTS STD, FORMAT E-FORMAT 0268 -. 036 .268474330198E-02 .360245767B05E-03 STD ;:rror 07 001 )fc )fc )fc )K A A )fc A A A )k !K !4c )fc )fc )fc )fc '.ic )fc )k ]4 A^ STEP NUMBER 3 VARIABLE 'XI' ADDED R-SQUARED = ,87206 Analysis of Variance Tab It 1 ; SOURCE DF SUM OF SQUARES TOTAL 8 .20052 REGRESSION 3 . 17486 RESIDUAL 5 .0 2565 MEAN SQUARE , 5829 ,00513 F - VALUE .1 1 , 3 6 STANDARD ERROR = . 071B29432442B F TO PART F TO #-- -VARIABLE ENTER CORR TO! PP'I FT!" 1 , XI 4.71 p , X2 .20 .220 , 020 4 . X i A 2 ,26 , 248 . 050 5, , X2 A 2 25,76 6 ,X1*X2 11 ,89 R F G R F R H T O N r O i':" F F I C I F. H T S ;TJ> , Ffir?MAT r- rflPHAT . 00469 .4687491.S3034F.~02 0396 00 086 .395611766121E-02 ■.859423200316E-03 , 022 S 2 Constant , 120040391928 Tolerance value too snail and/or F-values The results are the same as in stepwise re- gression. i n s u f f i c i e n t t o p r o c e e d . 88 Input 'K>, delete '-¥,', or, enter to end regression P i" o c e dur e n u m b e r : = ? T 1 e r an ce value ( i , e , , 1 . . i ) = '> .05. F -value for deletion = ? A l>; above .information correct? yes Number of dependent variable = ? 3 Which remaining variables desired in regression ALL Is above information correct? YES Backward (stepwise) algorithm. Only a F-To-Delete is required. (Perhaps it should be bigger than 4 with n = 9.) *■**#*********#***********************************************'******************* BACKWARD REGRESSION on DATA SET; EXAMPLE OF STEPWISE LINEAR REGRESSION Dependent variable: (3)Y Independent v a r i a b 1 e ( s ) ! ( 1. > X 1 <2>X2 i 4 ) X .1 * ?. <5)X2 A 2 (6>xi*x;. 3 The backwards algorithm sets all the terms in the model and then deletes one at a time until no F to remove is less than the F we specify (Fdelete = 4). Tolerance = .01 F - v a 1 u e for d e 1 e t i o n Method number = ? CORRELATION MATRIX X x 1 A 2 x "J A O x 1**2 Y XI. i ,00 !) 0,00 1 .ooooo )() 9".:'4";'R-?7 R (1 H Ij >- 1 , 0000') on qrp'741^ i o o o o o o n i) n n n 1 .0000 '■-"i *V2 V <:' i :>(1 ■"'.; ■, - ,1 ;? n o 4 ? ■■.3 42 n :■' .-I o ; : ' ... '•:', <;■ n r: 3 r vr 4 ' '' '■,"'< i 4 ''•'•: ,:. '<il<5.' \ !l !: <" ! 1 , '<'. Q ***************************************** ************#************************'** STEP NUMBER R -SQUARED = ,88615 A n a 1 v ": I. s f U a r i. a n c e T a b 1 e SOURCE TOTAL REGRESSION RESIDUAL PF SUM OF SOU ARES 8 , 20052 5 , 17769 3 , 2283 MEAN SQUARE . 03SS4 . 761 F--UALUE 4 , 67 STANDARD ERROR = .0872327012721 89 F TO PART F TO ♦—VARIABLE ENTER CORR TOL. DEI FTE I.XI .23 2.X2 .16 4.Xi"2 .21 5.X2"2 2.01 6.Xi*X2 7.24 REGRESSION COEFFICIENTS STD STD. FORMAT E-FORMAT ERROR .00247 .246964177292E-02 .0052 -.02576 --.257643442623E-01 .0636 .00002 . 23i32929ii58E~04 .0001 .00547 .5468750 00 0E-02 .0039 -.00083 --.833990121900E-03 .0003 Constant = -.0021815421979a Removes the variable with the smallest F to delete(x 2 ) *************************************************** STEP NUMBER 1 VARIABLE>X2' DELETED R-SQUARED = ,87993 Analysis of Variance Table SOURCE DF SUM OF SQUARES TOTAL 8 .20 052 REGRESSION 4 . 17644 RESIDUAL 4 02408 MEAN SQUARE , 4411 , 602 F- VALUE 7.33 STANDARD ERROR ,0775917889132 ♦—VARIABLE i.Xi 2.X2 4.Xi A 2 5.X2*2 6.Xi*X2 F TO PART- ENTER CORR 16 228 F TO TOL DELETE .34 020 .26 2.1. .96 10. 13 REGRESSION COEFFICIENTS STD STD. FORMAT E-FORMAT ERROR .00267 .267310640025E-02 .0046 .00002 . 2313292911S8E-04 0.0000 .00396 . 39S611766121E-02 .0008 -.0 086 -. 85942320 0316E-03 .00 03 Constant 0953530816668 Removes X 4 = X1 \2 next. ******* *****#*****************##*#*******#******#************************* ****** STEP NUMBER 2 VARIABI. E'X1 A 2> DELETED R -SQUARED = .87206 A n a 1 y s J. s of V a r j, a n <:: e T a l:> 1 1 SOURCE DF SUM / ::> .i. ::> v. 1 i v "jl t OF SQUARES TOTAL 8 , 20 052 REGRESSION 3 , .1.7436 RESIDUAL. 5 . 2565 STANDARD ERROR = .0716294324428 F TO PART F TO # VARIABLE ENTER CORR TOL DELETE S" I.Xi 4.71 2.X2 .20 .220 .020 4.X.t A 2 . 26 . 248 .050 S.X2 A 2 25 . 76 6Xi*X2 11.89 MEAN SQUARE F-- VALUE , 5829 1 1 : 36 . 051.3 REGRESSION COEFFICIENTS STD STD. FORMAT E-FORMAT ERROR .00469 . 468749152939E-02 .0 022 00396 .3956ii766121E-02 .0008 00086 -.8S9423200316E-03 .0002 C o n s t a n t 120 040391928 Results are the same as in stepwise regres- sion. Tolerance value too snail and/or F-ualues insufficient to proceed. But this may not be the case Input >k", delete '-K', or, enter to end regression , , , , for some data sets. Procedure nuwber = ? Exit Stepwise Regression. Residual analysis and/or prediction? NO Option nuwber = ? 7 Return to BSDM . 90 Example 3: Polynomial Regression Bus Passenger Service Time The time required to service boarding passengers at a bus stop was measured together with the actual number of passengers boarding. The service time was recorded from the moment that the bus stopped and the door opened until the last passenger boarded the bus. The objective is to determine a model for predicting passengers service time, given knowledge of the number boarding at a particular stop. Let Variable 1 = number boarding and Variable 2- passenger service time. The following data was gathered during the month of May 1968 at twelve downtown locations in Louisville, Kentucky. Are you So in 9 to use user defined transformation or do No n -linear regression ? (Y/N) NO Are you usinq an HP IB Printer 1 ? YES Enter select cadet bus address (if 7)1 press CONT)' * DATA MANIPULATION * ********************************************************#******#**************** Enter- DATA TYPE: 1 Raw data Mode nuMber = :: ? 2 Mass storage Is data stored on the progran's scratch file ''DATA ''7 YES Previously stored on Data File' BUS PASSENGER SERVICE TIME (EXAMPLE OF POLYNOMIAL REGRESSION) Data file nane : DATA Data type is: Raw data NuMber of observations: 31 Nunber of variables: 2 Variable nanes: i . NUMBER X1 = number of passengers boarding a bus. 2, TIME X2 = Y = passenger service time in seconds. Subfiles; NONE SELECT ANY KEY Select special function key labeled-LIST 91 Op t i. on nurtber ~ ? i Enter Method for list. in a data; 3 List all the data. In tabular form. BUS PASSENGER SERVICE TIME (EXAMPLE OF POLYNOMIAL REGRESSION) Data type is: Raw data Variable # 1 Variable # 2 (NUMBER ) (TIME ) OBS# i 1 .00000 1 .40000 2 1,00000 2.80000 3 1 .00000 3.00000 4 i .00000 1 .80000 5 1 ,00000 2,00000 6 2,00000 4,70000 •-? 2.00000 8,00000 8 P.. 00000 3, 00000 9 2.00000 2,50000 10 3.00000 5,2000 15. 3.00000 6,20000 12 3, 00000 9,40000 13 4, 00000 11 ,70000 14 5. 00000 7,50000 IS 5, 00000 11 ,90000 16 6.00000 13.60000 17 6.00000 12,40000 18 6. 00000 1.1 .60 000 19 7.00000 14,70000 20 7. 0-0000 13.50000 21 8,00000 12.00000 22 8,00000 14. 10000 23 8,00000 26.00000 24 9,00000 19. 00000 25 10. 00000 21,20000 26 11 . 00000 22,90 00 27 11,00000 22.60000 28 13.00000 25,20000 29 .17.00000 33.50000 30 19. 00000 33,7000 31 25, 00000 54,20000 Option nunber = ? SELECT ANY KEY Exit List routine. Select special function key labeled-STATS What statistic options are desired ? 1 VARIABLES = > ALL Confidence coefficient for confidence interval on the ciean ( e . ci . 30 .95 t99'Jf,) = ? Gives the mean, ci, variance, standard, de- viation, skewness, and kurtosis of all the data. 9 5 95%C.l.on means will be developed. 92 ****************************************************** * SUMMARY STATISTICS * * ON DATA SET : * * BUS PASSENGER SERVICE TIME (EXAMPLE OF POLYNOMIAL REGRESSION) * ******************************************************************************** BASIC STATISTICS VARIABLE NAME NUMBER TIME * OF * OF OBS, MISS 31 3i SUM 207. 00000 431 .30000 MEAN 6.67742 i. 3 . 9 1 29 VARIANCE 33.22581 139,39983 STD . DEV . S.7641S! 11 .80677 VARIABLE NAME NUMBER TIME COEFFICIENT OF VARIATION 86.32351 84.86202 STD. ERROR OF MEAN 1 , 03528 2.12056 95 % CONFIDENCE INTERVAL LOWER LIMIT UPPER LIMIT 4.56260 8.79223 9,58113 18.24468 VARIABLE SKEWNESS KURTOSIS NUMBER TIME 1 ,43125 1 , 48977 1 -9079 2 , 55645 What statistic options are desired "> 2 Gives the correlation matrix of all the data. VARIABLES = •) ALL ******************************************************************************** * SUMMARY STATISTICS * * ON DATA SET : * * BUS PASSENGER SERVICE TIME (EXAMPLE OF POLYNOMIAL REGRESSION) * ***********************'********************************************************* CORRELATION MATRIX NUMBER TIME ,9743533 Highly correlated in a linear fashion. What statistic option*; are desired 7 VARIABLES = Gives median, mode, percentiles, min, max, and range of all the data. ALL **************#**************************************************'» ; ************* : * * SUMMARY STATISTICS * * ON DATA SET ; * BUS PASSENGER SERVICE TIME (EXAMPLE Of POLYNOM I A 1 . REGRESSION) * ******************************************************************************** i: ORDER STATISTIC V API ABLE NUMBER T I MI- MAX I MUM :>5, 00 00 54,20000 MINIMUM 1 ,00000 1 ,40000 "'ANGE '4 0(H) 52 , 600 00 tf T DRANG E j ~i 93 Tn KEY'S HINGES VARIABLE MEDIAN 25-th X-.i.le 75--th X--ile NUMBER 6,00000 2.00000 8.00000 TIME il. 90000 4,70000 19,00000 TUKEY'S MIDDLEMEANS VARIABLE M II) M E A N T R I M E A N M I I) S P R E A D NUMBER 5 , 4 .1. i 76 5 , 5 6 , T I ME i i , S7 59 1 i , 875 1. 4 . 3 Other p e r c e n t i 1 e s ? NO What statistic: options art? desired '> Exit Basic Statistics. SELECT ANY KEY Select special function key labeled-ADV STATS Remove BSDM disc. Insert regression medium. Option nu fiber - ? 3 Polynomial regression selected. NuMber of the dependent variable - '> ? NuMber of the independent variable -■• ? i POLYNOMIAL REGRESSION ON DATA SET: BUB PASSENGER SERVICE TIME '.EXAMPLE OF POLYNOMIAL REGRESSION) --where: Dependent variable = (25TIME Independent variable = (1.) NUMBER Is a plot of the regression d e s i r e d ? YES Plot on CRT? NO Plot on an external plotter Plotter identifier strinS (press CONT if *HPGL') ? Plotter select code, Bus * = (defaults are 7t5) ? X-nin - f X--MQX -- ? 23 Y-win = ? Y-hax = ^ „, . 60 Plotting limits specified. Y-axis crosses X-axis at X = ? X-axis crosses Y-axis at Y -■■ '! Distance between X-ticks == v 5 Distance between Y- ticks - ? 5 * of decimals for labelling X-axis (<-~7'; = ? ♦ of decinals for labelling Y-axis = ? I) Number of pen color to be used ? 1 Is above information correct? YES Beep will sound when plot is done* then press CONTINUE 94 BUS PRSSENGER SERVICE TIME B0 y / 55 y + y 50 y y y ^y y ^y 45 y y*y y y^ 40 y yS y yS y yS u 35 30 y yS y y^ ^ •" H ' y^ " " h- 25 + y y 20 S s y y *•* 15 *-- .+ s* y^ 10 - . — ■ + + yS y y 5 " *, "% + y y 6 y i 1 1 1 1 3 in Q in Q in — • ~* OJ OJ NUMBER 95 Max i nun degree of reqressi on < < =10 > i VARIABLE N MEAN NUMBER 31 6,67742 TIME 3.1. 13. 91.290 We specified maximum degree at 1 although we could have chosen a value slightly higher than desired level. VARIANCE 33, 381 139 ,39983 STANDARD DEVIATION 5.76418 ti , BO 677 COEFF . OF VARIATION 86 .33351 84.862 02 CORRELATION ,97435 Degree of reqression = ? 1 SELECTED DEGREE OF REGRESSION = 1 R -SQUARED = ,94936 STANDARD ERROR OF ESTIMATE = 2.70221890497 Specify the actual degree of interest. ANALYSIS OF VARIANCE TABLE SOURCE DF SUM OF SQUARES MEAN SQUARE F- VALUE TOTAL REGRESSION X A 1 RESIDUAL 30 1 1 29 4181 ,99484 397 , 23722 3970 , 23722 2.11 ,75762 3970 ,23722 3970 , 23722 7 .30199 343,72 543,72 REGRESSION COEFFICIENTS E -FORMAT 86330 09690 0E+0 VARIABLE STD. FORMAT * CONSTANT' .53633 X A 1 1.99577 .199576699031E+01 Confidence coef.fic.Lent (e.g., 90,95,99) =• ? 95 STANDARD ERROR REG, COEFFICIENT T-VALUf . 74979 . 78 . 08559 23 , 32 y = .586+ 2.00X about two seconds per pas- senger to board a bus. 4 CONSTANT' X A i COEFFICIENT , 58633 1 , 99577 95 % CONFIDENCE INTERVAL LOWER LIMIT UPPER LIMIT -.94752 2,12018 1 82063 2 17086 Plot regression curve on present Jraph ? YES Plot confidence interval of regression line also ? YES Confidence coefficient (e.S.t 90 t 95. 99)= ? 95 Same pen color ? YES ChanSe decree of regression ? NO Residual analysis and/ or prediction ? YES P r j. n t o u t r e s .i d u a 1 <•; '' YES May not need an intercept term. 96 TABLE OF RF SI DUALS STANDARDIZED DBS* OBSERVED Y PREDICTED Y RESIDUAL. RESIDUAL 1 1 ,40000 2,58210 -1 , 18210 - , 43745 2 2.B0000 2,58210 , 21790 , 08064 3 3, 00000 2,58210 .4179 , 15465 4 1.80000 2,58210 -.782.10 - , 28943 5 2.00000 2,58210 •58210 ■-.21541 6 4.70000 4 , 57786 . 1221.4 , 04520 7 8.00000 4,57786 3,42214 1 . 26642 8 3, 00000 4 , 57786 ■1 , 57786 ■■■■ .58391 9 2.50000 4.57786 --2, 07786 ■- , 76895 10 5.200 00 6 . 57363 -1 , 37363 -.50 833 11. 6,20000 6 , 57363 - , 37363 -.13827 13 9.40000 6 . 57363 2 , 82637 1 . 04594 13 5.1 ,70000 8,56940 3,13060 1 . 15853 14 7,50000 10.56517 •3. 065.1.7 -i , 1343.1. IS 11.90000 10 ,56517 1 ,33483 . 49398 .1.6 13,60000 12,56093 1, 03907 .38452 17 12,400 00 12,560 93 ~, 16093 - , 05956 IB 11 ,600 12.560 93 --, 96 93 -.35561 19 1.4,70000 14,55670 , 14330 ,05303 20 13,50000 14,55670 ■1. 05670 ■•- ,39105 21 12,00000 16.55247 -4,55247 -1 ,68471 22 14,10000 16.55247 ■2 , 45247 - , 9 757 23 26,000 16.55247 9 , 44753 3,49621 24 19.00000 18,54823 ,45177 , 16718 25 21 ,20000 20.54400 .656 . 24276 26 22.900 22,53977 .360 23 , .1.3331 27 22.60 00 22 ,53977 , 60 23 .0 2229 28 25.2000 26,531.30 ■1 ,33130 - , 49267 29 33.50 00 34,51437 -1 , 01437 •- . 37538 30 33.7000 38,50590 -4,80590 -1 ,77850 31 54,20000 50 ,48050 3.71950 1 .37646 sicn:i *** Durbin -Watson Statistic: 2. 09200089648 Residual plots'? YES Plot on CRT? NO Plotter identifier strinS (press CONT if *HPGL' ? Plotter select code. Bus * - (defaults are 7i5) Note that one observation (#23) seems to have a very large standardized residual. Residual plots An external plotter is used. R e s i ct i.) u 1 p 1 o t o p t i o n no. =• '> 1 For plotting, X-cu. n = ? For olottinci, X-Max ::; '> 35 D j. s t '.) n c e b e t we e n X •- 1 i. c: k s = '< 5 * of decimals for labelling X--axi<i (.<-?) Number of pen color to be used ? 1 I '-. a b (i '.) e i n f o r m a t i o n c: o r r e. c 1 1 YES Plot residuals vs time sequence. 97 PRSSENGER SERVICE TIME (EXRMPLE OF POLYNOMIRL REG.) (E n £0 UJ OH Q UJ N M a a. or a z (E I- cn 4 3 2L 1 -1 -2 -3. -4. -5 — *-= H* H; x x x X -X-+- x?x I ** X X X in G> m (U in (VJ m en v SEQUENCE *' Residual plots ? YES Plotter identifier strins" (press CDNT if V HPGL') Plotter select oodei bus * (defaults are 7t5) ? Residual plot option no. = ? 2 For plotting, X-Min = ? For plotting, X-nax * ? 55 Distance between X-ticks = ? 5 * of deciwals for labelling X-axis (<=7) = ? Number of pen color to be used ? 1 Is above infornation correct? YES Plot residuals vs predicted Y values. 98 PRSSENGER SERVICE TIME (EXRMPLE OF POLYNOMIAL REG.) (E 3 Q CO UJ a. a ui N a a cc a z (E cn 4 3L 2 1 -1 -2 -3 -4 -5 X X es m <s in 6) in eg in eg in G3 in «— (U OJ m m *- v m m x PREDICTED Y' Residual plots NO Op t j. on n under Return to BSDM 99 Example 4: Nonlinear Regression Twenty-five samples of human urine were obtained to determine if a nonlinear model could be developed relating Y = blood concentration of urine (micrograms/1000 cc) to X = time in hours. The data were entered from the keyboard. A "three-exponential" model was tried: Yhat = B0*exp( - B1*X) + B2*exp( - B3*X) + B4*exp( - B5*X) and 0.00001 was used as the convergence coefficient. Notes: 1. The initial estimates were chosen after some experimentation although the only effect that they have is in the speed of convergence. 2. Every iteration was printed. It is not necessary to have this done. 3. The residuals for the smallest time are larger than for T or X near 60 or above. Of course, the largest Y's are associated with the smallest X. Are you soins to use user defined transformation or do Non-linear regression ? (Y/N) N0 We have already prepared the file with the function and derivative. A r e ■.' o u u s in g a n H P I B Pr i n ! :± r 7 YES Enter select code, bus address (if 7.1 press C0NT) ? * DATA MANIPULATION i Enter DATA TYPE- - 1 Raw data data type required Mode nu fiber = '' 2 From mass storage 1 si d a •(• a s t o r e d o n t h « p r o q r a m > s <= <:: r a t c h f j. 1 e >' X) A T A ) ? y i— v-i- l " :l Data stored in program's storage medium from previous run. EXAMPLE i-UR INF/BLOOD CONCENTRATION Data file nana: DATA D a t a type i si Raw d a t a NuMber nf observci t j. on-s : 2S Number of variables! 2? 100 Variable nanes: 1 TIME(HR) 2, BLD.CONT Subfile si NONE SELECT ANY KEY Op i i. on n u fiber i a r m « t h o «:l f o r 1 1 s t i n a *:! g f ■ Select special function key labeled-LIST List all the data. In tabular form. EXAMPLE i -URINE/BLOOD CONCENTRATION Data i.vds is; Raw data Marioble 1 1 Variable # 2 <t:cme(hr) ) < BLD.CONT > OBSt 1 4 , 250 1165 ,700 p 7 , 5 851 .00000 3 10,80000 523, 000 00 4 12. 000 365. 00000 5 16.00 294 ,00000 6 23.80000 170 ,00000 ■7 27.8000 6 0,00000 e 35,300 00 81,00000 9 38.30000 20 ,0 00 00 10 45.30000 45 ,00000 11 51.3000 27 . 00000 .1 2 54.20 00 37 ,00000 1 3 5? , 80000 31,00000 14 64.25000 26 ,00000 IS 69.50 00 36, 000 16 78.200 00 i 8 . o n n 17 9 0.20000 10,000 18 100 , 00 00 8 , 2 i? 10 5.00000 13 .40000 2 108. 00000 17 . 40 00 21 114.00000 8 ,00000 /". !'... 120 , 00000 4 23 130 , 00000 6 .70000 24 142, 00000 6 , 70 00 o r:.; 154. 0000 5,80000 Option nu fiber ::: ~> SELECT ANY KEY Exit the List routine. 101 Option nuMber = ? ' v > u « b e r of t h e. d e o e n d a n t u a r i. a b 1 e Select special function key labeled-ADV STAT Remove BSDM medium. Insert the regression medium. Select non-linear regression. Specify blood content as Y. How f»any independent variaftles will be in tiie mhJcI" One independent variable. 1 Tiidspeiideri t variable nuwtief-s ( s eg a r at e <i by <:<!<"!«'• Specify time in hours as X. T '.: q h A ',' <v: I H f r ivi q t j, n r o ■'■ ;■'■ •:■■-' t ^ YES ************************************** * * * * * * * * * '.< ■■■ * * % £ * * * '--(: •!■: * * "*: * & * * * * * :■<■: * * * * * * * * * * * * NON-LINEAR REGRESSION ON DATA SET ■ URINE /BLOOD CONCENTRATION 'EXAMPLE i OF NON-LINE A!? REGRESSION) ** ******************************* to********************************************** w h e r e : I) e p e n d e n t v a r ,i. a b 1 e - < 2 ) E L D , C N T Independent var i.ab le. < s > -■■ <" .1. )TIME ( HP > # of oar-mieter";: in the Model < <~2 > '* 6 I s a p 1 o t o f t h e n o n - 1 i n e. a r r e a r e s s .i. o n d e s i r e d YES Request plot Plot on CRT NO But not on CRT. Plotter identifier strinS (press CONT if ' HPGL'J ? Plotter select <:: o d e , B u s ♦ --(defaults are 7(5) ? On plotter with select code = 7 and bus Is a auick plot desired '> code = 5. NO No quick plot. We will specify our limits. X-Min -■ ? 4 X ■"• « a x : ~ ? 1.60 Y-Min ■• ? "3 T-mx = '.' ii.70 Y - a x i s c r o s s e s X - a x i s a t X 4 X-axj. s crosses Y-axis at Y 3 Distance between X- ticks - 1.6 I) i s t a n c e b e t w sen Y - 1 i c ; = '■' Xmin = 4 Xmax = 160 Ymin = 3 Ymax = 1170 Xtic inverval = 1 6 Ytic interval = 1 20 With no decimal points for labelling. 102 1.?0 # of decimals for- labellina X-gx i. ■=:''< : "' 7 ' 1 * of decimal'- for label]. tno Y--a / i c ','■ Number of pen color to be used ? 1 S :> above information cor re •:.* '.' Beep will sound when plot donei then press CONTINUE File nane where subroutines are stored ? FONDER: INTERNAL Is function rtediuM placed in device? YES Is prograM Medium placed in device? YES E n t e r co n v e r n e n c '2 c o e f f i c i e n t ( e . q . 0,005 .> . !) i. > . i) i Convergence criteria on changes in all coeffi- Inj.tic.il est i. Mate for para water t 1. cients. Note .00001 is pretty restrictive. ? Initial estimates input at this point. 1202,336 1 1 > i t J. a 1 e s t i m a t e f o r p a r a m e 1 e r * 2 . i083 Initial est i Mate for parameter # 1 ? 40 . 33f->? Initial estimate for paramete r I- 4 o . tOR"? I n i t i a I e s t i m a t e for p a r a m e t e r t S V 31. . 461.9 Initial estimate for par am: t er I 6 1 . 6'7i/> Is the a bow; infornatinn correct? YKS IK********************************************************* 103 De 1 1 a (Con vergence cr iter i.a > = .1. THE INITIAL VALUES OF PARAMETERS ARE PARAMETER 1 = 1202,336 PARAMETER 2 = ,1083 PARAMETER 3 = 4 0.336" PARAMETER 4 - , 10 83 PARAMETER 5 ~- 31 46.1 ^ PARAMETER 6 = . 06716 Would vou like to print out every iteration on hard copy opt.!, on printer Not a good idea if many iterations are ex- J hS pected. Laics, May be lenqthv. A beep will, sound when done, Prevs ,C; ' > key to APnpT' ITERATION ESTIMATED PARAMETER MA! UFR ?;', S RF'VrDUAi (: i Calculations way be quite tiwe c onsoninq , A beep will sound when completed. , 10830 42560 , 6966977 16513 19113,9722052 . 13849 17230. 2902S8<. 12102 17131, 1484877 , 13013 17001 ,3543193 . 13435 16990, 4904512 ,13606 16989, 3523974 13672 169B9, 1984944 . 13698 16989. 1746607 13708 16989, 1708856 , 13713 16989, 1702861 . 13714 16989, 1701 889 , 1 3 7 1 5 16989, 170172!) .13715 1698'?, 1701669 Note: Estimated values for six coefficients followed by sum of squared residuals *******K*****>r************«^ THE ESTIMATED PARAMETER VALUES AFTER 13 ITERATIONS ARE : PARAMETER i= 1398.5719009 ( 1 . 3985719009E+03) PARAMETER 2= .1371535 ( i . 371S34796SE--0 i ) PARAMETER 3= 604.3657684 < 6 . 0436576836E+02) PARAMETER 4= .1371525 < 1 . 3715246328E--01 ) PARAMETER S~ 75.0763988 < 7 . S076398794E+Q1 ) PARAMETER 6= .0170560 < 1 . 7055987670E •02) THE INITIAL VALUE OF SUM OF SQUARED RESIDUALS = 42560,6966977 AFTER 13 ITERATIONS THE SUM OF SQUARED RESIDUALS- 16989,1701669 APPROXIMATE STANDARD ERROR FROM SQUARED RESIDUALS'* 29,9026228091 Plot regression curve on present GRAPH ? ^^ Plot curve to see how good the fit is. Same pen color? YES 1202,33600 31,46190 . 10830 ,00672 400 33670 1 1379, 0339 76. 16355 , 12722 ,02198 r;77 00409 o 1392,99446 71,83127 . 13353 ,01538 600 25867 3 1395, 63956 76 . 36979 . 14371 , 01725 603 73447 4 1397,91748 76.09567 . 14022 .01722 603 92050 q 1398,50753 75,57048 . 13844 ,01714 6 4 30809 6 1398,59945 75.28321 , 13768 01709 6 04 39161 7 1398,59229 75.15983 , 13736 ,01707 604 38569 8 1398.58144 75 , 1 0969 , .13724 , 01706 604 37522 9 1398,57589 75, 08959 ,13719 ,01706 604 36975 10 1398,57350 75.08157 . 13717 .01706 604 36737 11 1398,57252 75,07838 , 13716 .01706 604. 36639 12 1398,57212 75.07711 ,13716 .01706 6 4 , 3659^ 1 3 1398.57196 75,07660 . 13715 ,01706 6 04 . 36583 DONE ! ! ! i Note: Estimated v alufis fnr six rnpffi 104 BLOOD CONCENTRATION TIME(HR) Like to chanSe initial estimates and/or function HO <", r t> fonfidsn c >■; i.nte r •> a i ■■■■ o a o a r a »<■> r > : We are satisfied. C o i"i +' i. ci <■-'. n r: e <:: 11 <.■: f f i i:: i. « n t " <> "" ■ o n f i. •:.! ' ; Request confidence intervals. 105 If********************** ***:-** * * * :+; sit * * * :+: * * :*■ X< "* * * i« * H: * >!•; 4: * * ^ +: * * * * ^ •{■: *; ^ * *- * * ;K * * :^: * :4.. * :k 4: *: * * * * APPROXIMATE 95 7. CONFIDENCE INTERVALS ON PARAMETERS PARAMETER ONE-AT-A TIME C.I SIMULTANEOUS C.I . LOWER LIMIT 790.3196 .0762 -3.8858 -.0039 5 -33.5338 6 -.0073 **** *************************** ******************#**************** Residual analysis and /or ort>d ic t i o> i 2 3 4 UPPER 1 -IMIT 2006 .8242 .1981 1212 .6173 . 2782 183 .6866 .0414 LOWER LIMIT UPPER LIMIT 244.5233 2552.6205 .0215 .2528 -549.6815 1758.4130 -.1304 . 4047 -130.9917 281.1445 -.0292 . 0633 ********************* ********* YES P r i n t out r e s i d u a 1 s ? YES Study size and form of residuals. TABLE OF RESIDUALS STANDARDISED OBS* OBSERVED Y PREDICTED Y RESIDUAL RESIDUAL 8IRN.1>"' i 1165,70000 1188. 01983 -22. 31933 - ,74642 2 851 ,000 00 782, 09103 68,9 897 2,30445 ** 3 523,00000 517 ,81851 5, 18149 . 17328 4 365.00000 447,44910 -82,44910 ■■-? , 75725 ** 5 294,00000 280 , 31284 1.3.68716 , 45772 These two have 6 170. 00000 126,59139 43, 40861 1. , 45167 fairly large residuals. 7 60 , 00000 9 ,96313 -30 ,96313 -1 . 03547 8 81 , 000 56.93086 24. 06914 .80492 9 20 ,00000 49 , 54572 •-39 ^4572 - , 98806 10 45. 00000 38,68203 6. 31.797 ,21128 11 27.00000 33, 05932 -6,05932 -.20264 1.2 37, 0000 3 ,97073 6. 2927 ,20163 13 31 ,00000 27 . 62273 3 . 37727 , 11294 1.4 26. 00000 25, 39305 ,6 0695 , 02030 15 36,00000 23, 09053 12 . 9 0947 .43172 16 .1.8.00000 19.82514 ■•1,82514 ••- , 0610 4 17 10,00000 16,12840 -6, 1284 •- ,2 0495 18 8.2000 13,64085 -5.44085 -. 18195 1? 13.40000 12.52486 .87514 , 02927 2 17.40000 1 1 , 89978 5.50022 , 18394 21 8. 00000 10 ,74190 -2 , 74190 .... (19169 22 4, 00000 9 , 69684 -5.69684 •- , 19051 23 6.700 00 8, 17622 -1 , 47622 -. 4937 24 6,7000 6,66290 . 371 . 00124 25 5,80000 5 , 42969 .370 31 01238 D u r b i n ■ -Ufa t son Statistic: 2.57626883803 R e s i. d u a 1 p 1 o t s; YES Plot on CRT'? NO Plotter identifier strins (CDNT if *HPGL'> ? Plotter select code, Bus ♦ --■ (defaults are 7 Residual plots yes On external plotter 5) 106 P e s i cl u a 'I p lot opt .i. o n n o . for p 1 o t i: j. n q , X - r»i .i. n For plotting, X-mqx = f I) i s t a n '.:: e bet w e en X - 1 .i. c k s : = '' * of dec irtals for label lino X-axis (<=?> P Number of pen color to be used ? 1 I <3 a b o v e i n f o r n a t i on c o r r e c t ? YES Plot residuals vs time/sequence number. BLOOD CONCENTRATION (EXAMPLE 1 OF NON-LINERR REG. ) 3 2 n M (0 1 Id a a u PI N M O -1 a <r a z a: -2 cn -3 -4 -5 s x x _t X I X X » ' X v A H « in in (9 (XI in (VI Residual Plots ? NO Op t ion nu fiber ~ ? % SEQUENCE *' Exit residual routine. Return to BSDM 107 Example 5: Nonlinear Regression An experiment was conducted to determine the relationship between Y = elevation (in centi- meters) and X = distance from the summit of a hill. Thirty-four observations were entered from a mass storage device. After viewing the X-Y scatter plot, it appeared that it would be necessary to piece the model together. Hence, the following model suggested itself: Yhat = polynomial model of degree 2 if X=s65. = simple linear model if 65<X«sl25. = polynomial model of degree 2 if X>125. i.e., the model can be written as Yhat= A0 + Al*X + A2*Xt2 if X^65. = B0 + B1*X if65<X==125. = C0 + Cl*X + C2*Xf2 if X>125. or for the program's purpose: F= (P(l) + P(2)*X(l) + P(3)*X(l)T2)*(X(l)*s65) + (P(4) + P(5)*X(1))*((X(1)>65)AND(X(1)=£125)) + (P(6) + P(7)*X(l) + P(8)*X(l)t2)*(X(l)>125) Therefore, we have eight unknown parameters in the model to be estimated. 0.00001 was used as the convergence coefficient. The initial values were obtained by interpolating values on the scatter plot. The chosen values are: Initial Values: A0 = 1000 B0 =1200 CO =1826 Al=-1.0 Bl=-5.8 CI- -16.0 A2=-.2 C2-.046 After five iterations, the estimated coefficients give a Sum of Squares residual of about 295 and a very good fit as we can observe from the plot of the data and the estimated equation. Also, the residual analysis seems to suggest that the fit is quite good. Are you Joins to use user defined transformation or do Non-linear regression ? (Y/N) N0 Other printer selected. Are you using an HP IB Printer? YES Enter select code i bus address (if 7 » 1 press C0NT) ? 108 ***************************************************** # DATA MANIPULATION * **************************************************************** >,: **** ),c ********** Enter DATA TYPE: i Raw data (data type required) Mode ntinber = ? 2 From mass storage lis data stored on the prog raw's scratch file (DATA)' 1 NQ Data stored on a different medium so it must Data file na«e - '> be retrieved. LANDSCAPE: INTERNAL Was data stored by the BS&DM system ? YES Is data tied inn placed in device INTERNAL ? YES Is pro a ran n e d i. u m p 1 a c e d i n d e v i. c e ? YES PROGRAM NOW STORING DATA ON SCRATCH DATA FILE AND BACKUP FILE LANDSCAPE SEGMENTS DELINEATION Data f i. le nawe : L.ND120 ■ E8 , 1 Data type is: Raw data NuMber of observations; "54 Nuciber of variables; 2 Var iab le nti«es : i , DISTANCE 2. ELEUATIDN Subfile none beginning obser vn \ j, on ---n u«bt;r of obser mi ': i on ■■■.-. i TOP 1 ^5 2, BOTTOM i 6 1 ■'" SELECT ANY KEY Option nu fiber '<' 1 rntcr Method for listing data Select special function key labeled-LIST List all the data In tabular form 109 . ANDSCAPE SEGMENTS DELINEATION Data type Raw data Variable * 1 Variable * 2 (DISTANCE ) (ELEVATION > DBS* i ,00000 10 0.00000 2 5,00000 992.40 3 10.00000 985.40000 4 1 5 , 973,30 00 5 20 , 00000 963. 100 00 6 25, 000 952,90 000 7 30 .00000 939,60000 8 35.0 00 929,40000 9 4 0,00000 912.90000 10 45.00000 894,50000 ii 50, 00000 881,80000 12 55, 00000 864. 00 00 13 60,00000 832,90 00 14 65,00000 808.80 00 IS 70 ,00000 779 ,00000 16 75.00 000 757,40000 .17 80 .00000 727.60000 18 85.00000 691 .40000 19 90 . 00000 664.100 00 20 95,00000 633 , 21 10 0,00000 605.70000 22 105, 00000 577, 10 000 23 110,00000 549,800 24 115, 00000 518, 00000 25 120.00000 495, 10 000 26 125.00000 468 ,4000 27 130 ,00000 446,200 00 28 135,00000 421 ,40000 29 140 , 00000 4 3,00000 30 145, 00000 390 .9000 31 ISO, 00000 369.30000 32 155. 00000 356,6000 33 160 , 00000 347,70 00 34 165. 00000 340 ,10000 Ontion number - '' SELECT ANY KEY Exit List routine. Remove Basic Statistics Go to Regression program medium. Dot ion number -- V 4' S ti b f i 1 e Kent e r to i n r e s u b f i. 1 e s > ~- ? Non-linear regression Number of the dependent variable. ~ '> How Many independent variables will be in the Model? 1 Independent variable numbers (separated h\> commas) = ? 1 110 1 s a bo'je I n f or«ation c o r r e <:: t "> YES NON-LINEAR REGRESSION ON DATA SET' LANDSCAPE SEGMENTS DELINEATION ^ ^ * T "T* ^ M* ^ ^ * * * * * ^ * ^ ^ ^ ^ ^ ^ ^ *^ ^ ^ ^ n* 'I* ^ ^ ff* *- ^t/f*/p^fi^\Jy\^\flK/^^/f\^^i/f\ «*f\- <^ /jS /p .^S *p fl\ Jf* /ft ^ JH jpj^Jf^i^ ifv Jf\ %\ ^. /j\ /fi ^ Jf* .% Jfx Jf ^k /p. Jft /f\ ft /^ ^ --•-where: Dependant variable =•■ ( 2 >ELEUATION Independent uar lab le (s > ~- ( 1. ) DISTANCE : l of paracteters in the nodel<<=20) ? 8 Is a slot of the n on -linear regression desired YES Plot on CRT NO Plotter identifier strinS (CDNT if 'HPGL'J? Plotter select code,E<us* --(defaults are 7i5) ? I s a a u i c k p 1 o t d esire d Plot on EXTERNAL plotter NO No quick plot. We specify our limits. X-nin = '.' X-cmx = ? 165 Y--«in ■:-■ ■> 34 Y-wax -= ? i. n Y-ax U crosses X-axis at X = ? Kaxt s crosses Y-axis at Y = "i X40 D i. <-,• '< a n c: e b e t w e e n X - t i r k s - ? 1) i. s t a n c. e b e t w sen Y - 1 i c -• ? .1. tt (j f cl e c: i n a 1 s f » r 1 a belling X - a x i s ( < ~ 7 ) - '> {) # of decimals for labelling Y-axis ■■= "> Number of pen color to be used ? 1 Is above information correct f YES Plot shown below overlayed curve. Ill Beep will sound when pl-ot donet then press CONTINUE File name where subroutines are stored ? LANDER: INTERNAL Is data Medium placed in device INTERNAL ? YES Is prosiram medium placed in device ? YES Enter convergence coefficient <e.g. . 005, . 00i> ,000 i Supply initial estimates. Initial est incite for parameter # i ? 10 Initial estiwate for poraneier * 2 ■> -.1 Initial estiwate for parameter * 3 ? Initial estinote for paraweter * 4 V i.20 Initial estinate for parameter * 5 -S . 8 I n i t i a 1 e s t i m a t e f o r par a n e t e r # h ? 1.826 Initial estiwate for paraweter * 7 ? -.1.6 Initial estiwate for parameter # B ? . 46 1 s i. h « a b o v h i. n f o r m a t i o n t: o r r k c t V YES Del ta( Hon Mergence criteria)-- .000.1 THE INITIAL VALUES OF PARAMETERS ARE : PARAMETER 1 = 1000 PARAMETER 2 = -1 PARAMETER 3 =-.;.■' PARAMETER 4 = 120 PARAMETER 5 =-5,8 PARAMETER 6 = 1826 PARAMETER 7 =-.1.6 PARAMETER 8 = .046 Would you liKe to see every iteration ? YES 112 Culcs. May be lenqthv. A beep will sound when clont ITERATION ESTIMATED PARAMETER VALUES Pr k e y to ABORT! ;. RESIDUAL' t ine. -.1. Calculations May be quite 1000,00000 -5.80000 1826 1 994.39986 -5,7688:5 1796 2 997,48082 -5,76284 1798 3 997.51105 -5,76280 1807 4 997.51107 -5.76280 1823 5 997.51107 -5.76280 1826 DONE MM 4 "& *A/ ^t ^' 4 W 4 4 W 4f W ^ 4 ^ ^ 4 ^ 4 4" 4f 4 *it ^ *^ ^ ^" ^ "^ ^t 4' 'ilf "4f' 4 b!/ "A" "^ ' V ^k" ^t THE ESTIMATED PARAMETER VALUES AFTER 5 I c o n s u«ina 00000 -.69611 78585 -1 , 0858 11334 -1 .01126 15342 -1 .01126 356 08 -I . 01126 81849 A b e e p w i. 1 1 s o u n cl w h 1200,0 . 0460 11 84, 6 .0 446 1.184. , 0447 1184. ,0451 1184, , 458 .1.184,0 . 0460 First eight values per line PARAMETER PARAMETER PARAMETER PARAMETER PARAMETER PARAMETER PARAMETER PARAMETER 3 = 4- 5 = 6 = 8 = 997 - 1 , 1184, -5 1B2G, -16. 5110714 0112610 28 0357 0893939 7627972 8938B29 6126168 0460476 THE INITIAL. VALUE OF SUM OF S AFTER 5 ITERATIONS THE SUM OF APPROXIMATE STANDARD ERROR FR regression curve on present * r a p h ? - 20 00 -16. 00000 •■■■ . 03291. -16.20243 - , 02807 ■16,22046 - . 2804 -16,34364 -, 02804 -16,56441 - , 2804 -16 ,6115'' are the estimated coefficients. Last TERATIONS ARE : ( 9 ,9751107143E+02) <-i. 0112609889E+00) ( -2,803S7i4287E~02> < 1, 1840893939E+03) (-5 , 7627972028E+00) < 1 . 826S938829E+03) <-\ , 66 126 16824 E* 01.) ( 4,60476ii520E-02) ********************** QUARED RESIDUALS =■-• 169 SQUARED RESIDUALS^ 29 OM SQUARED RESIDUALS- en com pie ted 1693553.53 9313 6 9329 894 4 8939 8 8939 4 295,65 is sum of squared residuals. 339.21 295,84 295 , 74 295 ■ 65 ****************** 3553 . 53 5 ,649036151 3. 372.10865409 Plot YES Same YES Plot curve or graph. pen color 113 LANDSCAPE SEGMENTS DELINEATION 940 640 z o M (E > U _l u 740 640 540 440 340 i i 1 1 _** CO co to CO 0) en tu m m CO DISTANCE Like to change initial estimates and/or function ? NO Ar« confidence intervals on par a Met'; NO a s s i r- 1: c! 114 Residual analysis and/or prediction? YES P r .i. n t out res i d u a 1 <■; 1 YES TABLE OF RFSIDUALS STANDARDIZED DBS* OBSERVED Y PREDICTED Y RESIDUAL RESIDUAL i 10 0,00000 997.51107 2 , 48893 7380 9 p 992,40000 991 ,75387 , 646.1.3 , 1916.1. 3 985,40000 984 , 59489 .80511 . 23876 4 973,300 00 976, 3412 -2.73412 - . 810 80 5 963, 10000 966, 07157 -2 , 97157 -■ , 83122 6 952,900 954,7 0723 -1,80723 - . 53593 '? 939,6000 941 ,94110 -2,34110 -.69425 8 929,40 000 927,773.1.9 1,62681 ,48243 9 912,90000 912,20349 .69651 20655 1.0 894,50000 895,23201 -.73201. - ,2170 8 1.1 881 ,80000 876 ,85874 4 , 94126 1 , 46533 12 864, 0000 857, 08368 6, 91.632 2 , n 5 1 n 4 13 832,90000 835 , 90684 -3 , 0684 -.89168 14 808.800 00 813,3282! -4,52821 -1 . 34284 15 779, 000 780 . 69359 -1 . 69359 ■ 5 223 1.6 757,40000 751 ,87960 5.52040 1 , 63708 1 7 727.600 00 723 . 06562 4 , 53438 1 ,34467 1 8 691 ,40000 694,25163 -2, 85163 ■••• , 34565 1 9 664, 10 000 665 .43765 -I ,33765 •■•■ , 39668 20 633, 0000 636 . 62366 -3 , 62366 -1 , 07460 21 605,70000 607,80967 -2, 10967 - 62562 22 577, 1.000 578 , 99':': 69 ••■! . 89569 - • 5621 '-' '.'■> " :i , 549,800 00 550 , 18170 ■- , 38170 - 11319 24 518,0000 521 ,36772 ...T '^77^ .... Q9870 25 495 ,1000 492 55373 2 54627 7551!) 26 468.40000 463 , 73974 4 , 66026 5 , '*< 2"0 27 446 ,200 445 , 45333 74167 21994 28 421,40000 4':>3 , 4 83.3 -2 0833 .... roq 5 7 29 403 , 00000 403 ,66071 -.66071 -■ 19^94 3 390 ,90000 386,21548 4 , 68452 1 , 38920 31 369 . 300 00 371 , 07262 ■-1 . 77262 - , 52567 32 356,600 333.23214 -1 , 63214 - . 484 1 33 347 700 347 . 694 5 005^5 no 1 77 34 34 1.0000 ■j ~i p 4 c; £> "z -t . 641 67 19 2 - SIGN IE #* Durb in -Watson Statistic: ■ 1 ,51.322482175 Test statistic for autocorrelation of residuals Special tables are necessary. R e s i d u a 1 plots'? YES Residual plots Plot on CRT? NO Plot on external plotter Plotter identifier strinS (CONT if 'HPGL') ? P ] o t t « r •= e 1 e c t code, Bus \ ~ (defaults are 7)5) R (> s i d i.) a 1 p i o t o p t i o n n o , 1 Plot residuals vs time square F o r p lotting, X - m i n = '•' For plotting, X-mux - ? 115 35 Distance between X- ticks = ? # of decinale for labelling X-axis (<-»?) = '> Number of pen color to be used ? 1 Is above information correct' YES LRNDSCRPE SEGMENTS DELINEATION 3 d a l-t CO , u l at y N M O 1 a. ~1 (E a a: -2 H (0 -3. -4. -5 XX X X in x - xX X XX x x „ in in in "SEQUENCE ♦ ' Residual plots ? YES Would you liKe to plot on CRT ? NO Plotter identifier strins (CONT if *HPGL') ? Plotter select codei bus * (defaults are 7 » 5 ) ? 116 R a <•> i d u a 1 plot option no For plotting, X-Min r;: ? 30 For plotting, X-eiax - ? 1. Di. stance between X -ticks -- ? iOO # of decimals for labelling X-axis (<=?) Number of pen color to be used ? 1 !1 <:i a b owe i n f o r m a t i o n c o r r e <:; t ? Plot residuals vs predicted Y LRNDSCflPE SEGMENTS DELINERTION 4. 5 2 a 1 u DC a u N a: a a: -2 H cn -3 -4 -M 1 X 6) eg m x X SOS S S S <«■ in U) X xX s S s s s s s s IV- OD <n s Residual plots N PREDICTED Y' ND 0i» I i. on no fiber Exit residual routine. Return to BSDM 117 Example 6: Standard Nonlinear Regression In this example, standard nonlinear models are fit to the data from Example 4. Are you SoinS to use user defined transformation or do Non-linear regression ? (Y/N) NO Are you using an HPIB Printer? YES Enter select oodet bus address (if 7tl press CONT) ? X**************************************************^ * DATA MANIPULATION * Enter DATA TYPE: Mode nu fiber ~- ? Raw data (data type required) From mass storage Is data stored on the prograw's scratch -Tile (DATA)? YES Previously stored on program's scratch file called DATA. URINE/BLOOD CONCENTRATION Data file nane: DATA Data type is: Raw data Nu fiber of observations; Nufiber of variables: "3 Variable nawes: i, TIME(HR) 2. BIL.D.CONT Subfiles: NONE Same data set which we used for nonlinear regression. SELECT ANY KEY Option nufiber = f 1 Enter Method for listing data: 3 Select special function key labeled-LIST List all the data In tabular form 118 URINE/BLOOD CONCENTRATION Data type is: Raw data Variable # 1. Variable # 2 (TIME(HR) ) (BLD.CONT ) OBS* 1 4,25000 1165,70000 2 7,50000 851 .00000 3 10 .80000 523.00000 4 12.00000 365,00000 5 16.00000 294.00000 6 23.80000 170 .00000 7 27.80000 60.00000 8 35.30000 81 .00000 9 38,30000 20,00000 iO 45,30000 45,00000 ii 51 .30000 27,00000 12 54.20000 37 . 13 59.80000 31.00000 14 64.25000 26. 00000 IS 69.50000 36.00000 16 78,20000 18,00000 17 90.20000 10.00000 18 100, 00000 8,20000 19 105, 00000 13.40000 20 108.00000 17,40000 21 114, 00000 8,00000 22 120.00000 4,00000 23 130,00000 6,70000 24 142. 00000 6,70000 25 154,00000 5,80000 Option nuMber = ? SELECT ANY KEY Exit List routine. Option nunber = ; ? Nu fiber of the regression nodel Should fitted fiodel include intercept terfi 1 NO Nunber of the dependent variable --- 1 Nunber of the independent variable 1 ]!<•> above inf or fiat ion correct? YES Select special function key labeled-ADV STAT Remove BSDM medium. Insert regression medium. Select standard non-linear regression modes. Mixed exponential of form: Y = A«Exp(B«X) + OExp(D*X) Note: In the non-linear regression exam- ple we specified 3 exponential terms. Y = blood count X = time in hours Displayed on CRT. It is correct. 119 X-MdX := 160 Y-m in = ? Y-MQX = ? REGRESSION MODELING ON DATA SET: URINE/BLOOD CONCENTRATION ********************************************** — where: Dependent variable = <2)BLD,CONT Independent variable = (l)TIME(HR) THE STANDARD NON-LINEAR REGRESSION MODEL SELECTED = Y=A*EXP (B*X>+C*EXP <D*X> Is a plot of the regression desired? YES Like to see a plot Plot on CRT N0 But not on CRT. Plotter identifier strinS (CONT if V HPGL') ? Plotter select code,Bus# -(defaults are 7.5) ? On an external plotter at 7,5 X-Min - ? Specify plotting limits. 1200 Y-axis crosses X-axis at X = ? X-axis crosses Y-axis at Y = ? Distance between X-ticks = ? 16 Distance between Y-tic = 130 * of deciMals for labelling X-ax is< < =7) - ? ♦ of deciMals for labelling Y-axis - ? Number of pen color to be used ? 1 Is above inforMation correct ? YES Is plotter ready ? YES Are the values of the initial estinates proper? As shown on CRT and printed out below. YES 120 ************************************************ DeltafConuergence criteria) ■= .05 THE INITIAL VALUES OF PARAMETERS ARE : PARAMETER i = 334.489319026 PARAMETER 2 =-3 , 26684362156E-02 PARAMETER 3 = 33.4489319026 PARAMETER 4 =-1 , 63342i8i078E-02 Calcs. May be lengthy, A beep will sound when done, Press 'NO' key to INTERRUPT I CALCULATIONS STARTED ON 0/0 AT 0:0 ITERATION A 334 ,48932 1 767,19521 1593,74645 3 1854.77884 4 1974,36951 r~ 2008,32686 6 2003.21510 ESTIMATED PARAMETER VALUES B C S.S. RESIDUALS 3267 33 , 44893 09251 2 1 1 ,31532 18542 335 ,76599 13293 214 , 4524 14275 12 ,85302 13849 78 ,90472 1369? 73 ,85445 D 01633 03542 3753 3737 02902 1868 01677 1137486, 330889, 82374, 39473 , 17809, 17060, 16989, 3294 6618 840 4982 6312 8039 6350 DONE' ! ! ! ***********************************************************************'********* THE ESTIMATED PARAMETER VALUES AFTER 6 ITERATIONS ARE ; PARAMETER 1- 2002,9416350 ( 2 . 029416350E + 03) PARAMETER 2= -,1371748 ( -1 . 3717475809E-0 1) PARAMETER 3= 75,2098521 ( 7 , 52098521 03E + 01 ) PARAMETER 4= -.0170887 ( -1 , 7088737873E-02 ) ******************************************************************************** THE INITIAL VALUE OF SUM OF SQUARED RESIDUALS = 1137486.32942 AFTER 6 ITERATIONS THE SUM OF SQUARED RESIDUALS- 16989.1765347 APPROXIMATE STANDARD ERROR FROM SQUARED RESIDUALS- 28.4430730832 ******************************************************************************** Should regression line be plotted on seme graph ? Note: These results in terms of the sum of squared residuals are very close to the non- linear regression example with two more Same pen color ? parameters. YES YES 121 URINE/BLOOD CONCENTRATION 1200 r ; 1060 - 960 - 840 - H Z o • 720 - 600 - a m 480 360 240 120 TIMECHR) New initial estimates arid/or convergence criteria ? NO Are confidence intervals on parameters desired ? Satisfied with results. YES Why not get confidence intervals Confidence coefficient for confidence interval on par acie ters ( e , a . 90,95,99) = 95 X CONFIDENCE INTERVALS ON PARAMETERS PARAMETER ONE-AT- •A TIME C.I , SIMULTANEOUS C.I , i 2 3 LOWER LIMIT 1818,4543 -.1593 -42,6712 UPPER LIMIT 2187,4289 -.1150 193.0909 LOWER LIMIT 1703.9352 -.1731 -115,8451 UPPER LIMIT 2301,9481 ■■-. 1013 266.2648 4 -.0433 .0092 -.0596 .0255 122 Residual analysis and/or prediction ? YES Print out residuals? YES Residual analysis TABLE OF RESIDUALS Residual Sy.X STANDARDIZED OBS# OBSERVED Y PREDICTED Y RESIDUAL. RESIDUAL SIGNIF i 1165.70000 1188.03381 -22,33381 -.78521 2 851 .00000 782. 07774 68,92226 2.42317 ** 3 523.00000 517.80218 5,19782 , 18274 4 365.00000 447,43453 -82.43453 -2 . 89823 ** 5 294. 00000 280.30783 13,69217 .48139 Note: 6 170. 00000 126.60208 43,39792 1 . 52578 Two large •7 / 60 . 00000 90.97711 -30 .97711 -1.08909 residuals. 8 81 ,00000 56.94431 24, 05569 , 84575 9 20 .00000 49.55743 -29 , 55743 -1 ,03918 10 45.00000 38.68823 6.31177 .22191 ii 27.0 0000 33.06036 -6.06036 -.21307 12 37. 0000 30.96936 6. 03064 ,21202 i.3 31 . 00000 27.61708 3.38292 .11894 14 26. 00000 25,38439 .61561. . 02164 15 36.0 00 00 23,07884 12,92116 ,45428 16 18, 00000 19,80955 -1 ,80955 -.06362 17 10, 00000 16, 10942 -6, 10942 -.21479 18 8,20000 13.62043 -5,42 043 -, 1905V 19 13.40000 12,50406 , 89594 , 03150 20 17.40000 11 ,87886 5,52114 , 19411 21 8.00000 10 ,72091 -2.72091 ■-. 9566 22 4, 00000 9,6760 -5.6760 », .19956 23 6,70000 8, 15597 -1 .45597 --. 05119 24 6.70000 6,64379 . 05621 . 0198 25 5,80000 5,41199 ,38801 , ni364 Durbin-Watson Statistic: 2.57642711573 Residual plots? YES Plot on CRT? NO Plotter identifier strins (press CONT if V HPGL') ? Plotter select code, Bus * = (defaults are 7>5) Residual plot option no. - ? 1 For plotting, X-nin = ? for plotting, X-ciax = ? Specify limits for residual plot verses sequ- ence #. Distance between X-ticks = * of decimals for labelling X-axis <<=7) = ? Number of pen color to be used 1 Is above information correct? YES URINE/BLOOD CONCENTRATION 123 3. 3 2 a M 0) 1 u a: a N M a a: -1 (C a z -2 ^- <n -3 -4 -5 <S -H g X I » X I v X-X— « x * x x x x * X x in 69 in (VJ in Residual plots ? NO Option number = ? v SEQUENCE *' Exit residual routine. Number of the regression Model -• ? Should fitted model include intercept tern ? YES Number Df the dependent variable = ? Standard non-linear regression models. This time with intercept term. Mixed exponentials This time with an "intercept" term. Y = A*EXP(B*X) + C*EXP(D*X) + E Number of the independent variable = ? Is above information correct? YES 124 «W *X* W ^ J** ^ >£■ ^ ^ »V W ^ ^ \t ^ W ■A' 4 4f ^ ^ J/ ^ ^ si 1 ^ ^ "-if *&/ 4r & >^^\^\if\^^ii/^\i/ -j/ ^/ ^jj- u." ^ ^ *Ji" ^ ^V ^ '4/ '4/ *4f '4t °4/ *4/ "Jf" W W ^V *4f '4/ *4f *ds "J/ ■Ji' ^X r "A - ^ ^ "A - *& v Jf ^ st sir , if ^A* ^ REGRESSION MODELING ON DATA SET: URINE/BLOOD CONCENTRATION ******************************************************************************** --where: Dependent variable =■ (2)BLD.C0NT Independent variable = (i)TIME(HR) THE STANDARD NON-LINEAR REGRESSION MODEL SELECTED =■ Y=A*EXP ( B*X )+C*EXP ( D*X ) + E IN RADIANS Is a plot of the regression desire d '• NO Are the values of the initial est incites proper? No plot this time. YES **************************************************** Delta (Convergence criteria)- OS THE INITIAL VALUES OF PARAMETERS ARE : PARAMETER i ~ 429.35223400? PARAMETER 2 =-.0428279809939 PARAMETER 3 = 42.9352234007 PARAMETER 4 =-.021413990497 PARAMETER 5 = 3.92 Would you liKe a hard copy of e u e r y iteration ? YES Calcs, way be lengthy. A beep will sound when done. Press 'NO' key to INTERRUPT ! CALCULATIONS STARTED ON 0/0 AT 0:0 ITERATION ESTIMATED PARAMETER VALUES S.S. RESIDUALS 429.35223 3,92000 885, 18106 36.47985 1286.23842 20,86657 1243,85321 18,44537 -.04283 9863 -.12132 10655 42 , 93522 211 , 11731 604,50758 824,55159 02141 909776. 40446 ,0 9760 278063.40271 ,17961 50454.00143 .18667 18850.77414 DONE MM ******************************************************************************** THE ESTIMATED PARAMETER VALUES AFTER 3 ITERATIONS ARE : PARAMETER 1- 1218.8442529 ( 1 . 218B442529E + 03 ) PARAMETER 2= - . 1063836 ( - 1 . 0B38358B05E-01 ) PARAMETER 3= 860.7805920 B , 6078059213E+02) PARAMETER 4~ -.1848468 (-1 . 8484679940E-0 1 ) PARAMETER 5= 17,7594408 ( 1 , 7759440852E+01 ) J|\ >f* ^£ J^ ^^ Jf» J(C JfC J|C ^C ^t J|( JfC Jf( J|C Jft Jf( j^l JfC Jft JfC J(t jp 3fC Jft Jft J|C jft jp J|C jfs JfC JfC JfC jjs J|C JfE J(( JfC 3fC 3JC Jfk Jf£ J(C J^t 3ft 3ft Jft J^t JfS Jf( Jft JJ? Jft jp J|( Jft Jft Jf( JfC fll J(n Jf\ ^^ ^^ ^t ^t J|C J^ ^- ^» -^ -T 1 t^ 'T^ '■^ -^ -t^ -^ -t^ THE INITIAL VALUE OF SUM OF SQUARED RESIDUALS = 909776.404501 Not as good AFTER 3 ITERATIONS THE SUM OF SQUARED RESIDUALS* 18803.5777771 as before. APPROXIMATE STANDARD ERROR FROM SQUARED RESIDUALS= 30,6623366503 ******************************************************************************** New initial estimates and /or converse nee criteria ? NO Are confidence intervals on paraMeters desired v 125 YES Confidence coefficient for confidence interval on par ane t e.r s (e . g , 90,95,99) = 95 95 7, CONFIDENCE INTERVALS ON PARAMETERS SIMULTANEOUS C.I, ARAMETER ONE-AT-A TIME CI, LOWER LIMIT UPPER LIMIT i 631,4045 1806,2840 2 -. 1322 -.0806 3 228, 0449 1493,5163 4 -.31S5 -.0542 5 1,8285 33,6904 LOWER LIMIT 182. 0381 -.1519 -255.9710 -.4154 -10,3579 UPPER LIMIT 2255,6504 -.0609 1977,5322 ,0457 45,8768 ^/ *&/ W ^ W ^ ^ ^ W W W ^t W ^f ^ W W >t *& ^ ^ ^ 'At ^ 4r ^ 4 ^r ^ ^ W ^ ^ ^ ^ ^ "^ ^ W ^ *4f ^ ^^^^ ^ ^ -j/ j* -^ <Af ^ -j/ ^ ^ ^- ^ ^ j/ o/ ^ ^l- ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ O/ ^ ^ ^K J^ ^ ^ ^ ^ ^ ^t ^ ^ ^ ^ ^ ^ J^ ^ J^ ^ ^ ^ ^ ^ ^ ^ ^L ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ J^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ *f* ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ Residual analysis and/or prediction ? YES Print out residuals? YES TABLE OF RESIDUALS STANDARDIZED iSt OBSERVED Y PREDICTED Y RESIDUAL RESIDUAL SIGN IF 1 1165,70000 1185.65897 -19.95897 - .65093 2 851 , 00000 781,76841 69.23159 2,25787 ** 3 523.00000 521.01910 1 .98090 . 06460 4 365. 00000 451.45738 -86.45738 -2,81966 ** 5 294.00000 284.66099 9.33901 .30458 6 170.00000 125,23916 44,76084 1,45980 7 60 .00000 86,12756 -26,12756 -.85211 8 81.00000 47,53330 33.46670 1. 09146 9 20. 00000 39.20569 -19.20569 ■- . 62636 10 45, 00000 27.79845 17,20155 ,56100 11 27, 00000 23,02251 3,97749 , 12972 12 37, 00000 21 .61557 15,38443 .50174 13 31 , 00000 19.87723 11 , 12277 .36275 14 26, 000 19.07606 6.92394 ,22581 15 36, 00000 18.51147 17,48853 ,570 36 16 18, 00000 18.05704 -.05704 -. 0186 17 10 .00000 17.84239 -7 . 84239 -.25577 18 8,20000 17.78867 -9,58867 -.31272 19 13,40000 17.77661 -4.37661 ■-, 14274 20 17,40000 17.77192 - .37192 -. 012.1.3 21 8,00000 i 7, 76603 -9.766 3 -.31850 22 4, 00000 17,76292 -13,76292 » . 44885 23 6.70000 17.76064 -11. 06064 -.36072 24 6.70000 17.75978 -11. 05978 -.36070 25 5.80000 17.75953 -11 .95953 -.390 04 Durbin Watson Statistic: 2 . 36053763341 Residual plots? NO Option riuciber = ? 7 Return to BSDM 126 127 Statistical Graphics General Information Object of Program This group of nine programs has been developed to allow you to quickly get a graphical representation of your data with a minimum number of questions on the CRT screen or an HP-IB Peripheral Plotter. Because of the length of the programs, two discs are used to hold the Statistical Graphics Routines. The entry to every program requires that you specify only the variables to be used and how subfiles are to be treated if they exist. From here on, all plotting parameters are determined by the program, and a plot may be constructed immediately by selecting the plot option from the plotting characteristics menu. Once the data has been specified, you have the option of changing nearly all the plotting parameters in order to construct a more personalized plot. This is done by selecting the option from the plotting characteristics menu. Any time new variables are defined by selecting the "RESTART" option from the menu, all previous parameters that had been defined are reset to a default value for this particular data set. In order to save the plotting characteristics you have specified, select the store option available in the menu. This stores the plotting characteristics out on another file of you choice. Then, after you select the restart option, you can retrieve these characteristics by selecting the load option. Special Considerations 1. Every time you select a graph type, the CRT is declared as the standard plotting device. This unit may be changed by selecting the "Select Plotter" option from the plotting characteristics menu. 2. Every program begins its execution by reloading the data contained in the "DATA" file. In the case of the NORMAL PROBABILITY PLOT and WEIBULL PROBABILITY PLOT, the file "DATA" is reloaded every time the "RESTART" option is selected. 128 The "RESTART" option always initializes the plotting parameters to default values as follows: • The axes labels default to the name of the variable being plotted. • The graph title contains the first 33 characters of the data set title. If a subfile is declared, the graph title preceded by the 10-character name given to the subfile. • The plotting symbol is a plus sign. • Pen numbers are set to 1. • The axis parameter is wide enough to contain the data set and has 10 equally spaced tic marks with every second tic mark being labeled. • In the special case of the log axis, only complete cycles are plotted on a log scale might be scaled so that it fits in an entire cycle. • The graphics device used for plotting is reset to CRT. Note After selecting the "OVERLAY" option, new data may be plotted on the previously constructed graph. But, the default values will be in effect for pen number and symbol. These may be changed by select- ing the "SELECT PLOTTER" and "SELECT PEN NUMBER" op- tions from the Plotting Characteristics menu. 4. Whenever the program identifies an incorrect response, the question is asked again, until the correct response is given. 5. Most plotting symbols are centered on top of the point they are designating. For some special characters, like the period and comma, the symbols are plotted in a lower posi- tion. 6. The graphics programs only allow up to six decimal places for labeling the axis tic marks. For data that would need more, it is suggested that it be transformed. 7. Each program handles missing values in a different way. See the individual programs for details. 8. When asking for labeling information, an error 18 will occur if the label is too long. To recover, shorten the label and re-enter it. 9. Do no press the "RUN" or "SHIFT-PAUSE" (RESET) keys unless it is necessary. The "RUN" key erases all variables, and RESET may erase memory. 10. To prevent a graph segment from being plotted, assign a pen number of - 1 for the CRT or for an external plotter to that segment using the select pen numbers option. Note Statistical Graphics may be entered from any of the other Statistics packages by selecting the Advanced Statistics option. Once in the Statistical Graphics package, select the type of plot you wish to do from the menu provided. 129 Common Plotting Characteristics The following options are available for all nine of the plots, so their description and operation are explained in this section. There are slight deviations in the way some of these options work for the different plots. These differences are explained in the sections that describe each plot. It is recommended that you read through the section for the particular plot you wish to do before using the program. Not all plotting characteristics can be changed in each program. RESTART When this option is selected, all plotting parameters for the data set are reset to the default values which the program has determined for the data set. At this time a new variable to be plotted may be selected. PLOT This option plots the variable(s) being considered. The plot will be done on the CRT if no other device has been specified. If you have not specified any plotting parameters, the ones determined by the program are used, otherwise, the plotting characteristics you specified are used. You may choose whether or not to connect the points on most of the graphs, and whether or not to put grid lines on the graph. X-AXIS This option allows you to designate the scale for the x-axis. You determine the minimum x value, the maximum y value, the distance between the tic marks on the axis, and how many places after the decimal point you want printed. Since complete cycles on the x-axis are required by the semi-log, log-log, normal and Weibull plots, this option may not be used in those routines. Y-AXIS This option allows you to designate the scale for the y-axis. You determine the minimum y value, the maximum y value, the distance between the tic marks on the axis, and how many places after the decimal point you want printed. Since full cycles are required on the y-axis by the log-log and Weibull plots, this option may not be used in those routines. LABELS This option allows you to change the labels of your graph. You have an opportunity to change the x-axis label, the y-axis label, and the title of the graph. SYMBOLS This option allows you to change the symbol used to designate the points on the graph. If you do not want any symbol use a blank which is designated by " ". Dump Graphics On CRT This option prints the most recent CRT graph on the printer. This option may be used only if your printer has graphics capabilities (e.g. 2671G, 2631G). 130 SELECT PLOTTER This option allows you to select the plotting device on which you wish to have the plot drawn. You may have the plot done on the CRT or an external plotter. You will need to input the select and bus codes. You will also need to input a plotter identification string. SELECT PEN COLOR This option allows you to select the pen number you wish to use for plotting your graph. The pen number used may be changed for axes and numeric labels, grid lines, labels and points. OVERLAY This option, when available, allows you to add another plot of the same type with new vari- ables on the previously constructed plot. The plotting limits will remain as you have specified. STORE This option allows you to store the plotting characteristics that you have specified so that they may be retrieved at a later time. To do this you need to specify a file name and where you wish to store the information. LOAD This option allows you to retrieve the plotting characteristics that were stored previously for this type of plot. You need to specify the name of the file and where it was stored. The program will then list the stored plotting characteristics. RETURN This option returns the program to the main STATISTICAL GRAPHICS MENU. Time Plot Object of Program This program plots any variable in increasing units of time or sequence number. This plot is useful in determining the effect that time/sequence may have on a variable. The program allows the initial time to begin the plotting and the time period between points to be set by selecting the "X AXIS" option. If the plot option is selected first, the program defaults to a starting time of 1 and time increment period of 1. Special Considerations 1. Missing values are not plotted. The value at this time period is left blank. 2. When doing an overlay of the data, the initial time and time increments are 1 unless changed by selecting the x-axis option. Once the values have been changed, they retain the new values until they are changed again. 131 Special Plotting Characteristics X-AXIS This option allows you to determine the scale for the time axis. You need to specify the minimum and maximum time values, and the distance between tic marks. In addition, you need to specify the initial time for beginning series, the point in time that the plotting begins, and time increments between points, how much time passes between each plotted point. OVERLAY This option allows you to plot another variable over an already contructed graph. References 1. EXPLORATORY DATA ANALYSIS, John W. Tukey; 1977; Addison Wesley. 2. "A Review of Some Smoothing and Forecasting Techniques", T. J. Boardman and M.C. Bryson, Journal of Quality Technology, Volume 10, Number 1, January, 1978. Histogram Object of Program This program creates a histogram with up to forty cells. For every data set, the sample mean, the sample variance, the number of cases used to calculate them, and the cell statistics will be printed. Different histograms may be created by specifying the number of cells to be used, and the cell locations, or by specifying the number of cells, the location of the first cell, and the cell width. These specifications may be given by selecting the "CELL LIMIT" option from the Plotting Characteristics menu. A normal curve overlay and the corresponding Chi-squared goodness-of-fit statistic may be obtained by selecting the "NORMAL CURVE OVERLAY" option from the Plotting Charac- teristics menu. Special Considerations 1. Missing values are not considered in any calculation, and are not considered in con- structing any cell. 2. A maximum number of forty cells may be obtained. 3. At least four cells are needed to perform a chi-squared goodness-of-fit test. Specicil Plotting Characteristics CELL LIMITS This option allows you to specify the cell size for the histogram. There are two ways of doing this: 1. Enter the number of cells (greater than 1 but not more than 40) and Enter the minimum cell value and the maximum cell value that should be used. 2. Enter the number of cells (greater than 1 but less than 40), and Enter the mimumum cell value and the width of the cell. 132 The program will then give you a list of the number of cells, their minimum and maximum bounds, and the number of observations in each cell. NORMAL CURVE OVERLAY This option does a chi-square goodness-of-fit test of the data. In order to do this at least four cells must be specified; if four cells have not been specified, an error will be printed. The descriptive statistics for each cell will be printed. The contributions to the chi-squared statistics are added together to get the final value. The cells on the tails are collapsed together until an expected frequency of at least three and less than seven is found, and then the contribution is calculated. If, after collapsing the end cells to get high enough frequencies, the number of terms in the contribution of the chi-squared value go below four then another error will be printed. Once this is done the normal curve for the desired plot is plotted over the histogram. Methods and Formulae X, = ith observation of the selected variable that is not a missing value N = number of valid observations X= 2* x/n n / n \ : -Xv-(2*) i = l N i = l 7 Variance = JL, X, z - V^ Z, X, / /N N-l Normal Curve overlay = 100*(Cell width)*(EXP((X-X) 2 /(2*Variance)) * 2 ,r V 2iT*Variance # cells (Observed frequency in cell i - Expected frequency in cell i) (Expected frequency in cell i) df = (# of cells) -3; because 1 degree of freedom is lost for number of cells, 1 for the estimated mean, and one for the estimated variance. The expected frequency of cell i = area under the normal curve overlay which would fall in cell i is calculated by determining the left side of the cell i(A), and the right side of the cell i(B) and finding 133 ( B-Xbar \ / A-Xbar \ 4> \ standard deviation ) — 4> I standard deviation ) Then use the following approximation for the area between A and B in a standard normal. 4>(X) = 1-Z(X) (bit + bz^ + ba^ + b^ + bgt^ + EfX) where | E(X) | <7.5*10~ 8 t = (l + .231649X)~ 1 bi=. 31938153 b 2 == -.35656378 b 3 = 1.781477939 b 4 = 1.821255978 b 5 = 1.330274429 for X>0 andl-$(|X|)forXs=0 Z(X) = exp(-x 2 /2)/V2ir To calculate the right tailed probability value associated with the Chi Square value we use P(X 2 v>calculated value) = i-{[-|^p(-x 2 '2)]/r((v + 2 )/ 2 ))}.c c-i + i i* r=i (v + 2) (v + 4)...(v + 2R) where X 2 is the calculated value v is the degree(s) of freedom 7 (.) is the standard gamma function y{.5) = .88626925 The sum is calculated until the percentage of change between two consecutive sums is less than .000001 or R = 40. The number of cells being used defaults to the value given by the closest integer of the function: [1 + (3.31og 10 (Number of valid observations))] References 1. An Introduction to Statistical Methods and Data Analysis, Lyman Ott; 1977; Wads- worth. 2. Statistics for Modern Business Decisions, Second Edition, Lawrence Lapen; 1978; Har- court, Brace, Jovanovich. 3. Statistical Analysis for Decision Making, Second Edition, Morris Hamburg; 1977; Har- court, Brace, Jovanovich. 4. Fundamental Statistics for Business and Economics, Fourth Edition; Neter, Wasserman, and Whitmore; 1973; Allyn and Bacon. 5. Handbook of Mathematical Functions, Abramowitz, Stegun; Fifth Printing; 1965 Dover Publications. 134 Normal Probability Plot Object of Program This program creates normal probability paper, orders the data, and then plots the data on the paper. This plot may be used to indicate if the data set may have come from a normal distribution. If a straight line can be made to fit the plotted points, then the data may come from a normal distribution. Special Considerations 1. Missing values are eliminated from the data, which effectively makes the data set one smaller for each missing value. 2. When plotting more than a hundred points, it is suggested that the period be used as the plotting symbol. This allows for a more even line. Note that the period is plotted lower than the actual value of the point. 3. A maximum of 999 points may be plotted on the graph with the empirical distribution used by the program. Special Plotting Characteristics LABELS This option allows you to change the labels for the y-axis and the title, but not the x-axis. OVERLAY This option allows you to plot the normal probability of another variable over the already existing graph. Methods and Formulae Empirical Distribution Function (EDF) X, is the i sorted value in the data set. i can go from 1 to N. N is the number of non-missing values in the data set. EDF(Xi) = i/(N + 1 ) Cumulative distribution function (CDF) for plotting and scaling the X axis is done by deter- mining the EDF(Xi) and then determining X p . x„=t- c + c 1 t+c 2 t 2 P l+d,t + d 2 t 2 + d 3 t 3 where t= Vlog e (l/(EDF(X)) 2 c = 2.515517 cj = .802853 c 2 = . 010328 d 1 = 1.432788 d 2 = . 189269 d 3 = . 001308 135 References 1. Probability Plots for Decision Making, James R. King; 1971; Industrial Press. 2. "Weibull Probability Papers", Wayne Nelson and Vernon C. Thompson, Journal of Quality Technology; Volumn 3, Number 2, April 1971. Weibull Probability Plot Object of Program This program creates Weibull probability paper, orders and then plots the data. The number of cycles used to plot the data is determined by the data. If the plotted data appears to lie on a straight line, the data may come from a Weibull distribu- tion. No attempt is made in the program or on the paper to estimate the parameters of the Weibull distribution. Special Considerations 1. Missing values are eliminated from the data, which effectively makes the data set 1 smaller for each missing value. 2. When more than a hundred points are plotted, it is suggested that the period be used as the plotting symbol. This allows for a more even, narrower line. Note that the period is plotted lower than the actual value of the point. 3. A maximum of 999 points may be plotted on the graph with the empirical distribution used by the program. 4. All data used by this program must be positive. The data is checked and a message is printed if any zero or negative data is found. Methods and Formulae Empirical Distribution Function (EDF) Xi is the ith sorted value in the data set. i can go from 1 to N where N is the number of non-missing values in the data set. EDF(Xi) = i/(N + l) Percent P : ailure l09 -( l03 '(l3I^))) 136 Scattergram Object of Program This program plots points on a graph according to the two variables you specify. The plot is useful in determining if there is any relationship between two variables. Special Considerations For any point where either the X or Y coordinate is missing, the point is not plotted. Semi-Log Plot Object of Program This program plots points on a graph where each X value is plotted on a log scale, and each Y value is plotted on a normal scale. The number of cycles used on the X axis is determined by the program. This plot is useful in determining if any relationship between an untransformed Y variable and a log-transformed X variable exists. Special Considerations 1. For any point where either the X or Y coordinate is missing, the point is not plotted. 2. All data used for the X variable must be greater than zero. Log-Log Plot Object of Program This program plots points on a graph where both the X and Y axes take on log values. The number of cycles used by both axes are determined by the program. The plot of the points is useful in determining if any relationship exists between log- transformed X and Y variables. Special Considerations 1. For any point where either the X or Y coordinate is missing, the point is not plotted. 2. All data specified for this program must be positive. References 1. Exploratory Data Analysis, John W. Tukey; 1977; Addison Wesley. 2. The Statistical Analysis of Experimental Data, John Mandel; Interscience. 137 3D Plot Object of Program This program constructs and draws points in a simulated three-dimensional graph. The axes may be rotated and tilted to see relationships between the data better. An effective XY scatter- plot may be obtained by tilting the axes 90 degrees. The program looks best when rotation and tilt are between 20 and 70 degrees. At more extreme angles, labeling problems may occur. You may correct some of these problems by adjusting the axis so that the number of tic marks labeled are fewer, and so that axes labels are shorter. Special Considerations 1. For any point where either the X, Y or Z value is missing, the point is not plotted. 2. For long axes titles and various rotation and tilt combinations, the axes titles may over- lap, or not be entirely plotted. Special Plotting Characteristics PLOT This option plots the three variables that were specified. You need to input the angle (in degrees) of rotation about the z-axis between zero and ninety degrees, and the angle, be- tween zero and ninety of elevation, which is the angle between the line drawn from the origin of the axes and the XY plane. Z-AXIS This option allows you to designate the scale for the z-axis. It works the same as the options for the X and Y-axis. Methods and Formulae Mapping from the third dimension to the two dimensions of the plotting device uses the following method. Given any point (X,Y,Z) we map to the point (A,B) by letting A = (X-Xmin) (CQS(Rotation)) + (Y-Ymin) ( _ SIN(Rotation)) (Xmax-Xmin) (Ymax-Ymin) and B= (X-Xmin)[(COS 2 (Rotation) - l)(TAN(Tilt/2))] (Xmax-Xmin) + (Y-Ymin)[(SIN 2 (Rotation) -1) (TAN(Tilt/2))] (Ymax-Ymin) + (Z-Zmin) (COS(Tilt)) (Zmax-Zmin) where Xmin, Xmax, Ymin, Ymax, Zmin, and Zmax are the minimum and maximum values of the axes. Rotation and Tilt are the angles specified for the tilt and rotation of the axes. 138 Andrew's Plot Object of Program This plot takes multidimensional data and plots it on a two-dimensional plotting device in a meaningful way. It does this by mapping the vector X= (X 1? X 2 , X 3 ,..., X k ) into a function of the form Fx(t) = X 1 \ / 2 + X 2 sin(t) + X 3 cos(t) + X 4 sin(2t) +X 5 cos(2t) +... where t is between ±it. For further information, see Reference 1. Special Considerations 1. Up to twenty variables may be used for plotting. 2. Each observation causes one line to be plotted. 3. The order of the variables determines the outcome of the plot. 4. Neither axis may be labeled. 5. A rough guess is made by the program as to extremes of the functions being plotted, and may be modified by pressing the "YAXIS" special function key. 6. The duration of the plot increases with the number of variables being used. 7. Any time a missing value is encountered for any variable used, the entire observation is deleted. For labeling the lines, the observation number that would have been used to label the line is incremented and used for the next observation. 8. Each line being plotted is broken up into 100 straight-line increments. Special Plotting Characteristics PLOT This option creates the Andrew's Plot. You may choose whether or not you wish to have the first twenty observations labeled. Because this plot constructs one curve for every observa- tion, it takes quite awhile to complete the plot for large data sets. X-AXIS In changing the parameters for the x-axis, the minimum value of x must be between ( — PI) and ( + PI). The maximum value must be between the minimum value and ( + PI). Labels The only label that may be changed is the title. References 1. D. F. Andrews, "Plots of High-Dimensional Data", Biometrics, 28, pp. 125-136, March, 1972. 139 Examples STATISTICAL GRAPHICS EXAMPLES ************************************************ * DATA MANIPULATION * ************************)K*********************************** ) x*)| ( !(c)(c 3 (()|t ! t(**)((*)K)K***!((*)|< Enter DATA TYPE (Press CONTINUE for RAW RATA): Mode nuMber = ? it Is data stored on program's scratch file (DATA)V YES Raw data to be Input From mass storage EGG FUTURE CONTRACTS Data file nawe: DATA Data type is: Raw data NuMber of observations: NuMber of variables: 83 5 Variable nacies: i . ALBUMEN 2. FROZ. ALBU 3. FROZ. EGGS 4 . SHELLEGGS 5. EGG. FUTURE Subfile nawe i. SUBFILE i 2. SUBFILE 2 3. SUBFILE 3 4. SUBFILE 4 SELECT ANY KEY Five variables and names or labels beginning observation nunber of observations i 30 31 12 43 24 67 17 Option nu fiber = ? 1 Enter Method for listing data; 3 Press special function key labeled-LIST All data listed EGG FUTURE CONTRACTS Data type is- Raw data Variable # 1 Variable # 2 Variable * 3 Variable # 4 Variable * 5 (ALBUMEN ) (FROZ. ALBU) (FROZ. EGGS) (SHELLEGGS ) (EGG. FUTURE) DBS* 1 1 .67000 21.20000 2103. 00000 .20000 43.580 2 1.80000 19.60000 2025.00000 .20000 47.90 3 i .990 24.80000 2834.0 00 .30000 47.40 000 4 1.92000 36.60000 4697.00000 .5000 45. 100 00 5 1.92000 49.80000 6842.00000 1.20 000 43. 000 6 2.12000 54.40000 7793.00000 2. 10000 42.850 7 2.34000 53.60000 7920.0 0000 2.30000 42. 15000 8 2.38000 46.60000 6979. 000 2.20 00 40.850 9 2.260 37.30000 5740.00000 1.70000 41.75000 140 10 2. 08000 30.30000 4627.00 000 1 . 1 43 100 00 il 2 06000 23.30000 3392.0 00 .80000 43. 00000 12 2 02000 17.40000 2429.00000 .30000 46.90000 13 1 . 96000 10.70000 1912.00000 .10 46.450 14 1. 81000 9.50000 1681 .00000 .30000 45. 15000 15 1. 83000 15.50000 2179.00000 .30000 44.70000 16 1 . 61000 25.10000 3425.0 00 .30000 44.50000 17 1 . 53000 38.80000 5294. 0000 .60000 45.40000 18 1 . 55000 50.30000 6464.0 00 1 .20 000 42.80000 19 1 . 42000 51.80000 6431.00000 1 .50000 41 . 00000 £0 1 . 36000 49.6000 5955.000 00 1 30 000 37.00000 21 1. 25000 45.30000 5186.00000 1.00000 37 .00000 22 1. .230 39.80000 4478.0 0000 .70 00 39.50000 23 1 19000 33.80000 3734.00000 .60000 39.75000 24 1 .18000 27.90000 2930.00000 .50000 40.60000 25 1 . .15000 26.40000 2599.00000 .30 000 39.9000 26 1 .16000 23.90000 2527.00000 .30000 40.20 000 27 1 .20000 24.60000 3304.00000 .500 37.55000 28 1 .280 33.10000 4388.00000 .90000 36.60000 29 1 .450 42.80000 5907.00000 i. 20000 36.500 00 30 i .550 53.10000 6836.00000 1 .70 00 34.05000 31 i .3300 56.50000 6769.00000 1 .80000 35.70 32 1 .20000 52.50000 6074.00000 i:SOO0O 35.00000 33 1 .17000 46.50000 5148.00000 1.200 00 34.58000 34 1 .22000 39.50000 4101.00000 .90000 41.25000 35 1 .16000 32.50000 3174.00000 .60000 43.30000 36 1 .05000 25.80000 2329.00000 .30000 43.10000 37 1 .03000 24.20000 1921.00000 .20000 41.65000 38 i .00000 23.00000 1749.00000 .20000 41.70000 39 1 .06000 21.10000 1535.00000 .10000 42.50000 40 1 .07000 25.30000 2176.00000 .10000 43.10000 41 1 .10000 35.20000 3437.00000 .30000 41.05000 42 1 .09000 45.40000 4448.00000 .70000 39.95000 43 .96000 47.50000 4459.00000 .90000 40.15000 44 .91000 44.60000 4103.00000 .70000 37.650 45 .87000 39.7000 3423.00000 .50000 41.75000 46 .80000 32.30 00 2711.00000 .30000 37.80000 47 .80000 26.70000 2112.00000 .20000 36.80000 48 .840 22.20000 1631.00000 .10000 36.00000 49 .88000 19.20000 1249.00000 .10000 36.50000 50 .84000 18.20000 1209.00000 .10000 36.70000 Si .83000 19.70000 1500.00000 .10000 35.70000 52 .83000 26.50000 2687.00000 .10000 32.70000 53 .81000 33.20000 4024.00000 .50000 31.50000 54 .81000 39.90000 4831.00000 1.00000 32.40 000 55 .81000 38.60000 4739.00000 i .10000 31.250 56 .81000 36.30000 4513.00000 .90000 28.30000 57 .70000 33.20000 3966.00000 .70000 29.00000 58 .74000 28.70 00 3489.0 00 00 .60000 35.35000 59 .84000 24.40000 2732.00000 .50000 34.95000 60 .75000 21.30000 2180.00000 .30000 36.60000 61 .73000 22.70000 2210.00000 .20000 35.80000 62 .67000 22.80000 2322.00000 .30000 34.10000 63 .68000 24.60000 2243.00000 .30000 36.00000 64 .850 26.70000 2580.0 000 .20000 37.850 65 .85000 38.300 3836.00000 .30000 38.60000 66 .88000 48.50000 5086.00000 .80000 35.70000 67 .88000 51.00000 5241.00000 1 .10000 34.950 68 .81000 48.10000 4748.0 0000 1.00000 34.65000 69 .750 42.90 00 4022.00000 .70 000 35.45000 70 .69000 35.10000 3149.00000 .50000 38.50000 71 .68000 28.100 00 2307.0000 .30000 37.00000 72 .71000 22.40000 1700.00000 . 10000 36.35000 73 .75000 19.10000 1456.00000 . 10000 38.15000 74 .76000 16.90000 1282.00000 .10000 38.70000 75 .85000 17.40000 1417.00000 .20000 36.35000 76 .84000 20.00000 1772.00000 .20000 37.00000 77 .95000 25.80000 2578.00000 10 37.150 78 .98000 28.90000 3215.00000 20 000 37.75000 79 .98000 29.10000 3165.00000 4000 38.30000 80 1 .05000 27.30000 3025.00000 30000 38.450 81 i. 00000 22.60000 2746.0 00 30000 36.350 82 .90000 19.80000 2311.000 00 20000 35.00 00 83 .920 15.60000 1853.00000 10000 33.70000 141 Option nunber = ? SELECT ANY KEY Enter nuciber of desired function: 1 Y axis variable nunber? 2 Enter subfile to be used (0 if subfiles ignored) Enter nunber of desired function: 8 Enter option nunber of the graphics device? 2 Plotter identifier string (press CONT if 'HPGL')? Enter select code, bus address (default is 7,5) ? Is the above infornation correct? YES Enter nunber of desired function: 1 Are the points to be connected? YES Are grid lines to be plotted? NO Beep will sound when plot is done then press CONT. To interrupt plotting press STOP key. Exit listing options Press special function key labeled-ADV STATS Remove BSDM media Insert Statistical Graphics 1 A media Time Plot Select plotter option Choose external plotter Press CONTINUE Press CONTINUE Plot 142 EGG FUTURE CONTRRCTS TIME Enter noMber of desired function : 4 Change y-axis Y plotting MiniMUM? Y plotting MaxiwuM? 60 Y tic ? 10 Label every Kth tic nark? 1 Nuciber of decimal places to label the Y axis? Enter nunber of desired function: 5 Change labels Enter the Tine axis title <33 characters or less) TIME BY INCREMENTS OF i Enter the Y axis title <33 characters or less) FROZEN ALBUMEN Enter the Graph Title (33 characters or less) FUTURE EGG CONTRACTS Enter nunber of desired function: i Plot Are the points to be connected? YES Are grid lines to be plotted? NO 143 Beep will sound when plot is done then press CONT. To interrupt plotting press STOP key. Enter number of desired function: 10 Y axis variable number? 5 Enter subfile to be used <0 if subfiles ignored) Enter number of desired function: 6 Put double quotes around the blank. A Enter number of desired function: i Are the points to be connected? YES E'eep will sound when plot is done then press CONT. To interrupt plotting press STOP key. Overlay plot Change plotting character Plot FUTURE EGG CONTRRCTS TIME BY INCREMENTS OF 1 Enter number of desired function: 11 Enter file name to store plot characteristics ? CHARS .INTERNAL Store plotting characteristics 144 Is data nediun placed in device INTERNAL ? YES Is PROGRAM MEDIUM replaced in device ? ■> YES Enter nunber of desired function; 13 Enter nuMber of desired function: Return to main graphics menu. Select histogram example HISTOGRAM Variable nunber for creating histogram? Variable 2 will be used Enter subfile to be used < I) if subfiles ignored) Nunber of valid cases = 83 The Mean is calculated to be= 31.9313253012 The variance is calculated to be- 140.299006759 OBSERVED CELL MINIMUM MAXIMUM FREQUENCY 1 9.500 16.214 4 2 16.214 22.929 IS 3 22.929 29.643 22 4 29.643 36 . 357 10 5 36.357 43.071 11 6 43.071 49.786 9 7 49.786 56.S00 9 Enter nunber of desired function = 8 Select plotter Enter option nunber of the graphics device? Plotter identifier string (press CONT if 'HPGL')? Enter select code, bus address (default is 7,5) Is the above infornation correct? YES Enter nunber of desired function: 1 Plot Are horizontal grid lines to be plotted? NO BEEP will sound when plot done then Press CONT. To interupt plotting, press STOP key. Enter nuMber of desired function 10 Overlay normal curve OBSERVED EXPECTED CONTRIBUTION LL MINIMUM MAXIMUM FREQUENCY FREQUENCY CHI -SQUARE i -Infinity 16.214 4 7 . 658 1 . 748 2 16.214 22.929 18 10.901 4 623 3 22.929 29.643 22 16.583 1 . 770 4 29.643 36.357 10 18.448 3 . 869 5 36.357 43.071 11 15.011 1.072 6 43.071 49.786 9 8.932 .001 7 49.786 Infinity 9 5.466 2 . 284 Press CONT to plot the noma! curve overlay BEEP will sound when plot done then PRESS CONT. 145 EGG FUTURE CONTRRCTS 30 >- U z Ld m o Ld o: L_ Ld > CE _i Ld Ld U LY Ld D_ FROZ. RLBU Enter nunber of desired function: 13 Enter nunber of desired function: 3 Variable nunber? 2 Enter subfile to be used (0 if subfiles ignored) SORTING THE DATA Enter nunber of desired function: 3 Return to main graphics menu Select normal probability plot Change y-axis 146 Y plotting niniwuM? 5 Y plotting Maxinun? 60 Y tic ? 5 Label every Kth tic Mark? i NoMber of decimal places for labeling the Y axis? Enter nuwber of desired function: 4 Enter the Y axis title (33 characters or less) FROZEN ALBUMEN Enter the Graph Title <33 characters or less) EGG FUTURE CONTRACTS Enter nunber of desired function: 7 Enter option nunber of the graphics device? 2 Plotter identifier string (press CONT if 'HPGL')? Enter select code, bus address (default is 7,5) Is the above in for nation correct? YES Enter nuMber of desired function: 5 Put double quotes around the blank? * Enter nuMber of desired function: i Are grid lines to be plotted? NO Eieep will sound when the plot done then press CONT To interrupt plotting, press STOP key. Specify y lower limit Specify y upper limit Label every tic mark Change labels and titles Select plotter Change plotting symbol Plot 147 EGG FUTURE CONTRACTS z u z Zl m _i cr z u N O QL 5 5 45 35 15 »«*«* **" 1* <r ** J I I L — rvi in ~ tvi in s PERCENT UNDER NORMRL PROBRBILIY PLOT j 1 Q S ts s Q O a Q in CO en in CO CO OJ ro xi- in CD rv GO co en en 01 CO en CO CO CO CO Enter niiMber of desired function : 12 Enter number of desired function: 5 X axis variable nunber? i Y axis variable nuftber? 5 Enter subfile to be used <0 if subfiles ignored) Enter nunber of desired function: 8 Enter option nuwber of the graphics device? 2 Plotter identifier string (press CONT if 'HPGL')? Enter select code, bus address (default is 7,5)? Is the above infor nation correct? YES Enter nunber of desired function; i Are the points to be connected? NO Return to main graphics menu Select scattergram Select plotter option Plot 148 Are grid lines to be plotted ? NO Beep will sound when plot done then press CONT To interrupt plotting press 'STOP' key. EGG FUTURE CONTRHCTS 49 45 Ld CK 40 Z) Z) u 3 1 27 + + CO S OJ in OJ RLBUMEN Enter nunber of desired function A Y plotting MininuM? 30 Y plotting MaxiMUM? 50 Y tic? Change y-axis for another scattergram Label every Kth tic Mark? i Nunber of decifial places for labeling the Y axis? Enter nunber of desired function: 3 X plotting miniMUM? .6 X plotting waxinuM? 2.4 X tic? Change x-axis Label every Kth tic Mark? i NuMber of decimal places for labeling the X axis? 149 Enter nuciber of desired function: 6 Put double quotes around the blank? i Enter nunber of desired function: S Enter the X axis title (33 characters or less) ALBUMEN Enter the Y axis title (33 characters or less) EGG FUTURE Enter the Graph Title (33 characters or less) FIRST EGG FUTURE CONTRACTS Enter number of desired function: i Are the points to be connected? NO Are grid lines to be plotted ? NO Beep will sound when plot done then press CONT . To interrupt plotting press 'STOP' key. Change plotting symbol Change labels Plot 50 r 45 - Ld z> h- £ 40 (J (J Ld 35 30 FIRST EGG FUTURE CONTRACTS ii 1 1 1 i i 1 1 « : > I'll . I 11 1 1 1 1 C3 S CD CO (S SI IS (S3 OJ tJ- CO 1 1 1 l u s s OJ w ai RLBUMEN 150 Enter number of desired function : 13 Enter nunber of desired function : Return to main graphics menu Select another ADV STAT pac Remove Statistical Graphics 1A Insert Statistical Graphics 1 B Enter nunber of desired function: 3 X axis variable nunber? i Y axis variable nunber? 3 L axis variable nunber? S Enter subfile to be used < (I if subfiles ignored) 3 Enter nunber of desired function: 9 Enter option nunber of the graphics device o Plotter identifier string (press CONT if 'HPGL' ? Enter select code, bus address (defaults are 7,5)? 13 THE ABOVE INFORMATION CORRECT? YES Enter nunber of desired function: i Enter angle of rotation in degrees t 0< Angle<=90 1 30 Enter angle of elevation in degrees I 0< =Angle<=90 ] 30 Beep will sound when plot done then PRESS CONT. To interrupt plotting press 'STOP' key. Select 3-D plot Plot only for data in subfile 3. Select plotter Plot Rotate plot for easier viewing Raise angle of elevation SUBFILE 3 EGG FUTURE CONTRRCTS 151 4 1 ■ - y 38 - I < ■ • ? J, f QL Z) ! i " i ! i h- 35 i i i i i Z) I* ■ I L. ' i j 1 j ■ 32 ■ [ 1 (J j j t . (J Ld 2g . > 6 ^ .7 ©~"~ ~y- _[__ e \~" 52 RLBUMEN 1 o -, 1 1 .,1920 Oc 2740 3 560 A3 80 G <3 f &&' Enter nunber of desired function; 14 Enter number of desired function: 4 Nuciber of variables to be used? S Enter variable nunber i ? i Enter variable nunber 2 ? Return to main graphics menu Select Andrews Plot Enter variable nunber 3 ? 3 Enter variable nunber 4 ? 4 Enter variable nunber S ? S Is the above information correct? YES Enter subfile to be used (0 if subfiles ignored) Enter nunber of desired function: 7 Enter option nunber of the graphics device? 2 Plotter identifier string (press CONT if 'HPGL')? Enter select code, bus address (default is 7,5)? Is the above infomation correct? YES Plot only data in subfile 2 Select plotter 152 Enter nunber of desired function: i Are up to the first twenty lines to be labelled? YES Beep will sound when plot done then PRESS CONT . To interupt the plot press the STOP key Plot SUBFILE 2 EGG FUTURE CONTRACTS 5000 3000 1000 1000 - -3000 -5000 (S3 D_ \n I Q D_ in Q_ (S RNDREHS PLOT Enter nunber of desired function 12 Return to main graphics menu Enter nunber of desired function: i X axis (LOG AXIS) variable nunber? Select semi-log plot Y axis variable nunber? 4 Enter subfile to be used (0 if subfiles ignored) Enter nunber of desired function: 3 Change y-axis Y plotting nininun? Y plotting naxinun? £.4 Y tic? .4 Label every Kth tic nark? i Nunber of decinal places for labeling the Y axis? i Enter nunber of desired function; 4 Change labels 153 Enter the X axis title (33 characters or less) FROZEN ALBUMEN Enter the Y axis title (33 characters or less) SHELL EGGS Enter the Graph Title (33 characters or less) SEMI-LOG PLOT EGG FUTURE DATA Enter number of desired function: 7 Enter option nunber of the graphics device? 2 Plotter identifier string (press CONT if 'HPGL')? Enter select code, bus address (default is 7,5)? Is the above information correct? YES Enter nufiber of desired function: i Are grid lines to be plotted? NO Beep will sound when plot is done then press GONT To interrupt plotting, press 'STOP' key Select plotter Plot 2.4 ,- SEMI-LOG PLOT EGG FUTURE DRTR 2.0 en U _l _J LJ Ul 1 .6 1 .2 .8 4 •■ 0.0 + + + + + + + + + + + + + + ++ + + + +++ + + + + -H- l «ll II + + + + *+#++ + + + +■» ++ -ttt J I I I l_ _1_ I I I I I J I c\j m ^ m id n cooiE) ( — ( FROZEN RLBUMEN s S S C2 Q G) (3 CJ <*• in cd r- gd mo Enter number of desired function XZ Return to main graphics menu 154 Enter nunber of desired function: 2 X axis variable number? Select log-log plot V axis variable nu fiber? A Enter subfile to be used (0 if subfiles ignored)? Enter nuMber of desired function: t, Select plotter Enter option number of the graphics device? o Plotter identifier string (press CQNT if 'HPGL')? Enter select code, bus address (default is 7,?>>? Is the above information correct? YES Enter number of desired function: i Plot Are grid lines to be plotted? NO Beep will sound when plot done then press CONT . To interrupt plotting, press 'STOP' key. in ID (J U u X in 3 - EGG FUTURE CONTRACTS + + ++ + + + + + + + + ++ + + + + + + + +* + ++ + + +++ + + + + + h+ ♦ + + + J I I 1_1_ OJ m in cd r- co ens I I I H ' I I !«- J I I I L_J ai Q Q S QQQQ -*• m ID MDmQ FROZ. RLBU 155 Enter nunber of desired function: * * Return to main graphics menu Enter nunber of desired function: 6 Return to statistical graphics 1 A Enter nunber of desired function: 4 Select Weibull Plot Variable nunber? 2 Enter subfile to be used (0 if subfiles ignored) SORTING THE DATA Enter nunber of desired function: £> Select plotter Enter option nunber of the graphics device? 2 Plotter identifier string (press CONT if 'HPGI..')? Enter select code, bus address (default is 7,5)? Is the above infronation correct') YES Enter nunber of desired function: 1 Plot Are grid lines to be plotted? NO Beep will sound when plot done then press CONT. To interrupt plotting, press 'STOP' key. 156 EGG FUTURE CONTRACTS 99.9 99 95 90 h- 80 o 70 G0 50 >- 40 H id 30 KH (£ d3 20 H l-H 10 CQ K O z 5 n u u .J - 2 _J Z> 1 m HI Lu .5 31 .2 . 1 / ./ + _J_ J I L_I_ _1_J in to i^ oo en® (3 <S Q Q Q S SSS OJ n ■<1- in lO NOD cno FROZ. RLBU Enter nunber of desired function: ii Enter nunber of desired function 6 Return to main graphics menu Return to Basic Statistics and Data Manipulation (BSDM) 157 General Statistics General Information Description The General Statistics module includes 5 major parts: 1. One Sample Tests allow you to run a series of tests and plots on one-variable prob- lems. You can test whether the observations are mutually independent, whether the mean of the data is significantly different from a hypothesized mean, compare your data with normal, exponential, or uniform distributions, and test the randomness of your data. 2. Paired-Sample Tests allow you to compare the means of two samples, test if the paired samples are similar, fit the data with a regression equation, test whether the two populations have the same median and test the independence of two random variables. 3. Two-Independent-Sample Tests allow you to test whether the means of two samples are equal, whether the medians of two samples are equal, and whether the two popula- tions have the same distribution. 4. Multiple-Sample (s=3 Samples) Tests allow you to test whether the means of several populations are equal, and whether there are significant differences between pairs of means. 5. Statistical Distributions allow you to study a series of continuous and discrete statistic- al distributions. Both tabled values and right-tailed probabilities are available for the continuous distributions. The discrete distributions calculate right-tail probabilities, sing- le term probabilities and an approximate value for a specified right-tailed probability. This program will also calculate n factorial, the complete gamma function, the complete beta function and binomial coefficients. Methods and Formulae, References, etc., for each of these five parts are found in each of the following sections. Special Considerations If you specify one type of test (for example, Paired-Sample Tests), you will not be able to perform a different type of test (say, Multiple-Sample Tests), without returning to the Start-up procedure for the new test. You must access the Start-up procedure to define the segment of the data matrix which is to be tested. 158 One Sample Tests Object of Programs This section allows you to run a series of tests and plots on one variable (or one subfile of one variable) from the data matrix defined by the Basic Statistics and Data Manipulation program. Each test will automatically sort or restore the data to its original form as needed. You can perform several kinds of tests on your data: Serial Correlation — tests if the observations are mutually independent. t-Test — tests if the mean of the data is significantly different from a hypothesized mean which you specify. Kolmogorov-Smirnov Goodness-of-fit test or Chi-Square Goodness-of-fit — test if your data follow a normal, exponential or uniform distribution. Runs Test — tests the randomness of your data. Shapiro-Wilk Test — tests for normality. The above tests will be described in Methods and Formulae. Typical Program Flow Input data via BSDM Select Advanced Statistics option Insert program medium Select "One sample test" Specify variable and subfile Select certain test (7 options) Execute your choice 159 Data Structure Since we have only one variable, the data is entered as in the following example, which shows a sample of size 12: Variable #1 I OBS(I) OBSU + 1) OBS(I + 2) OBS(I + 3) OBS(I + 4) 1 6 11 2 6 3 5 4 4 8 5 7 9 3 7 Alternatively, you may input a data set containing several variables, then specify a single variable for the analysis. Several variables may be analyzed in succession. Methods and Formulae Basic Statistics For the calculation of the sample mean, variance, standard deviation, standard error of the mean, coefficient of variation, skewness, kurtosis, and confidence intervals on the mean and variance, please refer to Snedecor and Cochran's Statistical Methods. Kolmogorov-Smirnov Goodness-of-Fit Test • Assumptions 1. The sample is a random sample. 2. If the hypothesized distribution function G(X), in HO below, is continuous the test is exact. Otherwise, the test is conservative. • Hypotheses Let G(X) be a completely specified, hypothesized distribution function. F(X) is the distribution function for the random variable X. 1. Two-Sided Test HO: F(X) = G(X) for all X. HI: F(X) * G(X) for at least one value of X. 2. One-Sided Test HO: F(X) 2* G(X) for all X. HI: F(X) < G{X) for at least one value of X. 3. One-Sided Test HO: F(X) ^ G(X) for all X. HI: F(X) > G(X) for at least one value of X. 160 • Test Statistics Let S(X) be the empirical distribution function based on the random sample XI, X2, ... , Xn. 1. Two-Sided Test Let the test statistic T be the greatest (denoted by "sup" for supremum) vertical dis- tance between S(X) and G(X). T = sup|G(X)-S(X)| 2. One-Sided Test Tl = sup [G(X)-S(X)] 3. One-Sided Test T2 = sup[S(X)-G(X)] • Decision Rule Reject HO at the level of significance a if the appropriate test statistic, T, Tl, or T2 exceeds the 1 — a quantile W(l - a) from the Table of Quantiles of the Kolmogorov Test Statistic. Chi-square Goodness-of-Fit Test • Assumptions 1. The sample is a random sample. 2. The measurement scale is at least nominal. • Hypothesis Let F(X) be the true but unknown distribution function and let G(X) be a completely specified, hypothesized distribution function. HO: F(X) = G(X)forallX. HI: F(X) * G(X) for at least one X. • Test Statistic Suppose the data is divided into c classes, and the number of observations falling in each class is denoted by Oj, for j = 1, 2, ... , c. Let Pj be the probability of a random observation being in class j under the assumption that G(X) is the distribution function of X. Then define Ej as Ej = Pj*n, where n is the sample size. Then, the test statistics is: T = S(Oj - Ej) 2 /Ej forj = 1, 2, ... , c. • Decision Rule The exact distribution of T is difficult to use, so the large sample approximation is used. The approximate distribution of T is the Chi-square distribution with (c-1) degrees of freedom. Therefore, the critical region of approximate size a corresponds to values of T greater than X''(l -a), the (1 -a) quantile of a x 2 random variable with (c-1) degrees of freedom. Reject HO if T exceeds x 2 (l —ol); otherwise, accept HO. t-Test Let XI, ... , Xn be a random sample from a population with mean [l, where M is the sample mean and S is the sample standard deviation. 161 • Hypotheses 1. Two-Sided HID: (jl = a, the hypothesized value for the population mean. HI: |x =* a 2. One-Sided HID: (x = a HI: |x < a 3. One-Sided HO: jjl = a HI: |x > a • Test Statistic t = Vn(M-a)/S • Decision Rule The statistic t has a t-distribution with (n - 1) degrees of freedom. T(l - a, n - 1) is the (1 - a) quantile of the t-distribution with (n - 1) degrees of freedom. 1. Two-Sided: if t «£ T(l - a/2, n - 1), accept HO, otherwise, reject HO. 2. One-Sided: if t ss T(a/2, n- 1), accept HO, otherwise, reject HO. 3. One-Sided: if t *s T(l -a/2, n-1) accept HO, otherwise, reject HO. In this program the corresponding one- or two-tailed probability ot the computed t-value will be printed. Runs Test Any sequence of like observations bounded by observations of a different type is called a run. The number of observations in the run is called the length of the run. Suppose a coin is tossed twenty times and the resulting heads (H) or tails (T) are recorded in the order in which they occur: T HHHHHH T H T H TT HHH T H T H Each segment is called a run. The total number of runs in the example is 12. The total number of runs may be used as a measure of the randomness of the sequence; too many runs may indicate that each observation tends to follow and be followed by an observa- tion of the other type, while too few runs might indicate a tendency for like observations to follow like observations. In either case the sequence would indicate that the process generat- ing the sequence was not random. • Hypothesis HO: The process which generates the sequence is a random process. HI: The random variables in the sequence are either dependent on other random variables in the sequence or are distributed differently from one another. 162 • Test Statistic In this program we use the median as an indicator of two types of observations, i.e., a value below the median is one kind, a value above the median is another kind. Count the runs below and above the median, say D. Then W = N + 1 + Z p ([(N | 2)/(2N-l)] t .5) where Z p is the pth quantile of a standard normal random variable. • Decision Rule Reject HO at the level a if D > W(l - a/2) or D < W(a/2), otherwise accept HO. Serial Correlation This routine checks for randomness in the sample. • Formula Serial correlation with lag k: |_ Z (X.-X) (X i + k -X) J / |_ 2 X 2 -N.X 2 J If the correlation is small, this means the observations are mutually independent. Shapiro-Wilk Test This routine performs a test for normality for a sample of size 3 to 50, inclusive. Note A tie means two or more observations have the same value. Ties must be given a special treatment when we try to give every single observation a rank. If the sample size is less than 3 or greater than 50, a message will be printed stating that this program will not work and to try a chi-square goodness of fit test for N>50. Then you will have a chance to choose the test you want again. • Hypothesis The data comes from a normal distribution. • Test Statistic A test statistic W is printed followed by the tabled values of Wa (% POINTS) for alpha = .01, .02, .05, .1, and .5. • Decision Rule The observed test statistic W indicates that the sample did not come from a normal distribu- tion at the corresponding alpha level of significance if the value of W is less than the corres- ponding percentage point. Hence, small values of W are significant. 163 References 1. Abramowitz, Milton and Stegun, Irene A (1970) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. U.S. Government Printing Office, Washington D.C., p. 949. 2. Box, G.E.P. and Cox, D.R. (1964). "An Analysis of Transformations". Journal of the Royal Statistical Society 26:2, pp. 211-252. 3. Conover, W.J. (1971). Practical Nonparametric Statistics. John Wiley & Sons, Inc., New York, p. 414. 4. Conte, S.D. (1965). Elementary Numerical Analysis. McGraw-Hill Book Company, New York, p. 135. 5. Dickinson Gibbons, Jean (1971). Nonparametric Statistical Inference. McGraw-Hill Book Company, New York, pp. 75-83. 6. Hahn, G. and Shapiro, S.S., (1967). Statistical Models in Engineering, John Wiley & Sons, Inc., New York, pp. 330-332. 7. Kopitzke, Robert W., Unpublished Notes, 1973. 8. Mood, Graybill, Boes (1974). Introduction to the Theory of Statistics, 3rd Edition, McGraw-Hill Book Company, New York. Chapter 7. 9. Shapiro, S.S. and Wilk, M.B. (1965). "An Analysis of Variance Test for Normality". Biometrika; 52, 3 and 4, p. 591. 10. Snedecor, George W. and Cochran, William G. (1967). Statistical Methods. The Iowa State University Press, Ames, Iowa. 11. Ullman, Neil R., (1972). Statistics: An Applied Approach, Xerox College Publishing, Lexington, Mass. pp. 354-357. 164 Paired-Sample Tests Description This program allows you to perform the following paired-sample tests: Paired t-test — compare the means of two samples. Cross Correlation — test if the paired samples are similar. Family Regression — fit the data with one of several regression equations. Sign Test or Wilcoxon Signed Rank Test — test whether two populations have the same median. Spearman's Rho or Kendall's Tau — test the independence of two random variables. Typical Program Flow Input data via BSDM ■ Select Advanced Statistics option Insert program medium Select "paired sample test" Specify variables and subfiles Select a certain test (9 options) Execute your choice Data Structure For paired-sample tests, two variables or the same subfile of two variables must be used. The data are entered as in the following example: Obs. # 1 2 3 Variable #1 54 44 46 Variable #2 46 42 44 165 Methods and Formulae Paired t-Test This is a one-sample t-test performed on the differences between paired samples. See the Methods and Formulae section in the One-Sample Tests chapter for further details. Cross Correlator! Provides a correlation between paired samples with a lag between them. Large values show the paired samples are quite similar, i.e., no significant difference. The cross correlation with lag k between the two samples X1,X2,...,XN and Y1,Y2,...YN is: [N-k "l/l -1 " 1 N ~~l S(X,-X)(Y 1 + k -Y) / X (X-X) 2 X (Y-Y) 2 i=i -* L i = i i = i -J T-5 Family Regression Provides four different regression models. All of the models are solved (except quadratic) by "linearizing" the model to the form: f(Y) = "b" + "a"g(X) and solving by ordinary linear least squares. The AOV table which is printed out for each model is in units of the transformed Y's. R 2 , the squared multiple correlation coefficient is expressed in units of the transformed Y's. The following models are provided: Linear: Y = aX + b Quadratic: Y = aX 2 + bX + c Exponential: Y = a exp(bX) Power: Y = aX t b Sign Test • Object The sign test is designed for testing whether two populations have the same medians. • Data The data consist of observations on a bivariate random sample (XI, Yl), .... , (Xn, Yn). Within each pair, (Xi, Yi), a comparison is made and the pair is a " + " if Xi > Yi, and a "-" if Xi < Yi. If Xi = Yi, the pairs are excluded from the analysis. • Hypotheses 1. HO: P(Xi < Yi) = P(Xi > Yi) for all i HI: Either P(Xi > Yi) < P (Xi < Yi) for all i or P(Xi > Yi) > P(Xi < Yi) for all i 2. HO: P(Xi > Yi) as P(Xi < Yi) for all i HI: P(Xi > Yi) > P(Xi < Yi) for all i 3. HO = P(Xi > Yi) ^ P(Xi < Yi) for all i HI - P(Xi > Yi) < P(Xi < Yi) for all i 166 • Test Statistic T = total number of pluses ( + ). • Decision Rule In this program a standardized T value Zt is printed so you can compare it to the cumulative distribution for a standardized normal random variable, Z. 1. Reject HO if 1 - P[-Zt<Z<Zt] <a Accept HO if 1 - P[ -Zt < Z < Zt] > a 2. Reject HO if 1 - P[Z =s Zt] < 1 - a Accept HO if 1 - P[Z *£ Zt] > 1 - a 3. Reject HO if 1 - P[Z «s Zt] > a Accept HO if 1 - P[Z «£ Zt] < a Wilcoxon Signed Ranks Test • Object This test is designed to test whether a particular sample came from a population with a speci- fied median. It may also be used for paired samples to see if two samples have the same median. • Data The data consist of N observations (X1.Y1), (X2.Y2), ... , (XN,YN). The absolute differences |Di| = | Xi - Yi |, for i = 1, ..., N are computed for each pair. Ranks from 1 to N are assigned to these N pairs according to the relative size of the absolute differences. Pairs for which Xi = Yi are excluded from the analysis. • Hypotheses 1. HO: E(X) = E(Y) HI: E(X) > E(Y) 2. HO: E(X) = E(Y) HI: E(X) < E(Y) 3. HO: E(X) = E(Y) HI: E(X) * E(Y) • Test Statistic Define Ri = if Yi > Xi (Di is negative) Ri = the rank assigned to (Xi, Yi) if Xi > Yi Then the test statistic T = 2Ri, for i = 1, ..., N. • Decision Rule Look up the Quantiles, W(*) of the Wilcoxon signed ranks test statistic in the table included in this manual. 1. RejectHOifT>W(l-a) Accept HO if T« W(l-a) 167 2. Reject HO if T < W (a) Accept HO if T s* W (a) 3. Reject HO if T > W( 1 - a/2) or T < W (a/2) Accept HO if W( a/2) <T< W(l-a/2) Higher Power Signed Rank Ranks the N differences, Xi-Yi, from smallest to greatest. T, the test statistic, is given by the sum of the ranks of the positive differences raised to the specified power (2,3,4, or 5). Note that if the power specified were 1, this test is the Wilcoxon Signed Rank test, and if the power were 0, this test is the Sign test. Using higher powers of the ranks can lead to a more powerful test when it is desired to weight larger values more heavily. This would be true in highly skewed distributions. Spearman's Rho • Object This routine will test the independence of two random variables. • Data The data consist of a bivariate random sample of size N, (XI, Yl), ..., (XN, YN). Let R(Xi) be the rank of Xi as compared with the other X values, for i = 1,2, ..., N. That is R(Xi) = 1 if Xi is the smallest of XI, X2, ..., XN; R(Xi) = 2 if Xi is the second smallest, etc. Similarly, let R(Yi) equal 1,2, ..., N depending on the relative magnitude of Yi. • Measure of Correlation d = 2(R(X,) - R(Y;)) 2 for i = 1,2,. ..,N R = l-[6d/N(N|2-l)] • Hypothesis Testing The Spearman rank correlation coefficient is used as a test statistic to test for independence between two random variables. 1. Two-Tailed Test HO: The Xi and Yi are mutually independent. HI: Either a) there is a tendency for the larger values of X to be paired with the larger values of Y, or b) there is a tendency for the smaller values of X to be paired with the larger values of Y. 2. One-Tailed Test For Positive Correlation HO: The Xi and Yi are mutually independent. HI: There is a tendency for the ranks of X and Y to be paired together. 3. One-Tailed Test For Negative Correlation HO: The Xi and Yi are mutually independent. HI: There is a tendency for the smaller values of X to be paired with the larger values of Y, and vise versa. 168 • Decision Rule From the table of quantiles of the Spearman test statistic in this manual, we can find the quantile value. 1. Two-tailed test: Reject HO if R exceeds the (1 - a/2) quantile or if R is less than the a/2 quantile. 2. One-tailed test for positive correlation: Reject HO if R exceeds the 1 - a quantile. 3. One-tailed test for negative correlation: Reject HO if R less than a quantile. Kendall's Tau • Object This routine allows you to test the independence of two random variables. • Data The data consist of a bivariate random sample of size N, (Xi,Yi) for i = 1,2, N. Two observa- tions, for example (1.3, 2.2) and (1.6,2.7), are called concordant if both members of one observation are larger than the respective members of the other observation. Pc denotes the number of concordant pairs of observations. A pair of observations like (1.3,2.2) and (1.6, 1.1) are called discordant if the two numbers in one observation differ in opposite directions (one negative and one positive) from the respective members in the other observation. Let Pd denote the number of discordant pairs of observations. If Xi = Xj or Yi = Yj, (i ^ j), the pair is disregarded. • Measure of Correlation T - (Pc-Pd)/[N(N-l)/2] • Hypotheses Same as in Spearmans's Rho. • Decision Rule From the table of quantiles of the Kendall rank correlation coefficient in this manual, we can find the quantile value. Q. 1. Two-tailed test: Reject HO if Q exceeds the (1 - a/2) quantile or if Q is less than the a/2 quantile. 2. One-tailed test for positive correlation: Reject HO if Q exceeds the 1 - a quantile. 3. One-tailed test for negative correlation: Reject HO if Q is less than the a quantile. 169 Two Independent Sample Tests Object of Program The following routines are provided: Two-sample t-test — tests whether the means of two samples are equal. Median test — tests whether the medians of two samples are equal. Mann-Whitney, Taha's Squared R, Cramer-von Mises, and Kolmogorov-Smirnov tests all test whether the two populations have the same distribution. Typical Program Flow Input data via BSDM Select Advanced Statistics option Insert program medium Choose two independent tests" ■ Specify variables and subtiles Choose the test you desire (7 options) Execute the test you choose Data Structure For all of the two-independent-sample tests, data must be entered into one variable in the data base created by Basic Statistics and Data Manipulation. Then, the Subfile routine of BSDM must be used to create two subfiles. Each subfile corresponds to one sample. For example, suppose you have one sample of size six and another sample of size eight. Suppose the data is: Sample 1: 2, 3, 4, 2, 3, 6 Sample 2: 4, 5, 4, 2, 2, 6, 3, 7. The data should be entered vis BSDM as one variable with 14 observations. Then, the Subfile routine would be used to specify two subfiles, the first with six observations, and the second with eight observations. 170 Methods and Formulae Two-Sample t Test • Object The two-sample t-test is used to test whether the means of two samples drawn from normal populations having the same variance are equal. • Data Let XI, ..., Xn be a random sample from the first population and Yl, ... , Ym be a random sample from the second. Let M(X) and M(Y) be the respective sample means and let S(X) and S(Y) be the sample variances. • Hypotheses Let |x(X) and |x(Y) be the two population means. 1. Two-Sided Test HO: ^(X) - u.(Y) HI: \x(X) ± jjl(Y) 2. One-Sided Test HO: m-(X) - m-(Y) HI: m-(X) < (x(Y) 3. One-Sided Test HO: ji(X) = |i(Y) HI: jjl(X) > |x(Y) • Test Statistic t = [M(X) - M(Y)] / [( — + --) (2Xi t 2 - nM(X) | 2 + 2Y, T 2 - mM(Y) f 2) / [n + m - 2\ h n m • Decision Rule 1. Two-Sided Test Reject HO if P[ - 1 < T < t] > 1 - a 2. One-Sided Tests Reject HO if P[T<t] > 1-a 3. One-Sided Tests RejectHOifP[T<t]<a Median Test • Object The median test is designed to determine whether two samples came from populations having the same median. 171 • Data From each of two populations a random sample of size Ni is obtained. Let N — Nl + N2. We obtain the sample median of the combined samples which is called the grand median. Let Oli be the number of observations in the ith sample that exceed the grand median, and let 02i be the number of observations in the ith sample that are less than or equal to the grand median. Arrange the frequency counts into a 2-by-2 contingency table as follows: Sample 1 2 Totals > median < median Hypothesis O n 12 O21 2 2 N x N 2 N HO: The two populations have the same median. HI: The medians of the two populations are different. • Test Statistic In the first sample count the number of X's greater than the grand median, say O n , and the number of X's smaller than the grand median, say 21 , then, let T = On— 2 i- The data value which is the same as the grand median is omitted. From the contingency table, a x 2 value can be calculated by using: X 2 = 2((01i - 02i) 2 /Ni) fori =1,2. • Decision Rule A standardized z-value is printed, so we can look in the cumulative normal frequency distribu- tion table to find the probability corresponding to the standardized z value, Zt, for Z = Vx 2 . Accept HO if 1 - P[ - Zt < Z < Zt] > a Reject HO if 1 - P[ -Zt < Z < Zt] < a If you wish to use the x 2 value calculated from the contingency table, then look in the chi- square contingency table and find the W(l -a) value with one degree of freedom where a is the significance level. Accept HO if calculated x 2 < W ( 1 - a ) Reject HO if calculated x 2 >W(l - a) If Nl+N2<30, Fisher's exact probability, P, is given. If a/2<P<l-a/2, accept HO; other- wise, reject HO. Mann-Whitney Test • Object The Mann-Whitney test is designed to test if two populations are identical. 172 • Data The data consist of two random samples. Let XI, X2,. ..., XN denote the random sample of size N from population one, and let Yl, Y2, ..., YM denote the random sample of size M from population two. Assign the ranks 1 through N + M to the combined samples. Let R(Xi) and R(Yj) denote the ranks assigned to X and Y respectively, for all i and j. • Hypotheses Let F{X) and G(X) be the distribution functions, corresponding to populations one and two respectively (or of X and Y respectively). 1. Two-Sided Test HO: F(X) = G(X) for all X HI: F(X) * G(X) for at least one X 2. One-Sided Test HO: P(X < Y) =s .5 HI: P(X< Y) > .5 3. One-Sided Test HO: P(X<Y) 5=. 5 HI: P(X<Y) <.5 • Test Statistic LetT = SR(Xi)fori = 1, ..., N. In our output T is standardized to z by using: z = (T-(jl)/ct where ix = N(N + M + l)/2 and a 2 = MN(M + N + l)/12 • Decision Rule Look in the normal probability function table to find the probability corresponding to the standardized z, Zt. 1. Two-Sided Test Accept HO if P[-Zt ss Z « Zt] < 1 -a Reject HO if P[ -Zt ss Z ^ Zt] > 1 - a 2. One-Sided Test Accept HO if P[Z =s Zt] > a Reject HO if P[Z ^ Zt] < a 3. One-Sided Test Accept HO if P[Z =s Zt] < 1 - a Reject HO if P[Z s£ Zt] > 1 - a 173 Taha's Squared R This test is similar to the Mann- Whitney test, because it ranks the pooled sample of X's and Y's and defines T by T = 2R(Xj) f 2. Again, the null hypothesis is that the two populations have the same distribution. Z is normalized by z = (T - |x)/cr where (jl = N(N + M + 1)(2(N + M) + l)/6 and a is very complicated, but can be found in Mielke. (See References) Cramer- Von Mises Test • Object The Cramer-Von Mises test is designed to test if two populations are identical. Data The data consist of two independent random samples, XI, ..., XN and Yl, ..., YM, with unknown distributions functions F(*) and G(*) respectively. • Hypothesis HO: F(X) = G(X) for all X HI: F(X) * G(X) for at least one X • Test Statistic Let Fl(Xi) and Gl(Yj) be the empirical cumulative distribution functions. Then T = 2[Fl(Xi) - Gl(Yj)] where the sum is over consecutive i and j, that is, over the "pooled" cumulative distribution function. • Decision Rule In the program output, T and the .10, .05, and .01 significance levels are printed. Choose your desired significance level and: Reject HO if T > corresponding critical point Accept HO is T < corresponding critical point Kolmogorov-Smirnov Test • Object This test is designed to test whether two populations have the same distribution. • Data The data consist of two independent random samples XI, ..., XN and Yl, ..., YM. Let F(*) and G(*) represent their respective, unknown, distribution functions. 174 • Hypotheses 1. Two-Sided Test HO: F(X) = G(X) for all X HI: F(X) * G(X) for at least one value of X 2. One-Sided Test HO: F(X) = G(X)forallX HI: F(X) > G(X) for at least one value of X 3. One-Sided Test HO: F(X) = G(X) for all X HI: F(X) < G(X) for at least one value of X • Test Statistic Let S1(X) be the empirical distribution function based on the random sample XI, ..., XN, and let S2(Y) be the empirical distribution function based on the other random sample Yl, ..., YM. Define the test statistic, T, as the greatest vertical distance between the two empirical distribu- tion functions: T = sup|Sl(X) - S2(Y)| • Decision Rule The output consists of T and the .10, .05, and .01 significance levels. Choose your desired significance level. Reject HO if T > corresponding critical point Accept HO otherwise 175 Multiple-Sample (^ 3 Samples) Tests Description The following routines are available: One-Way Analysis of Variance — tests whether the means of several populations are equal. Multiple Comparisons — test whether there are significant differences between pairs of means via Least Significant Differences, Duncan's test, Student-Newman-KeuPs test, Tukey's HSD, or Scheffe's test. Kruskal-Wallis Test — tests if several populations have identical medians. Typical Program Flow Input data via BSDM ■ Select Advanced Statistics option Insert program medium Select "multiple sample tests" ■ ■ Specify variables and subfiles Choose the desired test (4 options) Execute the chosen test Data Structure For ^ 3 Sample tests, three or more different subfiles of the same variable must be used. The data are entered as in the following example. Suppose you have three samples: Sample 1 Sample 2 Sample 3 2, 5, 8, 7, 6, 4 3,2,9,11 7, 3, 5, 8, 6 176 You would enter the data via Basic Statistics and Data Manipulation as one variable with 15 observations like this: Variable #1 I OBS(I) OBS(I + l) OBSU + 2) OBS(I + 3) OBS(I + 4) 1 6 11 2 4 7 5 3 3 8 2 5 7 9 8 6 11 6 Then, the Subfile option would be used to specify three subfiles, the first with six observa- tions, the second with four observations, and the third with five observations. Methods and Formulae 1. One-way Analysis of Variance is used to test the hypothesis that the means of several populations are equal. The assumption is that all the populations are normal and have equal variances, although the sample sizes may be unequal. Suppose k is the number of populations and n t is the number of observations in the sample from the ith population. The total variation of the data is SST 4(§U-&0) where X is the overall mean. The variation due to error, or variation within samples is -KK^-xa 2 )) SSE- ^ \ jLi \ (X,,-X,) 2 where X ( is the mean of the ith sample. The variation between samples is SSB K = X(n itXi-X) 2 ) The error mean square is defined as MSE = SSE/(N-k), where N = X(n.) i = i and the between samples mean square is defined as MSB = SSB/(k- 1). The F-ratio, MSB/MSE, has the F distribution with k - 1 and N-k degrees of freedom. The null hypothesis that the population means are equal may be rejected if the F ratio is greater than or equal to F«, k- 1, N — k, where « is the significance level of the experi- ment. This may be summarized in a table: 177 Source of Variation Degrees of Freedom Sum of Squares Mean Square F Between samples Kl SSB MSB= jp 1 * MSB MSE Error N-k SSE MSE=- S -^ E - N-k Total N-l SST Multiple Comparisons Multiple comparisons provide you with several tests to determine whether the the various samples have significantly different means. The procedures are used upon completion of an analysis of variance. The notation used in these tests is defined below. EMS = error mean square used in testing for significance in the analysis of variance n = har monic ave rage of observations per mean S(M) = VEMS/n k = number of groups a = degrees of freedom for EMS = n-k Mi = mean of the ith sample, i = 1, ..., k Oi = ith ordered (from largest to smallest) group mean, i = 1, ..., k msd = minimum significant difference Group means are sorted and then all possible comparisons are made. Only one table value is necessary for Least Significant Differences, Tukey's HSD, or Scheffe's test. On the other hand, k - 1 table values are needed for Student-Newman-Keul's test and Duncan's multiple range test. The minimum significant difference is the smallest difference there can be between two means for them to be considered significantly different from one another. In all of the procedures, comparisons are made starting with the largest difference between means and progressing to the smallest difference. The process should be terminated when there is no significant differ- ence found at a given step. In all cases the hypothesis is: HO: |xi = |jlj , where |xi is the mean of the ith population, i i= j HI: |i,i ^ |jlj 178 Least Significant Differences (Multiple Comparisons) • Test Statistic msd = t(a,b)S(M)V2, where t(a,b) is the upper b point of the t-distibution with a de- grees of freedom • Decision Rule Accept HO if Mi - Mj < msd Reject HO otherwise Duncan's Multiple Range Test (Multiple Comparisons) • Test Statistic First, the sample means are ordered from largest to smallest: 01, 02, ..., Ok. Define p = difference in ranks of the means being compared plus one. For example, if you are comparing 02 and 05, then p = (5 - 2) + 1 = 4. Then: msd = R(a,p,b)S(M), where R(a,p,b) is the upper b point from the new multiple range table with a degrees of freedom and distance p. • Decision Rule Accept HO if Oi - Oj < msd, where i < j Reject HO otherwise Scheffe's Test (Multiple Comparisons) After you have collected the data and tested those contrasts that catch your eye during the analysis, you should use Scheffe's Test. • Test Statistic msd = V(k - l)F(b,k-l,a) S(M), where F(b,k-l,a) is the upper b point of the F distributrion with k - 1 and a degrees of freedom. • Decision Rule Accept HO if Mi - Mj < msd Reject HO otherwise Tukey's HSD (Multiple Comparisons) • Test Statistic msd = R(k,a,b)S(M), where R(k,a,b) is the upper b point of the Studentized range table with a degrees of freedom and total sample number k. • Decision Rule Accept HO if Mi - Mj < msd Reject HO otherwise 179 Student-Newman-Keuls Test (Multiple Comparisons) First, the means of the sample are ordered from largest to smallest, 01, 02, ..., Ok. Then p is defined the same as in Duncan's Test. • Test Statistic msd = R(p,a,b)S(M), where R(p,a,b) is the upper b point from the Studentized range table with a degrees of freedom and distance p. • Decision Rule Accept HO if msd > Oi - Oj, i < j Reject HO otherwise Kruskal-Wallis Test • Object The Kruskal-Wallis test is designed to test whether k independent samples, k s= 2, have the same mean. The test does not assume normality of the k populations. • Data The data consist of k independent samples, each of size Ni, i = 1, ..., k. Let N = Nl + N2 + ... + Nk. Rank the combined samples. Then, for each sample compute the sum of the ranks of the observations in the sample. Call these sums Ri, for i = 1, ..., k. If more than one observation have the same value, assign the average rank to each of the tied observations. • Hypothesis HO: All of the k populations have equal means HI: At least one of the populations has a different mean • Test Statistic T = [12/N(N + l)][2(R,t2/N,)] -3(N + 1), for i = l,...,k • Decision Rule The output prints out a chi-square statistic along with the probability that a chi-square random variable is greater than the statistic. If the probability printed is smaller than the significance level you chose, reject HO. Otherwise, accept HO. 180 References 1. Bancroft, T.A., Topics in Intermediate Statistical Methods, Volume 1. Iowa State Uni- versity Press; Ames, Iowa, 1968. 2. Boardman, T.J., and Moffitt, D.R., "Graphical Monte Carlo Type I Error Rates for Mul- tiple Comparisons Procedures", Biometrics, 27: September 1971. 3. Conover, W.M. (1971), Practical Nonparametric Statistics. John Wiley and Sons, Inc. New York. 4. Conover, W.J. (1974), "Some Reasons For Not Using the Yates Contingency Correc- tion on 2x2 Contingency Tables)". JASA, June 1974, 69:374. 5. Dixon, Wilfred and Massey, Frank, Introduction to Statistical Analysis, McGraw-Hill, New York, 1969, pp. 119-123. 6. Draper, N.R. and Smith, H., Applied Regression Analysis, John Wiley & Sons, New York, 1966, pp. 7-20. 7. Mielke, P.W. (1967), "Note on Some Squared Rank Tests with Existing Ties". Tech- nometrics, 9:312. 8. Mielke, P.W. (1972), "Asymptotic Behavior of Two-Sample Tests Based on Powers of Ranks for Detecting Scales and Location Alternatives". 9. Mosteller, F. and Robert E.K. Rourke (1973), Sturdy Statistics. Addison-Wesley Pub- lishing Co., Reading, Mass. 10. Siegel, S. (1956), Nonparametric Statistics. McGraw-Hill, New York. 11. Snedecor, George and Cochran, William, Statistical Methods, Iowa State University Press, Ames, Iowa; 1971, pp. 91-119. 181 Statistical Distributions Object of Program This program allows you to run a series of continuous and discrete statistical distributions. Both tabled values and right-tailed probabilities are available for the continuous distribution. The discrete distributions calculate right-tailed probabilities, single term probabilities and an approximate value for a specified right-tailed probability. Additionally, this program will calculate n factorial, the complete gamma function, the com- plete beta function and binomial coefficients. Methods and Formulae Continuous The continuous distributions included in this program are: 1. Normal (Gaussian) 2. Two-parameter gamma 3. Central F 4. Beta 5. Student's T 6. Weibull 7. Chi-square 8. Laplace (double exponential, bilateral exponential, extreme distribution, or Poisson's first law of error) 9. Logistic (autocatalytic function, growth curve) For the central F, beta, T, chi-square and gamma distributions, the algorithms generally con- verge most rapidly for small or large right tail probabilities. For moderate tails, the time in- creases as the right tail approaches .5. For the beta distribution, both parameters should be greater than 10 3 . If the parameters are smaller than this, the time required for convergence is excessive. For the chi-square, it is recommended that the degrees of freedom be less than 500. For the logistic, Laplace and Weibull it is necessary that the right-tailed probabilities, p, satisfy 1-10 95 >p>10" 95 For the incomplete gamma, it is recommended that the ratio A/B be less than 250. Some special terms are: 1. Right -tailed probability. Given that X is a random variable and "a" is an observable value of X, then the right-tailed probability associated with "a" is PR(X>a). 2. Tabled values. Given that X is a random variable and P is a right-tailed probability, then the tabled value associated with P is that value "a" such that PR(X>a) = P. To specify the distributions, the respective density functions that are evaluated will be shown below. Let f(x) be a density, and T(*) be the gamma function. 182 1. Normal (standard) f(x) = =Ue-* 2/2 oc<x<oo :tt 2. Two parameter gamma, parameters A,B x>0 f(x)= - 1 _ A *x A x *e~ x/B T(A)B A A>0, B>0 3. Central F with N degrees of freedom in the numerator and D in the denominator f(x) = r((N + D)/2)(N/D) N/2 ,N/2 - 1 r(N/2)T(D/2) (■•*) Nx\ (N + D)/2 4. Beta with parameters A and B f(x)= r(A+B) (1 _ x)B -i x A-i r(A)T(B) O^x^l N and D are positive integers A,B>0 5. Student's t with N degrees of freedom r((N + l)/2)* 1 f(x) = V~N^r(N/2) (fTxVNr +i) - oc<x<oo N positive integer 6. Weibull with parameters A.B f(x) = BA B x B ^expt-Ax 6 ] x>0 A,B>0 7. Chi-square with N degrees of freedom 1 f(x) = — r(N/2) 2 N/2 x N/2 - 1 e x/2 8. Logistic with parameters A,B Bxexp(-(A + Bx)) B>0 and -cc< x < : [l+exp(-(A + Bx))] 2 N is a positive integer X>0 183 9. Laplace with parameters A and B f(x) = — exp{ - |x - A|/B} B>0 and - °o<x<oc 2B Discrete The discrete distributions included in this program are: 1. Binomial 2. Negative Binomial 3. Poisson 4. Hypergeometric 5. Gamma Function 6. Beta Function 7. Single Term Binomial 8. Single Term Negative Binomial 9. Single Term Poisson 10. Single Term Hypergeometric Other routines of this program are N factorial and Binomial Coefficients. Some special terms used are: 1. Tabled value. Let X be a binomial, hypergeometric or Poisson random variable. Given all approriate parameters and p, a desired right-tailed probability, then the tabled value is defined to be x such that P(X>x) = p. 2. Single term probability. Given that X is one of the three distributions and x is the counter domain of X, then the single term probability is defined to be P(X = x). All tabled values are normal approximations. It should be noted that if a right-tailed probabil- ity p is desired, it is an unlikely coincidence that there will exist an element x in the counter domain such that P(X>x) = p where x is one of the distributions in (2) above. Thus, after getting the normal approximation to the tabled value, values in the counter domain near the approximation should be checked to see which value is best for the particular application. The distributions are defined as follows: 1. Hypergeometric Let N = number of items in a lot M«sN M = sample size K«sN X = number of defective items in the sample X^K K = number of defective items in the lot X^M then P (exactly x defectives are in the sample) is 184 P(X = x)= \ x M M-x ) ,x = 0,l,.-,M /K\/N-K\ = x)= \xj ^M-x^J W) and min(M.K) P = P(X5*x)= 2> P(X = i) 2. Binomial Let N = number of trials p = probability of success at each trial X = number of successes (r) p R d P(X = R)= \R) p r (i-p) n " r , R = 0,l,...,N,0<p<l and 2 ( i )p'd i = R \ / N. /m\ N-i P= P(X=*R) = A \ i Jp'(l-p) 3. Poisson Let m - rate parameter or mean = lambda >0 X = number of occurrences =0,1,2,... 2^ P = P(X3=N) = e~ m _, i=n i! 4. Negative Binomial For a sequence of Bernoulli trials with probability p of success, let R = number of failures before the Nth success then /N + R-l \ ;=R)=V R ) P "(i P(X = R)=^ R /P N (1-P) R , R = 0,l,2...,0<p<l and if A = number of failures before the Nth success then 185 M n t i V» P(X^A) = .^V /p M d-p)', A = 0,l,2 5. N! and T(x) and complete beta function. N must be a non-negative integer. An asymptotic Stirling's approximation is used to calculate N! and T (x) and complete beta function. Special Considerations Loading the Program Directly This program may be entered via Basic Statistics and Data Manipulation, any One Sample test, or any Multiple Sample test. You may also load the program directly by following these instructions: 1. Insert the General Statistics program medium. 2. Enter: LOAD "START_DIST",10, 3. Press: EXECUTE Before you load the program directly, you must specify the mass storage device which contains the program medium using the MASS STORAGE IS command. Continuity Correction For right-tailed probabilities, the exact probabilities are calculated. Thus, there is no need to use a continuity correction. There is no restriction that the parameters be integers, so if for some reason a continuity correction is desired, one may be used. References 1. Abramowitz, M. and Stegun, I. A., Handbook of Mathematical Functions, National Bureau of Standards, 1964. 2. Abramowitz, M. and Stegun, I. (1964) N.B.S. Handbook Series 55, Government Print- ing Office. 3. Erdelyi, A., editor (1953) Higher Transcendental Functions, Vo. 1, McGraw-Hill, New York. 4. Johnson, N., and Kotz, S. (1970) Continuous Univariate Distributions, Vol. 1 and 2, Houghton-Mifflin, New York. 5. Khovanskii, A.N., (1956) The Applications of Continued Fractions and Their Genera- tion to Problems in Approximation Theory, P. Noordhoff, Groningen. 6. Kopitzke, R., PH.D. Dissertation, 1974. 7. Kopitzke, Robert W., Unpublished research notes. 8. Lieberman, G.J. and Own, D.B., Tables of the Hypergeometric Probability Distribution, Stanford University Press, 1961. 9. Wall, H.S., (1948) Analytic Theory of Continued Fractions, D. Van Nostrand, New York. 10. Whitaker, E.T., and Watson, G.N., (1940) Modern Analysis, Cambridge University Press. 186 Examples Examples On One Sample Data Sets One Hundred Failure-Time Data One hundred observations of the time until failure of an electronic circuit were obtained from a life testing experiment. The coded data values are shown below. The serial correlations with lag 1 and lag 2 were quite small indicating apparent "independence" of the observations. Also, a serial plot of the data shows no particular patterns. The runs test further confirms the randomness of the data. This type of data is assumed to come from an exponential random variable with mean = 1. The histogram of the data indicates that this assumption might be valid. If the data really is exponential with mean = 1, then the sample mean and standard deviation also should be about 1. From the output we see that x = 1.0856 and s = .9301 which do not differ from 1 by a great deal. This is confirmed by the one-sample t-test. Both the Chi-square goodness of fit test and the Kolmogorov-Smirnov goodness of fit test indicate that we cannot reject the hypothesis that the data came from an exponentially distri- buted population with mean = 1. The x 2 test yields a test statistic of 9.248 with 8 degrees of freedom, which is not significant even at the a = .10 level. The K-S test statistic DN = .09907, is not significant at a = ,20 level. However, both tests (x 2 and K-S) indicate that the data is not normally distributed. Since the sample size for this example was too large to perform a Shapiro Wilk Normality test, half of the observations were selected to give you an idea of the output. * DATA MANIPULATION * Enter DATA TYPE (Press CONTINUE for RAW DATA): i Raw data Mode nunber = ? 2 On mass storage Is data stored on program's scratch file (DATA)? NO Data file nane = ? TIME: INTERNAL Was data stored by the BS&DM systen ? YES Is data Medium placed in device INTERNAL ? YES Is prograei Mediuii placed in correct device ? YES Data file nawe: TIME = INTERNAL Data type is: Raw data NuMber of observations: iOQ Nunber of variables: i Variable nanes.- i. Xi Subfiles: NONE 187 SELECT ANY KEY Option nurtber = ? i Press special function key labeled-LIST Data type is: Raw data VARIABLE * 1 (XI) I OBS<I) OBS(I+i) OBSCI+2) 0BS<I+3) 0BS<I+4) i 2.00790 2.45450 2.55760 .50250 1.71430 6 1.71430 2.52480 .84390 2.89900 .32220 ii .18180 3.38780 1.71490 .16020 .10360 16 .53S30 1.18870 .01480 . 03510 .21580 21 .84770 1.85770 1.08500 3.25370 1.73570 26 1.03880 1.72300 1.72300 1.85580 .89840 31 .14220 .12790 1.49950 .11010 3.37350 36 .60190 1.90800 .52140 .29580 .49730 41 1.63010 .05740 1 . 08360 .57650 2.25210 46 2.72780 .83400 1.14640 .02070 .23900 Si 3.84480 1.29530 .81290 .85020 .97390 56 .43280 .83970 1.08490 .95980 .51170 61 .89530 2.51070 .32380 1.06270 3.21960 66 1.20550 .39400 . 29730 1.27110 .98670 71 2.31500 .48060 1.34410 .78670 2.28790 76 .12190 .54020 3.11250 . 17480 . 06320 81 .65310 .54450 .01050 .18050 .46430 86 .55340 .99490 .28950 1.36600 .15090 91 1.51270 1.53900 .77450 .14300 .44900 96 .43340 .16540 1.76060 .40100 .43230 Option riunber = ? SELECT ANY KEY Exit LIST procedure Enter number of desired function Select special function key labeled ADV. STAT Remove BSDM media Insert General Statistics media Choose 1 sample tests ***************************************** ONE SAMPLE TESTS VARIABLE -XI ♦ ♦W*******************************************************^******^**^*),:^^^^^ ** Enter desired function ■■ 1 Choose serial correlation SERIAL CORRELATION SAMPLE SIZE IS 100 CORRELATION LAG = ? 1 SERIAL CORRELATION WITH LAG = 1 IS .01605 Choose lag = 1 Not very serially correlated 188 ENTER ANOTHER LAG? YES CORRELATION LAG = ? 2 SERIAL CORRELATION WITH LAG IS . 01235 Try lag = 2 Not very correlated ENTER ANOTHER LAG? NO Enter desired function: 2 Obtain ranks RANKED DATA DISTINCT DISTINCT DISTINCT ( RANK DATA POINT) ( RANK DATA POINT) < RANK DATA POINT) < 1.00 .0105) < 2.00 .0148) ( 3.0 .0207) ( 4.00 . 0351) < 5.00 . 0574) ( 6.00 .0632) < 7.00 .1036) ( 8.00 .1101) < 9.00 .1219) < 10.00 . 1279) < 11.00 .1422) ( 12.00 .1430) ( 13.0 .1509) ( 14.00 .1602) ( 15.00 . 1654) < 16.00 .1748) < 17.00 . 1805) ( 18.00 .1818) ( 19.0 .2158) < 20.00 .2390) < 21.00 .2895) ( 22.00 .2958) < 23.0 .2973) ( 24.00 .3222) ( 25.00 .3238) ( 26.0 .3940) ( 27.00 .4010) < 28.00 .4323) < 29.00 .4328) < 30.00 .4334) ( 31.00 .4490) < 32.00 .4643) ( 33.00 .4806) ( 34.00 .4973) < 35.0 .5025) ( 36.00 .5117) ( 37.00 .5214) ( 38.0 .5353) ( 39.0 .5402) ( 40.00 .5445) < 41.00 .5534) ( 42.00 .5765) ( 43.00 .6019) < 44.00 .6531) < 45.00 .7745) ( 46.00 .7867) ( 47.00 .8129) ( 48.00 .8340) ( 49.00 .8397) < 50.00 .8439) ( 51.00 .8477) ( 52.00 .8502) ( 53.00 .8953) ( 54.0 .8984) ( 55.00 .9598) < 56.00 .9739) ( 57.00 .9867) ( 58.00 .9949) < 59.00 1.0388) ( 60.00 1.0627) < 61.00 1.0836) < 62.0 1.0849) ( 63.0 i .0850) ( 64.0 1.1464) ( 65.0 1.1887) ( 66.00 1.2055) ( 67.00 1.2711) < 68.00 1.2953) < 69.00 1 .3441) < 70.00 1.3660) < 71.00 1.4995) ( 72.00 1 .5127) < 73.0 1.5390) ( 74.00 1.6301) ( 75.50 1.7143) < 77.00 1.7149) ( 78.50 1.7230) ( 80.00 1.7357) ( 81.00 1.7606) < 82.00 1.8558) ( 83.00 1.8577) ( 84.00 1.9080) < 85.0 2.0079) ( 86.00 2.2521) < 87.00 2.2879) < 88.00 2.3150) ( 89.00 2.4545) ( 90.00 2.5107) ( 91.00 2.5248) < 92.0 2. 5576) < 93.00 2.7278) ( 94.00 2.8990) ( 95.0 3.1125) ( 96.00 3.2196) < 97.00 3.2537) < 98.00 3.3735) ( 99.00 3.3878) < 100.00 3.8448) Enter desir ed function: 3 Choose t-test ONE-SAMPLE t-TEST SAMPLE SIZE IS 100 1 OR 2 TAIL TEST 2 2 TAIL TEST HO: MU= 1.085611 OR = ? 2 tail test 189 1.0000 HO: MU= i N= 100 MEAN= 1 . 08S6 STD DEV = ,9301 STD ERROR OF MEAN= .0930 t = ,9204 DF= 99 Specify hypothesis mean Cannot reject hypothesis P< .9204 < t < .9204) .3596 Enter desired function: 4 Choose Kolmogorov-Smirnov G.O.F. test KOLMOGOROV-SMIRNOV GOODNESS-OF-FIT TEST SAMPLE SIZE IS 100 Please enter G.O.F. code: Testing for EXPONENTIAL goodness of fit. MEAN= 1. 085611 OR = ? Choose exponential form of the hypothesized distribution. MEAN = 1 N= 100, KOLMOGOROY-SMIRNOV STATISTICS: DN SQR<N>*DN .09907 .99 ANOTHER G.O.F. CODE? NO Enter desired function: 5 Choose Chi-square G.O.F. test CHI -SQUARE GOODNESS-OF-FIT TEST SAMPLE SIZE IS 100 Please enter G.O.F. code: Testing for EXPONENTIAL goodness of fit. Select exponential distribution again OFFSET = OFFSET = # OF CELLS <ctax is 50) = ? 10 * OF CELLS = 10 OPTIMUM CELL WIDTH = .3845 CELL WIDTH = .3844838448 OR = ? .4 Minimum value for histogram 10 intervals or windows 190 YOUR CELL WIDTH .4000 CELL # 3 4 5 6 7 8 9 10 LOWER LIMIT 0000 .4000 SOCIO 1 .2000 i .6000 2 .0000 2 .4000 2 .8000 3 .2000 3 .6000 OBSERVED * OF OBS. 26 20 19 8 ii 4 5 2 4 i EXPECTED * OF OBS. 30.82 21.32 14.75 10.20 06 88 38 34 62 12 CHI-SQUARE GQODNESS-OF--FIT FOR EXPONENTIAL DISTRIBUTION CHI- SQUARE VALUE = 9.248> DEGREES OF FREEDOM = 8 Not very big. ANOTHER GOF CODE? NO See Chi-square table in appendix with 8 degrees of freedom. Enter desired function: 7 Choose runs test RUNS TEST SAMPLE SIZE IS 100 Select a significance level by entering 1, 2 or Z- 3 TEST FOR TOO FEW RUNS? YES * OF RUNS IS NOT SIGNIFICANT AT THE .05 SIGNIFICANCE LEVEL FOR TOO FEW RUNS TEST FOR TOO MANY RUNS? NO Another significance level? NO Choose =c = .05 See if data is too non-random Enter desired function 9 Exit one-sample tests Enter nurtber of desired function: 6 SELECT ANY KEY Option nuMber = ? 1 NuMber of subfiles ( <=20 ) = ? 2 Name of Subfile * 1 ( <=10 characters ) = ? FIRST HALF Subfile * 1 ; number of observations = ? 50 Nacie of Subfile * 2 ( <=10 characters ) = ? SECONDHALF Is the above infornation correct? YES Subfile nafie: beginning observation nunber of observations 1 FIRST HALF 1 50 2 SECONDHALF 51 50 Return to BSDM to split data set in half for Shapiro-Wilk test. Select special function key labeled-SUBFILES Split data set by specifying number of observations in each subfile 191 Option nufiber = ? Exit subfiles procedure PROGRAM NOW UPDATING SCRATCH DATA FILE SELECT ANY KEY Return to General Statistics by pressing ADV. STAT key Enter nunber of desired function: 1 Choose one-sample tests SUBFILE NUMBER? <0=IGNORE SUBFILES) i ™ * ^v* ^ T ^ * * * ™ * * * ^ ^ ^ ^ ™ * * ^ * ^ ^ ^ * ^ * ^ * ^ * * t ^ t ^ ^ ^p^ ^ * ^^ ^ ^ ^ t * ^ ff ^ ^ * * ^ ^ ^ ^ T T ^ * ^ t ^ ^ ^ t ^ ^ ^ ^ *p ^ ^ ^ ^ ^ ^ ONE SAMPLE TESTS VARIABLE — -Xi SUBFILE --FIRST HALF Enter desired function: £> Select Shapiro-Wilk test for subfile 1 SHAPIRO-WILK NORMALITY TEST SAMPLE SIZE IS 50 W STATISTIC FOR NORMALITY = .904821834706 % POINTS FOR W <SMALL VALUE SIGNIFICANT) .05. .02 .05 .1 .5 CORRESPONDING U VALUES: .93 .938 .947 .955 .974 Enter desired function: S SUBFILE NUMBER? (0=IGNORE SUBFILES) 2 ONE SAMPLE TESTS VARIABLE — Xi SUBFILE — SECONDHALF Enter desired function: 6 Select Shapiro-Wilk test for subfile 2 SHAPIRO-WILK NORMALITY TEST SAMPLE SIZE IS 50 W STATISTIC FOR NORMALITY = .831574211967 X POINTS FOR W (SMALL VALUE SIGNIFICANT) .01 CORRESPONDING W VALUES: .93 Enter desired function: 9 Enter number of desired function: 6 .02 .05 .1 .5 .938 .947 .955 .974 Return to main menu Return to BSDM 192 SELECT ANY KEY Examples On Two Paired Samples Data Sets Pig Weight Changes 176 pigs were paired on the basis of sex, age, and initial weight. They were fed daily one of two iron compounds to supplement that which they lacked due to confinement in pens. It was desired to determine if there was any difference in pig weight due to the two different com- pounds as applied over a one month period. From the paired-t test and the correlation coeffi- cient, we see the difference is not significant. ******************************************************************************** * DATA MANIPULATION * *************************************************** Enter DATA TYPE (Press CONTINUE for RAW DATA): i Mode nuMber = ? Raw data On mass storage Is data stored on prograei's scratch file (DATA)? NO Data file nacte = ? PIGS: INTERNAL Was data stored by the BS4.DM systeM ? YES Is data nediuM placed in device INTERNAL ? YES Is prograM nediun placed in correct device ? YES PIG WEIGHT CHANGES Data file nacie: PIGS: INTERNAL Data type is: Raw data Nunber of observations: 88 Nuwber of variables: 2 Variable naeies: i. VARIABLES 2. VARIABLE#2 Subfiles: NONE Clever names for variables SELECT ANY KEY Option nuwber = ? 1 Enter Method for listing data 3 List all the data 193 PIG WEIGHT CHANGES Data type is: Raw data Variable # i Variable # 2 <VARIABLE#i> (VARIABLE#2> OBS* 1 54.00000 46.00000 2 44.00000 42.00000 3 46.00000 44.00000 4 54.00000 44.00000 5 45.00000 45.00000 6 46.00000 52.0000 7 50.00000 51.00000 8 43.0000 55.00000 9 47.00000 60. 00000 10 40.00000 43.00000 ii 40.00000 20.00000 12 46.00000 48.00000 13 52.00000 54.00000 14 50.00000 55.00000 15 54.00000 62.00000 16 49.00000 41.00000 17 30.00000 48.00000 18 50.00000 45.00000 19 48.00000 46.00000 20 38.00000 31.00000 21 27.00000 35.00000 22 50.00000 59.00000 23 107.00000 135.00000 24 77.00000 90.00000 25 91.00000 98.00000 26 88.00000 98.0 000 27 93.00000 96.00000 28 89.00000 74.0 00 29 95.00000 98.00000 30 105.00000 133.00000 31 107.00000 126.00000 32 95.00000 91.00000 33 114.00000 52.00000 34 128.00000 98.00000 35 110.00000 119.00000 36 104.00000 105.00000 37 94.00000 110. 00000 38 87.0000 81.00000 39 66. 00000 83.00000 40 96.00000 112.00000 41 120.00000 104.00000 42 90.00000 101.00000 43 95.00000 88.00000 44 86 . 86.00000 45 158.00000 221.00000 46 125.00000 176.00000 47 149.00000 150.00000 48 175.00000 176.00000 49 196.00000 209.00000 50 121.00000 118.00000 51 181.00000 180.00000 52 201.00000 238.00000 53 175.00000 196.00000 54 147.00000 138.00000 55 209.00000 133.00000 56 194.00000 159.00000 57 203.00000 209.00000 58 179.00000 205.00000 194 59 170 .00000 201 .00000 60 148 .00000 149 .00000 6i 138 .00000 159 .00000 62 232 .00000 230 .00000 63 223 .00000 198 .00000 64 151 .00000 161 .00000 65 142 .00000 147 .00000 66 167 .00000 176 .00000 67 210 .00000 320 .00000 68 240 .00000 267 .OUOOO 69 245 .00000 221 .00000 70 263 .00000 247 .00000 71 263 .00000 293 .00000 72 182. 00000 211. 00000 73 261. 00000 178. 00000 74 280. 00000 320. 00000 75 264. 00000 266. 00000 76 187. 00000 178. 00000 77 280. 00000 199. 00000 78 287. 00000 230. 00000 79 230. 00000 256. 00000 80 234. 00000 272 . 00000 81 238. 00000 245 . 00000 82 202. 00000 222. 00000 83 202. 245. 00000 84 317. 00000 243. 00000 85 293. 00000 264. 00000 86 215. 00000 215. 00000 87 171. 00000 172. 00000 88 242. 00000 233. 00000 Option SELECT nuciber = ? ANY KE> ' Exit list procedure Select special function key labeled-ADV. STAT Remove BSDM media Enter nuciber of desired function: Insert General Statistics 3 Choose two paired sample analyses VARIABLE NUMBER FOR X =? 1 VARIABLE NUMBER FOR Y =? ************************************************* PAIRED SAMPLE TESTS VARIABLE FOR X — VARIABLE FOR Y — VARIABLE*! VARIABLE*2 *******************************************#********************************#*#* Enter desired function: 1 Choose paired t-test PAIRED -t TEST SAMPLE SIZE IS 88 1 OR 2 TAILED? 1 HO : MU<X)-MU<Y> = Specify zero difference 1 TAILED TEST HO : MU(X)-MU(Y) = Hi : MU<X)~MU<Y> < LEVEL OF SIGNIFICANCE .05 T VALUE ~ -.736 DF == 87 Specify x = .05 T< 0.9500, 87 ) = 1.663 DO NOT REJECT HO AT .05 LEVEL OF SIGNFICANCE ANOTHER PAIRED-t TEST ON THIS DATA? NO Enter desired function; CROSS CORRELATION SAMPLE SIZE IS 88 Choose cross correlation 195 LAG ON X OR Y? Y LAG ON Y= ? i LAG ON Y = i COEFF. = .85126 ANOTHER CROSS CORRELATION? YES LAG ON X OR Y? Y LAG ON Y= ? LAG ON Y COEFF . 82534 ANOTHER CROSS CORRELATION? YES LAG ON X OR Y? Y LAG ON Y= ? 3 LAG ON Y = 3 COEFF. = .88230 Lag of 1 on y Try lag of 2 Try lag of 3 ANOTHER CROSS CORRELATION? YES LAG ON X OR Y? Y LAG ON Y= ? 22 LAG ON Y = 22 COEFF. = .89051 ANOTHER CROSS CORRELATION? NO Enter desired function 3 Try lag of 22 Choose family regression FAMILY REGRESSION / AOV SAMPLE SIZE IS 88 196 REGRESSION CODE =? i Choose linear regression Y=A+BX+E AOV OF LINEAR REGRESSION Y = A + BX OURCE ss DF MS F RATIO REG 481475.711 1 481475.711 581 . 18 RES 71246.789 86 828.451 TOTAL COR 552722.500 87 R SQUARED 5= .8711 YHAT ( 10.129409002 ) + ( .943467866544 >X EVALUATE Y AT X ? YES AT ALL X(I)'S ? YES Y EVALUATED AT X Table of predicted values and residuals X(I) YHAT Y(l) RES(I) 1 54.000 61.0767 46.00000 15 07667 2 44.000 51.6420 42. 00000 9 64200 3 46.000 53.5289 44.00000 9.52893 4 54.000 61.0767 44.00000 17.07667 S 45.000 52.5855 45.00000 7.58546 6 46.000 53.5289 52.00000 1.52893 7 50.000 57.3028 51.00000 6.30280 8 43.000 50.6985 55.00000 4.30147 9 47.000 54.4724 60.00000 5.S2760 10 40.000 47.8681 43.00000 4.868.1.2 11 40.000 47.8681 20.00000 27.86812 12 46.000 53.5289 48.00000 5.52893 13 52.000 59.1897 54. 00000 5.18974 14 50.000 57.3028 55.00000 2.30280 15 54.000 61.0767 62.00000 .92333 16 49.000 56.3593 41.00000 15.35933 17 30.000 38.4334 48.00000 9 . 56656 18 50.000 57.3028 45.00000 12.30280 19 48.000 55.4159 46.00000 9.41587 20 38. 000 45.9812 31.00000 14.98119 21 27.000 35.6030 35.00000 .60304 22 50.000 57.3028 59.00000 1.69720 23 107.000 111.0805 135.00000 23.91953 24 77.000 82.7764 90.00000 7 . 22357 25 91.000 95.9850 98.00000 2.01502 26 88. 000 93.1546 98.00000 4.84542 27 93.000 97.8719 96.00000 1.87192 28 89.000 94.0980 74.00000 20.09805 29 95.000 99.7589 98.00000 1.75886 30 105.000 109.1935 133.00000 23.80647 31 107.000 111.0805 126.00000 14.91953 32 95.000 99.7589 91.00000 8.75886 33 114.000 117.6847 52.000 00 65.68475 34 128.000 130.8933 98.00000 32.89330 35 110.000 113.9109 119.00000 5.08913 36 104.000 108.2501 105.00000 3.25007 37 94.000 98.8154 110.00000 11 .18461 38 87.000 92.2111 81.00000 11.21111 39 66.000 72.3983 83.00000 10.60171 40 96.000 100.7023 112.00000 11.29768 197 41 120.000 123.3456 104.00000 19.34555 42 90.000 95.0415 101.00000 5.95848 43 95.000 99.7589 88.00000 11.75886 44 86.000 91.2676 86.00000 5.26765 45 158.000 159.1973 221.00000 61.80267 46 125.000 128.0629 176.00000 47.93711 47 149.000 150.7061 150.00000 .70612 48 175.000 175.2363 176.00000 .76371 49 196.000 195.0491 209.00000 13.95089 50 121.000 124.2890 118. 00000 6.28902 51 181.000 180.8971 180.00000 .89709 52 201.000 199.7665 238.00000 38.23355 53 175.000 175.2363 196.00000 20.76371 54 147.000 148.8192 138.00000 10.81919 55 209.000 207.3142 133.00000 74.31419 56 194.000 193.1622 159.00000 34.16218 57 203.000 201.6534 209.00000 7.34661 58 179.000 179.0102 205.00000 25.98984 59 170.000 170.5189 201.00000 30.48105 60 148. 000 149.7627 149.00000 .76265 61 138.000 140.3280 159. 00000 18.67203 62 232.00 229.0140 230.00000 .98605 63 223.000 220.5227 198.00000 22.52274 64 151.000 152.5931 161.00000 8.40694 65 142.000 144.1018 147.00000 2.89815 66 167.000 167.6885 176.00000 8.31146 67 210.000 208.2577 320.00000 111.74234 68 240.000 236.5617 267.00000 30.43830 69 245.000 241.2790 221.00000 20.27904 70 263.0 00 258.2615 247.00000 11.26146 71 263.00 258.2615 293.00000 34.73854 72 182.000 181.8406 211.00000 29.15944 73 261.000 256.3745 178.00000 78.37452 74 280.000 274.3004 320.00000 45.69959 75 264.000 259.2049 266.00000 6.79507 76 187.000 186.5579 178.00000 8.55790 77 280.000 274.3004 199.00000 75.30041 78 287.000 280.9047 230.00000 50.90469 79 230.000 227.1270 256.00000 28.87298 80 234.000 230.9009 272.00000 41.09911 81 238.000 234.6748 245.00000 10.32524 82 202.000 200.7099 222.00000 21.29008 83 202.000 200.7099 245.00000 44.29008 84 317.000 309.2087 243.00000 66.20872 85 293.000 286.5655 264.00000 22.56549 86 215.000 212.9750 215.00000 2.02500 87 171.000 171.4624 172.00000 .53759 88 242.000 238.4486 233.00000 5.44863 REGRESSION CODE =? Enter desired function: 10 Enter number of desired function: 6 Exit family regression Exit two-paired sample test. Return to BSDM 198 Bus Passenger Service Time The time required to service passengers boarding at a bus stop was measured together with the actual number of passengers boarding. The service time as recorded from the moment that the bus stopped and the door opened until the last passenger boarded t' , us. The objective is to determine a model for predicting passenger service time, given ! ■*§ ;. i Ige of the number boarding at a particular stop. Let X = number boarding and Y *• ; • enger service time. The following data was gathered during the month of May, T. *'; twelve downtown locations in Louisville, Kentucky. * DATA MANIPULATION * *********************************************#*************************)! ******** Enter DATA TYPE (Press CONTINUE for RAW DATA): i Mode nu fiber = ? Raw data From mass storage Is data stored on progran's scratch file (DATA)? NU Data file nane - ? BUSTIME: INTERNAL Was data stored by the BS&DM systew ? YES Is data Mediutt placed in device INTERNAL ? YES Is proqraM nediuM placed in correct device ? YES BUS PASSENGER SERVICE TIME Data file nane: BUSTIME = INTERNAL Data type is: Raw data NuMber of observations: NuMber of variables: 31 Variable names: i . NUMBER 2. TIME Subfiles : NONE SELECT ANY KEY Option nuciber = ? i Enter Method for listing data; 3 Choose special function key labeled-LIST List all data BUS PASSENGER SERVICE TIME Data type is: Raw data 199 Variable ♦ 1 < NUMBER ) OBS* i 1.00000 2 1.00000 3 1.00000 4 1.00000 S 1.00000 6 2.00000 7 2.00000 8 2.00000 9 2.00000 iO 3.00000 ii 3.00000 12 3.00000 13 4.00000 14 S . 15 5. 00000 16 6.00000 17 6.00000 18 6.00000 19 7.00000 20 7.00000 21 8.00000 22 8.00000 23 8.00000 24 9.00000 25 10.00000 26 ii. 00000 27 il. 00000 28 13.00000 29 17.00000 30 19.00000 31 25.00000 Variable # 2 (TIME ) 5. 6. 9. 1.40000 2.80000 3.00000 1.80000 2.00000 4.70000 8.00000 3.00000 2.50000 .20000 .20000 .40000 11.70000 7.50000 11.90000 13.60000 12.40000 11.60000 14.70000 13.50000 12.00000 14.10000 26.00000 19.00000 21.20000 22.90000 22.60000 25.20000 33.50000 33.70000 54.20000 Exit list procedure Choose special function key labeled-ADV. STAT Remove BSDM media Insert General Statistics media Choose two paired sample test Option nunber - ? SELECT ANY KEY Enter nuciber of desired function: 3 VARIABLE NUMBER FOR X =? 1 VARIABLE NUMBER FOR Y =7 2 PAIRED SAMPLE TESTS VARIABLE FOR X — - NUMBER VARIABLE FOR Y — TIME *************************************************** Enter desired Function; 3 Choose family regression 200 FAMILY REGRESSION / AOV SAMPLE SIZE IS 3i REGRESSION CODE =? Linear regression Y=A+BX+E AOV OF LINEAR REGRESSION Y = A + BX SOURCE REG RES TOTAL COR DF 3970.237 2ii.758 4181 .995 1 29 30 MS 3970.237 7.302 F RATIO 543 . 72 R SQUARED = .9494 Not bad! YHAT ( .586330097087 > + < 1.99576699029 >X EVALUATE Y AT X ? YES AT ALL X(I)'S ? YES Y EVALUATED AT X X(I) YHAT Y(I> 1 1 .000 2.5821 2 1.000 2.5821 3 1.000 2.5821 4 1.000 2.5821 5 1.000 2.5821 6 2.000 4.5779 7 2.000 4.5779 8 2.000 4.5779 9 2.000 4.5779 10 3.000 6.5736 11 3.000 6.5736 12 3.000 6.5736 13 4.000 8.5694 14 S.000 10.5652 15 5.000 10.5652 16 6.000 12.5609 17 6.000 12.5609 18 6.000 12.5609 19 7.000 14.5567 20 7.000 14.5567 21 8.000 16.5525 22 8.000 16.5525 23 8.000 16.5525 24 9.000 18.5482 25 10.000 20.5440 26 11.000 22.5398 27 11.000 22.5398 28 13.000 26.5313 29 17.000 34.5144 30 19.000 38.5059 31 25.000 50.4805 ) RES(I) 1 .40000 1 . 18210 2.80000 21790 3.00000 41790 1.80000 78210 2.00000 58210 4.70000 12214 8.00000 3. 42214 3.00000 1. 57786 2.50000 2. 07786 5.20000 1. 37363 6.20000 .37363 9.40000 2 .82637 11.70000 3 .13060 7.50000 3 06517 11.90000 1 .33483 13.60000 1 .03907 12.40000 . 16093 11.60000 .96093 14.70000 .14330 13.50000 1 .05670 12.00000 4 .55247 14.10000 2 .45247 26.00000 9 .44753 19.00000 .45177 21.20000 .65600 22.90000 .36023 22.60000 .06023 25.20000 1 .33130 33.50000 1 .01437 33.70000 4 .80590 54.20000 3 .71950 REGRESSION CODE Exit family regression 201 Enter desired function: 10 Enter number of desired function 6 Exit two paired sample tests Return to BSDM Example #3 This example is included for your convenience as a sample problem so that you may check your operation of the routines involved. * DATA MANIPULATION * Enter DATA TYPE (Press CONTINUE for RAW DATA): i Mode nuMber = ? Is data stored on program's scratch file <DATA)? NO Data file name = ? TUONP: INTERNAL Was data stored by the BS&DM system ? YES Is data medium placed in device INTERNAL ? YES Is program medium placed in correct device ? YES Raw data On mass storage TWO SAMPLE NONPARAMETRIC STATISTICS Data file name: TWONP = INTERNAL Data type is: Raw data Number of observations: 12 Number of variables; 2 Variable names: i. X(I) 2. Y(I) Subfiles: NONE SELECT ANY KEY Option number = ? i Enter method for listing data: 3 Select special function key labeled-LIST List all data 202 TWO SAMPLE NONPARAMETRIC STATISTICS Data type is: Raw data Variable * 1 (X(I) ) Variable # 2 (Y<I) > OBS* i 2 3 4 5 86. 00000 71.00000 77.00000 68.00000 91.00000 88.00000 77.00000 76.00000 64.0 000 96.00000 6 7 8 9 10 11 12 72.00000 77.0 000 91.00000 70.00000 71.00000 88.00000 87.00000 72.00000 65.0 00 00 90.00000 65.00000 80.0000 81.00000 72.00000 Option SELECT nuciber = ? ANY KEY Exit list procedure Select special function key labeled-ADV. STAT. Remove BSDM media „ . . . Insert General Statistics media Enter nurcber of desired function = 3 Select two paired sample test VARIABLE NUMBER FOR X =? 1 VARIABLE NUMBER FOR Y =? 2 ***************************************** * ************************************** PAIRED SAMPLE TESTS VARIABLE FOR X — X<I) VARIABLE FOR Y — Y(I) ******************************************************************************** Enter desired function: 4 Select sign test SIGN TEST SAMPLE SIZE IS NUMBER OF POSITIVE DIFFERENCES = 7 (THE 1 POINTS WHERE X(I)=Y<I) ARE EXCLUDED FROM THE TEST) NUMBER OF OBSERVATIONS USED = 11 YIELDS AN APPROX. STD. NOR. DEV . = .90453 No real differences Enter desired function: 5 Select Wilcoxon Signed Rank test WILCOXON SIGNED RANK SAMPLE SIZE IS 12 203 SUM OF POSITIVE RANKS = 41.5 (USING RANKS OF X(I)-Y(I> AND EXCLUDING THE i POINTS WHERE X(I)=Y(I>) NUMBER OF OBSERVATIONS USED = 11 YIELDS APPROXIMATE STANDARD NORMAL DEVIATES i) WITHOUT CORRECTION FOR CONTINUITY : A) NOT COMPENSATING FOR TIED DIFFERENCES . .75574 B) CONDITIONAL ON THE EXISTING TIED DIFFERENCES : .75649 2) WITH CORRECTION FOR CONTINUITY = A) NOT COMPENSATING FOR TIED DIFFERENCES : .71129 B) CONDITIONAL ON THE EXISTING TIED DIFFERENCES : .71199 Confirms no differences Enter desired function = 6 Select Taha's higher power signed rank test HIGHER POWERED SIGNED RANKS SAMPLE SIZE IS POWER OF THE RANK (MUST BE 2, 3, 4, OR 5) 2 POWER OF THE RANK IS 2 SUM OF POSITIVE RANKS SQUARED = 335.75 (USING RANKS OF X(I)-Y(I> AND EXCLUDING THE 1 POINTS WHERE X(I)=Y(I>> NUMBER OF OBSERVATIONS USED =11 YIELDS AN APPROX. STD . NOR. DEV . OF 8284 CONDITIONAL ON THE EXISTING TIES AND WITHOUT A CORRECTION FOR CONTINUITY Enter desired function: 7 Again no difference Select Spearman Rank Correlation SPEARMAN'S RHO SAMPLE SIZE IS 12 SUM OF SQUARED RANK DIFFERENCES = 75 RHO = .73776 Seems to indicate that X & Y are related Enter desired function* 8 Select Kendall's Tau test KENDALL'S TAU SAMPLE SIZE IS 12 NUMBER OF CONCORDANT PAIRS = 49 NUMBER OF DISCORDANT PAIRS = 12 TAU = .56061 Also indicates X & Y are related 204 Enter desired function: 10 Enter nuMber of desired function 6 Exit two paired sample tests Return to BSDM Examples on Two Independent Samples Example 1 The following is an example of a two-sample t-test. ******************************************************************************** * DATA MANIPULATION * ******************************************* Enter DATA TYPE (Press CONTINUE for RAW DATA): i Mode nurtber = ? Is data stored on prograM's scratch file (DATA)? NO Data file nacie = ? ANEXMP2: INTERNAL Was data stored by the BS&DM systeM ? YES Is data MediuM placed in device INTERNAL ? YES Is progran nediun placed in correct device ? YES Raw data On mass storage ANOTHER EXPAMLE Data file nafie- ANEXMP2: INTERNAL Data type is: Raw data NuMber of observations: 13 NoMber of variables: i Variable nattes: 1. MEANS Subfile nane beginning observation nuwber of observations i. FIRST PART i 6 2. SEC. PART 7 7 SELECT ANY KEY Option nunber = ? i Select special function key labeled-LIST List data ANOTHER EXPAMLE Data type is: Raw data 205 I OBS(I) i 2.00000 6 4.00000 ii 6.00000 Option nunber = ? SELECT ANY KEY Enter number of desired function: VARIABLE t i (MEANS) OBS<I+i) 0BS(I+2> OBSCI+3) 3.00000 4.00000 2.00000 5.00000 4.00000 2.00000 3.00000 7.00000 0BS(I+4) 3.00000 2.00000 Exit list procedure Select special function key labeled-ADV. STAT Remove BSDM media Insert General Statistics media Select two independent sample test VARIABLE NUMBER =? i ******************************************************)m*********************** TWO INDEPENDENT SAMPLE TESTS VARIABLE — MEANS SUBFILE NUMBER FOR THE 'X' DATA? i X SUBFILE — FIRST PART SUBFILE NUMBER FOR THE 'Y' DATA? 2 Y SUBFILE — SEC. PART ************************************************** Enter desired function: * Select two sample t-test TWO SAMPLE t TEST SAMPLE 1 N == 6 MEAN = VARIANCE » COEFF. OF VARIANCE STD. DEV. = SAMPLE 2 N == 7 MEAN = VARIANCE = COEFF. OF VARIANCE STD. DEV. = 3.000000 .800000 '9.814240 .894427 4.142857 3.809524 47.112417 1.951800 t = 1.3147 WITH DF= 11 PROB <t > 1.3147) =.10769 Enter desired function: 8 Enter number of desired function: 6 Exit two sample tests Return to BSDM 206 Example 2 A cloud seeding experiment was performed using 16 nonseeded and 10 nonseeded days. The amount of rainfall, in inches, was recorded for the seeded (X) and nonseeded (Y) cases. Three tests to see if the median rainfall was identical were performed, none of which indicates that the two medians differ significantly. Taha's squared rank test was performed, since it was assumed that greater precipitation amounts are more important, and should therefore be weighted more heavily in this type of experiment. ************************************************ * DATA MANIPULATION * ******************************************************************************** Enter DATA TYPE (Press CONTINUE for RAW DATA): i Raw data Mode number = ? On mass storage Is data stored on program's scratch file <DATA)? NO Data file nacte = ? CLOUD: INTERNAL Was data stored by the BS&DM system ? YES Is data medium placed in device INTERNAL ? YES Is program medium placed in correct device ? YES CLOUD Data file name: CLOUD ■■ INTERNAL Data type is: Raw data Number of observations: 26 Number of variables: i Variable names: i . DAYS Subfile nane beginning observation number of observations 1. SEEDED i 10 2. NONSEEDED il 16 SELECT ANY KEY Select special function key labeled-LIST Option number = ? i List all data CLOUD Data type is= Raw data 207 VARIABLE # i (DAYS) I OBS(I) OBS(I+l> 0BS(I+2) 0BS<I+3> 0BS<I+4> i .05000 .72000 .69000 .09000 .04000 6 .62000 .37000 .23000 1.18000 .26000 ii .18000 .88000 .12000 .74000 .43000 16 .10000 .65000 .06000 .09000 .41000 21 .12000 .41000 .05000 .03000 .320 26 .05000 Option nu nber = ? SELECT ANY KEY Enter nuober of desired function Select special function key labeled-ADV. STAT Remove BSDM media Insert General Statistics Select 2 independent sample test VARIABLE NUMBER =? 1 TWO INDEPENDENT SAMPLE TESTS VARIABLE — DAYS SUBFILE NUMBER FOR THE 'X' DATA? 1 X SUBFILE — SEEDED SUBFILE NUMBER FOR THE 'Y' DATA? Y SUBFILE — NONSEEDED Select median test Enter desired function •? MEDIAN TESTS DO YOU WANT THE COMBINED RANKS PRINTED? YES COMBINED RANKS I FOR X(I) FOR Y(I) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 4.0000 23.0000 22.0000 .5000 .0000 20.0000 16.0000 13.0000 26.0000 14.0000 7. 2. 12.0000 25.0000 10.5000 24.0000 19.0000 9. 0000 21.0000 0000 .5000 17.5000 10.5000 17.5000 4.0000 1.0000 6. 7. Both data sets are combined and then ranked from smallest to largest. Tied ranks are assigned to identical data values. 15.0000 4.0000 208 I) TEST STATISTIC, T = 2 Useful for large samples. Since the values YIELDS A STD. NOR. DEV . OF .2894 are small do not reject hypothesis of no CONDITIONAL ON THE 5 EXISTING TIES differences between X and Y. II) CONTINGENCY TABLE ANALYSIS TOTAL * * * * OF OBS. > * 6 * 7 * 13 GRAND MEDIAN * * * * * * * OF OBS. <= * 4 * 9 * 13 GRAND MEDIAN * * * TOTAL 10 16 ; 1) YIELDS AN APPROXIMATE CHI -SQUARE VALUE WITH 1 DF OF A) USING YATES' CORRECTION FOR CONTINUITY = .16250 B) WITHOUT CORRECTION FOR CONTINUITY : .650 2) FISHER'S EXACT PROBABILITY OF THE EXISTING CELL FREQUENCIES OR WORSE .34408 All three values for the two by two table con- clude no difference between X' and Y's for middle value. Enter desired function 3 Select Mann-Whitney test MANN-WHITNEY TEST DO YOU WANT THE COMBINED RANKS PRINTED? NO SUM OF THE RANKS OF X ■- 147.5 YIELDS AN APPROX. STD. NOR. DEV . OF : CONDITIONAL ON THE 5 EXISTING TIES Designed to see if X's differ from Y's. ,6583 Conclude, they do not. For large sample sizes. Enter desired function- 4 Select Taha's squared rank TAHA'S SQUARED RANK DO YOU WANT THE COMBINED RANKS PRINTED? NO SUM OF X RANKS SQUARED = 2786.25 YIELDS AN APPROX. STD. NOR. DEV . OF = CONDITIONAL ON THE 5 EXISTING TIES Useful to see if X's differ from Y's in spread of data sets. ,7605 Conclude they do not. Enter desired function: 8 Exit from two independent sample tests 209 Enter nunber of desired function: 6 Return to BSDM Example 3 An investigator is interested in whether there is a significant difference in the time required to pace himself for one mile between a near sea level location and a high altitude location. Forty five low altitude observations (Y) and forty high altitude observations (X) were col- lected. It was decided to test whether the two populations from which the investigator sam- pled have the same distribution. Both the Cramer-Von Mises and Kolmogorov-Smirnov tests were performed, neither of which indicates that there is a significant difference between low altitude and high altitude pacing. * DATA MANIPULATION * Enter DATA TYPE (Press CONTINUE for RAW DATA) : i Raw data Mode nuMber = ? On mass storage Is data stored on program's scratch file (DATA)? NO Data file name = ? ALTITUDE: INTERNAL Was data stored by the BS&DM systew ? YES Is data Medium placed in device INTERNAL ? YES Is program medium placed in correct device ? YES ALTITUDE Data file name: ALTITUDE: INTERNAL Data type is: Raw data Number of observations: 85 Nunber of variables: i Variable names: i . ALTITUDE Subfile name beginning observation number of observations i. HIGH 1 40 2. LOW 4i 45 SELECT ANY KEY Select special function key labeled-LIST Option nunber = ? 1 List all data 210 ALTITUDE Data type is: Raw data I OBS<I> i 405.00000 6 394.00000 ii 394.00000 16 367.00000 21 361.00000 26 387.00000 31 351.00000 36 348.00000 41 361.00000 46 392.00000 51 379.00000 56 376.00000 61 373.00000 66 360.00000 71 438.00000 76 386.00000 81 3S7. 00000 Option SELECT nuriber = ? ANY KEY VARIAE iLE * i L (ALTITUDE) OBS(I+i) 0BS(I+2) 0BS(I+3) 387 .00000 400 .00000 392 00000 366 .00000 389 .00000 356, 00000 379 .00000 359 ,00000 357 00000 380 .00000 395 .00000 442 00000 361 .00000 360, ,00000 353 00000 352 00000 385 00000 349 00000 367 00000 364 .00000 363, 00000 360 00000 353 00000 355. 00000 362 00000 359 00000 382, 00000 371 00000 398 ,00000 400. 00000 370. 00000 365. 00000 362. 00000 371. 00000 369 00000 375. 00000 360 00000 374 00000 412. 364. 00000 377 00000 360. 00000 408. 00000 380. 00000 414. 00000 362 00000 380 00000 377. 00000 393. 00000 357. 00000 369. 00000 0BS<I+4) 343 00000 380, 00000 342 .00000 368 .00000 361 .00000 384 ,00000 345 00000 353 00000 350 00000 367. .00000 355. 00000 366, 00000 397. 00000 450, 00000 383. 00000 360 00000 373. 00000 Enter nuMber of desired function' Exit list procedure Select special function key labeled-ADV. STAT Remove BSDM media Insert General Statistics Select two independent sample test VARIABLE NUMBER =? 1 TWO INDEPENDENT SAMPLE TESTS VARIABLE -- ALTITUDE SUBFILE NUMBER FOR THE 'X' DATA? i X SUBFILE — HIGH SUBFILE NUMBER FOR THE 'Y' DATA? Y SUBFILE LOW ****************************************************** Enter desired function; 5 Select Cramer-Von Mises CRAMER-VON MISES Hypothesis is that x distribution is the same as y SUM OF THE SQUARED DIFFERENCES YIELDS A TEST STATISTIC, T= 9471 .2359 CRITICAL REGION OF SIZE 0.10 IS FOR T > 0.347 0.05 IS FOR T > 0.461 0.01 IS FOR T > 0.743 Accept hypothesis Enter desired function 6 Select Kolmogorov-Smirnov test 211 KQLMOGOROV-SMIRNQV Same hypothesis MAXIMUM DIFFERENCE, T <IN ABS . VALUE) = .2556 LARGE SAMPLE CRITICAL REGION OF SIZE O.iO IS FOR T > .2651 0.05 IS FOR T > .2955 0.01 IS FOR T > .3542 Same conclusion Enter desired function: 8 Exit Enter nunber of desired function: 6 Return to BSDM Example On Multiple Sample Data Sets 1. The following example was run to determine the effect of the addition of different sugars on length (in ocular units) of pea sections grown in tissue culture with auxin present. The first sample contains the control results, while the other samples contain: a. 2% glucose added b. 2% fructose added c. 1% glucose and 1% fructose added, and d. 2% sucrose added. After running the one way AOV, a large F value was calculated, indicating there was some difference. To determine which samples were different, two multiple comparison tests were run. In both the Least Significant Difference and in the Duncan's test, all samples differed significantly from the control sample. The Kruskal-Wallis test further supports this conclusion. * DATA MANIPULATION * Enter DATA TYPE (Press CONTINUE for RAW DATA): i Raw data Mode nuMber = ? Is data stored on prograci's scratch file (DATA)? NO Data file nane = ? TISSUE: INTERNAL Was data stored by the BS&DM systen ? YES Is data wediuM placed in device INTERNAL ? YES Is program ciediun placed in correct device ? YES TISSUE CULTURE GROWTH Data file name: TISSUE = INTERNAL On mass storage 212 Data type is: Raw data Number of observations: 50 Number of variables: i Variable names: i. GROWTH Subfile name i. CONTROL 2. 2% GLUCOSE 3. 2X FRUCT. 4. 1ZGLU+1FRU 5. 2XSUCR0SE SELECT ANY KEY Option nuMber = 1 beginning observation number of observations i 10 ii 10 2i 10 31 10 41 10 Select special function key labeled-LIST List all data Data type is: Raw data TISSUE CULTURE GROWTH VARIABLE * 1 (GROWTH) I OBS(I) 1 75.00000 6 71.00000 ii 57.00000 16 60.00000 21 58.00000 26 56.00000 31 58.00000 36 56.00000 41 62.00000 46 62.00 00 Option SELECT number = ? ANY KEY OBS(I+l) 67.00000 67.00000 58.00000 60.00000 61.00000 61.00000 59.00000 58.00000 66.00000 65.00000 OBSCI+2) 70 .00000 67.00000 60.00000 57.00000 56.00 000 60.00000 58.00000 57.00000 65.00000 65.00000 OBSCI+3) 75.00000 76. 00000 59.00000 59.00000 58.00000 57.00000 61.00000 57.0 000 63.00000 62.0 00 0BS(I+4) 65.00000 68.00000 62.00000 61.00000 57.00000 58.00000 57.0 00 59.00 000 64.00000 67.00000 Exit list procedure Select special function key labeled-ADV. STAT Remove BSDM media ,. „ . -j.j- Insert General Statistics Enter number of desired function: 4 Select three or more samples NUMBER OF TREATMENTS =? 5 MULTIPLE SAMPLE TESTS VARIABLE — GROWTH SUBFILE NUMBER FOR TREATMENT # 1 = ? 1 TREATMENT * 1SUBFILE — CONTROL SUBFILE NUMBER FOR TREATMENT * 2 = V 2 TREATMENT # 2SUBFILE — 2% GLUCOSE SUBFILE NUMBER FOR TREATMENT # 3 = ? 3 Specify treatments by subfiles 213 TREATMENT # 3SUBFILE — ZX FRUCT. SUBFILE NUMBER FOR TREATMENT # 4 = ? 4 TREATMENT # 4SUBFILE — 1XGLU+1FRU SUBFILE NUMBER FOR TREATMENT # 5 = ? 5 TREATMENT # 5SUBFILE — 2XSUCR0SE Enter desired function^ i Select one-way AOV ONE WAY AOV TRT # i 75.00000 65.00000 76.00000 67.00000 71.00000 68.00000 70.00000 67.00000 75.00000 67.00000 TRT * 2 57.00000 62.00000 59.00000 58.00000 60.00000 61.00000 60.00000 60.00000 59.00000 57.00000 TRT # 3 58.00000 57.00000 57.00000 61.00000 56.00000 58.00000 56.00000 61.00000 58.00000 60.00000 TRT * 4 58.00000 57.00000 57.00000 59.00000 56.00000 59.00000 58.00000 58.00000 61. 00000 57.0 00 TRT # 5 62.00000 64.00000 62.00000 66.00000 62.00000 67.00000 65.0 0000 65.00000 63.00000 65.00000 .# N MEAN VARIANCE STD DEV STD ERRORS i 10 70.1000 15.8778 3.9847 ) t.2601 2 10 59.3000 2.6778 1.6364 .5175 3 10 58.2000 3.5111 1 . 8738 . 592S 4 10 58.0000 2.0000 1.4142 .4472 5 10 64.1000 3.2111 1.7920 . S667 214 ANALYSIS OF VARIANCE SOURCE DF SS MS F TOTAL 49 1322.8200 TRTS 4 1077.3200 269.3300 49.3680 ERROR 45 245.5000 S.4S56 PROB <F > 49. 3680) =0.0000 Treatments diffe r significan BARTLETT'S TEST DF = 4 , CHI-SQUARE = 13.9386 PROB (CHI-SQUARE > 13.9386) Enter desired function: = .0075 Variances within treatments also differ. Probably just first treatment differs from the others. Select multiple comparisons MULTIPLE COMPARISONS CHOOSE A NUMBER AND PRESS CONTINUE 1 WHAT CONFIDENCE LEVEL ? < . 99, . 95, etc . ) .95 TABLE VALUE FROM STUDENT'S t 2. 02 DO YOU WISH TO PLOT ON THE CRT? YES Beep signify the end of plot, then press CONTINUE. DO YOU WANT A HARD COPYdF THIS IS FEASIBLE)? NO LSD ERROR MEAN SQUARE = 5.4556 DEGREES OF FREEDOM = 45 CONFIDENCE LEVEL = .95 TABLE VALUE FROM STUDENT'S t = LSD procedure at 95% confidence. 2.0200. LSD = 2.1100 SAMPLES RANKED A B - C _ MEANS 1 -c 2 -A 3 -A 4 -A 5 -B CHOOSE A NUMBER 1 WHAT CONFIDENCE AND PRESS CONTINUE LEVEL ? < .99, .95, etc. ) .95 TABLE VALUE FROr* i STUDENT'S t 2.02 DO YOU WISH TO PLOT ON THE CRT? NO Plotter indentif ier string(press CONT if'HPGL')? Treatments 2-4 are not different from one another. Treatment 1 differs from the others. Treatment 5 differs from the others. 215 Plotter select code, bus #<defults are 7,5)? WHICH PEN COLOR SHOULD BE USED? 1 Beep signify the end of plot, then press CONTINUE. LSD ERROR MEAN SQUARE = 5.4556 DEGREES OF FREEDOM = 45 CONFIDENCE LEVEL = .95 TABLE VALUE FROM STUDENT'S t = 2.0200, LSD 2.1100 SAMPLES RANKED 4 3 5 1 A — B C MEANS 1 -C 2 -A 3 -A 4 -A 5 -B 72.00 70.40 68.80 67.20 §= 65.60- z: uj 64.e8h o_ s: & 62.40 60.80 59.20 57.60 56.00 <s LSD I II oj en *■ SAMPLE NUMBER in 216 CHOOSE A NUMBER AND PRESS CONTINUE S ERROR MEAN SQUARE =? 5 DEGREES OF FREEDOM =? 2 WHAT CONFIDENCE LEVEL ? ( . 99, . 95, etc . ) .95 TABLE VAL FROM NEW MULT RANGE TEST FOR 5 MEANS ? 3.17 TABLE VAL FROM NEW MULT RANGE TEST FOR 4 MEANS ? 3.i TABLE VAL FROM NEW MULT RANGE TEST FOR 3 MEANS ? 3.01 TABLE VAL FROM NEW MULT RANGE TEST FOR 2 MEANS ? 2.86 Choose Duncan's multiple comparison procedure Tables available in appendix DUNCAN'S TEST ERROR MEAN SQUARE = 5.0000 DEGREES OF FREEDOM = 2 LEVEL OF CONFIDENCE = .95 NUMBER OF MEANS = 5, TABLE VALUE = NUMBER OF MEANS = 4, TABLE VALUE = NUMBER OF MEANS = 3, TABLE VALUE = NUMBER OF MEANS = 2, TABLE VALUE = 3. 170 , DIFFERENCE s 2. .242 3. 100 , DIFFERENCE = 2 .192 3. 010 , DIFFERENCE = 2 .128 2, 860 , DIFFERENCE = 2 .022 SAMPLES RANKED 4 3 5 1 C MEANS 1 -C 2 -A 3 -A 4 ~A 5 -B CHOOSE A NUMBER AND PRESS CONTINUE 6 Same conclusion as in LSD Exit multiple comparisons Enter desired function- 3 Choose Kruskal-Wallis test KRUSKAL-WALLIS TEST CHI-SQUARE = 38.1101 DF = 4 P(CHI-SQUARE > 38.1101) = 0.0000 Conclude treatments differ. Enter desired function 5 Exit 3 or more samples Enter nunber of desired function: 6 Return to BSDM 217 Analysis of Variance General Information Description The Analysis of Variance package is made up of six analysis routines as well as a number of auxiliary routines that can be used after the analysis of variance (ANOVA or AOV) is completed. The following analyses are available for balanced data sets - • Factorial design - multiway classification with or without major blocks. • Nested design - includes completely nested, mixed nested and crossed classifications. • Split-plot design - several types in which one or more factors can be in the whole plot. These three analyses can be used for balanced or unbalanced designs - • One-way ANOVA - completely randomized one-way classification. • Two-way ANOVA (unbalanced) - one or more of the cells can be empty or be unequal in sample size. • One-way Analysis of Covariance - for the completely randomized one-way classifica- tion. For each of the designs in this package, the objective of the routine is to sort out the sources of variability and assign, if possible, responsibility for a portion of the total variability in the data to certain factors in the design. Input The first step is to input your data via the Basic Statistics and Data Manipulation routines. Because the data for the AOV programs must be in a very structured format, please read the Basic Statistics and Data Manipulation section of this manual and the portion of this section entitled Data Structures before entering your data. After entering your data, one of the six types of designs is selected and questions will be asked in order to determine the exact design you are using. 218 Auxiliary Routines The following routines can be used to complement the analyses performed by the six design routines - • Orthogonal Polynomials - performs a decomposition of the specified sum of squares into linear, quadratic,..., portions. This routine should be used only for factors with quantitative levels. • Treatment Contrasts - performs a comparison on a specified factor. Output includes sum of squares and F ratio. • Multiple Comparison Procedures - can be used to perform one or more of five routines to determine which factor levels represent different population levels. For a more detailed description, please see the portion of this manual entitled Multiple Sam- ple Tests in the General Statistics section. • Interaction Plot - allows you to study the relationship between two or three factors. (Not available from One-way or Covariance routines.) • FPROB - generates right-tailed probability values for the F distribution. Special Routines New Response This allows you to specify a new response variable for the last design chosen. So, even after you have done multiple comparisons (or any other analysis) you may go back to the same design and specify a new response variable without having to answer all of the design questions. After this is done, a title and description of the last design will be displayed on the CRT. Special Considerations Limitations This program is capable of handling 50 variables with a total of 1500 data values. In addition, there are certain limitations imposed for each program as follows - • Factorial - the product of (levels of A)*(levels of B)*(levels of C)*(levels of D) = size =s 500. Also, (number of blocks)*size*(number of observations per cell) =s 1500. • Nested - size (as described above) =s 500. No blocks are permitted. • Split Plot - Blocks are necessary. Only factors A,B and C are permitted in addition to blocks, and (levels of A)*(levels of B)*(levels of C)*(number of blocks) *£ 500. • One Way - There can be up to 50 treatments. • Two Way (unbalanced) - At least one cell must have more than one observation. The number of rows (A factor) =s 20. The number of columns (B factor) =£ 20. (number of rows)*(number of columns) =£ 200. 219 • One-way Covariance - There can be up to 25 treatments. • Orthogonal Polynomial - The polynomial can be up to the tenth degree. • Treatment Contrast - There can be up to 20 levels of one-way means and up to 200 levels of two-way means. • Multiple Comparison - same as for Treatment Contrast. • Interaction Plot - there can be no more than 20 levels of the factor plotted on the X axis, otherwise the plot becomes "messy". Balanced vs. Unbalanced Designs To convert from a balanced design to an unbalanced design, you need to use the data manipulation section of the package to create variable(s) with the factor levels for the two factors in the unbalanced design. On the other hand, if you have finished a factorial analysis and now want to use a one-way design on the same data set, the program allows you to do this by selecting the Advanced Statistics option on the menu. Discussion General The analysis of variance (AOV) technique can be used in many data analysis situations where it is desired to characterize the sources of variation in a "planned" experiment. The essential feature of AOV is that the total variation of the numbers (data) is uniquely decom- posed into separate parts. For example, suppose we have run an experiment in which we used four varieties of corn and three row spacings. We repeated this experimental set-up five times (on five fields). We can then break the total variation down into five components as indicated below: AOV Source DF SS MS F Total 5*4*; 3-1 = = 59 SSt Fields (or Blocks) 5-1 = 4 SSb MSb Fi = = MSb/MSe Varieties 4-1 = 3 SSv MSv F 2 = = MSv/MSe Row Spacings 3-1 = 2 SSr MSr Fa = = MSr/MSe Var. X Row 3*2 = 6 SSvR MSvr F4 = = MSvr/MSe Error 44 SSe MSe 220 In order to more fully develop our understanding of the usefulness of AOV, let us discuss how one might use such a table. Starting with the first column, we see the decomposition of the total variation into its five components. The next column shows the allocation of the so-called degrees of freedom (see references). Notice that the degrees of freedom compo- nents add up to the degrees of freedom associated with the total sum of squares. For the total source of variation, the degrees of freedom will be the total number of observations in the experiment minus one. The SS(sum of squares) column shows the breakdown of the total sum of squares for the experiment into the various components. One could prove algebraically that SSt = SSb + SSv + SSr + SSvr + SSe and likewise for the degrees of freedom. The MS (mean square) column is obtained by taking SS/DF. This reflects an "average" variation due to each of the sources. The last column is the F-ratio or testing column. Generally, we are testing the hypothesis that there is "nothing" happening in the experiment versus the expected hypothesis that something "worthwhile" is occurring. If nothing is happening, then all mean sources of variation should be of the same magnitude as the error mean square. The F-ratio is a statistical test to see if the mean square for the source of variation in question is significantly bigger than the error mean square. If it is, we can conclude that there is a "real" effect. For example, suppose that F2 is quite large. We would then be able to conclude that the population variety means are not all the same. That is, at least one of the variety means differs significantly from the others. How big do the F values have to be? That depends on the degrees of freedom associated with the numerator MS and the degrees of freedom associated with the denominator (error) MS. The computed F values may be compared with tabled values to find out if they are significant at the .10, .05, .01, or .005 level, or, with this program, you can actually compute the level of significance. The program will automatically calculate the Prob[F > F calculated] for a factorial AOV. For nested or partially nested AOV, the user may elect to use the F probability option to find the probability levels. Factorial Versus Nested Models Many researchers have difficulty differentiating between a factorial model and a nested model for AOV. A brief example may be of some help. In a three-way factorial model, for example, the levels of factor B are the same over all levels of factors A and C. Suppose factor A is three temperature settings, factor B is two pressure settings and factor C is four different laboratories. In a factorial model, we would assume that each of the six (three temperature * two pressure) combinations had been studied at each of the four laborator- ies. In a nested AOV with factor C nested in A and B, we might assume that the same six combinations were run; however, for each of the six combinations, four different laborator- ies (greenhouses, plants, fields, classrooms, etc.) were used. Hence, a total of 24 laborator- ies were used instead of just four. Assuming just one observation per laboratory and ex- perimental combination, the AOV table for the factorial would be: 221 Factorial AOV Example Source DF SS MS Total 3*2*4-1 = 23 JJ Total Temperature 3-1 = 2 SSt MSt Pressure 2-1 = 1 SSp MSp Temp x Pres 2*1=2 SStp MStp Laboratories 4-1 = 3 SSl MSl Temp x Lab 2*3 = 6 SStl MStl Pres x Lab 1*3 = 3 SSpl MSpl Temp x Pres x Lab 2*1*3 = 6 SStpl MStpl However, for the nested model described above, the AOV table would be: Nested AOV Example Source DF SS MS Total 23 SSlotal Temperature 3-1=2 SSt MSt Pressure 2-1 = 1 SSp MSp Temp x Pres. 2*1=2 SStp MStp Lab (temp x pres) (4-l)*3*2 = = 18 SSlitpi MSlitpi Notice that the AOV tables are somewhat different. Actually, the SSutp) can be obtained (and is in the program) from the first AOV table by noting that SSl(TP) = SSl + SStl + SSpl + SStpl. Generally, in nested or partially nested AOV's, the nested factor is considered to be a random effect. Partially Nested vs. Nested Models Consider a laboratory experiment involving mice in which three levels of some drug (factor A) are to be investigated. Seven mice (factor B) are used for each drug level and the response variable is determined on four days (factor C). One model which might be used for the analysis would be three levels of factor A; seven levels of factor B nested on factor A; and four levels of factor C. The AOV table would be: AOV Source DF SS MS Total 83 Drug 2 Mice(Drug) 18 Days 3 Drug x Days 6 Time x Mice(Drug) 54 OO Total SSd MSd -* SSm(D) MSmidi SSt MSt -*- SSdt MSdt-*" SStmidi MStmid) - 222 This type of design is sometimes called a repeated measurements design. It is also a partially nested design because factor C is crossed both with factor A and the nested factor B. As is indicated by the arrows in the AOV table, at least two different "error" terms are used for studying the significance in this model. It should be noted that it is necessary to have exactly the same number of subjects within each level of factor A in order to use the analysis in this package. Two-Factor AOV Structure The analysis of variance is a method of decomposing the sum of squared deviations of the observations about the overall mean [l{yak - -y...) 2 ] into various sources. For a two-factor design, we may show sources of variation due to the row effect (A), the column effect (B), the row-by-column interaction effect (AB) and the within error effect (ERROR). For exam- ple, consider an experiment in which we have four levels of temperature (100, 150, 175, 200°C) and three levels of pressure (5, 10, 15 psi) with several determinations of the chemical yield (y) for each combination of temperature (ROWS) and pressure (COL- UMNS). One possible arrangement of the data might be as shown below: Pressure 5 10 15 Temperature Column 1 Column 2 Column 3 100 Row 1 ym. yiinii y 121.. yi2nl2 y i3i. ,.yi3ni:i 150 Row 2 175 Row 3 200 Row 4 y41 1 ... ,y41n41 y421. .y42n42 y431 y43n43 Each y,jk stands for the numerical value of the chemical yield in percent. The subscript i refers to the row designator, the j for the column designator, and the k for the observation number in the i.jth cell. Notice that the nu are not necessarily all equal, nor is it necessary that n i( be > = 1. If the ny are all equal, the analysis of variance involves the usual summing and summing of squares, a task which could be performed by hand calculators. When the n„ are not all equal, the exact analysis is quite complicated. Note that the table which we have described above does not show how the experiment was actually run. According to good statistical practice the order of running the experiment should be in a random fashion. That is, conceptually, all of the possible sequences should be equally likely and the experimenter should choose one sequence at random. 223 Reasons for Unbalanced Designs Unbalanced two-factor designs might arise in at least three ways. First, the design could have been planned as a balanced design (all n„ equal). However, several observations may be lost due to death of a subject, etc. This often happens in research even though ex- perimenters use good experimental techniques. Second, because of the nature of the variability of one response (or some other reason), the experimenter may have set up the design with an unequal number of observations in the cells. For example, suppose that one of the row levels is really a control or standard dose. It may be a common practice to use fewer observations on the control than the other drugs (other "levels" of the row factor). A third possibility is that certain combinations of the row and column levels might yield results which are impossible to monitor in an experiment. This might happen if in the experiment described above, the highest temperature level (200°C) and the highest pressure level (15 psi) proved to be "too much" for the chemical process. In general, of course, it is not a good procedure to design two-factor experiments in which certain levels of the factors cannot be included in the experiment. Approximate Analyses for Two-Factor Experiments If each cell (row-column combination) has at least one observation and the number of observations in each cell is approximately the same, the method of unweighted means is sometimes used. Essentially, in this analysis, the cell means are subjected to the usual two-way AOV with one observation per cell, and the within error term is added to the table after adjustment. (See Bancroft, reference 1, p. 35.) This approximate analysis will prob- ably allow you to draw accurate conclusions for most sets of data. One reason why we might use this type of analysis is because the "exact" analysis is quite complicated. The complexity of the analysis is related to the fact that the calculations which must be performed do not just involve the usual summing and summing of squared values. In short, the exact analysis is a "messy" problem. Unbalanced Two-Way AOV - "Exact" Solutions As described more completely in reference 1, Chapter 1, the solution involves rather messy notation. We shall avoid the notational problems by describing, in words, the procedures that you should use in interpreting the AOV tables, rather than describing the computing procedures which were used. 224 Once again, the idea of the AOV is to separate out the various sources of variation from an observable set of data. In the balanced two-factor design, the analysis of variance table might be written as follows: AOV Source df Sium of Squares Mean Squares Total N TSS Rows R - 1 RSS Columns C-1 CSS RxC Interaction (R-1)(C- -DISS Residual N-RC ESS RSS CSS - (R-D - (C-1) ISS -=■ ESS - (R-1) (C-1 - (N-RC) In this table, R equals the number of rows, C equals the number of columns, and N equals the number of observed y's. The computations which are involved in obtaining the Sum of Squares column will not be described. Suffice it to say that in each case the individual observations or the means are compared to the overall mean. As a brief review, let us examine that AOV procedure. According to the AOV procedure, we are trying to determine if the source of variation for rows, columns, and/or the interaction is significantly bigger than the error source of variation. This is done by calculating certain ratios of mean squares—the so-called F-ratios. Under the assumption of no differences among the row population means (i.e., levels of temperature), the mean square (MS) for rows should be of the same magnitude as the MS for the error. In a similar fashion, the source of variation for columns and interaction can also be tested. For balanced sets of data, that is where the subclass frequencies are all the same, the decomposition of the sources of variation for a two-factor design is orthogonal. This means that every SS and MS in the table represents the source of variation as indicated in that row. When we have an unbalanced design, the table is not as easy to interpret. In order to understand the output provided by this program, we will use the hypothetical experiment described earlier. Suppose that the table of nu, the frequency counts for the twelve row-column cells is as follows: Temperature Pressure 5 10 15 100 5 4 5 N = 54 150 5 5 5 175 5 5 4 200 4 3 4 225 Ordinarily we would ask the investigator to use equal n^; however, there might be perfectly good reasons why this was not possible. Preliminary AOV Tables The next output from this program is the Preliminary AOV tables. The first table has the general form: Source DF Preliminary AOV SS MS F-ratio Total N - 1 = 53 SSt Subclass* RC-1 = 11 SSs MSs MSs/MSe ERROR N- RC = 42 SSe MSe * Rows + Columns + Interaction The decomposition in this table looks as if we have twelve individual treatments rather than four temperature and three pressure combinations. If the F-ratio is large (and the F-Prob is small), say less than about .05, we can conclude that not all twelve population means are the same. The second table has a further decomposition of the subclass source into main effect differences and interaction differences. Source DF Interaction Preliminary. AOV SS MS F-Ratio Total N-1 =53 SSt Main Effects* R + C-2 = 5 SSm MSm MSm/MSe Interaction** (R-D(C-l) = 6 SS. MS, MSi/MSe Error N-RC = 42 SSe MSe * Row + Column **RxC This table helps us determine if there is interaction in our two-way design. This is important because it may help us decide which analysis to use next, that is, which of the FINAL AOV's we should choose (see Bancroft). If one or more cells are empty, the method of fitting constants must be used for the final analysis. For the method of fitting constants, we assume no interaction is present in the model. Hence, if either one n« = and/or interactions are assumed to be absent in the population, we should use the METHOD OF FITTING CONSTANTS FINAL AOV. If in- teraction between the row and column factors is expected to be present in the population and all n.j > = 1, the METHOD OF SQUARED MEANS should be used. 226 If you are uncertain whether or not interactions are present, your interpretation of the output of the PRELIMINARY AOV table for interactions may help you decide. If the F- PRQB for the interaction F-ratio is small enough, we might conclude that interaction is present. (Bancroft, reference 1, suggests that if F-PROB < .25, one should use the method of squared means. ) Interpreting the Method of Fitting Constants AOV Since this method assumes that the model is of the form Y = A + B * (ROW LEVELS) + C * (COLUMN LEVELS) + ERROR, what remains to be tested by this method is if the row levels (means) differ significantly from each other and if the column levels (means) differ significantly from each other. The calculations involve (see page 16, Bancroft) finding the solution to a set of least-squares equations. As we discussed above, when all rn, are equal, the sum of squares due to rows is orthogonal to the sum of squares for columns. However, when the n« are not all equal, by using the method of fitting constants, the program will construct the following table: Source DF SS MS F-Ratio Total N-l = 53 SSt Rows (unadjusted) R-1 = 3 SSr MSr Columns (adjusted) C-1 - 2 SSc-A MSc-a Fi = MSc-a/MSe Columns (unadjusted) C-1 = 2 SSc MSc Rows (adjusted) R-1 = 3 SSr a MSr-a F 2 = MSr-a/MSe Interaction (R-D(C-l) = 6 SS: MS. F 3 = MS./MSe Error N-RC = 42 SSe MSe The first two F-ratios can be used to test the following hypotheses: Ho: The "B" terms in the model are not needed; Ho: The "C" terms in the model are not needed. The third F-ratio is the same test for the interaction obtained in the preliminary AOV table. Notice that the SS for columns is obtained after correction for rows. That is, SSc a (columns adjusted for rows) = SSm (main effects in preliminary AOV table) - SSrow. (rows ignoring the column effects). Hence, some of the calculation for the final AOV by the method of fitting constants are derived from the preliminary AOV table. In conclusion, the method of fitting constants allows us to make "good" tests for main effects if the interaction term is absent. Also, if one or more nu — zero we must use this method since the interpretation of a significant interaction is questionable anyway. After determining that the row and/or column means differ significantly, one might wish to do some type of multiple comparison procedure to determine where the significant differences lie. 227 Interpreting the Method of Squared Means AOV When interaction is assumed present in our model or suspected to be present in the model after studying the preliminary AOV table, the method of squared means can be used to find "good" estimates of the main effects if all n,, > 0. This analysis operates on the cell means weighted by Wi = c 2 /(S 1/n.j) for the ith row and Wj = r 2 /(21/nu for the jth column. The model for this situation would be: Y = A + B * (ROW LEVEL) + C * (COLUMN LEVEL) + D (ROW, COLUMN LEVELS) + ERROR where A represents the average value and D represents the coefficient for the interaction term. The method, which is described on pages 24-29 of Bancroft, would yield an AOV table as follows: Source DF SS MS F-Ratio Total N-1 =53 Rows (weighted) R-1 = 3 SSr-w MSr-w MSr-w/MSe Columns (weighted) C-1 = 2 SSc-w MSc-w/MSe MSc-w/MSe Interaction (R-1) (C-1) = = 6 SSi MS. MS./MSe Error N-RC = 42 SSe MSe The F-ratios for rows and columns using the weighted cell means will indicate if the main effects are significant. Of course, if the interaction term is already determined to be signifi- cant, the interpretation of the main effects must be given careful consideration. Quite frequently experimenters find it useful to plot the subclass means in order to study the "pattern" for the interaction. Orthogonal Polynomial Breakdown If the levels of the row and/or column factors are quantitative, it might be of interest to decompose the sum of squares for these terms into single-degree-of-freedom terms for a polynomial model. For example, suppose that the row levels are quantitative such as the temperature levels which we described above (100, 150, 175, 200°C). Since there are four levels, it is possible to fit up to a third degree polynomial to the row levels. Hence, the SS for rows could be decomposed into orthogonal components for linear, quadratic and cubic terms, each with one degree of freedom. The program will perform the elaborate calcula- tions even if the row or column levels are unequally spaced. (For example, the column levels were given as 5, 10, 15 psi. Instead, they could have been 5, 10, 20 psi with unequal spacings between the levels.) For further information about these procedures, see references 1 and 2. References 1. Bancroft, T.A. (1968). Topics in Intermediate Statistical Methods. The Iowa State University Press, Ames, Iowa. 2. Searle, S.R. (1971). Linear Models, John Wiley and Sons. 228 Data Structures In order to provide for the analysis of six different types of designs the arrangement of the data must be 'presumed' by the program. The material that follows describes the various arrangements within the Basic Statistics and Data Manipulation (BSDM) routines, which are possible for each design. Please read the section dealing with the design which you are considering before attempting to enter your data. Further information about the designs considered in this package can be found in the Discussion section and in the references. Factorial Designs All data to be analyzed with the Analysis of Variance package is entered into memory via the Basic Statistics and Data Manipulation routines. The order in which the data is entered is very important. In general, sampling replications are entered in order, then factors are varied, then blocks are varied. That is, assuming a four-factor design and no sampling replications, the levels of factor D must vary the most rapidly, followed by the levels of C, B, A, and finally the levels of the blocks. Consider an example in which there are two blocks (major replications), two levels of A and three levels of B. Assume for the moment that we do not have any sampling replication and only one response variable. The structure within the Basic Statistics and Data Manipulation (BSDM) program would use only one variable since it is not necessary to store the levels of the factors and blocks when using the (ba- lanced) Factorial program. The structure for this two-way factorial in two blocks would be: Response Factor Factor OBS.# Variable 1 B A Blocks 1 Yin Bi Ai Block 1 2 Yll2 B2 3 Yll3 B 3 4 Yl21 Bi A2 5 Yl22 B2 6 Yl23 B 3 7 Y211 Bi Ai Block 2 8 Y212 B2 9 Y213 B 3 10 Y221 Bi A z 11 Y222 B2 12 Y223 B 3 Note The levels of Factor B vary most rapidly while the blocks vary the slowest. The Ys represent numerical data which is the only in- formation stored in BSDM. The first subscript indicates the block, the second indicates the level of factor A and the third designates the level of factor B. 229 You should remember that it is absolutely essential that you arrange your data in this form prior to entering the BSDM program. Of course, if you are careful, there are ways around the apparent limitation suggested above. Consider the following data set which has already been entered via the BSDM program: OBS# Variable (i) Factor V Factor U Blocks 1 Yin Vi Ui Block 1 2 Yl21 V2 3 Yll2 Vi U2 4 Yl22 V2 5 Yll3 Vi u 3 6 Yl23 V2 7 Y211 Vi Ui Block 2 8 Y221 V2 9 Y212 Vi U2 10 Y222 V2 11 Y213 Vi U3 12 Y223 V2 First of all, note that blocks (major replications) must vary the slowest. We can use this data structure in the Factorial program by telling the program that factor A, the factor which varies slowly, is factor U and has three levels; while factor B is our factor V and has two levels. Hence, independent of the implied subscripts, levels and ordering, we have con- siderable flexibility in specifying the factors. We must only make sure the Factor A is the factor which varies most slowly while Factor B is the factor which varies most rapidly. So far we have described how the data must be structured for the major replications and factors. We will now describe the two modes of data arrangement which are permissible for the minor replications (samples). If you have only one sample per treatment combination, there will be no difference between the two modes. 230 The first mode assumes that the response variable resides in only one of the variables specified in BSDM. Hence any minor replications/samples will have to be entered as subse- quent observations in BSDM. For example, suppose we have a factorial with two blocks, two levels of factor A, and three levels of factor B, with two replications (samples) per factorial combination. The data structure with three different response variables might appear as follows: Variables Factor OBS# 1 = %Ca 2 = %Cu 3 = %Fe Sample B A Block 1 Xn X.21 X31 1 Bi Ai Block 1 2 Xl2 X.22 X32 2 3 Xl3 X.23 X33 1 B 2 4 Xl4 X.24 X34 2 5 XlS X.25 X35 1 B3 6 Xl6 X.26 X36 2 7 Xl7 X.27 X37 1 Bi A2 8 XlS X.28 X38 2 9 Xl9 X29 X39 1 B2 10 XllO X:210 X310 2 11 Xm X:211 X311 1 B 3 12 Xll2 X:212 X312 2 Block 2 24 Xl24 X:224 X324 2 B 3 A2 The first mode of replicate/sample storage conserves on the use of variables (see Special Considerations for program limitations); however, it does use more observations. If you have only one response variable in your experiment it may be more efficient to use the second mode for specifying the sampling replications. This mode assumes that each observation in the BSDM program contains all replication values stored one per variable. Hence, the same design described above would appear as follows (here, the subscripts indicate the levels of factor A and factor B, respectively): Variables Factor Factor OBS. l = Repl 2- = Rep2 B A Block 1 Xn X21 Bi Ai Block 1 2 X12 X22 B2 3 Xl3 X23 B 3 4 Xl4 X24 Bi A2 5 Xl5 X25 B2 6 Xl6 X26 B 3 231 One other example is included without comment. Keep in mind that in our examples we have named the factors A, B, C, and D. As long as your data is arranged in some order with one factor varying the most rapidly within another factor, etc; you can call these factors A, B, C, and D where your factor called A will vary the slowest, etc. Example (Factorial)— two Blocks, two levels of Factor A, three levels of factor B, two sam- pling replications: DATA ENTRY OPTIONS FORM 1 FORM 2 OBS.# Variable #1 1 Blki Ai Bi Repi 2 Rep2 3 B2 Repi 4 Rep2 5 Bs Repi 6 Rep2 7 A 2 Bi Repi 8 Rep2 9 B 2 Repi 10 Rep2 11 Bs Repi 12 Rep2 13 Blk2 Ai Bi Repi OBS.# Variable#l Variable#2 1 Blki Ai Bi Repi Repz 2 B 2 Repi Rep2 3 Ba Repi Rep2 4 A 2 Bi Repi Rep2 5 B 2 Repi Rep2 6 Bs Repi Rep 2 7 Blk2 Ai Bi Repi Rep2 8 B 2 Repi Rep 2 9 Bs Repi Rep2 10 A 2 Bi Repi Rep2 11 B 2 Repi Rep2 12 Bs Repi Rep2 The order of the observations must be as shown above to get the correct results. In general, the levels of blocks will vary slower than levels of factor A, B, C, D and replicates within cells vary the fastest. Nested Design The form of the data structure for the nested or mixed design is quite similar to that previously described for the Factorial Designs. As far as the program is concerned, the nested design is considered to be in a factorial arrangement. The program will calculate the sum of squares, etc., as if the design were a factorial design and then pool the appropriate terms to form the nested or mixed design which you specified. As you may have already noted, the design must be balanced. This means that if factor C is nested within factor A and is denoted as C(A), then there must be exactly the same number of levels of factor C within each level of factor A. You may wish to refer to the Discussion section to familiarize yourself with the design arrangements for a nested design as compared to a factorial design. 232 Perhaps an example of a completely nested design structure would be helpful at this time. Suppose that within each of five sections of land we select two lakes at random. From each lake assume that three random positions in the lake are chosen at which we select two samples. Suppose further that the samples are each divided into two beakers and are analyzed separately. Assume that three responses are measured: Yi = Var. l = ppm lead, Y2 = Var.2 = ppm zinc, and Y3 = Var.3 = ppm copper. In this experiment, we will designate the five land sections as the levels of factor A, the various lakes as levels of factor B, and the position as levels of factor C. Notice that factor B is nested in factor A, and that factor C is nested within factor B. These relationships are commonly denoted by B(A) and C(B) respectively. For the first form of data arrangement, the two samples per position in the lake will be shown as stored in subsequent observations (down) rather than in an additional variable (across). A dash ( — ) indicates a numerical value which would be entered in BSDM. Form 1 Obs# Varl=Yi Var2 = Y 2 Var3 = = Y 3 Sample Position Lake Section 1 _ . - 1 Pi Li Sec 1 2 - - - 2 - - - 3 - - - . 1 P2 - - 4 - - - 2 - - - 5 - - - 1 P.3 - - 6 - - - 2 - - - 7 - - - 1 Pl = P4* L2 - 8 - - - 2 - - - 9 - - - 1 P 2 = P 5 - - 10 - - - 2 - - - 11 - - - 1 P3='P 6 - - 12 * ■ ~ 2 60 _ _ _ 2 Pa = P30 L2 = L 10 * Sees * Within each lake the "first" position Pj has no relationship with the "first" position in another lake; hence we have a total of thirty different lake positions. ** Since each section has two lakes selected from it, there are a total of ten lakes studied in this project. 233 The other form of data entry for this nested design would use twice as many variables since each sample would be included as another variable rather than another observation. Hence the last row would look like: Sample 1 Sample 2 Sample 1 Sample 2 Sample 1 Sample 2 Obs# 30 Varl=Yi Var2 = Yi Var3 = Y 2 Var4 = Y 2 Var5 = Ya Var6 = Y 3 With a little practice you will find that it is quite easy to structure your data so that the Nested Analysis will correctly recognize your data set. Mixed designs must be entered via the BSDM routines in a similar manner. Keep in mind that whichever factor you call D must have its levels varying more rapidly than factor C which in turn varies faster than factor B. The levels of factor A will change only after each level of factor B have appeared once. Note BLOCKS as described in the Factorial Design are not considered for the Nested Design. That is, you will not be asked any questions concerning blocks (major replications) of this design. Split-Plot Design In terms of the data structure in the BSDM routine, it is immaterial whether one is using a Split-Plot Design or a Factorial Design. Both designs are the same in terms of the data arrangement in BSDM. Examples representing the two modes of data arrangement for the minor replications (samples) will be shown below. Consider a split-plot experiment in which the pull-off force necessary to remove boxes from a tape is to be studied (see Hicks pp 219- 222, 226). Two complete replications (blocks) of the following experiment were performed. Three long strips of tape with boxes attached were chosen to represent three different methods of attaching the boxes to the strips. A chamber was used to study the effects of three humidity levels (50, 70, and 90%) on the pulling force of three boxes. The ex- perimental procedure called for randomly choosing one of the three humidity levels and adjusting the chamber to maintain that level. Two portions of each of the three strips were placed in the chamber for a specified period of time. The pull-force was then measured for each of the six portions of strip. Subsequently, one of the two remaining levels of humidity was randomly chosen and the process was repeated. Finally, the last level of humidity was maintained in the chamber. Upon completion of the first three humidities times three strips times two samples = 18 measurements, the entire process was repeated again in a random fashion. 234 The reason that this is a split-plot design and not a factorial is because of the ordering of the measurements of pull force. Since it was not deemed possible to randomly investigate the effects of humidity and strip type on the pull force response, we have a restricted rando- mization of the split-plot type. The two forms for specifying the sample replications are shown below. Note how the factor names A and B have been assigned to the factors in this experiment and how that corres- ponds to the data arrangement as shown. Only one response variable is necessary for this design. FORM 1 Y = pull force B A OBS# Variable 1 Sample Humidity Strip Block 1 _ 1 50% SI Bl 2 - 2 3 - 1 70% 4 - 2 5 - 1 90% 6 - 2 7 - 1 50% S2 8 - 2 9 - 1 70% 10 - 2 11 - 1 90% 12 - 2 13 - 1 50% S3 14 - 2 15 - 1 70% 16 - 2 17 - 1 90% 18 - 2 19 - 1 50% SI B2 36 2 In this experiment we would specify two blocks (major replications). Factor A (strips) has three levels, factor B (humidity) has three levels, and there are two samples for mode 1 (all samples wihin the same variable). Later, in the Split-Plot Design program, we would specify that factor B (humidity) is the whole plot while factor A (strips) is the subplot. As the experiment is described above, the humidity factor (B) would be in the whole plot even though it does not vary as fast as the strip factor (A). We could have entered our data in a manner which would have had the levels of humidity varying the slowest. Then we would identify humidity as factor A. 235 The second mode of sample specification for this example would require two variables, say variable one and variable two. FORM 2 Y = pull force B A OBS# Var 1 = Sample 1 Var2 = Sample2 Humidity Strip Block 1 - 50% Si Bi 2 - 70% 3 - 90% 4 - 50% Sz 5 - 70% 6 - 90% 7 - 50% S3 8 - 70% 9 - 90% 10 - 50% Si B 2 11 - 70% 12 - 90% 13 - 50% s 2 14 - 70% 15 - 90% 16 - 50% S3 17 - 70% 18 90% The one-way design, or one-way classification as it is sometimes called, has three possible forms of data organization or structures in BSDM. These three forms are identical to the forms for the ONE-WAY ANALYSIS OF COVARIANCE except that the covariance analysis will expect both a response variable, Y, and a covariate, X, to be specified while the ONE-WAY DESIGN expects only the response variable Y. The first mode of data organization for the one-way classification uses t variables in BSDM to specify the t treatments in this design. Consider an experiment in which four types of "mums" were investigated in a greenhouse experiment. Suppose two responses were mea- sured: diameter (Yi) and plant height (Y2). The data was collected in two separate years (subfiles) with approximately five pots per variety. One possible organization of this data is as follows: 236 Mode 1 Example Variable 1 2 3 4 5 6 7 8 Response Yi Y 2 Yi Y 2 Yi Y 2 Yi Y 2 Treatment/Var lety Type 1 Type •2 Type 3 Type 4 OBS# Subfile 1 - - - - - - - 1975 2 3 4 - - - - - - - _ _ MV MV _ _ _ 5 - - MV MV MV - MV MV Subfile 6 - - - - - - - 1976 7 - - - - MV - - 8 MV MV - - - - - 9 MV MV - - - MV - 10 MV MV - - - - - 11 MV MV MV MV - - - Here, a dash ( - ) indicates a numerical value is present, and MV indicates that a missing value is assigned to this position. Note The arrangement shown above has provisions for missing values to accommodate the various number of pots per treatment (varie- ty). The two subfiles do not have the same number of pots per treatment. The MV operation must be used to 'square-off the sample sizes for each variable. You would tell the program that variables one, three, five, and seven represent the four treatments for the first response (diameter). You would then specify the subfile number. The program would then assume that the sample size is five if subfile one is specified and six if subfile two is specified. If subfiles are to be ignored, then a sample size of 11 would be assumed. Of course all calculations within the program would check for missing values (MV) and delete those values from the calculations. Subsequent to the analysis on the first response, Yi, you may remain within this subfile and specify another response, say Y2. Finally, you may select another subfile and/or variables for further analysis. The second mode for possible data organization within the BSDM structure uses only one variable for each response. Within this response variable, the treatment observations are assumed to be contiguous. You specify the number of observations in each treatment including any missing values. The program assumes that the first observation in the first treatment is observation number one if the first subfile is chosen or subfiles are ignored, or the first observation within the specified subfile. Thereafter, the subfile is partitioned into t nonoverlapping but connected intervals - one corresponding to each treatment. Hence, for the example with four treatments and two response variables, one possible arrangement might be: Mode 2 EXAMPLE 237 Variable 1 2 Treatments OBS# Yi Y2 (Variety#) SUEJFILE 1 1 _ _ 1 1975 2 - - 3 - - 4 - - 5 - - 6 - - 2 7 - - 8 - - 9 - - 3 10 - - 11 - - 12 - - 13 MV - 14 - - 4 15 - - 16 - - 17 - - SUEJFILE 2 18 1976 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 MV MV 1 2 238 Note The sample sizes for the first subfile of each variable would be five, three, four, and four, respectively. For subfile two, the sam- ple sizes would be two, five, five, and six. Of interest is the com- parison between the number of data storage positions needed for the two modes of arrangement. For mode 1, the number of posi- tions required would be 11 observations times 8 variables = 88. For the second mode, the number required is 36 observations times 2 variables = 72. In many cases, if there are several missing values you may conserve available memory locations by using the second mode of arrangement. The third mode of data entry allows for treatments which are not necessarily connected within one variable. Each treatment is composed of a contiguous set of observations. Since this mode of data arrangement may choose treatment groups throughout the data set, it is not possible or necessary to specify subfiles. The arrangement of the data is similar to the arrangement described for method 2, however it is possible to have "gaps" or "holes" in the data set. Consider the example described above. Suppose it is desired to compare 1975 variety #2 with the 1976 variety #2 for both responses (Yi and Y2). Please refer to the Mode 2 Example and note that we would need to compare observations 6, 7, and 8 with observa- tions 20, 21, 22, 23, and 24. The first three specified observations are from variety #2 in subfile one which is the 1975 data set and the other five values are from variety #2 in subfile two which is the 1976 data set. Note that although this mode of data arrangement is quite similar to Mode 2, it does provide for more freedom on the part of the data analyst in terms of which treatments are to be used. Two-Way (Unbalanced) Design The unbalanced nature of this design makes it more complicated in terms of the data arrangement. It will not be possible to assume that the order of input is completely specified by factor names such as factor A and factor B. This is because it is possible to have not only different numbers of minor replication (samples) within each treatment combination (levels of factor A and factor B), but also to have one or more cells completely missing. Of course, the absence of certain cells is not a desirable characteristic of any factorial experiment; however, there are certain situations in which missing cells naturally occur. Therefore it is necessary for the BSDM data structure to provide for proper identification of the row and column levels (factors A and B) as well as the particular sample number within that cell. Two methods of specification are permitted for this type of design. The first "data storage type" assumes that you will use three BSDM variables to specify the response variable and factor levels. One variable will be used to store the particular response to be analyzed at this time. One variable will be used for each of the two factors A and B. It is not necessary to use a variable to specify the sample or observation number; however, you may wish to do so in order to completely identify each observation. 239 Please note that the levels of factors A and B must be the integers 1, 2, ...up to the number of levels of each factor. Hence, if factor A has three levels 70, 80, and 120, you would store these three levels in a variable as 1, 2, and 3 rather than 70, 80, and 120. The purpose of this restriction is to conserve data storage allocation. Within the program you will be able to specify the actual levels of the variables when this is necessary for the computation. As an example of the first data storage type, suppose you have factors of time and tempera- ture involved in an experiment which is designed to study the effects of these two factors on the yield (Y) of a chemical process. Suppose you had used three time settings of 4, 5, and 7.5 hours and three temperature settings of 110, 115, 120° F. Assume that, for one reason or another, from two to five samples were run at each treatment combination (temperature and time condition). Further, let us assume that at the highest temperature and time condi- tion, it was impossible to finish the experimental process. Thus, we can consider this "cell" as missing. Assume two responses Yi and Y2 were measured on almost all samples. One way to enter this data set in the BSDM program is as follows: Mode 1 Example BSDM Variable Number Obs 1 2 3 4 A B # Yi Y2 B Levels A Levels Sample Temp Time 1 MV _ 1 1 1 110° 4hrs. 2 - - 1 1 2 3 - - 1 2 1 115° 4 - - 1 2 2 5 - . 1 2 3 6 - . 1 3 1 120° 7 - - 1 3 2 8 - - 1 3 3 9 - - 1 3 4 10 - - 2 1 1 110° 5 hrs. 11 - - 2 1 2 12 - - 2 1 3 13 - - 2 2 1 115° 14 - - 2 2 2 15 - - 2 3 1 120° 16 - - 2 3 2 17 - - 2 3 3 18 - - 2 3 4 19 - MV 2 3 5 20 - - 3 1 1 110° 7.5 hrs. 21 - - 3 1 2 22 - - 3 1 3 23 - - 3 2 1 115° 24 - - 3 2 2 25 - - 3 2 3 26 MV MV 3 3 1 120° 240 Notes: 1. Observation number 26 is included to let the program know that the cell with temp= 120, time = 7.5 is missing in both responses. 2. Both observation #1 and #19 have one and only one missing response. 3. Although we have shown the 26 observations in a systematic arrangement, this is not necessary except for your own information. 4. The specification of variable numbers in the analysis will identify which factor it should consider as rows (factor A) and which it should consider as columns (factor B). The second data storage mode allows you to conserve on variables by using only one variable to identify both row and column levels. The levels are "packed" into four digits as xxyy, where xx identifies the row level and yy identifies the column level. Consider the example described above. Using the packed form of storage we will need to allocate at least three variables in the BSDM routine. One variable is needed for each response and one foi the 'packed' row/column identification. You may wish to use another variable to identify the sample numbers or you might wish to use the 'space' after the row/column specification. Foi example, suppose for the third row and second column you wish to identify the observation by the index 74. The packed version would be 0302.74. The program will use only the firsl four digits 0302 to identify the row and column numbers. Up to 6 digits may be input after the decimal point for identification purposes. The example described above may be entered via the BSDM routine as follows (for the first ten and the last three observations): Mode 2 Example BSDM Variable Number Obs 1 2 3 A B # Yi Y 2 ID xxyy Obs# Temp Time 1 MV _ 0101 1 110° 4hrs. 2 - - 0101 2 3 - - 0102 1 115° 4 - - 0102 2 5 - - 0102 3 6 - - 0103 1 120° 7 - - 0103 2 8 - - 0103 3 9 - - 0103 4 10 - - 0201 1 110° 24 - - 0302 2 25 - - 0302 3 26 MV MV 0303 1 120° 7.5 hrs 241 One-Way Covariance The three forms of data arrangement for the one-way analysis of covariance are the same as the one-way design except that both a response variable (Y) and a covariate (X) must be specified. Hence, for the example previously described for mode 1 of the one-way design you would need to specify 12 variables of the BSDM data set and specify a covariate for each treatment set. If different covariates are to be used with the two response variables, then you would need 16 variables. One possible ordering of these variables and treatments for the ith observation is as follows: Type 1 Type 2 Type 3 Type 4 Variable# 1 2 3 X Yi Y2 4 5 6 X Yi Y 2 7 8 9 X Yi Y 2 10 11 12 X Yi Y 2 For both mode 2 and 3, you would need to specify one additional variable number as the covariate for each dependent variable. Of course the response variables may use the same covariate in the analysis. 242 Factorial Design Object of Program This program will calculate the complete analysis of variance table for a two-, three-, or four-factor, completely balanced experiment. There may be multiple observations per cell and the entire experiment may be replicated in blocks. The program will automatically print out all main effect and two-way interaction means. If three- or four-way interactions exist, these interaction means may be printed. If there is more than one observation per cell, then tests for homogeneity of variance may be computed. If the experiment has not been repli- cated, or only one observation per mean is present, there will be no F values computed. All F tests assume that the factors are fixed. A label of up to ten characters may be assigned to each factor. Typical Program Flow Input data via BSDM ■ Select advanced statistics Insert program medium ■ Choose factorial design . Specify variables and subfiles Main effect means two way interaction means are printed AOV table Test for homogeneity of variance 1 Perform auxiliary routines, e.g., interaction plot Special Considerations See the General Information portion of this AOV manual for program limitations. Also, carefully read the Data Structures section before entering your data through Basic Statistics and Data Manipulation. References 1. Cochran, W.G. and Cox, G.M., Experimental Designs, John Wiley and Sons, Inc., 1957. 2. Snedecor, G.W. and Cochran, W.G., Statistical Methods, Iowa State University Press, 1967. 243 Nested or Partially Nested Design Object of Program This program will calculate and print the AOV for any valid nested design. The program does this by computing a general factorial and then combining sums of squares to get the desired results. There can be up to five nested factors if samples are entered. This program does not allow the experiment to be replicated in blocks. The program will not compute any F ratios unless the design is a completely nested design. All non-nested main effects, main effect means, and two-way interactions will be printed. If there are any non-nested, three- way interaction means, they may be printed. Possible Designs All possible designs are displayed with arbitrary factors P, Q, R and S. In the program you will be asked to match your factors (A, B, etc.) with these arbitrary labels to obtain the design you desire. The notation, Q(P), means that factor Q is nested within factor P. The following options are available. Number of factors = 2 P Q(P) Number of factors = 3 Design 1 Design 2 Design 3 p P P Q(P) Q Q(P) R(Q(P) ) PQ R R(PQ) PR QR(P) Number of factors = 4 Design 1 Design 2 p P Q(P) Q R(Q(P)) R S(R(Q(P) ) ) PQ PR QR PQR S(PQR) Design 3 P Q PQ R(PQ) S PS QS PQS RS(PQ) Design 4 P Q(P) R PR QR(P) S PS QS(P) RS PRS QRS(P) 244 Typical Program Flow Input data via BSDM Choose Advanced Statistics option Insert program medium Choose nested and partially nested design Specify variables and subfiles Main effect means two (or 3) way interaction means are printed AOV table Perform auxiliary routines, e.g., interaction plot Special Considerations See the General Information portion of this AOV manual for program limitations. Also, carefully read the Data Structure section before entering your data through Basic Statistics and Data Manipulation. References 1. C.R. Hicks "Fundamental Concepts in the Design of Experiments" 2nd edition. Holt, Rinehart and Winston, 1973. 2. D.C. Montgomery "Design and Analysis of Experiments". Wiley, 1976. 245 Split Plot Designs Object of Program This program will calculate a general factorial and then combine sums of squares to form specific error terms for the split plot or split-split plot design. Blocks must be present and at least two factors are necessary. Up to three factors may be specified and minor replications (samples) may also be declared. All main effects and interaction means will be printed. All computed F tests assume the factors are fixed. Typical Program Flow Input data via BSDM Select Advanced Statistics option Insert program medium Select split plot design Specify variables and subfiles Block and main effect means, two way interaction means are printed AOV table Perform auxiliary routines, e.g., interaction plot Special Considerations See the General Information portion of this AOV manual for program limitations. Also, carefully read the Data Structures section before entering your data through Basic Statistics and Data Manipulation. References 1. C.R. Hicks "Fundamental Concepts in the Design of Experiments" 2nd edition. Holt, Rinehart, Winston, 1973. 2. D.C. Montgomery "Design and Analysis of Experiments". Wiley, 1976. 246 One-Way Classification Object of Program This program will perform a one-way analysis of variance for treatments of equal or unequal size. You may give a ten character name to each treatment. For each treatment the name, sample size, total, mean, and standard deviation will be printed. The analysis of variance table will include all sums of squares and mean squares as well as the calculated F and the probability associated with getting that F value or one larger. You also have control over how many decimal places are to be printed on the output. Typical Program Flow Input data via BSDM Select Advanced Statistics option Insert program medium Select one-way classification Specify variables and subfiles Summary statistics AOV table Perform auxiliary routines, e.g., multiple comparisons Special Considerations See the General Information portion of this AOV manual for program limitations. Also, carefully read the Data Structure section before entering your data through Basic Statistics and Data Manipulation. References 1. W.J. Dixon, F.J. Massey "Introduction to Statistical Analysis" Third Edition. McGraw-Hill, 1969. 2. G.W. Snedecor, W.G. Cochran "Statistical Methods" Sixth Edition. Iowa State Uni- versity Press, 1967. 247 Two-Way Unbalanced Design Object of Program The purpose of this program is to perform an analysis of variance on a two-way classifica- tion with unequal subclass frequencies. The analysis may be performed in two ways. If interactions are known to be present in the population, and all subclasses have at least one observation, then the method of weighted squares of means should be used to test the main effects. If interactions are known to be absent in the population, or if at least one subclass has no observations, then the method of fitting constants should be used. In any case, if at least one subclass has no observations, the method of fitting constants must be used. If it is not known whether or not interactions are present in the population, then a prelimin- ary analysis of variance should be studied in order to test for interaction. If this test is significant, then the method of weighted squares of means should be used. A significance level of 0.25 may be used when testing for the presence of interaction. Typical Program Flow Input data via BSDM I Select Advanced Statistics Insert program medium Select two-way classification Specify variables and subfiles Summary statistics \ AOV table T Perform auxiliary routines, e.g., multiple comparisons Special Considerations See the General Information portion of this AOV manual for program limitations. Also, carefully read the Data Structures section before entering your data through Basic Statistics and Data Manipulation. References 1. Bancroft, T.A. (1968). Topics in Intermediate Statistical Methods. The Iowa State University Press, Ames, Iowa. 2. Searle, S.R. (1971). Linear Models, John Wiley and Sons. 248 One-Way Analysis of Covariance Object of Program This program will perform a one-way analysis of covariance for equal or unequal sample sizes. You may give a ten-character label to each treatment. For each treatment, a covariate (X) and a response variable (Y) must be specified. For each treatment, the number of observations in the treatment, the means and standard deviations for the covariate (X) and the response (Y), the correlation between the two, and the equation of the least squares line will be printed. For the overall data, the same things will be computed and printed. The corrected sums of squares tables will be printed and the analysis of covariance table with the calculated F and the probability associated with getting that F value or one larger will be printed. Tests of the one-way analysis of variances for both X and Y, tests for equal slopes within treatments, and significant pooled regression will be calculated and printed. The adjusted means and the standard errors of the adjusted means will be printed. These adjusted means will be saved for further analysis when doing multiple comparisons, or treatment contrasts. Any time an observation is found with either the covariate (X) or response (Y) missing, the point will be deleted from the calculations. You also have control over how many decimal places are to be printed on the output. Typical Program Flow 249 Input data via BSDM Select Advanced Statistics option Insert program medium Select one way analysis of covariance Specify variables and subfiles Summary statistics are printed Within treatment regression is performed ANOVA table One-way analysis of X variable One-way analysis of Y variable Test of homogeneity of regression coefficientiDn Test of homogeneity of pooled regression coefficients One way analysis of covariance table Perform auxiliary routines, e.g., multiple comparisons Special Considerations See the General Information portion of this AOV manual for program limitations. Also, carefully read the Data Structure section before entering your data through Basic Statistics and Data Manipulation. References 1. W.J. Dixon, F.J. Massey "Introduction to Statistical Analysis", Third Edition. McGraw-Hill, 1969. 2. G.W. Snedecar, W.G. Cochran, "Statistical Methods", Sixth Edition. Iowa State University Press, 1967. 250 F-Prob Object of Program Given the numerator degrees of freedom, and the denominator degrees of freedom, and an F value>l, this program will calculate the probability that an F random variable has a value greater than or equal to the given F value. References 1. Boardman, T.J. (editor) 9830A Statistical Distribution Pac, Hewlett-Packard (PN 09830-70854), September, 1974. 2. Boardman, T.J. (editor) 9845A General Statistics Package. 3. Boardman, T.J. and R.W. Kopitzke, "Probability and Table Values for Statistical Distributions", 1975, Proceedings of the Statistical Computing Section of The Amer- ican Statistical Association, pp 81-86. 251 Orthogonal Polynomials Object of Program This program generates orthogonal polynomials. This allows you to determine if quantita- tive factor levels with equal or unequal spacings in the levels are linear, quadratic, etc., in their relationship to the response variable. The output includes the sum of squares, the F-ratio and the P(F>comp F) for each degree polynomial. Typical Program Flow Perform some type of AOV Select further analysis Select orthogonal polynomial Define the maximum degree of orthogonal polynomial Orthogonal polynomial decomposition on rows and columns Special Considerations Maximum Degree of Orthogonal Polynomial For a one-way classification design, it must be less than the number of treatments. For a two-way (unbalanced) design, it must be less than the number of levels of factor A. For other designs, it must be less than the number of levels of the factor. Enter zero if that factor is not a quantitative variable or if it is not desired to do orthogonal polynomial comparisons on the factor. Level Associated with Treatment (row, factor) #"i" When this question is asked, you should enter the quantity corresponding to this treatment (for one-way design), or this row (two-way design), or the level "i" of factor k (for other design). 252 Contrasts Object of Program This program performs treatment contrasts on main effect means or on two-way means with one of the factors held constant. This allows you to make any desired linear contrast of a set of treatment means by entering an appropriate set of coefficients. The output includes the user-entered coefficients, the contrasts, and the sum of squares, F-ratio and P(F>comp F) associated with the contrasts. Typical Program Flow Perform some type of AOV \ Choose further analysis Choose contrasts 1 Choose 1. Main effect? 2. Two-way means? 1 2 ■ Enter the contrast coefficients Contrast on row enter the level # of column to be held constant ■ 1 ■ Special Considerations How to Make a "Contrast" If the coefficients for the contrasts you enter are denoted by c(i) choosing the c(i) is that they must satisfy £c(i) = then one condition for where i is summed over all levels of the factor of interest. Obviously, this implies that some of the c(i) must be negative. Of course one or more of the c(i) may be equal to zero. Let's look at an example which demonstrates the procedure. Suppose you have a one-way classification with four treatments. You find in the AOV table that you have a significant F value. So, you reject the hypothesis that all the treatment effects are equal, i.e., you reject H0:Ti = T 2 = T 3 = T4. 253 You still don't know exactly which treatments are significantly different from one another. This is where you use a contrast. Suppose you want to know if treatment one is significantly different from treatment three, i.e., you want to test the hypothesis H0:Ti = T 3 , orH0:Ti-T 3 = or, written in still another way H0:l*Ti + 0*T 2 -1*T 3 + 0*T4 = If the number of observations in each treatment are equal, then to specify the above contrast all you need to do is to supply the coefficients of the treatments. That is, coefficient one is 1, coefficient two is 0, three is - 1 and four is 0. You must tell the program what the coefficients (of the T's above) are. Suppose the number of observations for the four respective treatments are 6, 8, 7, and 6. Suppose further that you want to test if treatment two is significantly different from treat- ment four. Write the hypothesis as: HO: 0*Ti+l*T2. + 0*T 3 -l*T4 = 0. Then try the following procedure to determine your contrast coefficients, c(i). Form a table using the number of observations for the ith treatment, n(i), as one column. Use the coeffi- cients of the T's in the above hypothesis as the last column. Call these coefficients c(i)n(i). Remember, one condition for a valid contrast is that 2c(i)n(i) = 0. So, check to make sure that condition is satisfied. Then, make a column for your as yet unknown contrast coeffi- cients, c(i). You should have the following table. n(i) c(i) n(i)c(i) 6 8 1 7 6 -1 Now, just fill in the c(i) column. To do that notice that c(i) = n(i) c(i)/n(i). So you obtain the following. n(i) c(i) n(i)c(i) 6 8 1/8 1 7 6 -1/6 -1 So, contrast coefficient one is 0, two is 1/8, etc. Notice that the contrast coefficients for a given contrast are not unique. For example, the above contrast would be performed if contrast coefficients of 0, 1/4, 0, — 1/3 were given. Also, a similar contrast would be obtained using 0, - 1/8, 0, 1/6 as the coefficients. 254 Interaction Plots Object of Program This program will plot two-way interaction, or three-way interaction means. The two-way interaction plot will be on one graph. You may decide which factor will be put on the X axis as well as the spacing of the levels, and then the other factor will be plotted. Each interaction line will be labeled indicating the level of the factor. For instance, the three levels of a factor B will be labeled Bl, B2, B3. The three-way interaction plot will be plotted on several graphs. That is, a two-way interac- tion will be plotted for each level of the third factor. The program will give you a prompt when it is necessary to do the next page of the plot. You may also have a legend drawn showing the length of the Least Significant Difference (LSD) and/or the length of Tukey's Honestly Significant Difference (HSD). To do these, it is necessary to enter the critical value, error mean square, and its corresponding degrees of freedom. Special Considerations Which interaction is to be plotted? When this question is asked, enter the two letters corresponding to the two factors. The input must be one of AB, AC, BC, AD, BD, or CD, and the one selected must be possible for your data set. What 3-way interaction is to be plotted? When this question is asked, enter the three letters corresponding to the three factors. The input must be one of ABC, ABD, ACD or BCD. The label of the X-axis for an interaction plot. The factor levels must be given in increasing order. Factors whose levels are not in increas- ing order must be given arbitrary level codes if they are to be used on the X-axis of an interaction plot. References 1. C.R. Hicks, "Fundamental Concepts in the Design of Experiments"; Second Edition. Holt, Rinehart, and Winston, 1972. 2. B.J. Winer, "Statistical Principles in Experimental Design"; Second Edition. McGraw- Hill, 1971. 255 Multiple Comparisons Object of Program This program allows you to select any one of five multiple comparison procedures to use on either main effect means or two-way table means. You must input the appropriate tabled values for the procedure selected. In addition, for the separation procedures for the two- way means, you will need to specify the appropriate standard deviation to be used. A separation table will be printed which should help you determine which treatment or factor levels are significantly different from one another. For example, the following table shows output for a set of treatments: Factoi A Sample Level Mean Size Separation 1 10.7 10 ab 2 9.8 9 a 3 11.7 10 b 4 15.8 8 c We would interpret this table as showing that factor level 4 is significantly different from the other levels of A since no other level has a "c" listed beside it. Also we see that level 1 cannot be distinguished from level 2 and level 1 cannot be distinguished from level 3. And, level 2 can be shown to significantly differ from level 3 since they have no letters in com- mon. Of course, the conclusion one draws from the separation procedure may depend on which procedure is used and the level of significance you choose. Typical Program Flow Perform some type of AOV Choose further analysis Choose multiple comparisons Main effect means, two-way table means are printed Choose the procedure to be used (5 options) Multiple comparison is printed and/or plotted 256 Special Considerations Which factor/main effect should be used? When this question is asked, you should input A, B, C, or D as the response. What level of alpha are you going to use? The value you input in response to this question is used for printout purpose only and not for any calculations. What table value should you use? The following chart shows required inputs for tabled values: Procedure* Table Notation Parameter Reference 1 Student's t taMdf) df = error degrees of freedom (1,4) 2 Studentized range qa/2<p,df) p = # of means (6,4) df = error degrees of freedom 3 Duncan's q*a(p,df) p is as above but reduces by 1 to p = 2 (3) 4 Studentized range qaMp,df) p same as 3 (1,3,4) 5 Snedecor's F F^tp-l.df) p= # of means (4,5) See reierences (1) and (2) for more information on all procedures. Unequal sample sizes In this case, the harmonic mean, no, sample size will be used where no = p/(l/m + 1/ nv + ... + l/n P ). For the methods used in Multiple Comparisons, please refer to the Multiple Sample Tests portion of the General Statistics section of this manual. References 1. Boardman, T.J. and D.R. Moffitt (1971) "Graphical Monte Carlo Type I Error for Multiple Comparison Procedures". Biometrics 27:3, 738-744. 2. Carmen, S.G., and M.R.. Swanson (1973) "Evaluation of Ten Pairwise Multiple Com- parison Procedures by Monte Carlo Methods". Journal of the American Statistical Association 68:341, pp 66-74. 3. Duncan, D.B. (1955). Multiple range and multiple F tests. Biometrics 11, 1-42. 4. Pearson, E.S. and Hartley, H.O. (1958). Biometrika Tables for Statisticians, Vol. I. Cambridge University Press, London. 5. Scheffe,H. (1953). A method for judging all contrasts in the analysis of variance. Biometrika 40,87-104. 6. Tukey, J.W. (1953). The problem of multiple comparisons. Unpublished notes, Prin- ceton University. 257 Factorial Design Example Twenty-four laboratory rats were deprived of food, except for one hour per day, for several weeks. At the end of that time, each rat was inoculated with one of four doses of a certain drug and, after one of three amounts of time, was fed. The weight (in grams) of the food ingested by each rat was measured. The purpose of the experiment is to determine the effect of the drug on the motivation of the rats. A B Time before feeding Dosage (mg/kg) (hours) .1 .3 .5 .7 1 9.077 5.63 4.42 1.38 8.77 8.76 3.01 3.96 5 9.16 11.57 5.22 5.72 11.82 11.53 9.21 4.69 9 16.08 10.37 7.27 5.48 14.65 14.46 6.10 9.28 The design for this experiment is a two-way factorial with three levels of time and four dosage levels of the drug. Two rats (observations) per experimental combination were used. The data can be subjected to an analysis of variance in order to determine if there are significant differences between the three times before feeding or the four dosages of the drug. In addition, we can determine if there is a significant interaction between time and dosage. The F ratios indicate no significant interaction effect (F = .915), significant differences in time levels (F = 14.819) and dosage levels (F = 19.533). The orthogonal polynomial decom- position for the time factor (A) shows a significant linear effect. The decomposition for the dosage factor (B) shows a highly significant linear effect and a cubic effect. Even though the AB interaction (time or dosage) is not significant, a plot of the two-way means was included to show results of the INTERACTION PLOT routine. A reference LSD value is shown on interaction plot. 258 * DATA MANIPULATION * Enter DATA TYPE <Press CONTINUE for RAW DATA): 1 Mode nuMber = ? Is data stored on prograM's scratch file (DATA)? NO Data file name = ? DEPOFRATS: INTERNAL Was data stored by the BS&DM systen ? YES Is data piediuM placed in device INTERNAL ? YES Is prograM nediun placed in correct device ? YES Raw data On mass storage FOOD DEPRIVATION OF RATS Data file na«e: DEPOFRATS : INTERNAL Data type is: Raw data Number of observations: Nunber of variables: 12 Variable nanes: i. OBS i WT 2. OBS 2 WT Subfiles: NONE SELECT ANY KEY Option nuMber = ? i Enter Method for listing data: 3 Select special function key labeled-LIST List ail data FOOD DEPRIVATION OF RATS Data type is: Raw data Variable * i Variable # 2 (OBS i WT ) (OBS 2 WT ) DBS* i 9.07000 8.77000 2 5.63000 8.76000 3 4.42000 3.01000 4 i. 38000 3.96000 5 9.16000 11.82000 6 ii. 57000 11.53000 7 5.22000 9.21000 8 5.72000 4.69000 9 16.08000 14.65000 10 10.37000 14.46000 ii 7.27000 6.10000 12 5.48000 9.28000 259 Option number = ? SELECT ANY KEY Exit list procedure Select special function key labeled-ADV STAT Remove BSDM media Insert AOV2 Select factorial design 1, 5, and 9 hours .1, .3, .5, and .7 mg/kg Only 1 major replication 2 rats per experimental combination Enter number of desired funtion: i Number of factors in design ? (2, 3, or 4) 2 Number of levels of factor A ? 3 Number of levels of factor B ? 4 Number of blocks in this design ? i No. obs per trt combination in each bl ock (saMple) f 2 Is the above information correct ? YES Do YOU want to assign names to the factors ? YES Enter the name for factor A <<ii characters) ? TIME Enter the name for factor B (<ii characters) ? DOSAGE Data entry option ? 2 Variable # for minor replication (sample) i ? i Variable * for minor replication (sample) 2 ? 2 No. of decimals for printing calc. values(<=7>. 4 *r r^ ^ ^ ^ ^*r ^^ ^ ^ ^ *r ^ ^t ^ ^ ^ ^ ^ ^ *r ^ ^ t * ^ *P ^ ^t *rT ^ ^ ^^ ^ ^ ™ ^^^ * * ^^ ^ ^ * * ^ ^ ^ ^ * * ^ ^ ^ ™ * * ^ ^ ^ ^ ^ * * ^ ^ ^ ^ V ^ ^ ^ ^ * FACTORIAL ANALYSIS OF VARIANCE * FOOD DEPRIVATION OF RATS Minor replications are stored in different variables DESIGN Number of factors = 2 No. of levels of factor A = 3 No. of levels of factor B = 4 No. of major replications (blocks) = No. of minor replications (samples) Subfiles will be ignored Response variable(s) are ■■ Variable no. i OBS i WT Variable no . 2 OBS 2 WT MEANS * Overall mean = 8.2338 * Main Effect Means : Factor A - TIME Levels ( i - 3 ) 5.6250 8.6150 10.4613 260 Factor B - DOSAGE ii.59i7 Levels ( i - 4 ) 10.3867 5.8717 5.0850 # Two Way Interaction Means Factor A TIME down and Factor B - DOSAGE across 12 3 8.9200 7.1950 3.7150 10.4900 11.5500 7.2150 15.3650 12.4150 6.6850 4 2.670 5.2050 7.380 ANOVA TABLE Factorial Analysis of Variance Source (Nane) df Suns of Squares Mean Square F Ratio F-Prob Total 23 A TIME 2 B DOSAGE 3 AB 6 Sacipling Error 12 339.9634 95.3015 188.4283 17.6478 38.5858 Blk A B i 2 3 4 1 2 3 4 1 2 3 4 Mean 8.920 7.1950 3.7150 2.670 10.4900 11.5500 7.2150 5.2050 15.3650 12.4150 6.6850 7.3800 Std DeM .2121 2.2132 .9970 1.8243 1.8809 .0283 2.8214 .7283 1. 0112 2.8921 .8273 2.6870 14 7810 47 6507 14.819 .0006 62 8094 19.533 .0001 2 9413 .915 .5168 3 2155 NOTE; F tests assuwe that all factors are fixed Should tests for hociogeneity of variance be Made? YES FACTOR LEVELS CELL STATISTICS From the AOV table it can be seen that the effects of Factor A and of Factor B are signifi- cant, but interaction between Factor A and Factor B is not significant. Vari< i n c e Co el Var Z 0450 2 38 4 8984 30 76 9941 26 84 3 3282 68 33 3 5378 17 93 0008 24 7 9601 39 10 5305 13 99 1 0224 6 58 8 3640 23 29 6844 12 38 7 2200 36 41 Bartlett's test : Chi squared = 11.0311 with 11 degrees of freedom Prob< Chi squared > 11.0311) = .4410 Specify a new variable for this design ? NO Enter desired nunber: 4 Request interaction plot 261 INTERACTION PLOT *P *P *l* ^ * ^ * * T* ^^ ^"H * *P *P ^ ^ ^^ ^ ^ ^ ^T ^ ^ ^^ * * * ^ ^ ^ * * V ^ * * ^ * * ^ ^ ^ ^ ^ ^ ^ * * ^ ^ ™ ^ ^ * * ^ * * * ^ * ™ ^ ^ ^ ^ ^ ^ ^ ^ ^ * ^ ^ ^ Is This correct ? YES Confirm design on CRT Plot which factor on the X axis ■• A,B ? B Enter 4 levels of factor B(separate by connas) : ? ,1,.3,.5,.7 Natie of the response ? <<il characters) WEIGHT Enter Y MininuM value. (Less than 2.67 ) ? Enter Y naxinun value. (Greater than 15.365 ) ? 16 Enter Y tic 1 # of decifial places for labelling Y axis(<= 6 )= ? 2 Should length of the LSD and/or HSD be plotted ? YES Error Mean Square to calculate the LSD and/or HSD. 3.21548 From AOV table Error Mean Square to be used is 3.21548 t value for the LSD, or not to plot the LSD. 2.179 t-tabled value Q value for the HSD, or not to plot the HSD. t = 2.179 LSD = 3.90733040255 Plot on CRT V NO Plotter identifier string (press CONT if 'HPGL')? Enter the select code, bus # (defaults are 7,5)? Which PEN color should be used? 1 262 Beep will sound when plot done, then press CONT . To interrupt plotting press 'STOP' key Press CONTINUE when the plotter is ready. FOOD DEPRIVATION OF RRTS I RB Interaction . 3 • 5 • * (0 Factor B DOSRGE Are there any More plots to be Made ? NO Enter nuMber of desired funtion: 9 Return to BSDM 263 Nested or Partially Nested Design Example In order to compare two methods of display, a group of six new Thanksgiving greeting cards were selected. Eight stores were selected for the "promotional" display method and another eight stores were used for the "integrated" display method. For each of the two methods and each of eight stores per method, the same six card styles were compared using a response (Y) which measured dollar sales adjusted for store size. The data for each type of display, store, and greeting card style are shown below: Display Method 1 - "Promotional" (A) Card Style (B) Stores (C) 4 5 $1.21 1.49 1.76 1.52 0.65 1.96 1.21 1.57 1.72 2.09 2.21 2.36 2.83 3.99 2.01 2.62 1.72 1.44 1.84 0.91 1.30 7.61 2.01 3.27 0.29 0.92 0.37 0.72 0.43 3.99 2.35 4.71 1.44 2.09 1.84 2.36 1.96 3.26 2.01 1.70 4.43 3.66 0.51 1.78 2.13 5.58 1.41 2.75 Display Method 2 - "Integrated" (A) Stores (C) 1 9 10 11 12 13 14 15 16 $2.60 2.21 1.44 1.20 1.21 3.03 2.79 1.18 Card 2 1.67 1.16 1.73 1.92 4.84 2.88 4.10 1.48 Style 3 3.67 0.78 1.46 1.65 3.23 1.92 4.51 1.48 (B) 4 1.33 0.39 1.33 1.37 2.02 1.68 4.51 2.34 5 3.33 1.16 1.86 1.92 3.23 2.64 3.96 2.22 6 4.67 1.90 2.61 3.27 2.26 2.36 2.30 1.55 The mixed nested AOV for this model with factor A (display), factor C (stores) nested in factor A, and factor B (card style) crossed with A and C is shown below. The proper MS for testing differences between the two methods of display is C(A). Notice that the F ratio would be less than one = .42135/4.85529 indicating no significant difference between the methods as well as a considerable amount of store to store variation in the adjusted sales value. There does, however, appear to be significant differences between the population means for card types, i.e. F = 2.57257/. 92726 = 2.77 which is significant at the .024 level. A fairly standard procedure for the response variable Y considered here is to transform this response by Y* = ln(Y + l) in order to achieve a more homogeneous and consistent re- sponse. The next analysis of variance is performed on this new response. The net result is that the F ratio for differences in card type means is even more highly significant (3.93 versus 2.77). 264 An LSD multiple comparison procedure was done on the six card styles. The results of this comparison show significant differences between style four and all others except style one with certain other differences existing as well. However, if one were looking for the highest adjusted daily sales, one should probably choose one of styles five, two, or six since they were not significantly different from one another but were different from the other styles (although three is questionably different). ************************************************* * DATA MANIPULATION * ******************************************************************************** Enter DATA TYPE (Press CONTINUE for RAW DATA) : i Raw data Mode nuMber = ? 2 From mass storage Is data stored on progract's scratch file (DATA)? NO Data file rtane = ? GRETINGCDS: INTERNAL Uas data stored by the BS&DM systeM ? YES Is data MediuM placed in device INTERNAL ? YES Is program ciediuM placed in correct device ? YES THANKSGIVING GREETING CARD EVALUATION Data file name- GRETINGCDS . INTERNAL Data type is: Raw data Nunber of observations: 96 NuMber of variables: 1 Variable nanes: i. DESIGN Subfiles: NONE SELECT ANY KEY Select special function key labeled-LIST Option nuMber - ? i List all the data THANKSGIVING GREETING CARD EVALUATION Data type is: Raw data VARIABLE # i (DESIGN) OBS(I+i> 0BS(I+2) 0BS(I+3) 0BS(I+4) i. 49000 1.76000 i. 52000 .65000 i. 21000 1.57000 i. 72000 2.09000 2.36000 2.83000 3.99000 2.01000 I OBS(I) i i. 21000 6 1.96000 il 2.21000 265 16 2.62000 21 1 .30000 26 .92000 31 2.35000 36 2.36000 41 4.43000 46 5.58000 51 1.44000 56 1.18000 61 4.84000 66 .78000 7i 4.51000 76 i. 37000 81 3.33000 86 2.64000 91 2.61000 96 1.55000 Option number = ? SELECT ANY KEY 1.72000 7.61000 .37000 4.71000 .96000 .66000 .41000 .20000 .67000 2.88000 1. 46000 1.48000 2.02000 1.16000 3.96000 3.27000 1 .44000 2 .01000 .72000 1 .44000 3 .26000 .51000 2 .75000 1 ,21000 1 .16000 4 10000 1 .65000 1 .33000 1 68000 1 .86000 2 .22000 2 ,26000 Enter number of desired funtion: 2 Number of factors in design ? (2, 3, or 4) 3 Number of levels of factor A ? 2 Number of levels of factor B ? 6 Number of levels of factor C ? 8 Number of samples ? 1 Is the above information correct ? YES Which design (by number) is to be used ? 3 Which factor is P: A,B,C ? A Which factor is Q: B,C ? C Do YOU want to assign names to the factors ? YES Enter the name for factor A <<ii characters) ? DISPLAY Enter the name for factor B <<ii characters) ? CARD STYLE Enter the name for factor C <<ii characters) ? STORES No. of decimal places to print calculated values. 4 1.84000 3.27000 .43000 2.09000 2.01000 1.78000 2.60000 3.03000 73000 48000 23000 39000 51000 92000 67000 36000 .91000 .29000 3.99000 1.84000 i. 1. 3. 4. 1. 4. 70000 13000 21000 79000 92000 67000 920 33000 2.34000 3.23000 1.90000 2.30000 2. i. 3. 1. 1. Exit list procedure Select special function key labeled-ADV STAT Remove BSDM media Insert AOV2 media Choose nested design Shown on CRT, specify design type. 266 NESTED ANALYSIS OF VARIANCE THANKSGIVING GREETING CARD EVALUATION DESIGN Number of factors = 3 No. of levels of factor A = 2 No. of levels of factor B = 6 No. of levels of factor C = 8 No. of Minor replications (sanples) Response variable(s) are ■■ Variable no. 1 DESIGN MEANS * Overall Mean = 2.2327 He Main Effect Mean-s = Factor A - DISPLAY Levels < 1 - 2 ) = 2.1665 2.2990 Factor B - CARD STYLE Levels < i - 6 ) = 1.6894 2.4756 2.4250 2.6981 Factor C - STORES Levels < i - 8 ) = 2.3400 1.6075 i.5800 3.4083 2.7642 2.2392 1.7969 i.7483 2.3112 1742 * Two Way Interaction Means Factor A DISPLAY down and Factor B 1 2 5 1 1.4213 2.0825 2 1 . 9575 2.5400 CARD STYLE across 2 3 6 2.4788 2.5125 2.7812 2.4725 2.3375 2.6150 Factor A - DISPLAY down and Factor C - STORES across 1 2 3 5 6 7 1 1.8017 1.9483 1.4217 1.5500 4.3983 1.8333 2 2.8783 1.2667 1.7383 2.7983 2.4183 3.6950 4 1.7225 1.8712 4 8 1.6083 2.7700 1 . 8883 1.7083 Factor B - - CARD STYLE down an d Factor C - STORES acr oss 1 2 3 5 6 7 i 1.90SO 1.8500 1.6000 .9300 2.4950 2.0000 2 1.6950 1.6250 1.9700 3.8350 3.4350 3.0550 3 2.6950 1.1100 1.6500 2.2650 4.7650 3.2600 4 .8100 .6550 .8500 1.2250 2.8350 3.4300 S 2.3850 1.6250 1.8500 3600 3750 1400 0500 2800 3750 0450 5250 1400 267 2.5950 2.9S0O 6 4.5500 2.7800 2.1950 3.9700 Should the 3-way Means be printed ? NO 2.9850 1.5600 1.8550 1.9600 2.5250 2.1500 ANOVA TABLE Nested Analysis of Variance Source (Name) df Sums of Squares Mean Square Total 95 A DISPLAY 1 C(A) 14 B CARD STYLE 5 AB 5 CB(A) 70 148.0541 .4213 67.9740 12.8628 1 .8879 64.9080 .5585 .4213 .8553 . 5726" .3776 .9273" F = 2.77 significant at a = .02. Enter desired nuMber : 7 Enter nuMber of desired funtion; 4 SELECT ANY KEY SELECT ANY KEY Select option desired ■■ 1 Transf or«a tion nuMber = ? 1 Variable nuMber corresponding to X = ? 1 ParaMeter a = ? 1 ParaMeter b = ? 1 ParaMeter c = ? 1 Store transforMed data in Variable # < <= 2 ) ? 2 Variable nane <<= 10 characters) = ? LN(Y+i) Is above inforMation correct? YES Press 'CONTINUE' when ready. There is a significant difference between the population means for card types but not be- tween the types of displays. Exit nested design Return to BSDM Select Transform key Algebraic transformation The following transforation was perforMed: a*<X A b)+c where a = 1 b = 1 c - 1 X is Variable * 1 TransforMed data is stored in Variable # 2 <LN<Y+i)) 268 Select option desired 1 Another algebraic transformation Transf ormation number = ? 3 Variable number corresponding to X = ? 2 Parameter a = ? 1 Parameter b = ? i Par a Meter c = ? Store transformed data in Variable * ( <= 3 > V 2 Is above information correct? YES Press 'CONTINUE' when ready. The following transformation was performed: a#ln(bX)+c where a = i b = i c = X is Variable # 2 Transformed data is stored in Variable * 2 <LN<Y+i>). Select option desired ■■ Exit transformation routine PROGRAM NOW UPDATING SCRATCH DATA FILE SELECT ANY KEY Select LIST key Option number = ? i Enter method for listing data: 3 THANKSGIVING GREETING CARD EVALUATION Data type is: Raw data Variable * i Variable * 2 (DESIGN ) <LN(Y+i) ) OBS# i i. 21000 .79299 2 1.49000 .91228 3 i. 76000 1.01523 4 1.52000 . 92426 5 .65000 .50078 6 1.96000 1.08519 7 1.21000 .79299 8 1.57000 .94391 9 1.72000 1.00063 10 2.09000 1.12817 ii 2.21000 1.16627 12 2.36000 1.21194 13 2.83000 1.34286 14 3.99000 1.60744 269 IS 2.01000 1.10194 16 2.62000 1.28647 17 1.720 1.00063 18 1.44000 .89200 19 1.84000 1.04380 20 .91000 .64710 21 1.30000 .83291 22 7.61000 2.15292 23 2.01000 1.10194 24 3.27000 1.45161 25 .290 .25464 26 .92000 .65233 27 .3700 .31481 28 .72000 .54232 29 .43000 . 35767 30 3.99000 1.60744 31 2.35000 1.20896 32 4.71000 1.74222 33 1.44000 .89200 34 2.09000 1.12817 35 1 .84000 1.04380 36 2.36000 1.21194 37 1.96000 1 . 08519 38 3.26000 1.44927 39 2.01000 1.10194 40 1.70000 .99325 41 4.43000 1.69194 42 3.66000 1.53902 43 .51000 .41211 44 1.78000 1.02245 45 2.13000 i. 14103 46 5.58000 1.88403 47 1.41000 .87963 48 2.75000 1.32176 49 2.60000 1.28093 50 2.21000 1.16627 51 1.44000 .8920 52 1.20000 . 78846 53 1.21000 .79299 54 3.03000 1.39377 55 2.79000 1.33237 56 1.18000 .77932 57 1.67000 .98208 58 1.16000 .77011 S9 1.73000 1.00430 60 1.92000 1.071S8 61 4.84000 1.76473 62 2.880 1.35584 63 4.10000 1.62924 64 1 .48000 .90826 65 3.67000 1.54116 66 .78000 .57661 67 1.46000 .90016 68 1.65000 .97456 69 3.23000 1.44220 70 1.92000 1.07158 71 4.51000 1.70656 72 1.48000 .90826 73 1.33000 .84587 74 .39000 .32930 75 1.33000 .84587 76 1.37000 .86289 77 2.02000 1.10526 78 1.68000 .98582 79 4.51000 1.70656 80 2.34000 1.20597 81 3.33000 1.46557 270 82 1.16000 .77011 83 1.86000 1.05082 84 1.920 1.07158 85 3.23000 1.44220 86 2.640 1.29198 87 3.96000 1.60141 88 2.22000 1.16938 89 4.67000 1.73519 90 1.90000 1.06471 91 2.61000 1.28371 92 3.27000 1.45161 93 2.26000 1.18173 94 2.36000 1.21194 95 2.30000 1.19392 96 1.55000 . 93609 Option number = ? SELECT ANY KEY Exit list procedure Return to AOV2 Enter number of desired funtion: 2 NuMber of factors in design ? (2, 3, or 4) 3 Number of levels of factor A ? Select nested design Number of levels of factor B ? 6 Nunber of levels of factor C ? 8 NuMber of samples ? 1 Is the above information correct ? YES Which design (by number) is to be used ? 3 Which factor is P: A,B,C ? A Which factor is Q: B,C ? C Do YOU want to assign names to the factors ? YES Enter the name for factor A (<ii characters) ? DISPLAY Enter the name for factor B (<ii characters) ? CARD STYLE Enter the name for factor C <<ii characters) ? STORES Which variable number contains the response ? 2 No. of decimal places to print calculated values. 4 271 NESTED ANALYSIS OF VARIANCE THANKSGIVING GREETING CARD EVALUATION DESIGN Nonber of factors = 3 No. of levels of factor A = 2 No. of levels of factor B = 6 No. of levels of factor C = 8 No. of Minor replications (sanples) Response variable(s) are ■■ Variable no. 2 LN(Y+i) = 1 MEANS * Overall Mean 1.1068 * Main Effect Means : Factor A - DISPLAY Levels ( 1 - 2 ) :" 1.0711 1.1426 Factor B - CARD STYLE Levels ( 1 - 6 > .9621 1.2082 1.2469 Factor C - STORES Levels ( 1 - 8 ) : 1.1236 .9108 1.4248 1.2798 1.1403 .9144 1.1372 .9105 .9817 1.1730 1.0825 * Two Way Interaction Means Factor A - - DISPLAY down and Factor B - CARD STYLE across 1 2 3 4 5 6 1 .8710 1.1132 1.2307 1.2365 1.1404 .8350 2 1.0533 1.2329 1 . 18S8 1.2574 1.1401 .9859 Factor A - - DISPLAY down and Factor C - STORES acr oss 1 2 3 4 5 6 7 8 1 .9388 .8767 1.0420 1.6310 .8327 1.0312 .9267 1.2899 2 1.3085 1.2882 .7795 1.2185 .9961 1.5283 1.0368 .9845 Factor B - - CARD STYLE down and Factor C - STORES across 1 2 3 4 5 6 7 8 1 1.0370 .6469 1.0393 1.2395 .9536 1.0627 .8564 .8616 2 .9914 1 . 5538 .9491 1.4816 1.0853 1.3656 1.1418 1.0974 3 1.2709 1.1376 .7343 1.6123 .9720 1.4043 .8108 1.1799 4 .5503 .7315 .4908 1.2966 .5803 1 . 4578 .7026 1.4741 272 Should the YES i.1788 .9491 i.2637 i.3706 1.7136 1.3019 1.1614 1.5480 3~way weans be printed ? 1 . 0473 1.3517 .8479 1 .0368 1 .1418 1 .0813 1.2370 1.1289 # Three Way Interaction Means Factor A - DISPLAY, Level 1 Factor B - - CARD STYLE down and Factor C - STORES across 1 2 3 4 5 6 7 8 1 .7930 .9123 1.0152 .9243 .5008 1.0852 .7930 .9439 2 1.0 006 1.1282 1.1663 1.2119 1.3429 1.6074 1.1019 1.2865 3 1.0006 .8920 1.0438 .6471 .8329 2.1529 1.1019 1.4516 4 .2546 .6523 .3148 .5423 .3577 1.6074 1.2090 1.7422 5 .8920 1.1282 1.0438 1 .2119 1.0852 1.4493 1.1019 .9933 6 1.6919 1.5390 .4121 1.0225 1.1410 1.8840 .8796 1.3218 Factor A - - DISPLAY, Level 2 Factor B - - CARD STYLE down and Factor C - STORES across 1 2 3 4 5 6 7 8 1 1.2809 1.1663 .8920 .7885 .7930 1 . 3938 1.3324 .7793 2 .9821 .7701 1.0043 1.0716 1.7647 1.3558 1.6292 .9083 3 1.5412 .5766 .9002 .9746 1.4422 1.0716 1.7066 .9083 4 .8459 .3293 .8459 .8629 1. 1053 .9858 1.7066 1 .2060 5 1.4656 .7701 1.0508 1.0716 1.4422 1.2920 1.6014 1.1694 6 1.7352 1.0647 1 .2837 1.4516 1.1817 1.2119 1.1939 .9361 ANOUA TABLE Source (Nane) Nested Analysis of Variance df Sums of Squares Mean Square Note: Below AOV table does not show F ratios because the appropriate error mean square depends on the design. Total 95 A DISPLAY 1 CCA) 14 B CARD STYLE 5 AB 5 CB(A) 70 12.5531 .1225 5.3373 1.5185 .1687 5.4062 .1321 .1225 .3812 .3037" .0337 .0772 F = 3.93 This table shows the differences among card styles are even more significant. Specify a new variable for this design ? NO Enter desired nuMber: 3 Is the design displayed on the CRT the latest one? YES Multiple comparisons 273 Multiple Conparisons ^ ^ ^ ^ ^ ^ ^ '^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ fli ^ ^ *^ ^ * ™ *p ^ * ^ *p ^ * * * ^ Enter i or 2 to specify type of Means i Least significant difference Which Factor/Main Effect(A,B, or Oshould be used? B Error Mean Square, associated Degrees of FreedoM .07723,70 Which procedure would you like to use ? i What level of Alpha are you going to use ? .05 Enter table value froM Student's t with d.f.= 70 ? 1.99 Is a plot of LSD desired ? YES Plot on CRT ? NO Plotter indentifier string (press CONT if 'HPGL')? Enter select code, bus t (defaults are 7,5)? Which PEN color should be used? i Enter nane for labelling Y axis (< ii characters) LN(Adj.*> Beep will sound when plot done, then press CONT To interrupt plotting press 'STOP' key. 274 MULTIPLE COMPARISON PLOT : LSD THANKSGIVING GREETING CARD EVALUATION •n (E 2.0 r 1.8 - 1.6 - 1.4 - 1.2 - 1.0 - .8 - .6 - .4 - .2 - e.e - i if 1 ' CARD STYLE LEVEL NUMBER Least Significant Difference Error Mean square = .077£3 Degrees of freedon = 70 Harnonic average sanple size Alpha level = .05 Table value fron Student's t LSD value = .1955 16.0000 i.99 Multiple Comparisons on Factor CARD STYLE Level Mean Sa rtple Size Separation 4 .9105 16 a i .9621 16 ab 3 1.1403 16 be 5 1.1730 16 c 2 1.2082 16 c 6 1 . 2469 16 c Note: Where the levels' do not contain the same letters the factor levels are significantly different using the LSD procedure. 275 Another Separation Procedure on Factor 2 ? NO Another Factor to be used ? NO Multiple Conparison Procedures on Two-Way Means ? NO Enter nuwber of desired funtion= 9 Return to BSDM 276 Split Plot Example Example Hicks (1973, ex. 13.1) describes a split-plot experiment in which four oven temperatures and three baking times were investigated with regard to the life, Y, of an electrical component. The oven temperatures and the replications (blocks) are in the whole plot while the baking times are in the subplots. Only one electrical component was subjected to the stress condi- tions within each block-baking time-temperature combination. The data table is shown below: Replication 1 Baking Time (A) 5 10 15 5 10 15 5 10 15 580 Oven Temp. (B) 600 620 640 217 158 229 223 233 138 186 227 175 152 155 156 188 126 160 201 201 130 170 181 195 147 161 172 162 122 167 182 110 185 181 201 113 180 182 199 Since this is a balanced design with three replications, we need only use one variable for data entry. The data is entered across each row in the table above. Hence, three groups of replica- tions are available with factor A as baking time and factor B as oven temperature. Within the split-plot program, we answer that there are two factors and three major replica- tions. The design is specified with factor B in the whole plot and factor A in the subplot. The F ratio shows only significant temperature effects (B). The HSD multiple comparison procedure suggests that oven temperature two is significantly lower in life time readings than are the other three temperatures. This conclusion is supported, as should be expected, by the more 'liberal' LSD procedure shown on the next multiple comparison output. If one runs this data set through the Factorial Analysis in order to separate the replication interaction terms as suggested by Hicks, one finds a highly questionable interaction between replications and baking time. To do this, you specify factor A as replication, factor B as baking time, and factor C as oven temperature in the FACTORIAL program. Note that in Hicks the printed AOV table shows the mean square for AB (replication by baking time) is 1755.32 which is substantially larger than any of the other replication interac- tions. 277 After looking at the data set, we believe that Hicks may have rearranged the original data, since you would ordinarily not expect the replication interaction terms to differ by that much in a split plot. See if you agree. i^ ^L- ^ ^f tb dj *y ij> t|j ^ ^f ^m ^ jj j* *rf ^ *■> ^^^U/ U/ ^ ^ ^ U/ ^ t^ t^ i^ ^r \b ^ ^ ^/ i^ t|/ ^ ^ ^ il/ ^ ^ ^r *|/ *b U/ tl/ ^ tb \b ^^^ ^ -w^ \Lr^v\^^r^^^^ ^r Uf ^^ <Jf 'Jj iAf -X 1 ^ t^ i±f iAf ^/ \L' ^ ^ ^ * ^ ^ * * * ^ ^ ^ ^ ^ ^* ^ ^ ^ ^ ^ ^ ^ ^ ^ T ^ ^ T* ^ ^ ^ ^ ^ ^ ^ ^ ^ * ^ ^ ^ * ^ ™ ^ ^ ^ ^ * * * ^ ^ * ^ ^ * *^ *P ^ ^^^^^Jp^^^'f^^^ <^ ^ ^ ^ *P *p ^* ^ ^ * DATA MANIPULATION * ^ ^ ^ ^ ^ ^ ^ .^ ^ ^ ^ ^ ^ J|\ ^k J|k ^ ^ J^ /p ^ ^ ^ ^ flk J|% J|l ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ T* ^ ^ 'P ^ ^ ^ ^ T* T* ^ ^ T* ^ Enter DATA TYPE (Press CONTINUE for RAW DATA): i Raw data Mode number = ? 2 On mass storage Is data stored on prograw's scratch file (DATA)? NO Data file nsne = ? HICKS: INTERNAL Was data stored by the BS&DM system ? YES Is data MediuM placed in device INTERNAL ? YES Is prograM nediuw placed in correct device ? YES HICKS SPLIT PLOT ON COMPONENT LIFE TIME Data file nane: HICKS ■■ INTERNAL Data type is: Raw data NuMber of observations: 36 NuMber of variables: i Variable nanes: i. LIFETIME Subfiles: NONE SELECT ANY KEY Option nuMber = ? i Select special function key labeled-LIST HICKS SPLIT PLOT ON COMPONENT LIFE TIME Data type is: Raw data I OBS(I> i 217.00000 6 138.00000 ii 155. 00000 i6 201.00000 2i 195.00000 26 122.00000 VARIABLE # 1 (LIFETIME) OBS(I + i) 0BS(I+2) OBSU+3) 0BS(I+4) 158.00000 229.00000 223.00000 233.00000 186.00000 227.00000 175.00000 152.00000 156.00000 188.00000 126.00000 160.00000 201.00000 130.00000 170.00000 181.00000 147.00000 161.00000 172.00000 162.00000 167.00000 182.00000 170.00000 185.00000 278 31 181.00000 201.00000 213.00000 180.00000 182.00000 36 199.00000 Option nuciber = ? SELECT ANY KEY Select special function key labeled-ADV STAT Remove BSDM media Insert AOV2 media Enter nunber of desired funtion: 3 Split plot designs NuMber of factors in design ? (2 or 3) o NuMber of levels of factor A 7 3 NuMber of levels of factor B ? 4 NuMber of blocks in this design ? 3 No. obs per trt conbina tion in each block <sawple) ? 1 Do YOU want to assign nacies to the factors ? YES Enter the nane for factor A <<11 characters) 7 BAKINGTIME Enter the narie for factor B <<ii characters) 7 OVEN TEMP. Which factor(s) are in the whole plots ? B Which factor(s) are in the split plots ? A Is the above infor«ation correct ? YES No. of decimal places to print calculated values. 4 SPLIT PLOT ANALYSIS OF VARIANCE HICKS SPLIT PLOT ON COMPONENT LIFE TIME DESIGN NuMber of factors = 2 No. of levels of factor A = 3 No. of levels of factor B = 4 No. of Major replications (blocks) = 3 No. of Minor replications (saMples) = 1 Subfiles will be ignored Whole plot factor(s) are : Factor B Split-plot factor(s) are ■■ Factor A Response variable(s) are Variable no. 1 LIFETIME MEANS * Overall Mean = 178.4722 279 * Block and Main Effect Means : Factor Blocks - Levels < 1 - 3 ) 187.4167 169.3333 178.6667 Factor A - BAKINGTIME Levels ( 1 - 3 ) = 177.9167 183.5833 173.9167 Factor B - OVEN TEMP. Levels < 1 - 4 > = 194.8889 148.6667 176.7778 193.5556 * Two Way Interaction Means Factor A - - BAKINGTIME 1 down and Factor B - 2 OVEN TEMP . across 3 4 1 189 .0000 135.3333 185.3333 202.0000 2 201 3333 151.0000 179.0000 203.0000 3 194, 3333 159.6667 166.0000 175.6667 ANOVA TABLE Source (Name) Total Blocks B OVEN TEMP. Error (a) A BAKINGTIME BA Error <b) Split Plot Analysis of Variance df Sums of Squares Mean Square F Ratio F-Prob 35 2 3 6 6 16 29330.9722 1962.7222 12494.3056 1773.9444 566 . 2222 2600.4444 9933.3333 838. 0278 981 4164 295 3611 .7685 .6574 3 14 .319 .086 .1070 .0040 283 433 620 liii ,4074 .8333 .456 .698 .6418 .6551 NOTE; F tests assume that all factors are fixed Enter desired number: 1 Is the design displayed on the CRT the latest one? YES Only factor B has a significant difference among effects. Orthogonal polynomial comparisons Orthogonal Polynomial Comparisons Orthogonal polynomial comparisons on FACTOR 1 ? YES Enter the max degree of orthogonal poly 2 Value associated with level # i of FACTOR 1 ? 5 280 Value associated with level # 2 of FACTOR i ? 10 Value associated with level # 3 of FACTOR i ? is Is the above information correct ? YES Enter Error Mean square, degrees of freedom 620.83,16 From AOV table Orthogonal Polynomial Decomposition on BAKINGTIME Degree SS F-Ratio F-Prob i 96.0000 .1546 .69934 2 470.2222 .7574 .39701 Level of Treatments : 5 10 15 Orthogonal poly comparisons on another FACTOR? YES Orthogonal polynomial comparisons on FACTOR i ? NO Orthogonal polynomial comparisons on FACTOR 2 ? YES Enter the max degree of orthogonal poly 3 Value associated with level # i of FACTOR 2 ? 580 Value associated with level # 2 of FACTOR 2 ? 600 Value associated with level # 3 of FACTOR 2 ? 620 Value associated with level # 4 of FACTOR 2 ? 640 Is the above information correct ? YES Enter Error mean square, degrees of freedom 295.66,6 From AOV table Orthogonal Polynomial Decomposition on OVEN TEMP. Degree SS F-Ratio F-Prob i 261.6056 .8848 .38320 2 8930.2500 30.2045 .00152 3 3302.4500 11.1698 .01557 Level of Treatments = 580 600 620 640 Orthogonal poly comparisons on another FACTOR? NO Eiinter number of desired funtion: 6 Multiple comparisons Is the design displayed on the CRT the latest one? YES 281 ************************************************** Multiple CoMparisons ************************************* )K )|( # **** 1 |( ##)K * ) |< ## ^ # ) ( ( iK ^ ###)K ^^^ # ^ # ^^^ # ^^^ # ^^^^^ Enter i or 2 to specify type of Means i Which Factor/Main Effect(A or EDshould be used 1 B Error Mean Square, associated Degrees of Freedow 295.66,6 Which procedure would you like to use ? What level of Alpha are you going to use ? .05 for 4 Means, d.f.= 6 ? 4.9 Is a plot of HSD desired ? YES Plot on CRT ? NO Plotter indentifier string (press CONT if 'HPGL')? Enter select code, bus # (defaults are 7,5>? Which PEN color should be used? i Enter naMe for labelling Y axis << ii characters) LIFE TIME Beep will sound when plot done, then press CONT To interrupt plotting press 'STOP' key. Tukey's HSD 282 MULTIPLE COMPRRISON PLOT : TUKEY'S HSD HICKS SPLIT PLOT ON COMPONENT LIFE TIME u Ju 209.0 201.5 194.0 186.5 179.0 171.5 164.0 156.5 149.0 141.5 134.0 OVEN TEMP. LEVEL NUMBER Tukey's HSD Error Mean square = 295.66 Degrees of freedom = 6 HarMonic average sawple size = 9.0000 Alpha level = .05 Table value fron Studentized range = 4.9 HSD value = 28.0848 Multiple Cowparisons on Factor OVEN TEMP. Level Mean Sanple Size Separation 2 i48.6667 9 a 3 176.7778 9 b 4 193.5556 9 b i 194.8889 9 b 283 Another Separation Procedure on Factor 2 ? NO Another Factor to be used ? NO Multiple CoMparison Procedures on Two-Way Means ? NO Enter nurtber of desired funtion: & Multiple comparisons Is the design displayed on the CRT the latest one? YES Least significant difference **************************************************** Multiple Comparisons ******************************************************************************** Enter i or 2 to specify type of Means i Which Factor/Main Effect(A or EOshould be used ? B Error Mean Square, associated Degrees of Freedon 295.66,6 Which procedure would you like to use ? i What level of Alpha are you going to use ? .05 Enter table value fro« Student's t with d.f.= 6 ? 2.447 Is a plot of LSD desired ? YES Plot on CRT ? NO Plotter indentifier string <press CONT if 'HPGL')? Enter select code, bus # (defaults are 7,5)? Which PEN color should be used? i Enter nane for labelling Y axis << ii characters) LIFE TIMES Beep will sound when plot done, then press CONT To interrupt plotting press 'STOP' key. 284 MULTIPLE COMPARISON PLOT : LSD HICKS SPLIT PLOT ON COMPONENT LIFE TIME (n u r 205.8 198.3 191.6 184.9 178.2 171.5 164.8 158.1 151.4 144.7 138.0 OVEN TEMP. LEVEL NUMBER Least Significant Difference Error Mean square = 295.66 Degrees of freedow = 6 Harrtonic average sanple size = 9.0000 Alpha level = .05 Table value fron Student's t = 2.447 LSD value = 19.8346 Multiple Conparisons on Factor OVEN TEMP. vel 2 Mean 148.6667 Sanple Size 9 Separation a 3 176.7778 9 b 4 i 193.5556 194.8889 9 9 b b 285 Another Separation Procedure on Factor 2 V NO Another Factor to be used ? NO Multiple Comparison Procedures on Two~Way Means V NO Enter number of desired funtion* i Factorial design Number of factors in design ? (2, 3, or 4) 3 Number of levels of factor A ? 3 Number of levels of factor B ? 3 Number of levels of factor C ? 4 Number of blocks in this design ? i No. obs per trt combination in each block (sample) ? i Is the above information correct ? YES Do YOU want to assign nanes to the factors ? YES Enter the name for factor A (<ii characters) ? REP Enter the name for factor B (<li characters) ? BAKE TIME Enter the name for factor C (<ii characters) ? OVEN TEMP. No. of decimals for printing calc. values(<=7). 4 * FACTORIAL ANALYSIS OF VARIANCE * 4 4 4 ^' A' 4r W ^ 4 ^ At 4 W ^ ^ 4 ^ ^ W ^ ^ 4 4 4 4 4 ^ ^ 4 4 ^ ^lf W W 4 4 4 W W it W Jf W 4 W ifr ^t ^ W iL* W iV 4 W 4 W ilf it W 4 4 ^ ^ ^ W ^ 4f *At ^ 4 4r 4 W W ^ \fr W ^ 4 ^ HICKS SPLIT PLOT ON COMPONENT LIFE TIME DESIGN Number of factors = 3 No. of levels of factor A = 3 No. of levels of factor B = 3 No. of levels of factor C = 4 No. of major replications (blocks) => i No. of minor replications (samples) = i Subfiles will be ignored Response variable(s) sre ■■ Variable no. i LIFETIME MEANS * Overall mean = 178.4722 286 * Main Effect Means Factor A - REP Levels ( i - 3 ) : 187.4167 169.3333 178.6667 Factor B - BAKE TIME Levels ( i - 3 ) = 177.9167 183.5833 173.9167 Factor C - OVEN TEMP. Levels ( 1 - 4 ) ■ 194.8889 148.6667 176.7778 193.5556 # Two Way Interaction Means Factor A - REP 1 2 3 REP Factor A 1 2 3 Factor B 1 •> down and Factor B - BAKE TIME 1 2 206.7500 196.0000 168.7500 170.5000 158.2500 184.2500 OVEN TEMP. down and Factor C 1 2 208.3333 149.3333 194.6667 134.3333 181.6667 162.3333 acr oss 3 159.5000 168.7500 193.50 across 3 190 .0000 163.6667 176.6667 BAKE TIME down and Factor C - OVEN TEMP. acre 1 2 3 189.0000 135.3333 185.3333 201.3333 151.0000 179.0000 194.3333 159.6667 166.0000 Should the 3-way means be printed ? YES * Three Way Interaction Means 202.0000 184.6667 194.0 00 4 202.0000 203.0000 175.6667 Factor A - - REP, Level 1 Factor B - - BAKE TIME down 1 and Factor C - 2 OVEN TEMP. 3 acr oss 4 i 217.0000 158.0000 229. 0000 223. 0000 2 233.0000 138.0000 186. 0000 227. 0000 3 175.0000 152.0000 155. 0000 156. 0000 Factor A - REP, Level 2 Factor B - - BAKE TIME down 1 and Factor C - 2 OVEN TEMP. 3 across 4 1 188.0000 126.0000 160. 0000 201 . 0000 2 201.0000 130.0000 170. 0000 181. 0000 3 195.0000 147.0000 161. 0000 172. 0000 Factor A - - REP, Level 3 Factor B - - BAKE TIME down 1 and Factor C - 2 OVEN TEMP . 3 across 4 1 162.0000 122.0000 167. 0000 182 0000 2 170.0000 185.0000 181 0000 201. 0000 3 213.0000 180.0000 182 0000 199 0000 ANOVA TABLE 287 Factorial Analysis of Variance Source <Nane> df Total 35 A REP 2 B BAKE TIME 2 C OVEN TEMP. 3 AB AC 4 6 BC 6 ABC 12 Sums of Squares 29330.9722 1962.7222 566.2222 12494.3056 7021.2778 1773.9444 2600.4444 2912.0556 Mean Square 838 981 283 4164 1755 295 433. 242 0278 3611 liii 7685 3194 6574 4074 6713 We can see that the interaction between bak- ing temperature and replication is significant. Enter desired number; 7 Enter nuMber of desired funtion; 4 Exit factorial design. Return to BSDM. 288 One Way AOV Example Tissue Culture Growth was studied after exposure to five 'sugar' treatments; control, 2% fructose, 1% glucose and 1% fructose, and 2% sucrose. The response, Y, is length (in ocular units) of pea section grown in tissue culture with auxin present. The data was entered using One-Way AOV mode 2 in which all treatments are stored in one variable. Each treatment has ten observations (samples). Hence, observations 1 to 10 are in the first treatment, observations 11 to 20 are in the second treatment, etc. The F ratio for treatments shows a very strong indication that the population treatment levels are significantly different. Both the LSD and Duncan Multiple Comparison procedure separate the treatments into three non-overlapping groups - treatments 4, 3, and 2: and treatment 5; and treatment 1 (control). Hence, if you add either glucose (2) or fructose (3) or both (4) you get shorter lengths that if you use just sucrose which is in turn shorter than the control treatment. * DATA MANIPULATION * Enter DATA TYPE (Press CONTINUE for RAW DATA) : i Raw data Mode number = ? 2 On mass storage Is data stored on program's scratch file (DATA)? NO Data file name = ? TISSUE INTERNAL Was data stored by the BS&DM system ? YES Is data medium placed in device INTERNAL ? YES Is prograM medium placed in correct device ? YES TISSUE CULTURE GROWTH Data file name: TISSUE = INTERNAL Data type is; Raw data Number of observations: 50 Number of variables: i Variable names: i. GROWTH Subfile name beginning observation number of observations 1. CONTROL i 10 2. 2/£ GLUCOSE ii 10 3. ZX FRUCT. 2i 10 4. iXGLU+iFRU 31 10 5. 2XSUCR0SE 41 10 289 SELECT ANY KEY Option nuMber = i Select special function key labeled-LIST List all data Data type is: Raw data TISSUE CULTURE GROWTH VARIABLE # 1 (GROWTH) I OBS(I) i 75.00000 6 71.00000 11 57.00000 16 60.00000 21 58. 00000 26 56.00000 31 58.00000 36 56.00000 41 62.0 0000 46 62.00000 Option SELECT nu fiber = ? ANY KEY OBS(I+l) 67.00000 67.00000 58.00000 60.00000 61.00000 61.00000 59.00000 58.00000 66.00000 65.00000 0BS(I+2) 70.00000 67.00000 60.00000 57.00000 56.00000 60.00000 58.00000 57.00000 65.00000 65.00000 Enter nunber of desired funtion: 1 How Many treatcients in this analysis ? 5 Enter nafie for treatMent/f actor (<11 characters) TISSUE Do YOU want to assign naoes to the treatnents ? YES Enter the name for treatMent 1 <<ii characters) ? CONTROL Enter the nane for treatMent 2 (<ii characters) ? 27. GLUCOSE Enter the nane for treatMent 3 (<11 characters) ? 2X FRUCT. Enter the nane for treatMent 4 (<ii characters) 7 1XGLU+FRU Enter the naMe for treatMent 5 (<11 characters) ? 2XSUCRQSE Are the nanes displayed on the CRT correct ? YES TreatMent definition Mode = ? 2 Enter the nuMber of observations in treatMent i ? 10 Enter the nuMber of observations in treatMent 2 ? 10 Enter the nuwber of observations in treatMent 3 ? 10 0BS(I+3) 75.00000 76.00 00 59.00000 59.00000 58.00000 57.00000 61.00000 57.00000 63.00000 62.00000 Exit list procedure 0BS<I+4) 65.00000 68.0 000 62.00000 61.00000 57.00000 58.00000 57.00000 59.00000 64.00000 67.00000 Select ADV STAT Remove BSDM media Insert AOV1 media Select one way classification 290 Enter the number of observations in treatment 4 ? 10 Enter the number of observations in treatment 5 ? 10 Subfile * (enter to ignore subfile) = ? Is the design description on the CRT correct ? YES ^ ^ * ^ ^ * ^ ^ ^ ^ ^^ ^ ^ V" V ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ - ^ ™ * ™ ^ * ^"^ ^ ^ ^ ^ * * * * * ^ ^ ^ *P * * ™ *P ^"^ ^ ^ ■!* *P *^ ^ ^ *P *P * ^ ^ t* *r *T t ^ ™ *P ^ * T* *^*f* ^ *P *l* *P ONE-WAY ANALYSIS OF VARIANCE: TISSUE CULTURE GROWTH * of decimals for printing calculated values(<=7>? 4 DESIGN # of treatments = 5 # of observations in treatment i = 10 # of observations in treatment 2 = 10 # of observations in treatment 3-10 # of observations in treatment 4 = 10 # of observations in treatment 5 = 10 Response - GROWTH SUMMARY STATISTICS Treatment Statistics Treatment nc CONTROL ZX GLUCOSE 2% FRUCT. 1ZGLU+FRU 2XSUCR0SE Total 701.0000 593.0000 582.0000 580.0000 641.0000 70 Mean 1000 59.3000 58.2000 58.0000 64.1000 Stan . Dev 3.9847 .6364 .8738 .4142 .7920 N 10 10 10 10 10 Overall 3097.0000 61.9400 5.1958 50 ANOVA TABLE Source Df Total 49 TISSUE 4 Error 45 One-Way Analysis of Variance Table SS MS F-Ratio F-Prob 1322.8200 1077.3200 269.3300 49.3680 0.00000 245.5000 5.4556 We can see that the effects of population treatment levels are significantly different. Bartlett's test of homogeneity of variance ; Chi-square value = 13.939 with degrees of freedom = 4 Do you wish to specify another subfile ? X 2 (4,.05) = 9.488,X 2 (4„01) = 13.277 NO Both are smaller than the calculated X 2 value of 1 3.9386, so we know that the variances are Enter desired number; not homogeneous. 3 Multiple comparisons Is the design displayed on the CRT the latest one? YES 291 MULTIPLE COMPARISONS Jit************************************************* Which procedure would you like to use ? i What level of Alpha are you going to use ? .05 Enter table value forn Student's t with d . f = 45 ? 2.014 Is a plot of LSD desired ? YES Plot on CRT ? NO Plotter indentifier string (press CONT if 'HPGL')? Plotter select code, bus * (defaults are 7,5)? Beep will sound when plot done, then press CONT. Which PEN color should be used? i Enter nawe for labelling Y axis(<ii characters) LENGTH To interrupt plotting, press 'STOP' key. Least significant difference I I- Z MULTIPLE COMPRRISON PLOT : LSD TISSUE CULTURE GROWTH 72.0 70.4 68.8 67.2 65.6 64.0 62.4 60.8 59.2 57.6 56.0 _i_ 2 3 4 5 TISSUE LEVEL NUMBER 292 Least Significant Difference Error Mean square = 5.4556 Degrees of freedoM = 45 Harnonic average sample size = 10.0000 Alpha level = .05 Table value from Student's t = 2.014 LSI) value = 2.1037 Multiple CoMparisons on TISSUE Level Mean Sample Size 4 58.0000 10 3 58.2000 10 2 59.3000 10 5 64.1000 10 i 70.1000 10 Another Separation Procedure on TISSUE ? YES Which procedure would you like to use ? 3 What level of Alpha are you going to use ? 05 Separation a a a b c This separates the treatment into three non- overlapping groups, treatments 4, 3, and 2 in one group, 5 in another, and 1 in the last. Select Duncan's Test Duncan's Test Error Mean square = 5.4556 Degrees of freedom = 45 HarMonic average sample size = 10.0000 Alpha level = .05 Means Separated Table Value for 5 Means and d.f.= 45 ■> 3. 16 5 3.1600 for 4 Means and d.f.= 45 Required Difference 2.3340 3.095 4 3.0950 for 3 Means and d.f.= 45 ? 3.0 05 3 3.0050 for 2 Means and d.f.= 45 ? 2.85 2 2.8500 2860 >195 2.1051 Multiple CoMparisons on TISSUE Level Mean Sample Size Sep aration 4 58.0000 10 a 3 58.2000 10 a 2 59.3000 10 a S 64.1000 10 b 1 70.1000 10 c Same conclusion as above 293 Another Separation Procedure on TISSUE ? NO NOTE: HARMONIC AVER SAMPLE SIZE OF 10 USED IN CALCULATING THE MULTIPLE COMPARISONS. Enter nunber of desired funtion^ 9 Return to BSDM 294 Two Way (Unbalanced) Example The following data from Bancroft (1968, Ex. 1.3) is a two-way classification with factor A representing five different batches of silver and factor B representing two batches of iodine which are used to make silver iodine. The response, Y, is the reacting weights (coded). Apparently several samples were lost because the design is unbalanced. Iodine Sj Silver 22 -1 25 40 18 41 23 41 13 29 20 37 49 61 50 55 The data is entered using two variables. Variable one is used to identify the rows and columns and variable two contains the response, Y. Hence, a value in variable one of 0301 indicates that the observation in variable two is from the third level of silver (A) and the first level of Iodine (B). The Two-Way Unbalanced routine is used with the method of fitting constants selected as the desired procedures because of the presence of empty cells. This analysis indi- cates that the sampled batches of silver do not support the hypothesis of equality for the population means. The multiple comparison procedure by Student, Newman & Keuls (SNK) shows no separa- tion between the five samples of silver. This probably can be explained by both the conserva- tive nature of the SNK procedure and the fact that the AOV procedure uses an adjusted mean square for silver. 295 ************************************************* * DATA MANIPULATION * Enter DATA TYPE (Press CONTINUE for RAW DATA): i Mode number = ? Is data stored on program's scratch file (DATA)? NO Data file nafie = ? SLVRIODN INTERNAL Was data stored by the BS&DM system ? YES Is data medium placed in device INTERNAL ? YES Is program Medium placed in correct device ? YES Raw data On mass storage CODED REACTIN WEIGHTS OF SLIVER IODINE Data file nans: SLVRIODN = INTERNAL Data type is: Raw data NuMber of observations: 16 Number of variables: 2 Variable names: i. ROWjCOLUMN 2. RWEIGHT Subfiles: NONE SELECT ANY KEY Option nu nber = ? 1 Enter Method for listing data: 3 Select special function key labeled-LIST CODED REACTIN WEIGHTS OF SLIVER IODINE Data type is: Raw data Variable * 1 Variable # 2 (ROWjCOLUMN) (RWEIGHT ) OBS* i 101 .00000 22.00000 2 iOi .00000 25.00000 3 201 .00000 41.00000 4 201 .00000 41.00000 5 30i .00000 29.00000 6 301 . 00000 20.00000 7 301. 00000 37.00000 8 401. 00000 49.00000 9 401. 00000 50.00000 296 10 501 00000 55 00000 ii 102. 00000 -1. 00000 12 102. 00000 40 00000 13 102. 18. 00000 14 202. .00000 23, 00000 15 202, 00000 13. 00000 16 402. 00000 61. 00000 Option number = SELECT ANY KEY Exit list routine Select special function key labeled-ADV STAT Remove BSDM media Insert AOV1 media Two-way unbalanced design Enter nuciber of desired funtion= 2 Data storage type = 2 Variable number for packed identification = 1 Enter # of rows, * of columns (separate by comma) 5,2 Do YOU wish to label the row and column factors ? YES Enter nana of row factor (<ii characters) SILVER Enter name of column factor <<ii characters) IODINE Enter the variable number for response 2 Is the above information correct ? YES TWO-WAY UNBALANCED ANALYSIS OF VARIANCE: CODED REACTIN WEIGHTS OF SLIVER IODINE * of decimal places for calculated values <<=7)? 4 DESIGN # of rows = 5 # of columns = 2 Response = RWEIGHT SUMMARY STATISTICS Row Column 1 1 1 2 2 1 2 2 3 1 4 i 4 2 5 1 Subclass Statistics Total Mean 47.0000 23.5000 57.0000 19.0000 82.0000 41.0000 36.0000 18.0000 86.0000 28.6667 99.0000 49.5000 61.0000 61.0000 55.0000 55.0000 Stan . Dev N 2.1213 2 20.5183 3 .0000 2 7.0711 2 8.5049 3 .7071 2 0.0000 1 0.0000 1 297 Mean N 20 .8000 5 29 .5000 4 28 ,6667 3 53 .3333 3 55 .0000 i Row Statistics Row Total 1 104.0000 2 118.0000 3 86.0000 4 160.0000 5 55.0000 Colunn Statistics Col Total Mean N 1 369.0000 36.9000 10 2 154.0000 25.6667 6 ANOVA TABLE Preliciinary AOV ( Test two way Model ) Source Df SS MS F-Ratio F-Prob Total 15 4255.4375 Subclass 7 3213.7708 4S9.ii0i 3.5260 .04908 Error 8 1041.6667 130.2083 Prelininary AOV ( Test for Interaction ) Source Df SS MS F-Ratio F-Prob Total 15 4255.4375 Main 5 2722.2592 544.4518 4.1814 .03641 Int 2 491.5116 245.7558 1.8874 .21308 Error 8 1041.6667 130.2083 4. 4 4 4. 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 44 4 4 4 44 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Analysis of Variance ( Method of Fitting Constants ) Source Df SS MS F-Ratio F-Prob Total 15 4255.4375 SILVER 4 2572.3042 643.0760 IODINE <Adj) i 149.9550 149.9550 1.1517 .31450 IODINE 1 473.2042 473.2042 SILVER <Adj) 4 2249.0550 562.2638 4.3182 .03749 Int 2 491.5116 245.7558 Error 8 1041.6667 130.2083 4 4 4 4 4 4 4' 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Enter desired nuMber; 3 Multiple comparisons Is the design displayed on the CRT the latest one? YES 444444 ^^^44444444444444444444444444444444444444444444444444444444444444444444444 298 MULTIPLE COMPARISONS Enter i or 2 to specify type of Means i Which Factor/Main Effect<A or EOshould be used ? A Which procedure would you like to use ? 4 What level of Alpha are you going to use ? . 05 Student Newman-Kevls Student-Newnan-Keuls Test Error Mean square = 130,2083 Degrees of freedoM = 8 HarMonic average saMple size = Alpha level = .05 P "?A7>P . 36£ Means Separated Table Value for 5 Means and d.f.= 8 ? 4.89 5 4.8900 for 4 Means and d.f.= 8 ? 4.53 4 4.5300 for 3 neans and d.f.= 8 ? 4. 04 3 4.0400 for 2 Means and d.f.= 8 ? 3.26 2 3.2600 Required Difference 36.3053 33.6325 29.9945 24.2035 Multiple CoMparisons on SILVER Level Mean SaMple Size Sep aration i 20.8000 5 a 3 28.6667 3 a 2 29.5000 4 a 4 53.3333 3 a S 55.0000 i a Another Separation Procedure on SILVER "i NO Another Factor to be used ? NO Multiple CoMparison Procedures on Two-Way Means ? NO NOTE= HARMONIC AVER SAMPLE SIZE OF 2.36220472441 USED IN CALCULATING THE MULTIPLE COMPARISONS. Enter nuMber of desired funtion-- Return to BSDM V 299 One Way Analysis of Covariancc Example An experiment to evaluate the effects of various growth stiumulants (X-4 on tomato seedlings was performed in which: X == Initial length of seedling (m.m.) Y = Growth in length (m.m.) during experiment Stimulant X-4 Stimulant BC Stimulant F32 Stimulant OX X Y X Y X Y X Y 29 22 15 30 16 12 5 23 20 22 9 32 31 8 25 31 14 20 1 26 26 13 16 28 21 24 6 25 35 25 10 26 6 12 19 37 12 7 24 33 The data was entered using the first mode of storage for the covariance program. That is, each X,Y pair was stored in two variables and each of the four treatments used different variable pairs. Hence, for the Stimulant X-4, the initial length, X, was stored in Variable 1 and the growth, Y, was stored in Variable 2; while for the stimulant OX, the X value was stored in Variable 7 and the Y in Variable 8. Each variable has five observations. The first part of the output from the One-way Covariance routines shows the within treate- ment statistics including totals, means, standard deviations, sample sizes, correlation coeffi- cients, and regression coefficients. Note that the correlation coefficient and regression coeffi- cient are for all of the data points taken together without regard to treatment group. Hence, it should not be surprising that no overall relationship exists between the X and Y variables. The test for homogeneity of regression coefficients confirms that we can accept the hypothesis that all treatment regression coefficients are essentially the same. The test for significance of pooled regression confirms that the relationship between the X and Y pooled across all treat- ments is significant (level = .0003). Whereas the F ratio for treatment differences on the X's is non-significant (level = .121 17), the F ratio on the original Y's is significant at the .00037 level. The analysis of covariance adjust- ment to the original data does not change the significance of the treatment effect ( a = . 00000), but rather makes the difference in the means even more pronounced. This is shown by studying the "Table of Means" and noting the adjustment made in the original Y means after the use of the covariate X. The use of the Tukey HSD multiple comparison procedure shows that stimulants one and three differ from all other stimulants, while no significant difference can be shown between two and four. 300 * DATA MANIPULATION * ****************************************************** Enter DATA TYPE! (Press CONTINUE for RAW DATA): i Raw data Mode nuMber = ? On mass storage Is data stored on progran's scratch file (DATA)? NO Data file nane = ? TOMATO INTERNAL Was data stored by the BSM)M systeft ? YES Is data ftediuM placed in device INTERNAL ? YES Is prograci nediuci placed in correct device ? YES EFFECTS OF GROWTH STIMULANTS ON TOMATO SEEDLING LENGTHS Data file nane: TOMATO: INTERNAL Data type is: Raw data NuMber of observations: 5 Number of variables: 8 Variable nanes i. X-4:I 2. X-4:G 3. BC:I 4. BC:G 5. F32 = I 6. F32:G 7. OX: I 8. OX = G S u b f i les: NON Select LIST key SELECT ANY KEY Option nuMber •- ? i Enter Method for listing data: 3 EFFECTS OF GROWTH STIMULANTS ON TOMATO SEEDLING LENGTHS Data type is: Raw data Variable # i Variable # 2 Variable * 3 Variable # 4 Variable # 5 (X-4:I > (X~4:G ) <BC:I ) ( BC : G ) ( F32 : I ) u«s# 1 29.000 00 22.00000 15.00000 30.0 00 16.00000 2 20.00000 22.00000 9.00000 32.0 00 00 31.00000 3 14.00000 20.000 i. . 26.00 00 26.0 000 4 21.00000 24. 00 00 6.00000 25. 000 35.0 0000 5 6.00000 12.00000 19.00000 37. 000 12.00000 301 Variable # 6 <F32>G > DBS* i 2 3 4 5 12.00000 8.00000 13.00000 25.00000 7.00000 Variable # 7 (ox a ) 5.00000 25. 0000 16.00000 10.00000 24.00000 Variable # 8 <QX:G ) 23.00000 31.00000 28.00000 26.00000 33.00000 Option nunber = SELECT ANY KEY Exit list procedure Select ADV STAT key Remove BSDM media Insert AOV1 media One way analysis of covariance Enter nunber of desired funtion: 3 How Many treatnents in this analysis ? 4 Enter a nane for treatMent/f actor << 11 characters) TREATMENT Do YOU want to assign nanes to the treatments ? YES Enter the nane for trt. 1 <<=10 characters) ? X-4 Enter the nane for trt. 2 (<=10 characters) ? BC Enter the nane for trt. 3 <<=10 characters) ? F32 Enter the nane for trt. 4 (<=10 characters) ? OX Are the nanes displayed on the CRT correct ? YES Treatnent definition Mode = ? 1 Enter the X var . , Y var . for treatnent 1 ? i>2 Enter the X var . , Y var . for treatMent 2 ? 3,4 Enter the X var . , Y var . for treatnent 3 ? 5,6 Enter the X war., Y var. for treatnent 4 ? 7,8 Is the design description on the CRT correct ? YES ***************************************************** ONE-WAY ANALYSIS OF COVARIANCE EFFECTS OF GROWTH STIMULANTS ON TOMATO SEEDLING LENGTHS ********************************************************* *********************** # of decinal places for calculated values<<=7) ? 4 302 DESIGN # of treatMents = ^ # of observations # of observations # of observations # of observations Covariate X = X-4-I Response Y = X~4 : G in treatment 1=5 in treatment 2=5 in treatMent 3=5 in treatment 4=5 SUMMARY STATISTICS TreatMent Statistics Treatment X--4 BC F3Z OX X Y X Y X Y X Y Total 90.0000 L00. 0000 50.0000 150.0000 120.0000 65.0000 80.0000 141.0000 Mean 18.0000 20.0000 10.0000 30.0000 24.0000 13.00 16.0000 28.2000 Stan . Dev N 8.5732 5 4.6904 5 7.1414 5 4.8477 5 9.7724 5 7.1764 5 8.6891 5 3 . 9623 S Overall X Y 340.000 456.0 00 17.0000 22.8000 9.4088 8.5076 TreatMen t X-4 BC F32 OX Within Treatment Regressions Corr .Coef . .8331 .8449 .6310 .9730 Regression Coef. .4558 . 5735 .4634 .4437 Overall -.0487 -.0440 ANQVA TABLE One-Way Analysis of Variance Table(X-Var iable) MS F-Ratio 2.2561 Source Df SS Total 19 1682. 0000 Treatment 3 500. 0000 Error 16 1182 .0000 Source Df SS Total 19 1375.2000 Treatment 3 924.4000 Error 16 450.8000 166.6667 73.8750 One-Way Analysis of Variance TabIe<Y-Var iable) MS F~Ratio 308.1333 10.9364 28.1750 F~Prob .12117 F-Prob .00037 We can see that the effects of X-variables have no significant difference, but the effects of Y-variables are significantly different. 303 Source Df SS MS Total 18 1371.9444 Treatfien t 3 1188.3559 396.1186 Error 15 183.5885 12.2392 Test of homogeneity of regression coefficients : F-value = .0538 with 3 and 12 degrees of freedom P(F> .05) = .98277 We consider all treatment regression coeffi- cients are the same. Test of significance of pooled regression coefficient ■■ F-value = 21.8324 with 1 and 15 degrees of freedom P<F> 21.83) = .00030 We can see that the relationship between X ,, , . „ _ „„. . .„.-...-.,.,„„„ and Y pooled across all treatments is signifi- Pooled Regression Coefficient = .475465313029 can , s Pooled Correlation Coefficient = .7699 ************************************************* One Way Analysis of Covariance Table F-Ratio F-Prob 32.3647 0.00000 ******************************************************************************** We can see that the effects of treatments are significantly different. Table of Y Means Treatment name Unadjusted Y Mean Adjusted Y Mean X-4 20.0000 19.5245 BC 30.0000 33.3283 F32 13.0000 9.6717 OX 28.2000 28.6755 ******************************************************************************** Do you want to change response for this subfile? NO Enter desired number: 3 , Multiple comparisons Is the design displayed on the CRT the latest one? YES ******************************************************************************** MULTIPLE COMPARISONS ******************************************************************************** Which procedure would you like to use ? 2 Tukey's HSD what level of Alpha are you going to use ? .05 for 4 means and d.f.= IS ? 4.08 Is a plot of HSD desired t YES Plot on CRT ? NO Plotter indentifier string (press CONT if 'HPGL')? Stand. Dev N 1 . 5646 5 1.5646 5 1.5646 5 1.5646 5 Plotter select code, bus # (defaults are 7,5)? 304 Beep will sound when plot done, then press CONT . Which PEN color should be used? i Enter nacie for labelling Y axis<<ii characters) GROWTH To interrupt plotting, press 'STOP' key. MULTIPLE COMPRRISON PLOT : TUKEY'S HSD EFFECTS OF GROWTH STIMULRNTS ON TOMRTO SEEDLING LEN o a. 37.0 33.9 38.8 27.7 24.6 21.5 16.4 15.3 12.2 9.1 6.0 X-4 LEVEL NUMBER Tu key's HSD Error Mean square = 12.2392 Degrees of freedoM = 15 Harnonic average sample size = 5.0000 Alpha level = .05 Table value fron Studentized range = 4.08 HSD value = 6.3834 Level 3 differs from Level 1 , which differs from Level 4 & 2 305 Multiple CoMparisons on TREATMENT Level Mean Sanple Size Separation 3 9.6717 5 a i 19.5245 5 b 4 28.6755 5 c 2 33.3283 5 c Another Separation Procedure on TREATMENT ? NO NOTE: HARMONIC AVER SAMPLE SIZE OF 5 USED IN CALCULATING THE MULTIPLE COMPARISONS. Enter nunber of desired funtion= Return to BSDM 306 Notes 307 Principal Components and Factor Analysis General Information Description The Principal Components and Factor Analysis Software accomplishes a variety of factor- analytic techniques. Input may be raw data, a correlation matrix, a covariance matrix, or a factor matrix. Factors are extracted from the correlation matrix. You may choose either the principal axes method or the maximum likelihood method to extract the initial factors. Orthogonal varimax or quartimax rotations and/or oblique oblimin rotations may be applied to the factor matrix. In the oblique rotation, you can control the degree of correlations among factors. Graphical presentation of the relationship between pairs of initial or rotated factors is also available. The program computes the case scores and provides a plot for the case scores between each pair of factors if the raw data has been input. Case scores may be stored on a new file for further study. For a brief discussion of the techniques and computing formulas used in these programs, see the Discussion Section. Setting Up the Data The first thing you need to do is to enter the data by using the Basic Statistics and Data Manipulation (BSDM) routines. The input may be the raw data, a correlation matrix, a covar- iance matrix, or a factor matrix. If a correlation matrix or a covariance matrix is to be entered, only the distinct elements will be requested, i.e., only the portion on and above the main diagonal. After the data has been loaded into memory, you are ready to use the Principal Components and Factor Analysis programs. Special Considerations Factor oir Principal Component Scores In the case where an observation has one or more missing values, the score for that observa- tion will not be calculated and a blank line will be printed. Storing the Correlation Matrix In the case where it would be desirable to continue analysis at another time, you may store the correlation matrix. Note that the correlation matrix can later be input as data in BSDM. 308 Principal Components Object of Program A principal components analysis for a correlation matrix may be performed by selecting this option. Principal components will be printed. A table of eigenvalues is then printed. This includes the eigenvectors as well as the proportion and the cumulative proportion of the total variance accounted for by each component. If raw data has been input, case scores on the components may be computed and stored. If a missing value is encountered in the calculation of component scores, the program will ignore that particular observation. Case scores are calculated for all observations in the data set even if the principal components were developed for only one subfile. Special Considerations Component Output Options Four output options are available and are described on the CRT display. Each option allows you to inform the program how to determine how many components should be output. When using the minimum eigenvalue size option, many researchers choose a value of 1.00, while the maximum cumulative percent some researchers use is about 90 percent. The calculations, however, will be done for all principal components, i.e., one for each variable which has been included in the analysis. The number of components which result from your selected option will be used to determine the number printed later on in this routine. Plots For both the principal components plot and the component scores plot, you may select component numbers up to and including the number of variables you originally specified for the present analysis. Of course, if you originally had twenty variables, a plot of the 19th or 20th components may not be very useful. Storing Principal Components Scores The component scores are calculated and stored in the data matrix for all components which you specify. Component scores are generated for all observations in the data set across all subfiles. This feature may be useful for cross validation of the components be- tween subfiles. 309 Factor Analysis Object of Program The extraction and rotation of the initial factors may be performed by selecting this option. Factors are extracted from a correlation matrix by the principal axes method or by the maximum likelihood method. If the principal axes method is used, three types of initial communality estimates may be used as diagonal elements of the correlation matrix; namely, squared multiple correlations, maximum absolute raw correlations or user-specified values. For the principal axes method, you determine the number of factors to be extracted from the original matrix. (The number of factors to be extracted can be specified by you or you can specify the minimum eigenvalue bound). The maximum likelihood method provides a statistical basis for judging the adequacy of a model with a specified number of factors. The unrotated factors do not generally represent useful scientific factor constructs and hence it is usually necessary to rotate. Orthogonal quartimax or varimax rotations and/or oblique rotations may be performed on a factor matrix. After rotation, a table of the variance extracted by each factor is printed along with the new factor loading matrix. The program can graphically represent the original variables in terms of their factor loadings in a space that corresponds to the common factors. Thus, using pairs of axes, one obtains p points (where p is the number of variables) whose coordinates are factor loadings with respect to pairs of the common factors (before and after rotations). If the raw data has been input, factor scores for each factor may be computed and stored after each rotation. These factor scores can be plotted in pairs. Special Considerations Factor Extraction Methods For more information on the comparisons between the principal axes and maximum likeli- hood methods of factor extraction, see references 1,2 and 3. Principal Axes Method a. The maximum number of factors must be less than p, the number of variables in the analysis and must also be less than 15. b. In choosing the minimum eigenvalue size for inclusion of a factor some analysts use a value around 1.00. Keep in mind that if the variables were uncorrelated, each eigenvalue would be 1.00 with the sum (total variance) equal to p. c. The maximum number of iterations is set by default at 25. Some analysts believe that this number should be very small, say one or two. d. The total variance is by convention, p, the number of variables in the analysis. 310 Maximum Likelihood Method (MLM) a. If p is the number of variables in the analysis, then the maximum number of factors (m) which can be extracted by the MLM cannot exceed the largest integer satisfying m<V 2 ( (2p+l)-(8p + l) |.5). This quantity is calculated in the program and displayed as the maximum number of factors that you may extract. See reference 11 for a more detailed discussion. b. This method may be very time consuming. If you have a large number of vari- ables, we suggest that you consider using the principal axes method instead. c. This method may not converge at all. If this seems to be the case (i.e., the number of iterations and/or "tries" within an iteration is excessive), the program will allow you to stop and change to the principal axes method. d. The chi-square statistic and hence the accuracy of the probability value depend on the number of observations being quite large. If your sample size is small you should interpret the chi-square values as only an approximation to the adequacy of the model. Some authors suggest that you should specify a fairly large value for alpha in the goodness-of-fit test, especially when the sample size is small. Rotations Oblique rotation schemes available in this set of programs consist of solutions generated under the oblimin criterion. A whole class of rotations may be performed, as the oblimin solution is indexed by a constant ranging between and 1. The most important and gener- ally applicable special case is bi-quartimin, which corresponds to an index value .5. Other important special cases are quartimin (index = 0) and covarimin (index = 1.0). A thor- ough discussion of these methods is given in (3). Kaiser normalization will be used automatically in the program. Output at each rotation stage consists of both primary factors and reference factors. These two types of factors are related by transformation though they are subject to different inter- pretations. In fact, columns of the primary factor matrix are simply multiples of the corres- ponding columns in the reference factor matrix. It should be noted, that since they are the elements of the primary factors (as in the orthogonal case), these elements may \>e larger than 1.00. It is the primary factors which are used in factor score calculations. The distinction between the aforementioned concepts is well explained in (2) and (3). Select New Variables After completing an analysis on certain variables and subfiles, you may wish to select other variables and/or subfiles for further analyses. You may specify the variables and subfiles you wish to investigate by choosing this option. When you decide to select new variables, the program will go back to the beginning of the PC and FA procedures. When entering the variable numbers, you may enter the numbers separated by commas, or by dashes when denoting consecutive variables, i.e., 1, 3, 6, 8-11 for variables 1, 3, 6, 8, 9, 10. 11. 311 Discussion The purpose of this section is to reacquaint you with some of the fundamentals of principal components and factor analysis. Of course, it will not be possible to cover all of the material that would be necessary to understand all aspects of principal components and factor analysis in this section. Several of the references do have very good discussions on the basics of factor analysis and how it can be used. In particular, Sections 1.1, 1.2, and 1.3 of reference #11 have a very good discussion of the basics of Factor Analysis. In addition, reference #9 has some good material in Chapters 1, 3, 4 and 5. The other references also have some useful material. The basic idea of multivariate statistical methods which fall into the category labeled Factor Analysis is to examine a matrix expressing the dependence structure of the response vari- ables and to .determine certain factors which have generated the dependence in these responses. We measure p variables on n individuals. These p variables frequently are interrelated, that is, they are not independent of one another. The objective of factor analysis and principal components is to find certain hidden, or latent, factors which are fewer in number than the original p variables. Ideally, the observable variables may be represented as functions of the latent factors in such a way that the original dependence structure among the responses will be generated by the new system, to some degree of accuracy. Hopefully, the number of latent variables or factors will be considerably less than p, the original number of variables. In simplest terms, the responses may be thought of as linear combinations of the latent factors, and the goal of factor analysis is to estimate the coefficients of these linear combinations. If we are fortunate, the coefficients of the latent factors, sometimes called factor scores, will have some meaningful interpretation in terms of the original p variables. We would hope that the number of factors, or latent variables, would be considerably less than p. Ideally, two or three primary latent variables can be used in interpreting the results of the experie- ment. They are essentially new variables - new response variables that we can use in evaluating the results of the experiment. This program performs a principal component analysis and factor analysis on a correlation matrix. Given the response variables Xi, X2, ..., X P , the technique of principal components tries to find the coefficients, say, An, A21, ..., A P i such that the linear combination Yi = A11X1 + A21 X2 + ... + A P iXp "explains" the greatest proportion of the total response variance. Having found the desired set of values, we then seek new coefficients, say, A12, A22, ..., A P 2 such that the linear combination Y2 = A12X1 + A22 X2 +4 ... -I- A P 2Xp is uncorrelated with Yi and so that Y2 explains the largest portion of the response variance remaining after Yi has been removed. In principal component analysis, we proceed in this manner until we have obtained Yi, ..., Y P . Since the Y's are chosen to be uncorrelated, their total response variance will be the same as the original Xi, ..., X P . These linear combinations of the X's are called principal components, Yi being the first principal component, Y2 being the second principal component, etc. In fact, the coefficients Aij, A2 j; ..., A pj of the jth principal component are the elements of the eigenvector of the sample correlation matrix R corresponding to the jth largest eigenvalue lj. The importance of the jth component is 312 measured by Vp. Then, if a large proportion, say 80%, of the total response variance for the X's is accounted for by a few of the Y's, we will have obtained a smaller description of the initial dependence structure. This is the main object of principal component and factor analyses - reduction of dimensionality. The program computes the principal components, eigenvalues, proportion of the total variance, and cumulative proportion of the total variance accounted for by each component. For a study of the dependence structure, factor analysis is another technique for explaining the covariance of the responses. Principal components is simply a transformation of the responses. Factor analysis proposes a model for the responses which may be written as Xi =\nYi +\i2Y 2 + ...+ XimYm + ei X P = XpiYi +AVY2 + ...+ \p m Y m + e P where Yj is called the jth common factor variable, Xa is a coefficient reflecting the importance of the jth factor for the ith response variable, and e. is called a specific factor variable. Under this model, each response variable, Xo is expressed as a linear combination of a few com- mon factor variables Yi, ..., Ym. Let F = (X..j), then F is the so-called factor loading matrix, the quantity hi 2 = ^\ 2 ij j = i is called the communality of the ith variable, and the variance of e, is called the unique variance of the ith variable. If we replace the diagonal elements of the sample correlation matrix R with communalities and denote it by R* then R* ■= FF ' This equation has been called "the fundamental factor theorem". You can choose either the principal axes method or the maximum likelihood method to extract the initial factors. A brief comparison between these two methods can be found in reference 2. Factors which are not rotated do not generally represent useful scientific factor constructs and hence it is usually necessary to rotate. The desire for correlated (oblique) factors or uncorrelated (orthogonal) factors leads to either an oblique rotation or orthogonal rotation of the initial factor solution. The program computes the case scores for either principal components or factors if the raw data has been input. For detailed information on the calculation and the interpretation of case scores, see Chapter 16 of reference 3. The program also provides a graphical presentation of the initial and rotated factors. 313 Methods and Formulae Correlation Matrix: Raw Data Input: Let the input consist of N cases with p variates per case and let X = (Xu), i = 1, ..., N;j = 1, ..., p, denote the data input matrix. The covariance matrix S = (su) is computed from (N - 1)S = % X,Xi' - Nxx' i = i where Xi' = (xji, ..., xjp), i N The correlation matrix, which is used for the principal components analysis and/or factor analysis, is then given by R = (r,j) where m = s,j/(siiSjj) V 2 Covariance or Correlation Matrix Input: Let the input consist of a matrix for p variates. For a covariance matrix, the p(p + l)/2 distinct elements of the matrix S are entered and the correlation matrix R = (n,) is com- puted by ru = Sij/(siiSj i ) 1 / 2 In the third method of input, the distinct elements of R are entered directly. Principal Components Analysis: The eigenvalues and corresponding eigenvectors of R are obtained by a variant of the QR method (see page 219 of reference 5). Let the eigenvalues of R be denoted by 6i2=023=. ..5=0p and let W = (wij) be a pxp matrix of column eigenvectors (i.e., the jth column of W consists of the elements of the eigenvector corresponding to the jth eigenvalue 8j). Then W is a matrix of principal components and 0i is the variance accounted for by the ith component. Case Scores: For each data case a vector of component scores f is computed by f = Wz where W is the matrix of principal components and z is the vector of standardized values of the variables. 314 Factor Extractions Principal Axes Method: The main diagonal elements of R are either unaltered or adjusted by one of the following options: (i) squared multiple correlations on the main diagonal where rn is given by x» = 1 - l/r u and r is the ith diagonal element of R \ The Cholesky square root method is used to obtain R" 1 (ii) maximum absolute row value among rij, j = l,...,p (iii) User specified values. The p eigenvalues and corresponding eigenvectors of R are obtained by the QR method. Let the eigenvalues of R be denoted by 0i>023=. ..s=0p and the matrix of column eigenvec- tors be denoted by W = (wi, W2, ..., w P ). The number of factors obtained is M = min {m, # of 9i such that 9i > + c}, where M is the maximum number of factors (user specified) and c is the minimum eigenvalue for factor inclusion (also user specified). Then the jth column of the factor loading matrix F = (f«) is VOjWj. New estimates of communalities are then given by i=i If more than one iteration is requested, the diagonal of R is adjusted by the new estimates of communalities and the extraction procedure is repeated. Iterations are continued until the maximum number is reached or until the maximum change in the communality estimates is less than 0.0001. If for a particular iteration any of the estimates of communalities exceed one, the process will terminate, a message will be printed, and the factor matrix for the previous iteration will be printed. Note that the number of factors may change during the iterative process. Maximum Likelihood Method: The Enslein procedure (see reference 13) is used to obtain the maximum likelihood solu- tions of the factor loading matrix F and the unique variance 0h of the ith variable. If k is the number of factors and fk(<J>) = - log rr 0i + V P - (p - k) i = k+l where 01=5023=... S20 P are the eigenvalues of 0" 1/2 R$" 1/2 and where <& = diag (<J>n, 4)22, ..., 4>pp), the ML solution of 4>» is the value i<\>\\ which minimize the value of fk(<J>). The factor loading matrix F is then computed by F = 4>" 2 W (H - I) 1 ' 2 where W = (wi, wz, ...,Wk), H = diag (9i, 02, 9k) and where wi, W2, ..., Wk are the eigenvec- tors corresponding to the k largest roots. The initial estimate of 0u = (1 - k/2p)/r " where r" is the ith diagonal element of R '. The minimization procedure of the method of Fletch and Powell is applied to the function fk(<&). For a detailed explanation of the com- putation procedure, see reference 13. 315 The program performs a sequence of maximum likelihood factor analyses for k = ki, ki + 1 , ki + 2, ... , k2, where ki is the minimum number of factors. The sequence terminates when the maximum number of factors k2 is reached or when a proper solution has been found and is acceptable from the point of view of goodness-of-fit at a user specified level of signifi- cance. If for a particular k the solution is improper (Heywood, see reference 3), having q < k of the unique variances equal to "zero", the corresponding q variables are eliminated and the partial correlation matrix R22xi is computed as follows: (i) Find R _1 by square root method (ii) Delete the q columns and rows from R" 1 and evaluate the inverse of the resulting matrix denoted by Ri (iii) R22X1 = D- 1/2 RiDi 1/2 where Di is a diagonal matrix with the diagonal elements of Ri The matrix R22X1 of order (p-q) is analyzed as before with the number of factors k-q, and the resulting solution is again examined for properness. The procedure repeats until a proper solution has been found for some k>0. A goodness-of-fit test is performed on this solution by computing X 2 = [N - 1 - (2p + 5)/6 - 2k/3]log " 4> + FF' R ' freedom v =[(p-k)2-p-k]/2 Note that R can be either the original correlation matrix or the partial correlation matrix, and p is the order of R. If the computed chi-square value is greater than the tabled value with a prescribed level of significance, the value of k is increased by one and the above procedure is repeated. If the solution is acceptable, then the process terminates. The final solution is combined with the principal components of the eliminated variables (see equations (56), (57) of referenced, if any, to give a complete solution for all the original variables. Factor Rotation: Orthogonal Rotation: (i) Quartimax method: The object of the quartimax method is to determine the orthogonal transformation matrix T which will carry the original factor matrix F into a new factor matrix B = (bij) for which P k Q = 2 Z W i = 1 j = 1 is a maximum. See page 298 of reference 3 for a detailed discussion. 316 (ii) Varimax method: The orthogonal varimax criterion requires that the final factor matrix B = (bij) maximize the function P k k / p \ ; = p X X (b../h.) 4 - X ( X by W ) i = 1 j = 1 j = 1 M = 1 ' where k hi2 = X U 2 the communality of the ith variable of the initial factor matrix. See page 304 of reference 3 for a detailed discussion. Oblique Rotation: Oblique oblimin rotation may be performed to minimize the value B = 1<-I = k r~ p p p — i X P X (Vii 2 /hi 2 ) (V., 2 /h, 2 ) - X X Vn 2 /h. 2 X V,i 2 /h, 2 <:, = 1 l_ 1 = l ' | = l 1 = 1 _l where = Z h, 2 = £, fu 2 j = i is the communality of the ith variable of the initial factor matrix. \ is the rotation constant in the range to 1. Values of \ which yield standard oblique rotations are: (i) Quartimin: \ = 0; least oblique (ii) Biquartimin: X = 0.5; less oblique (iii) Covarimin: X = 1; most oblique Both reference and primary factors are obtained. See page 324 of reference 3 for a detailed discussion. Factor Scores: Computation of factor scores begins with the calculation of a factor score coefficient matrix C where C is PXM, P is the number of variables and M the number of factors. If we let F be the given factor matrix (either orthogonal or oblique factors), and R the correlation matrix for the original data, C is calculated in one of two ways. 317 Orthogonal Factors: C - R-'F Oblique Factors: C = R X FQ where F is an oblique primary factor matrix and Q is the correlation matrix of the primary factors. Once C has been computed, the factor scores, f, for each data case are computed by f = c'z where z is the vector of standardized values of the variables. For detailed information on the calculation of the primary factor matrix and the Q matrix above, interpretation of the primary factors, reference structure matrix, and factor scores, see reference 3. References 1. Enslein, K., Ralston, A., and Wilf, H. S. (eds.) (1977) Statistical Methods for Digital Computers, John Wiley & Sons, Inc., New York. 2. Gnanadesikan, R. (1977) Methods for Statistical Data Analysis of Multivariate Observations, John Wiley & Sons, New York. 3. Harman, H. H. (1967) Modern Factor Analysis, 2nd ed., University of Chicago Press, Chicago. 4. Joreskog, K. G. (1967), "Some Contributions to Maximum Likelihood Factor Analy- sis". Psychometrika, Vol. 32, p 443-482. 5. Martin, K. (1978) 9845B Numerical Analysis Library, Vol. 1., Hewlett-Packard Part No. 09845-10351. 6. Morrison, D. F. (1976) Multivariate Statistical Methods, 2nd ed., McGraw-Hill Book Company, New York. 7. Vecchia, D. F. Unpublished Notes for 9830A Factor Analysis. 8. Cooley, William W. and Lohnes, Paul R. (1971) Multivariate Data Analysis, John Wiley and Sons, Inc., New York. 9. Guertin, Wilson H. and Bailey, John P., J. (1970), Introduction to Modern Factor Analysis, Edwards Brothers, Inc., 1970, Ann Arbor. 10. Horst, Paul, (1965) Factor Analysis of Data Matrices, Holt, Rinehart and Winston, Inc., New York. 11. Morrison, Donald A. (1965) Multivariate Statistical Methods, Holt, Rinehart and Win- ston, Inc., New York. 12. Comrey, Andrew L. (1973) A First Course in Factor Analysis, Academic Press, New York. 13. Enslein, Kurt (Ralston, A. & Wilf, H. eds.) Statistical Methods for Digital Computers, Volume 4, John Wiley and Sons, Inc., New York. 318 Examples Sample Problem #1 This example uses a simple artificial data set which is given below. The raw data was entered in keyboard mode. The principal component analysis was performed. Notice the "% of total variance" row corresponds to random data. Component plots of component 1 vs. compo- nent 2 and component 1 vs. component 3 were generated. Component scores were output and a plot of component scores was made, again for the same pairs of components. Factor analysis by the principal axes method was done. Communalities were found by iteration. The iterations are not output on the printer but do appear on the CRT. The number of factors chosen to explain the variation was 3 in this example. Factor rotation plots were made for factor 1 vs. factor 2 and factor 1 vs. factor 3. An orthogonal varimax rotation was performed. The contribution of factors, % of total variance, and factor plots were output. Factor scores were also output. se No. Xi X 2 X 3 X4 X= 1 7 9 6 5 2 2 5 5 4 6 2 3 1 2 3 4 5 4 1 6 5 2 3 5 4 6 5 2 5 6 7 9 6 6 5 7 6 5 3 2 1 8 9 8 6 5 3 9 4 6 5 2 1 10 6 5 4 3 5 11 3 2 1 6 5 12 5 6 5 2 3 13 6 5 4 5 4 14 1 6 5 8 9 15 9 8 9 6 5 16 7 3 1 9 5 17 1 5 9 3 7 18 3 5 7 9 19 6 2 4 8 6 20 4 6 4 2 8 319 * DATA MANIPULATION * A A W W 4 W W W 4 W WW ^ W A ^ W ^ W 4 ^ *t 4 4 4 4 W 4 W 4 4 4 4 W W W W W W W 4 W W W Jf W 4 W W 4 W 4 ♦ W ^fc" 4 4 ^t W ^W^t ifr w W 4 4 it i W 4 ^ W W W W ^t W 4 ^ J|» ^ ^ *T* t* * *■* * *r^ *l* ™ ^ ^ ^ ™ * ^ ^ ^ ^ ™ ^ ^ ^ ^^^ ^ ^^^ ^ ^ ♦ ^ ^ ^ ^ ^ ^ ™ ^ ^r^ ^ ^ ^ ^ * ^ ^ ^ ^ * ^ ^ * ^ ^ ^ * * ^^ * ^ ^ ^ ^ ^ ^ ^ ^ ^ * ^ ^ ^ ^ Enter DATA TYPE (Press CONTINUE for RAW DATA) ■■ i Mode number = ? Is data stored on program's scratch file (DATA)? NO Data file name = ? PFACSMPBi -INTERNAL Was data stored by the BS&DM system ? YES Is data medium placed in device INTERNAL ? YES Is program medium placed in correct device ? YES Raw data On mass storage SAMPLE PROBLEM #i Data file nane: PFACSMPBi : INTERNAL Data type is: Raw data Number of observations: 20 Number of variables: 5 Variable names = i. Xi 2. X2 3. X3 4. X4 5. X5 Subfiles: NONE SELECT ANY KEY Option number = ? i Enter method for listing data: 3 Press special function key labeled-LIST List all data SAMPLE PROBLEM #i Data type is: Raw data Variable # i (Xi ) Variable # 2 <X2 ) Variable # 3 (X3 ) Variable # 4 (X4 ) Variable # 5 (X5 ) OBS# i 2 7.00000 5.00000 9.00000 5. 00000 6.00000 4.00000 5.00000 6.00000 2.00000 2.00000 3 4 5 6 i. 00000 1.00000 4.00000 7.00000 2.00000 6.00000 6.00000 9.00000 3.00000 5.00000 5.00000 6.00000 4.00000 2.00000 2.00000 6.00000 5.00000 3.00000 5.00000 5.00000 320 7 6.00000 5.00000 3.00000 2.00000 1.00000 8 9.00000 8.00000 6.00000 5. 00000 3.00000 9 4.00000 6.00000 5.00000 2.00000 1.00000 10 6.00000 S. 00000 4.00000 3.00000 5.00000 11 3.00000 2.00000 1.00000 6.00000 5.00000 12 5.00000 6.00000 5.00000 2.00000 3.00000 13 6.00000 5.00000 4.00000 5. 00000 4.00000 14 1.00000 6.00000 5.00000 8.00000 9.00000 15 9.00000 8.00000 9.00000 6.00000 5.00000 16 7.00000 3.00000 1.00000 9.00000 5.00000 17 1.00000 5.00000 9.00000 3.00000 7.00000 18 3.00000 5.00000 0.00000 7.00000 9.00000 19 6.00000 2.00000 4.00000 8.00000 6.00000 20 4.00000 6.00000 4.00000 2.00000 8.00000 Option number = SELECT ANY KEY Exit list procedure Select special function key labeled-ADV STAT Remove BSDM media Insert Principal Components & Factor Analysis media Use all the variables in the analysis (YES/NO) ? YES Is the above information correct ? YES PRINCIPAL COMPONENTS AND FACTOR ANALYSIS SAMPLE PROBLEM *1 where variables to be used are ■■ 1. XI 2. X2 3. X3 4. X4 5. XS CORRELATION MATRIX XI X2 X3 X4 X2 X3 X4 X5 4204206 .17S3833 .2259743 -.3753400 .6175669 -.2043786 -.2005056 -.2764709 -.1251464 .3879237 Do you want to store the correlation Matrix ? NO Enter number of desired funtion: 2 Press "CONTINUE^ when ready. We could store the correlation matrix for later use, if we wished. Select principal component analysis * PRINCIPAL COMPONENT ANALYSIS * J^ ^ J^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^ ^ ™ * ^ ™ ^ ^^ ^ ^ t ^ ^ *r ^ ^ Enter the option for components output <i ,2, 3,or 41 Output all principal components COMPONENT MATRIX 321 Variable Na 1. XI 2. X2 3. X3 4. X4 5. X5 «e 1 .383267 .574271 .513971 -.305216 -.407427 COMPONENT 2 .637731 . 138684 -.090709 .741708 .125325 3 -.297991 .330269 .507831 .133991 .725451 4 -.092255 -.584914 .673708 . 322234 -.302733 5 .590843 -.446965 .125823 -.484690 .447628 Eigenvalue 2.084182 1.255467 1.046971 .363811 .249569 X of total variance 41.68365 25.10934 20.93941 7.27622 4.99139 Cunulative variance X 41.68365 66.79298 87.73240 95.00861 100.00000 Do you wish to plot the principal components ? YES Plot on CRT ? NO Plotter identifier string (press CONT if 'HPGL')? Enter select code, HPIB bus (defaults are 7,5)? A beep will signify the end of the plot. Which pen nu«ber should be used ? 1 Note: First 3 principal components have Eigen values bigger than 1 .0. 322 Enter the pair of component numbers which will be used in this plot ? SAMPLE PROBLEM #1 Component Plot w o o 1.0 .8 .6 .4 .2 0.0 -.2 -.4 -.G -.8 -1.0 H 1 1 h H 1 H 03 00 CO *- (VI 03 <\l r r oa (O CD 63 ~ I I COMPONENT 1 323 Plot for another two factors ? YES Which pen nunber should be used ? i Enter the pair of conponent nunbers which will be used in this plot ? i,3 SRMPLE PROBLEM #1 Component Plot 1.0 .8 .6 .4 .2 0.0 I -.2 -.4 -.6 -.8 m 0. i -1.0 H H H 1 1 1 8Qco*-cuscu«-<0a>8 ^ I* |* i* f s * * * J i COMPONENT 1 324 Plot for another two factors ? YES Which pen nuMber should be used ? 1 Enter the pair of cociponent nuMbers which will be used in this plot ? 2,3 SRMPLE PROBLEM #1 Component Plot 1.0 .8 .6 .4 .2 0.0 -.2 -.4 -.6 -.8 -1.0 m o o H 1 1 h H 1 1 1 1 SB(0«-N8CU*-<0<D8 ■ •••••••••• ~. | | | | s — I COMPONENT 2 Plot for another two factors ? NO Enter the option nuMber <i,2,or 3)= i Select component scores COMPONENT SCORES COMPONENT Observation * i 2 3 4 5 i 2.07540 .71235 -.15044 -.23088 -.72271 2 .09139 .34176 -.93465 .51003 -.65276 3 -1.81738 -i. 30509 -.35682 .53949 -.01545 4 . 33929 -1.86345 -.00753 -.01155 •-.72163 325 b 6 7 8 9 10 ii 12 13 14 15 16 17 18 19 20 .44941 -1 .00182 .25200 - .37656 . 35664 1 . 42788 1 .19038 .82627 - .47569 -.36426 .71513 - .69652 -1 .81193 - .24860 .17100 1 .93132 1 .20276 - .23760 - .15167 .14705 1 .13760 -1 .21350 - .97337 .13479 -.39946 .12078 _ .20532 - .30636 _ .32603 .77361 2 .22775 - .08321 - .92196 . 153S7 -.07616 .94491 - . 85573 - .47841 - . 1S733 .21200 .03008 .38027 - .49735 .07921 .16732 1 .48126 . 36963 2 .17657 .05362 -.83926 2 .13151 1 .50862 1 .10035 .61701 .48187 1 .74141 1 . 94865 -1 .06175 .14396 .01767 .14576 -i .55787 2 .00757 1 .07660 .26029 2 .44801 .68660 .61275 -1 .35411 -.22557 1 .53268 1 .24477 - .18584 1 .07945 .56124 - .29196 - .80330 .94850 -1 .05530 .86858 Do you wish to plot the case scores ? YES Plot on CRT ? NO Plotter identifier string <press CONT if 'HPGL')? Enter sselect code, HPIB bus <defaults are 7,5)? A beep will signify the end of the plot. Which pen number should be used ? 1 326 Enter the pair of cowponent nuwbers which will be used in this plot ? i>2 3.8 2.4 i.e 1.2 .6 SRMPLE PROBLEM #1 Component Scores Plot (V ui z o Q. X. o o B.a I hr-H 1 h -.6 -1.8 -2.4 -3.0 H 1 1 1 1 StDNUSlONIItS <oc\i»- , »- , i(B •-• — « cu m i I i i COMPONENT 1 327 Plot for another two factors ? YES Which pen nuober should be used ? i Enter the pair of coMponent numbers which will be used in this plot 1 i,3 SAMPLE PROBLEM *1 Component Scores Plot en o Q. £ O O 3.8 2.4 1.8 1.2 .6 * 8.0 I 1 h -.6 X X -1.2 -1.8 -2.4 -3.8 -"H 1 1 1 1 X <D(\l(0S(0(Via>*'(9 S — — (UP) (!) (VI — — I I I I I COMPONENT 1 328 Plot for another two factors ? YES Which pen number should be used ? i Enter the pair of cociponent numbers which will be used in this plot ? 9 1 3.0 2.4 i.e 1.2 .6 0.0 -.6 -1.2 -1.8 -2.4 -3.0 SAMPLE PROBLEM #1 Component Scores Plot I h H h H 1 1 1 1 S«-<D<U<08C0(U<D<T8 ........... (i) (V .-> •* I 8 •< »« <U CO 1 I I I COMPONENT 2 Plot for another two factors ? NO Store the principal cociponent case scores ? NO Enter nunber of desired funtion: 3 Max. * of factors to be extracted <<= 15) ; 3 Select factor analysis We must specify how many factors we want to use. From the principal component analy- sis it appears that three might be correct. 329 ^ ^ J^ ^ ^ ^ ^ ^ ^ ^ ^ ^^ ^ * * ^ * * * ^ * ^ ^ ^ ^^ ^ ^ ^ ^ ^^ ^ ^ V * T* T^ *^*l* *r *l* * FACTOR ANALYSIS BY PRINCIPAL AXES METHOD * A maximum of 3 factors will be extracted. Enter Commonality Estimate type <i,2,3,or 4) = 2 Squared multiple correlation used on the di- agonal of the correlation matrix as the ini- tial estimates. COMMUNALITY ESTIMATION Squared Multiple Correlation has been used to compute the commonality estimates. Initial Estimated Commonalities of Variables : Variable Commonality 1. XI 2. X2 3. X3 4. X4 5. X5 .47407 .50461 .40850 .42089 .39380 Starting values Do you wish to specify a win. eigenvalue for factor inclusion ? NO Do you want to refine the commonality estimates using iteration ? YES Enter the maximum * of iterations <default=25> 5 Max. number of iterations for factor extraction = 5 Commonalities of Variables after 5 iterations Variable Communality 1. XI 2. X2 3. X3 4. X4 5. X5 .74634 .72370 . 57824 .67900 .63413 Final estimates UNROTATED FACTOR MATRIX Variable Name 1. XI 2. X2 3. X3 4. X4 5. XS 1 .540204 .784661 .644004 -.386153 -.522787 2 .628415 .093539 -.120257 .713522 .114566 3 -.244171 .315046 .386055 .144134 .589658 Contribution of factor 1.74468 .94036 . 67638 "/. of total Variance Extracted 34.89350 18.80713 13.52766 330 Do you wish to perforM any factor rotations ? YES * FACTOR ROTATION * Do you wish to plot the original factors ? YtS Plot on CRT ? NO Plotter identifier string (press CONT if 'HPGL')? Enter the select code, HP bus (defaults are 7,5)? Which PEN nutiber should be used? i The pair of factor numbers used in this plot =? A beep will signify the end of the plot. SRMPLE PROBLEM #1 UNROTRTED Factor Plot O I- K u. 1.0 .8 .6 .4 .2 0.0 -.2 -.4 -.8 -.8 -1.0 3 H H -\ \- 3 8 CD (0 * C\i 8 • • • • ■ • v4 1 1 1 1 1 8 CM CO FRCTOR 1 331 Plot for another two factors ? YES Which PEN number should be used? i The pair of factor numbers used in this plot =? i,3 A beep will signify the end of the plot. SRMPLE PROBLEM #1 UNROTRTED Factor Plot s i.e .8 .6 .4 .2 0.0 h -.2 -.4 -.6 -.8 -1.0 CO 4 H h — i i (u eg i* s (VI CO CO FRCTOR 1 332 Plot for another two factors ? YES Which PEN nurtber should be used? 1 The pair of factor numbers used in this plot =? 2,3 A beep will signify the end of the plot. SRMPLE PROBLEM *1 UNROTRTED Factor Plot <n s I- 1.0 .6 .6 .4 .2 0.B h -.2 -.4 -.6 -.8 -1.0 ■■ 5 H 1 h -I 1 1 1 1 s OD CO *• (U s • • • ■ • 1 1 1 1 1 s w (0 CD FACTOR 2 Plot for another two factors ? NO Enter the type of rotation <i or 2) = 1 Enter the Method of orthogonal rotationd or 2) 1 Orthogonal rotation Choose varimax method FACTOR MATRIX 333 ORTHOGONAL VARIMAX ROTATION Variable Nana 1. XI 2. X2 3. X3 4. X4 5. X5 1 .218231 .796148 .747073 -.244315 -.026678 FACTOR 2 3 .041559 -.834861 -.099285 -.282820 -.139647 -.024948 .738402 -.272169 .656311 .450191 Contribution of factor 1.30000 1.00707 1.05435 X of total Variance Extracted 25.99992 20.14135 21.08702 Note by the factor coefficients that factor 1 seems to be a weighted average of X2 and X3 ; factor 2 is a weighted average of X4 and X5, while factor 3 seems to be essentially X1 (and maybe X5). Do you wish to plot the rotated factors ? YES Plot on CRT ? NO Plotter identifier string <press CONT if 'HPGL')? Enter the select code, HP bus (defaults are 7,5)? Which PEN nuMber should be used? 1 334 The pair of factor nunbers used in this plot =? 1,2 A beep will signify the end of the plot. SRMPLE PROBLEM #1 VRRIMRX ROTRTED Factor Plot <u K O »- o (E U. 1.8 .8 4 .6 .4 .2 0.0 I 1 1 1 h -.2 -.4 -.6 -.8 -1.0 I CD CO I* ■+— I 1 1 CU (9 CVI f IS CO CD FRCTOR I 335 Plot for another two factors ? YES Which PEN nunber should be used? 1 The pair of factor nunbers used in this plot =? 1,3 A beep will signify the end of the plot. SRMPLE PROBLEM *1 VRRIMRX ROTATED Factor Plot m o U u. i.e .8 .6 .4 .2 8.0 I 1 1 h -.2 -.4 -.6 -.8 -1.8 CD CO I H 1— y+- cu i" eg s CU CD FRCTOR I 336 Plot for another two factors ? YES Which PEN number should be used? i The pair of factor nuMbers used in this plot =? A beep will signify the end of the plot. SRMPLE PROBLEM #1 VflRIMRX ROTATED Factor Plot m at o i- o cc 1.0 .6 .6 .4 .2 0.0 I 1 1 1 b -.2 -.4 -.6 -.8 -1.0 H 1 1 1 eg CD CO *• OJ ea cu • • « ■ • • • T 1 1 1 1 ea CO CD FACTOR 2 Plot for another two factors 1 NO Enter the option nunber <i,2,or 3)= i Print out factor scores FACTOR SCORE COEFFICIENTS 337 FACTOR MATRIX FACTOR Variable Nane i 2 3 1. Xi -.014160 .060858 .682742 2. X2 .576544 .074114 .043713 3. X3 .392323 .018432 .099292 4. X4 -.078039 .558876 .207201 S. X5 .162978 .479519 .277970 FACTOR SCORES FACTOR Observation # 1 2 3 1 1.03930 -.25987 -.95596 2 -.43066 -.22543 -.50906 3 -1.13434 -.30973 1.11956 4 .24275 -1.03780 1.06651 5 .36361 -.56069 .49214 6 1.21218 .58816 -.69300 7 -.54262 -1.37420 -.58291 8 .82101 -.04477 -1.35708 9 .08832 -1.37066 .02261 10 -.12901 -.31560 -.15906 11 -1.55654 .20332 .31475 12 .22038 -.94163 -.01234 13 -.26S01 -.03697 -.45482 14 .45414 1.62051 1.23567 15 1.44080 .62500 -1.08097 16 -1.40375 1.05664 -1.05258 17 .89618 .00956 1.64180 18 - . 65896 1.35218 .58881 19 -1.05594 .98328 -.42485 20 .39817 .03870 .80077 Do you wish to plot the factor scores ? YES Plot on CRT ? NO Plotter identifier string (press CONT if 'HPGL')? Enter the select code, HP bos (defaults are 7,5)? Which PEN nunber should be used? 1 338 The pair of factor nunbers used in this plot =? 1,2 A beep will signify the end of the plot. SAMPLE PROBLEM *1 VRRIMRX ROTATED Factor Scores Plot s I- u. 3.0 2.4 1.8 X 1.2 " X X .6 X » X M X X 1 T 1 1 1 1 X -.6 . X -1.2 X X X X -1.8 -2.4 -3.B OIUIOSlOAIIBtS W (U -* —• I I I I I FRCTOR i Plot for another two factors ? NO Do you wish to store the factor scores ? YES Enter a title for the new data set : FACTOR SCORES How Many factor scores do you want to store ? i Natie of data file = SCORE INTERNAL Is data MediuM placed is device INTERNAL ? YES 339 PROGRAM NOW STORING FACTOR SCORES Is program Mediuii replaced in deuicelNTERNAL ? YES *** The i factor analysis scores were stored in SCORE = INTERNAL *** Do you wish to perform another rotation ? NO Enter nuwber of desired funtion: 4 Return to BSDM 340 Sample Problem #2 The correlation matrix for a set of six fowl bone measurements of White Leghorn Fowl are considered. The correlation matrix is the subject of Example 7.5, page 243 of Morrison (see reference 11). The six measurements are: Xi = Skull length X 2 = Skull breadth X3 = Humerus X 4 = Ulna Xs = Femur Xb = Tibia Extraction of the principal components for the matrix reveals that 76% of the variance is explained by the first component and 88% by the first two components together. Thus, if one were interested in data reduction, it may be practical to use only the first two compo- nents (or factors). This particular example permits an easy interpretation of the factors or components. For example, the first factor may be interpreted as a general average dimension of all bones, with the wing and leg bones receiving slightly higher loadings. Further explanation of the components may be obtained in Morrision (11). The data was input as a correlation matrix. A principal component analysis was done and it showed that two components accounted for over 88% of the total variance. Component plots were done for component 1 vs. component 2, component 1 vs. component 3, and component 2 vs. component 3. Factor analysis by the method of principal axes was done. Communalities were calculated. Three factors were used in the factor analysis. The first two factors accounted for over 80% of the total variance. A factor plot was done for factor 1 vs. factor 2. Then an orthogonal varimax rotation was performed. The result of the rotation and a new factor plot was output. ^ ^ * ^ * * * t* ^ ^ * T ^ * ^ * * ^ ^ ^ ^ ^ ^ t V ^ * ^ ^ ^ ^ T ^ ^ ^ ^ T* V ^ V ^ ^ ^ ^ ^ ^ t* V *P *p t* ^ * ^ * ^ * * 'n *P 'r- ^ ^ ^ ^ ™ * * ^ * ™ * * * * * * * * * * DATA MANIPULATION * Enter DATA TYPE <Press CONTINUE for RAW DATA): 3 This data was stored as a correlation matrix. Mode nunber = ? 2 Is data stored on program's scratch file (DATA)? NO Data file napie = ? BONELNGTH: INTERNAL' Was data stored by the BS&DM systew ? YES Is data ciediuM placed in device INTERNAL ? YES Is prograM MediuM placed in correct device ? YES 341 BONE LENGTHS OF WHITE LEGHORN FOWL (MORRISON P M3> Data file na«e: BONELNGTH: INTERNAL Data type is: Correlation Matrix Number of observations : 6 NuMber of variables: 6 V. aria ble nanes; i. SKULL LGTH 2. SKULL BOTH 3. HUMERUS 4. ULNA 5. FEMUR 6. TIBIA Si jbf i les: NONE SELECT ANY KEY Press special function key labeled-LIST BONE LENGTHS OF WHITE LEGHORN FOWL (MORRISON P. 243) Data type is: Correlation Matrix Variable # 1 Variable # 2 Variable * 3 Variable ♦ 4 Variable # 5 (SKULL LGTH) (SKULL BDTH) (HUMERUS ) (ULNA ) (FEMUR ) VAR# i 1 .00000 .58400 .61500 .60100 .57000 2 .58400 1.00000 .57600 .53000 .52600 3 .61500 .57600 1.00000 .94000 .87500 4 .60100 .53000 .94000 1.00000 .87700 5 .57000 .52600 .87500 .87700 1.00000 6 .60000 .55500 .87800 .88600 .92400 Variable # 6 (TIBIA ) VAR# 1 .60000 2 .55500 3 .87800 4 .88600 5 .92400 6 1.00000 SELECT ANY KEY Use all the variables in the analysis (YES/NO) ? YES Is the above inforwation correct ? YES Select special function key labeled-ADV STAT Remove BSDM media Insert Principal Components & Factor Analysis media 342 PRINCIPAL COMPONENTS AND FACTOR ANALYSIS BONE LENGTHS OF WHITE LEGHORN FOUL <MORRISON P. 243) where variables to be used are i. SKULL LGTH 2. SKULL BOTH 3. HUMERUS 4. ULNA 5. FEMUR 6. TIBIA CORRELATION MATRIX SKULL LGTH SKULL BDTH HUMERUS ULNA FEMUR SKULL BDTH .5840000 HUMERUS .6150000 .5760000 ULNA 6010000 .5300000 .9400000 FEMUR S700000 .5260000 .8750000 .8770000 TIBIA .6000000 .5550000 .8780000 .8860000 .9240000 Do you want to store the correlation Matrix ? NO Enter number of desired funtion; 2 Press ^CONTINUE" when ready. Select principal component analysis ******************************** * PRINCIPAL COMPONENT ANALYSIS *' ******************************** Enter the option for conponents ou tput ( 1 ,2,3, or 4) 1 Output all the principal components COMPONENT MATRIX Variable Naeie 1 . SKULL LGTH 2. SKULL BDTH 3. HUMERUS 4. ULNA 5. FEMUR 6. TIBIA 1 ,347463 .326404 .443411 .439972 .434532 .440140 COMPONENT 2 .536959 .696453 .187321 .251402 .278188 .225718 3 . 766673 636305 .040071 011196 .059205 .045735 4 .049099 ,0 02033 .524079 .488769 .514259 .468582 5 .027212 .008031 .168550 .151309 .669453 .706912 6 .0 02378 ,058829 .680900 .693763 . 132887 .184237 Eigenvalue 7. of total variance 4.567571 .714123 76.12618 11.90205 .412129 6.86882 173189 2.88648 .075859 1.26431 057129 .95216 Cumulative 7. variance 76.12618 88.02823 94.89705 97.78353 99.04784 100.00000 343 Do you wish to plot the principal cowponents ? YES Plot on CRT ? NO Plotter identifier string (press CONT if 'HPGL')? Enter select code, HPIB bus (defaults are 7,5)? A beep will signify the end of the plot. Which pen nunber should be used ? i Enter the pair of component numbers which will be used in this plot ? 1,2 BONE LENGTHS OF WHITE LEGHORN FOWL Component Plot K o 0. 5 o i.e .8 .6 .4 .2 0.0 H -.2 -.4 -.6 -.8 -1.0 H h H f- (9 CD (0 *■ M 8 (VI ■ ■ ■ • ■ • • 1 i 1 1 1 s CO 09 COMPONENT 1 344 Plot for another two factors ? YES Which pen nuciber should be used ? i Enter the pair of component nuwbers which will be used in this plot ? i , 3 m o o. £ O o BONE LENGTHS OF WHITE LEGHORN FOWL Component Plot 1.0 .8 .6 .4 .2 e.e h -.2 -.4 -.6 -.8 -1.8 H 1 1 h H 1 (90D(DtNfiMt(DO(S • «•■•*«*■•* ~> I I I I s — I COMPONENT 1 345 Plot for another two factors ? YES Which pen nu fiber should be used ? 1 Enter the pair of conponent nunbers which will be used in this plot V 2,3 BONE LENGTHS OF WHITE LEGHORN FOWL Component Plot m 1.0 .8 .6 .4 .2 0.0 h -.2 -.4 -.6 -.8 -1.0 -\ h H 1 1 1 s 00 CO *■ (U ta • ■ • • ■ • ^4 1 1 1 1 1 s (U CO COMPONENT 2 Plot for another two factors ? NO Enter nuMber of desired funtiom 3 Method for extracting factorsd OR 2) i Max. # of factors to be extracted (<= 15) 3 Select factor analysis Use principal axes method 346 ******************************************** * FACTOR ANALYSIS BY PRINCIPAL. AXES METHOD * ******************************************** A maximum of 3 factors will be extracted. Enter CoMMunality EstiMate type (1,2,3, or 4) Squared multiple correlation COMMUNALITY ESTIMATION Squared Multiple Correlation has been used to coMpute the coMMunality estiMates. Initial EstiMated CoMMunalities of Variables : Variable CoMMunality i. SKULL LGTH 2. SKULL BDTH 3. HUMERUS 4. ULNA 5. FEMUR 6. TIBIA .46814 .42741 .90169 .90232 .87345 .88329 Do you wish to specify a Min. eigenvalue for factor inclusion ? NO Do you want to refine the coMMunality estiMates using iteration ? YES Enter the naxinun # of iterations <default=25) S Max. nuMber of iterations for factor extraction = 5 CoMMunalities of Variables after 5 iterations Variable i. SKULL LGTH 2. SKULL BDTH 3. HUMERUS 4. ULNA 5. FEMUR 6. TIBIA CoMMunali ty .60294 .56058 .93835 .94385 .91719 .93088 UNROTATED FACTOR MATRIX Variable NaMe 1. SKULL LGTH 2. SKULL BDTH 3. HUMERUS 4. ULNA 5. FEMUR 6. TIBIA Contribution of factor X of total Variance Extracted .684976 .636078 .951391 .945555 .928596 .942826 4.42422 73.73696 ■.365703 -.393993 .081564 .150044 .176294 .125079 .36486 6.08099 FACTOR 3 .003721 -.027403 .162951 .165112 -.154345 -.162222 10472 1 .74530 347 Do you wish to perforw any factor rotations ? YES W ^ 4 ^lr ^w \V w \V w w st 4 4 st 4 SU \V s^ * FACTOR ROTATION * Do you wish to plot the original factors ? YES Plot on CRT ? NO Plotter identifier string (press CONT if 'HPGL')? Enter the select code, HP bus (defaults are 7,5)? Which PEN nuMber should be used? i The pair of factor numbers used in this plot =? i>2 A beep will signify the end of the plot. BONE LENGTHS OF HHITE LEGHORN FOHL UNROTHTED Factor Plot § i- o 1.0 .8 .6 .4 .2 8.0 h -.2 -.4 -.6 -.8 -1.8 H h -I h H 1 Note that factors lie on top of each other. (See factor matrix) s CD CD <■ cu 6) ■ • • • • ■ *>4 1 1 1 1 I 8 (XI CD FACTOR 1 348 Plot for another two factors ? YES Which PEN nunber should be used? i The pair of factor nunbers used in this plot =? 1,3 A beep will signify the end of the plot. BONE LENGTHS OF HHITE LEGHORN FOWL UNROTHTED Factor Plot m s IE U. 1.8 .8 .6 .4 .2 0.0 I 1 h -.2 -.4 -.6 -.8 -1.0 -\ 1 lo<— I 1 Note that factors lie on top of each other. (See factor matrix) S o o ~ l I* cu O f S cu CD CD FACTOR 1 349 Plot for another two factors ? YES Which PEN nuctber should be used? i The pair of factor numbers used in this plot -^ 2,3 A beep will signify the end of the plot. BONE LENGTHS OF HHITE LEGHORN FOWL UNROTHTED Factor Plot m § i- o U. 1.8 .8 .6 .4 .2 8.8 f- -.2 -.4 -.6 -.8 -1.8 H h I 34 H h H 1 65 S CD (0 (VI f S s w (O FACTOR 2 Plot for another two factors ? NO Enter the type of rotation (1 or 2) = 1 Enter the Method of orthogonal rotationd or 2) i 350 FACTOR MATRIX ORTHOGONAL VARIMAX ROTATION Variable Na«e i . SKULL LGTH 2. SKULL BDTH 3. HUMERUS 4 ULNA 5. FEMUR 6 TIBIA Contribution of factor X of total Variance Extracted FACTOR 1 2 3 .351827 -.689172 . 064838 .298532 -.686028 .028647 .809812 -.465665 .256342 .843788 -.405943 .259001 .873357 -.388363 .060132 .856571 -.438891 .067387 3.07714 1.67068 .14597 51.28572 27.84462 2.43291 Do you wish to plot the rotated factors ? YES Plot on CRT ? NO Plotter identifier string (press CONT if 'HPGL')? Enter the select code, HP bus (defaults are 7,5)? Which PEN nuMber should be used? 1 The pair of factor nunbers used in this plot =? i,2 A beep will signify the end of the plot. BONE LENGTHS OF HHITE LEGHORN FOHL VRRIMRX ROTATED Factor Plot 351 s H O 1.0 .8 .6 .4 .2 0.8 1 1 h -.2 -.4 -.8 -.8 -1.0 I (0 H 1 1 1 1 21 I I f s s (U .f Note that factors lie on top of each other. (See factor matrix) to FACTOR 1 352 Plot for another two factors ? YES Which PEN nuMber should be used? i The pair of factor nunbers used in this plot =? i,3 A beep will signify the end of the plot. BONE LENGTHS OF WHITE LEGHORN FOWL VRRIMRX ROTATED Factor Plot m 8 8 1.0 .8 .6 .4 .2 0.0 -.2 -.4 -.6 -.8 -1.0 H h Note that factors lie on top of each other. (See factor matrix) ........... — I I I I 8 — FACTOR 1 353 Plot for another two factors ? YES Which PEN nuMber should be used? i The pair of factor numbers used in this plot =? 2,3 A beep will signify the end of the plot. BONE LENGTHS OF WHITE LEGHORN FOWL VRRIMflX ROTATED Factor Plot m s l.a .8 .6 .4 .2 0.0 -.2 -.4 -.6 -.8 -1.0 34 H 1 h 65 H 1 h 8a>a><rNSfti«-(0CDS • ••••■••••• — I I I I s - FACTOR 2 Plot for another two factors ? NO Do you wish to perforM another rotation ? NO Enter nunber of desired funtion: 4 Return to BSDM 354 Notes 355 Monte Carlo Simulations General Information Description The programs in this software package are meant primarily as a library of utility routines to be combined with the user's own programs. Hence, each routine is set up as an independent, modular unit with a standard of input and output parameters. These subprograms contain no actual inputs or outputs, with the exception of error messages. With each routine, the package provides a general-purpose front-end driver. In some cases, such as the Spectral and Run tests, the driver plus the routine make sense as a stand-alone unit. In other cases, such as the various random number deviates, the drivers are simply meant to introduce the user to the subprogram itself. The software package does not establish the printers or the mass storage devices. It is the user's responsibility to select the printer and mass storage device before using any of these routines. The 9826/36 operating system includes a random number generator, RND. General Instructions How Do I Load A Stand Alone Program? 1. Insert the program disc into the computer. 2. None of the drivers ask for the desired printer or mass storage device. This must be set by the user from the keyboard. 3. Type: LOAD "File name", 10 Press: EXECUTE. 4. At this point, appropriate inputs are requested, computations are performed, and the results are printed or saved on a mass storage device. 356 How Do I Add One Of The Utility Subprograms Onto My Program? Each program file has a driver and then one or more subprograms. If you want to incorporate just one of these subprograms into your routine, how do you do it? The entire file needs to be loaded into memory first, and then the particular subprogram needs to be saved in a temporary file. Finally, after you have written your own code, you can link the temporary file containing the desired subprogram on after your code. 1. Insert the program cartridge or disc into the computer. 2. Type: LOAD "File name" Press: EXECUTE 3. After the program has been loaded, Type: EDIT Press: EXECUTE 4. At this point, the screen looks as follows: 10 Beginning of driver proqra«. 20 Driver program END 100 SUB Sub.._to_be...linked SUBEND 5. If subprogram Sub_to_be_linked is the one desired and it goes from line 100 to line 500, then Type: SAVE "TEMP", 100,500 Press: EXECUTE. 6. Type: SCRATCH A Press: EXECUTE. 7. After you enter your program into memory, for this example assume that the last line of your code is line 2500. Then Type: GET "TEMP", 2510 Press: EXECUTE. 8. The desired subprogram is then linked on behind your routine. 357 Special Considerations 1. All the programs in this package have been set up using the random number generator RND. This may be replaced by the super random generator contained in RSUPER. 2. You now have two different random number generators at your disposal. RND: a randomly generated generator. (See the section further on in General Information for more details. ) RSUPER: a combination generator. (See "RSUPER" for further details. ) It is strongly suggested that any serious Monte Carlo simulation should be run with both of these generators. 3. This package is meant to provide a set of subprogram utilities which you can combine to meet your particular needs. Each utility may be viewed as an independent modular unit. This allows you to combine these building blocks into your own program. 4. In order to get a feel for how each utility works and, in the case of the various generators, how much confidence you can place in them, driver routines have been provided. So, it is suggested that you first use these driver programs as is, and then later adapt them to your particular need. 5. In order to allow you the most flexibility, no references are made to printers or mass storage devices. Hence, to have a particular program run from a floppy disk in the internal disc drive and have all information printed on the CRT, you would type in the following before running your program: 1. a. Type: MASS STORAGE IS ^INTERNAL" b. Press: EXECUTE 2 a. Type: PRINTER IS 1 b. Press: EXECUTE 6. Each of the driver programs for the random deviates allows you to: 1. generate a set of random numbers to be printed or saved on a mass storage device, or 2. get a feeling for the quality of the generator by running through some randomly generated tests. 358 7. There may be occasions where you will not have enough memory to store all the random numbers you would like to have. A number of possible tricks are available to you: a. Presently all deviates are set up in full precision arrays. Can you store the deviates in an integer? Where a full precision array requires 8 bytes per number, an integer only requires 2. Care must be taken here to dimension your array using an INTEGER statement rather than a DIM. Also, the parameters in the SUB statement must be changed to INTEGER. b. Can you generate and use the random numbers in a partitioned fashion? For exam- ple, generate 1000 deviates, use them; generate 1000 more, use them; etc. c. If b is not possible, can you make use of your mass storage device to recall the deviates as you need them? For example: i. generate 1000 deviates; store them; generate 1000 more, store them; etc. ii. bring first 1000 deviates into memory; use them; bring them 1000 in, use them; etc. 8. Entering a value of 1 for the printer's select code automatically causes the program to skip over the question requesting the printer's bus address. 9. If you choose to check through some examples of random data sets produced by one of the generators, default values are supplied for the parameters. For example, you may see a prompt such as: # OF RANDOM DEVIATES IN EACH SET? 100 If the default number, 100, is acceptable to you simply press CONTINUE and 100 deviates will be generated in each set. If you wish to have a different number generated, edit the number in the response line before pressing CONTINUE. 10. If you store a set of random numbers produced by one of the generators, the data set may be read into a statistical data base created by Basic Statistics and Data Manipulation (BSDM) and then accessed by any other statistics routine. To access the data using BSDM, remember that the data was not stored by BSDM. Thus, you will need to supply a name for the data set, a variable name, number of observations, etc. 359 9826/36 Random Number Generator: RND This generator uses a standard "multiplicative congruential generator". In this generator, a starting value called the seed is multiplied by a positive integer constant, and the result is taken modulus M. X (l + 1) = A*XiModM The algorithm used in the RND has a starting seed of 37480660. This seed may be set by the program to any new value by using the RANDOMIZE statement. In this routine, the value A = 16 807, is used for the multiplier. The modulus M = 2 31 - 1. The exact steps used in the algorithm are presented below. The algorithm below is the one used to generate the next random number in a sequence from the previous one (i.e., the seed) using RND: 1. Multiply the current seed by 16 807. 2. Take the result of Step 1 Modulus M. 3. Save result of Step 2 as the new seed. 4. Convert the result of Step 2 to a number between and 1. (Divide by 2 31 -1). 5. Go to Step 1. References 1. Camp, Warren V. and Lewis, T.G., "Implementing a Pseudo-Random Number Gener- ator on a Minicomputer", IEEE Transactions on Software Engineering, May, 1977. 2. Knuth, Donald E. , The Art of Computer Programming, Volume 2: Seminumerical Algor- ithms, Addision-Wesley, Reading, Mass., 1969. 3. Learmonth, J. and Lewis, P.A.W., "Naval Postgraduate School Random Number Gener- ator Package LLRANDOM", Naval Postgraduate School, Monterey, Calif., 1973. 4. Learmonth, J. and Lewis, P.A.W., "Statistical Tests of Some Widely Used and Recently Proposed Uniform Random Number Generators", Naval Postgraduate School, Mon- terey, Calif., 1973. 5. MacLauren, M.D. and Marsaglia, G., "Uniform Random Number Generators", JACM 12, Jan. 1965, p. 83-89. 6. Marsaglia, G. and Bray, T.A., "One-line Random Number Generators and Their Use in Combinations", CACM, Vol. II, 1968, p. 757-759. 7. Musyck, E., "Search For a Perfect Generator of Random Numbers", Studiecentrum Voor Kernenergie, E. Plaskylaan 144, Brussels 4, Belgium, January, 1977. 8. Reddy, Y.V., "PL/I Process Generators", SIMULETTER, Vol. Ill, Oct. 1976, p. 25-29. 9. Wheeler, Robert E., "Random Variable Generators", SIMULETTER, Vol. Ill, Oct. 1976, p. 16-22. 360 Random Number Generators Object of Program Subprograms with optional drivers are provided to generate random deviates on some stan- dard statistical distributions. The subprograms have been set up as independent modules. Hence, it is quite simple to use these routines in your own programs. Choose values for the required input parameters, call the subprogram and the resulting outputs are returned to you. See the General Information section if this manual for detailed instructions. Optional drivers have also been set up for your use. In general, the drivers: i) allow you to directly generate a set of deviates to be printed or saved on a mass storage device; and ii) provide the ability to check out the particular generator through the use of some standard tests in order to get a feel for the quality of the deviates produced. Typical Program Flow Choose to check through examples Choose to consider a specific data set 1 Use default parameter values Enter parameters Numbers are generated and statistics on the deviates are printed. Print out the data set Store the data set 361 (RBETA) Random Numbers Generated from a Beta Distribution Description Given a Beta distribution with VI and V2 degrees of freedom, respectively, this subprogram generates a set of random deviates. The probability density function is: f(x) = [x | (Vl/2-l)][(l-x) t (V2/2-l)]/[B(Vl/2,V2/2)] for 0=sxssl, where B(*,*) is the beta function. File Name "RBETA" Calling Syntax CALL Random_beta (N,V1,V2,X(*) ) Input Parameters N number of deviates desired. VI, V2 degrees of freedom on the Beta distribution. Output Parameters X(*) array of dimension (1:N) containing the N deviates. Algorithm This routine generates deviates for the beta distribution with vl, v2 degrees of freedom. The method used is valid for both integer and non-integer vl and v2: 1. Generate uniform random deviates ul and u2. 2. Setyl=ul f (2/vl);y2 = u2 f (2/v2), repeating this process until finding yl +y2< = 1. 3. Thenx = yl/(yl+y2). Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2 Seminumerical Algor- ithms, Reading, Mass.: Addision-Wesley, 1969, p. 115. 362 (RBINOM) Random Integers Generated From a Binomial Distribution (T,P) Description Given that some event occurs with probability P and that we carry out T independent trials, this subprogram generates a set of integers with the binomial distribution (T,P). The probability density function is: f(x) = (I)[P t x][(l-P) t (T-x)] Forx = 0,l,...,T. File Name "RBINOM" Calling Syntax CALL Random-binomial (N,P,T,X(*) ) Input Parameters N number of deviates P probability of the event occurring. T number of independent trials. Output Parameters X(*) array of dimension (1:N) containing integers randomly generated for the number of occurrences. Algorithm Given T and P: 1. Set Sum = 0. 2. For 1 = 1 to T. 3. Generate a uniform random deviate U. 4. IfU<=PthenSum = Sum + l. 5. Next I. 6. The binomial deviate is equal to Sum. Reference 1. Reddy, Y.V., "PL/I Process Generators", SIMULETTER, Vol III, Oct. 1976, p. 25-26. 363 (RCHISQ) Random Numbers From a Chi-square Distribution Description Given the number of degrees of freedom and the number of deviates desired, this subprogram generates a set of random numbers with the Chi-square distribution. The probability density function is: f(x) =[.5 f (v/2)][x t (v/2-l)][exp( - .5x)]/[G(v/2)] for x > 0, where v is the degrees of freedom and G(*) is the gamma function. File Name "RCHISQ" Calling Syntax CALLRandom_chi_sq(N,V,X(*) ) Input Parameters N number of deviates desired. V degrees of freedom. Output Parameters X(*) array of dimension (1:N) containing the N deviates. Algorithm This utility generates random deviates for the Chi-square distribution with v degrees of freedom. For each deviate, if v = 2*k, where k is an integer set x = 2*(yl + y2 + ... +yk) where the y's are independent random variables with the exponential distribution, each with mean = 1. Ifv = 2*k + l, set x = 2*(yl + y2 + ... +yk) +z | 2 where the y's are as before, and z is a random variable independent of the y's, with the normal distribution (mean = , standard deviation = 1). Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2 Seminumerical Algor- ithms. Reading, Mass: Addison-Wesley, 1969, p. 115. 364 (REXPON) Random Numbers From an Exponential Distribution Description Given a mean, which you supply, this subprogram generates a set of exponential deviates. The probability density function is: f(x) = [exp( - x/|x)]/|x for x>0, where jul is the mean of the distribution = M(jl. File Name "REXPON" Calling Syntax CALL Random_expon (N,Mu,X(*) ) Input Parameters N number of deviates desired. Mu mean of the distribution. Output Parameters X(*) array of dimension (1:N) containing the N deviates. Algorithm This routine uses the random minimization method (due to George Marsaglia) to compute an exponentially distributed variable without using the logarithm subroutine. Although this routine takes slightly more space, it is much faster than the traditional algorithm. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2 Seminumerical Algo- rithms. Reading, Mass.: Addison-Wesley, 1969, p. 114. 365 (RF) Random Numbers Generated From an F-Distribution Description Given an F-distribution (variance-ratio distribution) with VI and V2 being the numerator and denominator degrees of freedom, respectively, this subprogram generates a set of correspond- ing random deviates. The probability density function is: [G(Vl/2 + V2/2)][(Vl/V2) | Vl/2][x j (Vl/2-1)] f W= G(Vl/2)G(V2/2)[(l + (Vl/V2)x) t (V1/2 + V2/2)] for x>0, VI and V2 positive integers. File Name 'RF' "DC" Calling Syntax CALLRandom_f(N,Vl,V2,X(*) ) Input Parameters N number of deviates desired. VI, V2 degrees of freedom on the F-distribution. Output Parameters X(*) array of dimension (1:N) containing the N random numbers. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2 Seminumerical Algo- rithms. Reading, Mass.: Addison-Wesley, 1969, p. 116. 366 (RGAMM1) Random Integers Generated From a Gamma (Alpha) Distribution Description This subprogram generates a set of Gamma (Alpha) deviates. The probability density function is: f(x) = [(x) t (Alpha - 1) )(exp(-x)]/G(Alpha) where Alpha>0 is the distribution parameter and G(*) is the gamma function. File Name "RGAMM1" Calling Syntax CALL Random_gammal (N,Alpha,X(*) ) Input Parameters N number of random numbers desired. Alpha Gamma parameter. Output Parameters X(*) array of dimension (1:N) containing numbers randomly generated with the given Gamma distribution. 367 (RGAMM2) Random Numbers Generated From a Gamma (A,B) Distribution Description This subprogram generates a set of Gamma (A,B) random deviates. The probability density function is: f(x) = [x t (B-l)][exp(-x/A)]/[G(B)A | B] for x, A and B>0, where G(*) is the gamma function. File Name "RGAMM2" Calling Syntax CALL Random_gamma2 (N,A,B,X(*) ) Input Parameters N number of random deviates desired. A,B Gamma parameters, B must be an integer. Output Parameters X(*) array of dimension (1:N) containing deviates randomly generated with the Gamma distribution. Algorithm 1. Given C mean = 2. The corresponding Gamma deviate is equal to the sum of the B exponential deviates. 1. Given Gamma parameters A and B, generate B independent exponential deviates with mean = A. 368 (RGEOM) Random Integers Generated From a Geometric Distribution Description Given that a certain event occurs with probability P, this subprogram generates N random integers with the appropriate Geometric distribution; that is, each random integer represents the number of individual trials needed until the given event first occurs (or between occurrences of the event). The probability density function is: f(x) = P(l-P) t (x-D for x= 1,2... . File Name "RGEOM'Calling Syntax Call Random_geom (N,P,Integer(*) ) Input Parameters N number of random integers desired. P probability of a given event occurring. Output Parameters Integer(*) array of dimension (1:N) containing integers randomly generated for the number of independent trials needed until the given event occurs. Algorithm The probability of the event first occurring on the Rth trial is P*(l-P) T (R-l). A convenient way to generate a variable with this distribution when P is small, is to set R = the least integer function of [ln(U)/ln(l -P)] where U is a uniformly generated random number. Reference 1. Knuth, Donald E. , The Art of Computer Programming, Volume 2, Seminumerical Algor- ithnms, Reading, Mass.: Addison-Wesley, p. 116. 369 (RLNORM) Random Lognormal Deviates Description This subprogram generates a set of random deviates such that the natural logarithm of the deviates follows a normal distribution with mean = Mu and standard deviation = Sigma. The probability density function is: f(x) = [exp(-.5[(lnx -Mu)/Sigma] t 2)]/[x((2*PI) f .5)*Sigma] File Name "RLNOiRM" Calling Syntax CALL RandomJognorm (N,Mu, Sigma, X(*)) Input Parameters N number of deviates desired. Mu mean of the associated normal distribution. Sigma standard deviation of the associated normal distribution. Output Parameters X(*) array of dimension (1:N) containing the N lognormal deviates. Algorithm 1. Let S = log[(Sigma | 2)/(Mu | 2) + 1]. 2. LetU = log(Mu) - 0.5*S. 3. Generate a normal deviate A, with mean = U and standard deviation = Square Root of (S). 4. Then the lognormal deviate is equal to exp (A). Reference 1. Reddy, Y.V., "PL/I Process Generators", SIMULETTER, Vol. Ill, Oct., 1976, p. 27. 370 (RNEGBI) Random Numbers Generated From a Negative Binomial Distribution Description This subprogram generates a set of Negative Binomial random deviates, that is, each random integer represents the number of trials needed until a given event occurs R times. The probabil- ity density function is: fM =(r: 1 i) < p T R)(d-P) T (x-R)) forOs=P=£l,andx = 1,2... . File Name "RNEGBI" Calling Syntax CALL Random_neg_bin (N,R,P,X(*) ) Input Parameters N number of random integers desired. R failure value. P probability. Algorithm 1. Given parameters R and P, generate R random geometric deviates with parameter P. 2. The corresponding Negative Binomial Deviate is equal to the sum of the R geometric deviates. Reference 1. Wheeler, R.E., "Random Variable Generators", SIMULETTER, Vol. IV, April, 1973, p. 22. 371 (RNORM) Normal Random Deviates With Mean = And Standard Deviation = 1 Description This subprogram calculates an even number of normally distributed variables with mean = and standard deviation = 1. The probability density function is: f(x) = [exp(-.5(x t 2))]/[(2*PI) f -5] File Name "RNORM" Calling Syntax CALL Random-normal (N,X(*) ) Input Parameters N number of normal deviates desired. N must be even. Output Parameters X(*) array of dimension (1:N) containing the N normal deviates. Algorithm This utility generates random deviates for the normal distribution with mean = and standard deviation = 1. An adapted form of the Polar Method is used. (See Reference 1.) Special Considerations 1. Due to the nature of the algorithm used, this routine generates an even number of normal deviates. If an odd number is requested, an error message is printed and the routine has to be re-entered again. 2. This method is rather slow, but it has essentially perfect accuracy and takes a minimum of storage space. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algor- ithms. Reading, Mass.: Addison-Wesley, 1969, p. 104. 372 (RNORM1) Normal Random Deviates With Specified Mean and Standard Deviation Description This subprogram generates a set of normal random deviates with mean = Mu and standard deviation = Sigma. The probability density function is: f(x)= exp[-(x-Mu) 2 /(2*Sigma | 2)]/[(2*PI) t -5*Sigma] where Sigma >0. File Name "RNORM1" Calling Syntax CALL Random_normall (N,Mu,Sigma,X(*) ) Input Parameters N number of deviates desired Mu assume a normal distribution with mean = Mu. Sigma assume a normal distribution with Standard Deviation = Sigma. Output Parameters X(*) array of dimension (1:N) containing the N normal deviates. Algorithm Given a mean = u and standard deviation = s, 1. Generate a deviate x with a normal distribution with mean and standard deviation = 1. 2. Then y = u + s * x. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algor- ithms. Reading, Mass.: Addison-Wesley, 1969, p. 113. 373 (RNORM2) Dependent Normally Distributed Random Variables (Bivariate Normal Deviates) Description This subprogram generates two dependent random variables which have a bivariate normal distribution with marginal means = Mul,Mu2, marginal standard deviations = Sigma 1, Sigma2, and Correlation Coefficient = Rho. File Name "RNORM2" Calling Syntax CALL Random_normal2 (Mul,Mu2,Sigmal,Sigma2,Rho,Xl(*),X2(*) ) Input Parameters Mul, Mu2 marginal means. Sigma 1, Sigma2 marginal standard deviations. Rho marginal correlation coefficient. Output Parameters Xl(*), X2(*) two vectors of dependent normally distributed random variables. Algorithm If xl and x2 are independent normal deviates with mean = and standard deviation = 1, and if yl = Mul + Sigmal*xl, and y2 - Mu2 + Sigma2*(Rho*xl + J 1 - Rho f 2*x2) then yl and y2 are dependent random variables, normally distributed with means Mul, Mu2 and standard deviations Sigma 1 and Sigma2, and with correlation coefficient Rho. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algor- ithms. Reading, Mass.: Addison-Wesley, 1969, p. 113. 374 (RPAR1) Random Pareto Generator Of The First Kind Description This program generates sets of random Pareto deviates of the first kind. The probability density function is defined as follows: f(x) - [N*A t N]/x f (N + l)forx>A File Name "RPAR1" Calling Syntax CALL Random_paretol (Number A,N,X(*) ) Input Parameters Number number of random deviates desired. A,N Pareto parameters. Output Parameters X(*) array of dimension (1:N) containing N Pareto deviates of the first kind. Algorithm 1. Given parameters A and N, generate a uniform deviate U. 2. Then the Pareto deviate is equal to: A/(l-U) f (1/N). 375 (RPAR2) Random Pareto Generator Of The Second Kind Description This program generates sets of random Pareto deviates of the second kind. The probability density function is defined as follows: f(x) = [N*B t N]/[B + x] t (N + l)forx>0. File Name "RPAR2" Calling Syntax CALL Random_pareto2 (Number B,N,X(*) ) Input Parameters Number number of random deviates desired. B,N Pareto parameters. Output Parameters X(*) array of dimension (1:N) containing N Pareto deviates of the second kind. Algorithm 1. Given parameters B and N, generate a uniform deviate U. 2. Then the Pareto deviate is equal to: B/(l-U) t (1/N)-B. 376 (RPOISS) Random Integers Generated From A Poisson Distribution Description This subprogram generates a set of Poisson deviates with a specified mean. The probability density function is: f(x) = [exp(-Mu) (Mu | x)]/x! for x = 0,1,..., where Mu is the mean of the distribution, and Mu>0 File Name "RPOISS" Calling Syntax CALL Random_poisson (N,Mu,X(*) ) Input Parameters N number of random integers desired. Mu mean of the Poisson distribution. Output Parameters X(*) array of dimension (1:N) containing integers randomly generated with the given Poisson distribution. Algorithm Given a mean of the distribution Mu, 1. Set: P = exp (-Mu) N - Q = 1 2. Generate a random variable U, uniformly distributed between and 1. 3. Set: Q = Q*U 4. If Q>P, then set N = N + 1 and return to step 2. Else, terminate the algorithm with output N. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algor- ithms. Reading, Mass.: Addison-Wesley, 1969, p. 116. 377 (RSPHER) Random Points on an M-dimensional Sphere of Radius One Description This subprogram generates a set of random points on an M-dimensional sphere of radius one. File Name "RSPHER" Calling Syntax CALL Random-sphere (N,M,X)*) ) Input Parameters N number of random points desired. M number of dimensions of the sphere. Output Parameters X(*) array of dimension (1:N) containing the N random points. Algorithm 1. Let XI, X2 . . . ., Xm be independent normal deviates (means = 0, standard deviation = 1). 2. LetR = SQR(XlT2 + X2T2 + ...+XmT2). 3. Then the point (Xl/R,X2/R,...,Xm/R) is a random point on the M dimensional sphere of radius one. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2 Seminumerical Algor- ithms. Reading, Mass.: Addison-Wesley, 1969, p. 116. 378 (RSUPER) Super Uniform Random Number Generator Description Given methods for generating two random sequences, this schuffling algorithm successfully outputs the terms of a 'considerably more random' sequence. This routine uses RND twice to generate 'super' random numbers and, due to the slow execution speed, should be used only in cases where no regular random number generator will do. The probability density function is: f(x) = 1 for0=sx=sl File Name "RSUPER" Calling Syntax CALL Random_super (N,X(*) ) Input Parameters N number of random deviates desired. Output Parameters X(*) array of dimension (1:N) containing N uniformly generated random numbers on the range (0,1). Algorithm This method has been suggested by Bays and Durham in (Ref. 1). Given methods for generat- ing two pseudo-random sequences xn and yn, this routine will output terms of a 'considerably more random' sequence. A temporary table V( 1:107) is used in the generation of sequence yn. 1. Fill table V with the first 107 elements of sequence Xn. 2. Set X,Y equal to the next numbers of the sequences Xn,Yn, respectively. 3. Set J = INT(101*Y + 1) 4. Output V(J) and set V(J) = X. Go to step 2. In our routine, both sequences Xn and Yn are generated using RND. Knuth contends that the sequence obtained by applying this algorithm will satisfy virtually anyone's requirements for randomness in a computer-generated sequence. Reference 1. Knuth, Donald E., The Art of Computer Programming, Vol. II. Seminumerical Algor- ithms, Second Edition, Reading, Mass.: Addison-Wesley, 1969, 1981. 379 Special Considerations 1. As a result of our own tests, this generator comes highly recommended. It performed extremely well on all of our tests of randomness. In terms of execution speed and storage space, it is approximately three times as slow as RND alone, plus it requires an extra 856 or so bytes for storage of the temporary array. 2. In using this routine, it is suggested that as many random deviates be generated on one call as is possible. Each time the subprogram is entered, 107 new table values are created. 3. If you are interested in repeatability of an experiment, remember that initial seeds must be set for RND (using RANDOMIZE). 4. If you plan on calling this routine a large number of times, a significant amount of time would be saved if the table V is set up once in your calling routine and then passed as an additional parameter to Random_super. This will avoid the overhead of redoing this table each time you enter the routine. (RT) Random Numbers Generated From A T-Distribution Description This subprogram generates a set of random deviates for a T-distribution with V degrees of freedom. The probability density function is: f(x) = G( (V+l)/2)/[G(V/2) ( (V*PI) t .5) ( (1 + ( X T 2)/V( f (V + l)/z] forV = 1,2,... File Name "RT" Calling Syntax CALL Random_t (N,V,X(*) ) Input Parameters N number of random deviates desired. V degrees of freedom. Output Parameters X(*) array of dimension (1:N) containing the N random deviates. Algorithm 1. Letyl be a normal deviate, (mean = 0, standard deviation = 1) 2. Let y2 be independent of yl, having the Chi-square distribution with v degrees of freedom. 3. Then x = yl/(SQR(y2/v) ) is independent, having the T distribution with v degrees of freedom. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algor- ithms. Reading, Mass.: Addison-Wesley, 1969, p. 116. 380 (RT1EXT) Random Type I Extreme-Value Generator Description This program generates sets of random Type I Extreme-Value deviates. The cumulative dis- tribution function is defined as follows: f(x) = exp(-exp[-Alpha*(x-Mu)] ) File Name "RT1EXT" Calling Syntax CALL Random_typelext (Number,Alpha,Mu,X(*) ). Input Parametes Number number of random deviates desired. Alpha Mu Type I parameters. Output Parameters X(*) array of dimension (1:N) containing N Type I deviates. Algorithm 1. Given parameters Alpha and Mu, generate a uniform deviate U. 2. Then the Type II deviate is equal to: - log[ - log(U)]/Alpha + Mu. 381 (RT2EXT) Random Type II Extreme- Value Generator Description This program generates sets of random Type II Extreme-Value deviates. The cumulative dis- tribution function is defined as follows: F(x) = exp[-(V/x) f K] File Name "RT2EXT" Calling Syntax CALL Random_type2ext (Number, V,K,X(*) ) Input Parameters Number number of random deviates desired. V,K Type II parameters. Output Parameters X(*) array of dimension (1:N) containing N Type II deviates. Algorithm 1. Given parameters V and K, generate a uniform deviate U. 2. Then the Type II deviate is equal to: V*[-log(U)] t (-1/K). 382 (RUNIF) Uniform Random Number Generator Description This program generates sets of uniform random numbers. The probability density function is: f(x) = 1 for ss x =s 1 Calling Syntax CALL Random_uniform (N,X(*) ) Input Parameters N number of random deviates desired. Output Parameters X(*) array of dimension (1:N) containing N uniformly generated random numbers on the range (0,1)- 383 (RWEIBU) Random Integers Generated From a Weibull Distribution Description This subprogram generates a set of Weibull deviates. The cumulative distribution function is: F(x) = 1- exp[-(x t (Beta) )/Alpha] File Name "RWEIBU" Calling Syntax CALL Random_weibull (N,Alpha,Beta,X(*) ) Input Parameters N number of random deviates desired. Alpha, Beta Weibull parameters. Output Parameters X(*) array of dimension (1:N) containing deviates randomly generated with the given Weibull distribution. Reference 1. Wheeler, R.E., "Random Variable Generators", SIMULETTER, Vol. IV, April 1973, p. 22. 384 Tests for Randomness Object of Programs A standard set of statistical tests for randomness is provided. These tests are designed as independent subprograms with optional drivers. These driver programs have been set up to test the binary random number generator RND for randomness. The aim here is twofold: i) to actually allow you to check the randomness of RND; and ii) to show you how a typical test might be set up. (TCHISQ) Chi-square Test Description This subprogram performs a Chi-Square test on a set of observations placed in a set of cate- gories with given probabilities. File Name "TCHISQ" Calling Syntax Call Chi_sq_test (N,Cats,Prob(*),Obs(*),V,P) Input Parameters N number of observations. This should be at least 5*Cats, but preferably much larger, for a valid test. Cats number of categories. Prob(*) array of dimension (l:Cats) containing the probabilities of any event occurring in a particular category. Care must be taken to insure that no probability value is too small. Obs(*) array of dimension (l:Cats) containing the number of observations occurring in each category. Output Parameters V Chi-square statistic. V is expected to have the Chi-square distribution with (Cats- 1) degrees of freedom. P right-tailed probability; Prob (X>V). 385 Special Considerations 1. The Chi-square method can only be used with sets of independent observations. 2. The proper choice of N is somewhat obscure. Large values of N will tend to smooth out 'locally' non-random behavior, that is, blocks of numbers with a strong bias followed by blocks of numbers with the opposite bias. But, N should be large enough so that each of the expected values N*Prob> = 5 for the probability associated with each category. Preferably, N should be taken much larger than this. So, the method should probably be used with a number of different values of N. 3. From the Chi-square formula, we can see that a very small probability value would severely influence the Chi-square statistic. Hence, it is suggested that categories with very small probabilities be grouped together into one larger category. 4. You must supply the routine with the number of categories into which the data is to be partitioned. For example, to check the randomness of the first digit, ten categories will be sufficient. To check the first two digits, 100 categories are recommended. Algorithm A fairly large number, N, of independent observations is made. We count the number of observations falling into each of K categories, and compute the quantity. K V = (1/N)X( (observed(I) | 2)/Prob(I) )-N i = 1 In the associated driver program, the right-tailed probability P(X> V) is then calculated using (K- 1) as the number of degrees of freedom. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algorithms. Reading, Mass.: Addison-Wesley, 1969, p. 35-40. 386 (TKS) Kolmogorov-Smirnov Test Description Given a continuous cumulative distribution function F(X), this subprogram calculates the stan- dard Kolmogorov-Smirnov statistics of maximum deviation. File Name "TKS" Calling Syntax Call K_s_test (N,Knp,Knn) Input Parameters N number of observations The distribution function F(X) must be provided as an in-line function to the subprogram. Output Parameters Knp positive K-S statistic. Knn negative K-S statistic. Algorithm Given a distribution function F(x) = probability that (X< = x) for a random variable X, the statistics Knp (Kn positive) and Knn (Kn negative) can be obtained as follows: 1. Obtain the observations xl,x2,..., xn. 2. Sort the observations: xl< = x2< = ...< = xn. 3. Knp = SQR(n)* maximum of [j/n-F(xj)] where 1< = j< = n. Knn = SQR(n) * maximum of [F(xj) - (j - l)/n] where 1 < = j< = n. Special Considerations 1. The method used in the driver program (using several tests for moderately sized N, then combining the observations later in another K-S test), tends to detect both local and global nonrandom behavior. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algorithms. Reading, Mass.: Addison-Wesley, 1969, p. 41-48. 387 (TMAXT) Maximum of T Test Description This routine generates groups of uniform random numbers, finds the maximum of each group and then applies the Kolmogorov-Smirnov test to the resulting set of numbers. File Name "TMAXT" Calling Syntax CALL Max_of_t (N,T,Knp,Knn) Input Parameters N number of groups to be tested. T size of each group. Output Parameters Knp positive Kolmogorov-Smirnov statistic. Knn negative Kolmogorov-Smirnov statistic. Algorithm For 0< = j<n, let Vj = max(Utj, Utj + 1, ..., Utj + t-1) where the U's are uniformly distri- buted random numbers. Now apply the Kolmogorov-Smirnov test to the sequence VO, VI, ..., Vn-1, with the dis- tribution function F(x) = x t t, (0< = x< = 1). Reference 1. Knuth, Donald E., The Art of Computer Programming, Vol. II, Seminumerical Algo- rithms, Readinbg, Mass.: Addison-Wesley, 1969, p. 64. 388 (TPOKER) Modified Poker Test Description This subprogram calculates the number of distinct values in a given set of observations. A Chi-square test is then applied to the set of data. File Name 'TPOKER" Calling Syntax CALL Poker_test (K,N,Digits,V,P) Input Parameters K number of possible different digits in a set. The degrees of freedom is then (K- 1). A reasonable number here is 5. N number of test sets to be used. N should be at least 5*(K-1), but preferably much larger, for a valid Chi-square test. Digits range on the allowed digits, [0,Digits-l]; 13 or 10 would be reasonable values here. Output Parameters V Chi-square statistic. V is expected to have the Chi-square distribution with (K-l) degrees of freedom. P right-tailed probability; Prob (X>V). Algorithm In general, we look at n groups of k successive numbers. We count the number of k-tuples with r different values. For example, generate 1000 groups of 5 successive numbers, where the numbers range from 1 to 13. How many sets have all 5 numbers different? How many have 4 different? How many 3? 2? 1? A Chi-square test is then made, using the probability. P(r) = d*(d-l)*...*(d-r+l)/(d t k)*S(k,r) where d is the number of possible digits considered and S(k,r) is the standard Sterling number of k,r. Special Considerations You will be required to enter a starting and ending value for the number of groups desired, as well as the increment between values. At each value, three independent tests are run. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algorithms. Reading, Mass.: Addison-Wesley, 1969, p. 57-58. 389 (TRUNS) Runs Test Description This subprogram sets up N random numbers and calculates the number of ascending or descending runs in the sequence. A special Chi-square statistic is then produced. File Name "TRUNS" Calling Syntax CALL Runs_test (N,Direction,V,P) Input Parameters N number of random deviates used. The value of N should be 4000 or more. Direction Direction = 1 means an ascending run. Direction = — 1 means a descending run. Output Parameters V Chi-square statistic. Since adjacent runs are not independent, a standard Chi- square test cannot be used here. A special test, with six degrees of freedom is used instead. P Right-tailed probability; Prob (X>V). Algorithm In this eilgorithm, we examine the length of monotone subsequences of an original sequence of random numbers; that is, segments which are increasing or decreasing. 1. Calculate the increasing (or decreasing) run lengths and count how many runs have length 1, 2, ..., 6 or greater. 2. Since adjacent runs are not independent, we cannot apply a standard Chi-square test to the above data. Instead, we calculate a special statistic V (see Ref. 1, p. 61) which should have the Chi-square distribution with six degrees of freedom, when N is large. The value of N should be at least 4000 for a valid test. This test may also be used for decreasing runs. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algorithms. Reading, Mass.: Addison-Wesley, 1969, p. 60-61. 390 (TSERAL) Serial Test Description This subprogram tests whether pairs of successive numbers are uniformly distributed in an independent manner. File Name "TSERAL" Calling Syntax CALL Seria]_test (N,D,D_squared,V,P) Input Parameters N number of uniform random numbers to be tested. D number of digits permitted; 5 or 10 is a reasonable number here. D_squared D*D; this must be passed as a parameter to allow for dynamic allocation of arrays. Output Parameters V Chi-square statistic. V is expected to have the Chi-square distribution with (D * D - 1 ) degrees of freedom. P right-tailed probability; Prob(X>V). Algorithm Given n = total number of uniform random numbers. d = number of digits permitted; that is, the deviates created are used to create inte- gers 1,2..., d. yj = jth random integer. Then for each pair of integers (q,r) with 0< = q, r<d, count the number of times the pair (y2j,y2j + 1) = (q,r) occurs, for 0< = j<n. Finally, apply the Chi-square test to these k = d*d equi-probable categories with probability l/(d*d) in each case. Special Considerations 1. The number of digits permitted may be chosen as any convenient number. But care must be taken since a valid Chi-square test should have n large compared to k; that is, n>5*d*d at least. So, if d = 10 then n>500 d = 20 then n>2000 etc. 391 2. This test may easily be adapted to triples, quadruples, etc., instead of pairs. But the value of d must be severely limited in order to avoid having too many categories. Fre- quently, in this case, less exact tests, such as the poker test or the maximum t test are used instead. Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algorithms. Reading, Mass.: Addison-Wesley, 1969, p. 55-66. (SPCTRL) Spectral Test Description This test is used in theoretically determining the value of coefficient A, given the word size of the computer, M, in the linear congruential model described in the General Information section of this manual. The value of A is crucial in setting up a good uniform random number generator. This is by far the most powerful test currently available on any sized machine. It tends to measure the statistical independence of adjacent n-tuples of numbers and is generally applied for N - 2,3,4 and perhaps a few higher values of N. File Name "SPCTRL" Calling Syntax CALL Spectral (A,M,N,Info,Q,V,Cn) Input Parameters A the multiplier to be tested. It is essential that the linear congruential sequence be of maximal period. M modulus used in the model; in our case, M=£2"49- 1. N size of n-tuple to be measured. This test is generally applied for N = 2,3, 4 and perhaps a few higher values of N. Info intermediate information on program execution each time a particular section of code has been entered as well as total number of iterations required for conver- gence can be printed out at the user's option: Info = 1 — < print out intermediate information. Info = = > do not print out the information. 392 Output Parameters Q V | 2, equals the wave number squared. V smallest non-zero wave number in the spectrum. Cn = PI t (N/2)*V T N (N/2)!*M Special Considerations 1. Since BASIC string routines are used to perform the multi-precision arithmetic, this program is very slow. 2. The subprogram allows at most 12 digits for A and M. If larger numbers are desired, some parameters must be changed to strings before entering the routine. Change: SUB Spectral (A,M,N,Info,Q,V,Cn) DIM Coef$ = VAL$(A) CALL Clean-up (Coef$) Base$ = VAL$(M) CALL Clean-up (Base$) To: SUB Spectral (Coef$,Base$,N,Info,Q,V,Cn) 3. As suggested in the literature, the driver has been set up for N = 2,3,4,5,6. 4. The multi-precision arithmetic routines are set up as independent subprograms so that the user may apply them to other contexts as well. Presently, each of these routines allows for up to 90 digits of accuracy. This can be increased simply by changing the DIM statements at the beginning of each routine. Note This test is quite slow. It is not unusual for it to run for a couple of hours with one pair. 5. The program has been set up with n-tuples of size 2, 3, 4, 5 and 6. For each of these values, the quantity Cn is calculated. Large values of Cn correspond to randomness, small values correspond to nonrandomness. Knuth suggests that the multiplier A passes the spectral test if the Cn values are all greater than or equal to 0.1, and it passes the test with flying colors if all are greater than or equal to 1. Reference 1. Knuth, Donald E., The Art of Computer Programming, Vol. II, Seminumerical Algo- rithms. Reading, Mass.: Addison-Wesley, 1969, p. 69-100. 393 Elementary Sampling Techniques Object of Programs This section provides some elementary sampling and shuffling techniques. Independent sub- programs with optional driver routines are provided. (SSEL) Selection Sampling Description Given a set of N objects, this program will select n of them at random in an unbiased manner (a simple random sample without replacement). File Name "SSEL" Calling Syntax CALL SeLsampling (T_number,S_number,X(*) ) Input Pairameters T_number total number of records in the set. S_number number of records to be selected. Output Parameters X(*) array of size (1:N) containing the index numbers of the records to be sampled. Algorithm To select n records at random from a set of N, where 0<n< = N: 1. Sett = 0, m = 0. 2. Generate a random number U, uniformly distributed between zero and one. 3. If (N - t)*U> = (n - m), then go to step 5. Else go to step 4. 4. Select the next record index for the sample. m = m + 1. t = t+1. If m<n then go to step 2. Else the sample is complete and the algorithm terminates. 5. Skip the next record index. t = t+1 Go to step 2. 394 Special Considerations 1. In order to avoid connections between samples obtained on different runs, care must be taken to use different starting seeds each time this program is run. RND (using RANDO- MIZE) allows for this. The seed can either be initialized in the calling program or the subprogram itself. A simple way of initializing different seeds for different runs is to do the following: use the digits from the month, day, and time that the program is run as the seed. For example, if you are running the program on June 19 at 9:47 am, then your seed would be 6190947. Reference 1. Knuth, Donald E., The Art of Computer Programming. Vol. II, Seminumerical Algo- rithms, Reading, Mass.: Addison- Wesley, 1969, p. 122. (SSHUFL) Shuffling Description Given an array of numbers, this program randomly shuffles the array. File Name "SSHUFL" Calling Syntax CALL Sshuffle (N,X(*) ) Input Parameters N number of digits in the array to be shuffled. X(*) array of dimension (1:N) containing the digits to be shuffled. Output Parameters X(*) array of dimension (1:N) containing the shuffled digits. Algorithm Let XI, X2, ..., Xt be a set of t numbers to be shuffled. 1. Set: j = t. 2. Generate a random number U, uniformly distributed between zero and one. 3. Set: k = greatest integer in [j*U + 1]. Hence, k is a random integer between i and j. Exchange Xk and Xj. 4. j = j-1. If j>l then return to step 2. Else the algorithm terminates at this point. 395 Reference 1. Knuth, Donald E., The Art of Computer Programming, Volume 2, Seminumerical Algorithms. Reading, Mass.: Addison-Wesley, 1969, p. 124-125. 396 Notes 397 Appendix Changes Necessary For Larger Data Sets CAUTION INCREASING THE SIZE OF THE DATA SET MAY CAUSE A PROBLEM. THERE MAY NOT BE ENOUGH ROOM ON THE PROGRAM DISC TO STORE THE ENLARGED DATA SET. TO FIND OUT, PROCEED AS FOLLOWS. A. Perform the following check on each of your program tapes or discs (excluding Monte Carlo Random Number Generator): 1. Make sure nothing of value is in the scratch file "DATA". If there is, use the STORE routine to save it. 2. Type: PURGE "DATA" 3. Press: EXECUTE ,=J 4. Type: CREATE V "DATA", 2 + (8*n) DIV 1280,1280 where n is the maximum num- ber of data values you wish to use in the statistics routines (and is equal to number of variables times number of observations per variable). 5. Press: EXECUTE In addition, follow the above procedure for the file named "BACKUP" on Basic Statistics and Data Manipulation. If you obtain an error using the above procedure on any of the program tapes or discs, you must transfer all data to a larger media in order to expand the data set. 398 B. Make the following change to Basic Statistics and Data Manipulation: 1. Type:.LOAD"FILEl" 2. Press: . EXECUTE 3. Type: EDIT 80 4. Press: EXECUTE 5. By editing, make the line read Mno = n where n is the maximum number of data values you wish to use in the statistics routines. This must be less than or equal to 1500. 6. Press: ENTER 7. Press: shift RESET 8. Type: PURGE "FILE1" 9. Press: EXECUTE 10. Type: STORE "FILE 1" 11. Press: EXECUTE Note Maximum number of variables is 50 and cannot be changed by the user. Statistics Library Data Formats The following is a description of the data format used in the Statistics Library. Also included is an explanation of the steps you need to perform to have a program create data compatible with the library. Method 1 Numeric Data Only If you wish to have another program, write a data file that is compatible with the library. It is important to note that the actual numeric data could be written in one of two forms: .2 113 > Observations 0! 2 3 4 N V, V 2 v 3 v„ OR Variables Vi v 2 v 3 c o 0j (0 2 > o 3 01 • !/> • XI • O N v n The statistics library will prompt you for additional information such as sample size (n), number of variables (p), title of the data set, and names of the variables. 399 The statements needed to store the data are as follows: ! P = n o . of variables N = n o ♦ of observations THIS COULD BE X(N.P> 05 OPTION BASE 1 10 P = 3 20 N=10 30 ALLOCATE X(P.N) 40 ! 50 ! Put data into matrix X GO ! 70 CREATE BDAT "FILE " , INT ( ( 8*P*N ) / 1Z80 ) +2 80 ASSIGN SFilel TO "FILE1" 90 OUTPUT SFilel !X<*> 100 ASSIGN BFilei TO * 110 END (1280 ! 8 bytes per entry and ! 1280 bytes per logical ! record Method 2 Numeric Data and Descriptive Data If you wish to have another program, write a data file that is compatible with the library and if you wish to have it store descriptive information as well, you need to prepare the file in a slightly different manner. The following data is stored in record 1 of the data file: Data set title T$[80] Number of observations No Number of variables Nv Variable names Vn$(50)[10] Number of subfiles Ns Subfile names Sn$(20)[10] Subfile characterizations Sc(20) (max. is 50) (max. is 20) Note No, Nv, Ns, and the array Sc(*) should be declared in real precision. Starting with record 2, the Statistics Library expects to find the data array. The statements needed to store the data are as follows: P=no. of variables N = n o i of observations 05 OPTION BASE 1 10 P = 3 20 N=10 30 ALLOCATE X(P iN) 35 DIM T*[80] . Mn$(50)C10], Sn* < 20 ) C 10] , Sc(20) 40 ! 50 ! Put data into matrix X and descriptive data into other variables GO ! 70 CREATE BDAT "FILE1 " , I NT ( ( B*P*N ) / 1 280 ) +2 . 1 280 80 ASSIGN SFilel TO "FILE1" 85 OUTPUT SFile ,1 >T$ iNo ,Nv iUr,$<*) .Ns ,Sn*(*) ,Sc (*) 90 OUTPUT SFile ,2!X(*> 100 ASSIGN SFilel to * 110 END Write record 1 Write records 2 i3 i . . , When using this format and the Statistics Library asks you the question, "Was the data stored by the BS&DM system?", answer Yes. This will tell the library to expect the header record as record #1. 400 Statistical Tables Quantiles of the Spearman Test Statistic 0, n p = .900 .950 .975 .990 .995 .999 4 .8000 .8000 5 .7000 .8000 .9000 .9000 6 .6000 .7714 .8286 .8857 .9429 7 .5357 .6786 .7450 .8571 .8929 .9643 8 .5000 .6190 .7143 .8095 .8571 .9286 9 .4667 .5833 .6833 .7667 .8167 .9000 10 .4424 .5515 .6364 .7333 .7818 .8667 11 .4182 .5273 .6091 .7000 .7455 .8364 12 .3986 .4965 .5804 .6713 .7273 .8182 13 .3791 .4780 .5549 .6429 .6978 .7912 14 .3626 .4593 .5341 .6220 .6747 .7670 15 .3500 .4429 .5179 .6000 .6536 .7464 16 .3382 .4265 .5000 .5824 .6324 .7265 17 .3260 .4118 .4853 .5637 .6152 .7083 18 .3148 .3994 .4716 .5480 .5975 .6904 19 .3070 .3895 .4579 .5333 .5825 .6737 20 .2977 .3789 .4451 .5203 .5684 .6586 21 .2909 .3688 .4351 .5078 .5545 .6455 22 .2829 .3597 .4241 .4963 .5426 .6318 23 .2767 .3518 .4150 .4852 .5306 .6186 24 .2704 .3435 .4061 .4748 .5200 .6070 25 .2646 .3362 .3977 .4654 .5100 .5962 26 .2588 .3299 .3894 .4564 .5002 .5856 27 .2540 .3236 .3822 .4481 .4915 .5757 28 .2490 .3175 .3749 .4401 .4828 .5660 29 .2443 .3113 .3685 .4320 .4744 .5567 30 .2400 .3059 .3620 .4251 .4665 .5479 a The entries in this table are selected quantiles w p of the Spearman rank correlation coefficient p when used as a test statistic. The lower quantiles may be obtained from the equation Wp = -w^ p The critical region corresponds to values of p smaller than (or greater than) but not includ- ing the appropriate quantile. Note that the median of p is 0. This table was reprinted from Practical Nonparametric Statistics by W.J Conover, with permission from John Wiley and Sons, Inc., and authors Dr Gerald J. Glasser and Dr. Winter. 401 Quantiles of the Wilcoxon Signed Ranks Test Statistic" W.005 w.oi W.025 W.05 w.io ►f.20 W.30 W.40 W.50 n(n + 1) 2 n = 4 1 3 3 4 5 10 5 1 3 4 5 6 7.5 15 6 1 3 4 6 8 9 10.5 21 7 1 3 4 6 9 11 12 14 28 8 1 2 4 6 9 12 14 16 18 36 9 2 4 6 9 11 15 18 20 22.5 45 10 4 6 9 11 15 19 22 25 27.5 55 11 6 8 11 14 18 23 27 30 33 66 12 8 10 14 18 22 28 32 36 39 78 13 10 13 18 22 27 33 38 42 45.5 91 14 13 16 22 26 32 39 44 48 52.5 105 15 16 20 26 31 37 45 51 55 60 120 16 20 24 30 36 43 51 58 63 68 136 17 24 28 35 42 49 58 65 71 76.5 153 18 28 33 41 48 56 66 73 80 85.5 171 19 33 38 47 54 63 74 82 89 95 190 20 38 44 53 61 70 82 91 98 105 210 a The entries in this table are quantiles w p of the Wilcoxon signed ranks test statistic T, for selected values of p < .50. Quantiles w v for p > .50 may be computed from the equation w v = «(« + l)/2 - *>!_„ where n(n 4- l)/2 is given in the right hand column in the table. Note that P(T < w p ) < p and P{T > w p ) < 1 — p if ff is true. Critical regions correspond to values of T less than (or greater than) but not including the appropriate quantile. rhis table was reprinted from the Journal of the American Statistical Associ.^o n. Dr. Robert L. McComack author, and with the permission of the American Statistical Association. 402 Quantiles of the Kolmogorov Test Statistic" One-Sided Test /> = .90 .95 .975 .99 .995 p = .90 .95 .975 .99 .995 Two-Sided Test p = .80 .90 .95 .98 .99 p = .80 .90 .95 .98 .99 n= I .900 .950 .975 .990 .995 n = 21 .226 .259 .287 .321 .344 2 .684 .776 .842 .900 .929 22 .221 .253 .281 .314 .337 3 .565 .636 .708 .785 .829 23 .216 .247 .275 .307 .330 4 .493 .565 .624 .689 .734 24 .212 .242 .269 .301 .323 5 .447 .509 .563 .627 .669 25 .208 .238 .264 .295 .317 6 .410 .468 .519 .577 .617 26 .204 .233 .259 .290 .311 7 .381 .436 .483 .538 .576 27 .200 .229 .254 .284 .305 8 .358 .410 .454 .507 .542 28 .197 .225 .250 .279 .300 9 .339 .387 .430 .480 .513 29 .193 .221 .246 .275 .295 10 .323 .369 .409 .457 .489 30 .190 .218 .242 .270 .290 11 .308 .352 .391 .437 .468 31 .187 .214 .238 .266 .285 12 .296 .338 .375 .419 .449 32 .184 .211 .234 .262 .281 13 .285 .325 .361 .404 .432 33 .182 .208 .231 .258 .277 14 .275 .314 .349 .390 .418 34 .179 .205 .227 .254 .273 15 .266 .304 .338 .377 .404 35 .177 .202 .224 .251 .269 16 .258 .295 .327 .366 .392 36 .174 .199 .221 .247 .265 17 .250 .286 .318 .355 .381 37 .172 .196 .218 .244 .262 18 .244 .279 .309 .346 .371 38 .170 .194 .215 .241 .258 19 .237 .271 .301 .337 .361 39 .168 .191 .213 .238 .255 20 .232 .265 .294 .329 .352 40 .165 .189 .210 .235 .252 Approximation I 1.07 1.22 1.36 1.52 1.63 for it > 40 Vn Vh v7i V M v; ° The entries in this table are selected quantiles »„ of the Kolmogorov test statistics 7\, 7",+, and 7",- as defined by (6.1.1) for two-sided tests and by (6.1.2) and (6.1.3) for one-sided tests. Reject H at the level a if T exceeds the 1 — a quantile given in this table. These quantiles are exact for n <> 20 in the two-tailed test. The other quantiles are approximations which are equal to the exact quantiles in most cases. This table was reprinted from the Journal of the American Statistical Association with the permission of the American Statistical Association, author Dr. J L. Miller. 403 Quantiles of the Mann-Whitney Test Statistic n p m=2 3 4 5 6 7 8 9 10 // 12 13 14 15 16 17 18 19 20 .001 .005 1 1 2 .01 1 1 1 1 1 1 2 2 .025 1 1 1 1 2 2 2 2 2 3 3 3 3 .05 1 1 1 2 2 2 2 3 3 4 4 4 4 5 5 5 .10 1 1 2 2 2 3 3 4 4 5 5 5 6 6 7 7 8 8 .001 1 1 1 1 .005 1 1 1 2 2 2 3 3 3 3 4 4 3 .01 1 1 2 2 2 3 3 3 4 4 5 5 5 6 .025 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 .05 1 1 2 3 3 4 5 5 6 6 7 8 8 9 10 10 11 12 .10 1 2 2 3 4 5 6 6 7 8 9 10 11 11 12 13 14 15 16 .001 1 1 1 2 2 2 3 3 4 4 4 .005 1 1 2 2 3 3 4 4 5 6 6 7 7 8 9 4 .01 1 2 2 3 4 4 5 6 6 7 9 8 9 10 10 11 .025 1 2 3 4 5 5 6 7 8 9 10 11 12 12 13 14 15 .05 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 17 18 19 .10 1 2 4 5 6 7 8 10 11 12 13 14 16 17 18 19 21 22 23 .001 1 2 2 3 3 4 4 5 6 6 7 8 8 .005 1 2 2 3 4 5 6 7 8 8 9 10 11 12 13 14 5 .01 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 .025 1 2 3 4 6 7 8 9 10 12 13 14 15 16 18 19 20 21 .05 1 2 3 5 6 7 9 10 12 13 14 16 17 19 20 21 23 24 26 .10 2 3 5 6 8 9 11 13 14 16 18 19 21 23 24 26 28 29 31 .001 2 3 4 5 5 6 7 8 9 10 11 12 13 .005 1 2 3 4 5 6 7 8 10 11 12 13 14 16 17 18 19 6 .01 2 3 4 5 7 8 9 10 12 13 14 16 17 19 20 21 23 .025 2 3 4 6 7 9 11 12 14 15 17 18 20 22 23 25 26 28 .05 1 3 4 6 8 9 11 13 15 17 18 20 22 24 26 27 29 31 33 .10 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 35 37 39 .001 1 2 3 4 6 7 8 9 10 11 12 14 15 16 17 .005 1 2 4 5 7 8 10 11 13 14 16 17 19 20 22 23 25 7 .01 1 2 4 5 7 8 10 12 13 15 17 18 20 22 24 25 27 29 .025 2 4 6 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 .05 1 3 5 7 9 12 14 16 18 20 22 25 27 29 31 34 36 38 40 .10 2 5 7 9 12 14 17 19 22 24 27 29 32 34 37 39 42 44 47 .001 1 2 3 5 6 7 9 10 12 13 15 16 18 19 21 22 .005 2 3 5 7 8 10 12 14 16 18 19 21 23 25 27 29 31 8 .01 1 3 5 7 8 10 12 14 16 18 21 23 25 27 29 31 33 35 .025 1 3 5 7 9 11 14 16 18 20 23 25 27 30 32 35 37 39 42 .05 2 4 6 9 11 14 16 19 21 24 27 29 32 34 37 40 42 45 48 .10 3 6 8 11 14 17 20 23 25 28 31 34 37 40 43 46 49 52 55 .001 2 3 4 6 8 9 11 13 15 16 18 20 22 24 26 27 .005 1 2 4 6 8 10 12 14 17 19 21 23 25 28 30 32 34 37 9 .01 2 4 6 8 10 12 15 17 19 22 24 27 29 32 34 37 39 41 .025 1 3 5 8 11 13 16 18 21 24 27 29 32 35 38 40 43 46 49 .05 2 5 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 .10 3 6 10 13 16 19 23 26 29 32 36 39 42 46 49 53 56 59 63 .001 1 2 4 6 7 9 11 13 15 18 20 22 24 26 28 30 33 .005 1 3 5 7 10 12 14 17 19 22 25 27 30 32 35 38 40 43 10 .01 2 4 7 9 12 14 17 20 23 25 28 31 34 37 39 42 45 48 .025 1 4 6 9 12 15 18 21 24 27 30 34 37 40 43 46 49 53 56 .05 2 5 8 12 15 18 21 25 28 32 35 38 42 45 49 52 56 59 63 .10 4 7 11 14 18 22 25 29 33 37 40 44 48 52 55 59 63 67 71 .001 1 3 5 7 9 11 13 16 18 21 23 25 28 30 33 35 38 .005 1 3 6 8 11 14 17 19 22 25 28 31 34 37 40 43 46 49 11 .01 2 5 8 10 13 16 19 23 26 29 32 35 38 42 45 48 51 54 .025 1 4 7 10 14 17 20 24 27 31 34 38 41 45 48 52 56 59 63 .05 2 6 9 13 17 20 24 28 32 35 39 43 47 51 55 58 62 66 70 .10 4 8 12 16 20 24 28 32 37 41 45 49 53 58 62 66 70 74 79 404 Quantiles of the Mann-Whitney Test Statistic (continued) n p m = 2 3 4 5 6 7 8 9 10 // 12 13 14 15 16 n 18 19 20 .001 1 3 5 8 10 13 15 18 21 24 26 29 32 35 38 41 43 .005 2 4 7 10 13 16 19 22 25 28 32 35 38 42 45 48 52 55 12 .01 3 6 9 12 15 18 22 25 29 32 36 39 43 47 50 54 57 61 .025 2 5 8 12 15 19 23 27 30 34 38 42 46 50 54 58 62 66 70 .05 3 6 10 14 18 22 27 31 35 39 43 48 52 56 61 65 69 73 78 .10 5 9 13 18 22 27 31 36 40 45 50 54 59 64 68 73 78 82 87 .001 2 4 6 9 12 15 18 21 24 27 30 33 36 39 43 46 49 .005 2 4 8 11 14 18 21 25 28 32 35 39 43 46 50 54 58 61 13 .01 1 3 6 10 13 17 21 24 28 32 36 40 44 48 52 56 60 64 68 .025 2 5 9 13 17 21 25 29 34 38 42 46 51 55 60 64 68 73 77 .05 3 7 11 16 20 25 29 34 38 43 48 52 57 62 66 71 76 81 85 .10 5 10 14 19 24 29 34 39 44 49 54 59 64 69 75 80 85 90 95 .001 2 4 7 10 13 16 20 23 26 30 33 37 40 44 47 51 55 .005 2 5 8 12 16 19 23 27 31 35 39 43 47 51 55 59 64 68 14 .01 1 3 7 11 14 18 23 27 31 35 39 44 48 52 57 §1 66 70 74 .025 2 6 10 14 18 23 27 32 37 41 46 51 56 60 65 70 75 79 84 .05 4 8 12 17 22 27 32 37 42 47 52 57 62 67 72 78 83 88 93 .10 5 11 16 21 26 32 37 42 48 53 59 64 70 75 81 86 92 98 103 .001 2 5 8 11 15 18 22 25 29 33 37 41 44 48 52 56 60 .005 3 6 9 13 17 21 25 30 34 38 43 47 52 56 61 65 70 74 15 .01 1 4 8 12 16 20 25 29 34 38 43 48 52 57 62 67 71 76 81 .025 2 6 11 15 20 25 30 35 40 45 50 55 60 65 71 76 81 86 91 .05 4 8 13 19 24 29 34 40 45 51 56 62 67 73 78 84 89 95 101 .10 6 11 17 23 28 34 40 46 52 58 64 69 75 81 87 93 99 105 111 .001 3 6 9 12 16 20 24 28 32 36 40 44 49 53 57 61 66 .005 3 6 10 14 19 23 28 32 37 42 46 51 56 61 66 71 75 80 16 .01 1 4 8 13 17 22 27 32 37 42 47 52 57 62 67 72 77 83 88 .025 2 7 12 16 22 27 32 38 43 48 54 60 65 71 76 82 87 93 99 .05 4 9 15 20 26 31 37 43 49 55 61 66 72 78 84 90 96 102 108 .10 6 12 18 24 30 37 43 49 55 62 68 75 81 87 94 100 107 113 120 .001 1 3 6 10 14 18 22 26 30 35 39 44 48 53 58 62 67 71 .005 3 7 11 16 20 25 30 35 40 45 50 55 61 66 71 76 82 87 17 .01 1 5 9 14 19 24 29 34 39 45 50 56 61 67 72 78 83 89 94 .025 3 7 12 18 23 29 35 40 46 52 58 64 70 76 82 88 94 100 106 .05 4 10 16 21 27 34 40 46 52 58 65 71 78 84 90 97 103 110 116 .10 7 13 19 26 32 39 46 53 59 66 73 80 86 93 100 107 114 121 128 .001 1 4 7 11 15 19 24 28 33 38 43 47 52 57 62 67 72 77 .005 3 7 12 17 22 27 32 38 43 48 54 59 65 71 76 82 88 93 18 .01 1 5 10 15 20 25 31 37 42 48 54 60 66 71 77 83 89 95 101 .025 3 8 13 19 25 31 37 43 49 56 62 68 75 81 87 94 100 107 113 .05 5 10 17 23 29 36 42 49 56 62 69 76 83 89 96 103 110 117 124 .10 7 14 21 28 35 42 49 56 63 70 78 85 92 99 107 114 121 129 136 .001 1 4 8 12 16 21 26 30 35 41 46 51 56 61 67 72 78 83 .005 1 4 8 13 18 23 29 34 40 46 52 58 64 70 75 82 88 94 100 19 .01 2 5 10 16 21 27 33 39 45 51 57 64 70 76 83 89 95 102 108 .025 3 8 14 20 26. 33 39 46 53 59 66 73 79 86 93 100 107 114 120 .05 5 11 18 24 31 38 45 52 59 66 73 81 88 95 102 110 117 124 131 .10 8 15 22 29 37 44 52 59 67 74 82 90 98 105 113 121 129 136 144 .001 1 4 8 13 17 22 27 33 38 43 49 55 60 66 71 77 83 89 .005 1 4 9 14 19 25 31 37 43 49 55 61 68 74 80 87 93 100 106 20 .01 2 6 11 17 23 29 35 41 48 54 61 68 74 81 88 94 101 108 115 .025 3 9 15 21 28 35 42 49 56 63 70 77 84 91 99 106 113 120 128 .05 5 12 19 26 33 40 48 55 63 70 78 85 93 101 108 116 124 131 139 .10 8 16 23 31 39 47 55 63 71 79 87 95 103 111 120 128 136 144 152 This table was reprinted from Practical Nonparametric Statistics by W.J. Conover, with permission from John Wiley and Sons, Inc., and author L.R. Verdooren . 405 Percentage Points of the Duncan New Multiple Range Test \^ p ni ^y 2 3 4 5 6 7 8 9 10 12 14 16 18 20 50 100 1 18.0 18.0 18.0 18.0 18 18.0 18 18.0 18 18.0 18.0 18.0 18.0 18.0 18 18.0 2 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6 09 3 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4 50 4 3.93 4.01 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4 02 4.02 4.02 4.02 4.02 4.02 4 02 5 3.64 3.74 3.79 3.83 3.83 3.83 3.83 3.83 3 83 3.83 3.83 3.83 3.83 3.83 3.83 3.83 6 3.46 3.58 3.64 3.68 3.68 3.08 3.68 3.68 3.68 3.68 3.68 3.68 3.68 3.68 3.68 3 68 7 3.35 3.47 3.54 3.58 3.60 3.61 3.61 3.61 3.61 3.61 3.61 3.61 3.61 3.61 3.61 3 61 8 3.26 3.39 3.47 3.52 3.55 3.56 3.50 3.56 3.50 3.56 3.56 3.56 3.56 3.56 3 56 3.56 9 3.20 3.34 3.41 3.47 3.50 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 10 3 15 3.30 3 37 3.43 3.46 3.47 3.47 3.47 3.47 3.47 3.47 3 47 3 47 3 48 3 48 3 48 11 3 11 3.27 3.35 3.39 3.43 3.44 3.45 3.46 3.46 3.46 3.46 3.46 3.47 3 48 3 48 3.48 12 3.08 3.23 3.33 3.36 3.40 3.42 3 44 3.44 3.46 3.46 3.46 3.46 3 47 3.48 3.48 3 4S 13 3.06 3 21 3 30 3.35 3.38 3.41 3 42 3 44 3.45 3.45 3.46 3.46 3 47 3.47 3.47 3 47 14 3.03 3.18 3 27 3.33 3 37 3.39 3.41 3.42 3.44 3.45 3.46 3.46 3.47 3 47 3.47 3 47 15 3 01 3.16 3.25 3 31 3.30 3.38 3.40 3.42 3.43 3.44 3.45 3.46 3.47 3 47 3.47 3 47 1G 3.00 3.15 3.23 3.30 3.34 3 37 3.39 3.41 3.43 3.44 3.45 3.46 3.47 3 47 3.47 3 47 17 2.98 3.13 3.22 3.28 3.33 3.36 3.38 3 40 3.42 3.44 3.45 3.46 3.47 3 47 3.47 3.47 IS 2 97 3.12 3.21 3.27 3.32 3 35 3.37 3 39 3.41 3.43 3.45 3.46 3.47 3.47 3 47 3 47 19 2.96 3.11 3.19 3.26 3.31 3.35 3.37 3.39 3.41 3.43 3 44 3.46 3 47 3 47 3.47 3 47 20 2.95 3.10 3.18 3 25 3.30 3 34 3.30 3.38 3 40 3.43 3.44 3.46 3 46 3.47 3 47 3 47 22 2.93 3 08 3.17 3 24 3 29 3.32 3.35 3.37 3.30 3.42 3 44 3.45 3.46 3 47 3 47 3 47 24 2.92 3.07 3.15 3.22 3.28 3.31 3.34 3.37 3 38 3.41 3 44 3.45 3 46 3.47 3 47 3 47 20 2.91 3.06 3.14 3.21 3.27 3.30 3.34 3.30 3.38 3.41 3 43 3.45 3.46 3.47 3 47 3 47 28 2.90 3.04 3 13 3.20 3.26 3.30 3.33 3.35 3.37 3.40 3.43 3.45 3 46 3 47 3 47 3 47 :to 2.89 3.04 3.12 3.20 3 25 3.29 3 32 3.35 3.37 3.40 3.43 3.44 3.46 3.47 3.47 3.47 40 2.86 3.01 3.10 3.17 3.22 3 27 3.30 3.33 3.35 3.39 3.42 3.44 3.46 3.47 3.47 3.47 no 2.83 2.98 3.08 3.14 3.20 3.24 3.28 3.31 3 33 3.37 3.40 3.43 3 45 3.47 3 48 3.48 100 2.80 2.95 3 .05 3.12 3.18 3.22 3.26 3.29 3.32 3.36 3.40 3.42 3 45 3.47 3 53 3.53 oo 2.77 2.92 3.02 3.09 3.15 3.19 3.23 3.2U 3.29 3.34 3.38 3 41 3.44 3.47 3.61 3.67 ♦Using special protection levels based on degrees of freedom. This tabic was reprinted from Biometrics , Vol. 11 with the permission of the Biometric Society and author D.B. Duncan. 406 Percentage Points of the Studentized Range, q=(x n -X!)/s v . Ujiper 10% points X 2 3 4 5 6 7 8 9 10 1 8-93 13-44 16-36 18-49 20-15 21-51 22-64 23-62 24-48 2 413 5-73 6-77 7-64 8-14 8-63 905 9-41 9-72 3 3-33 4-47 5-20 5-74 616 6-51 6-81 706 7-29 4 301 3-98 4-59 603 5-39 5-68 5-93 6-14 6-33 5 2-85 3-72 4-26 4-66 4-98 6-24 5-46 5-65 6-82 6 2-75 3-56 407 4-44 4-73 4-97 617 5-34 5-60 7 2-68 3-45 3-93 4-28 4-55 4-78 4-97 514 5-28 8 2-63 3-37 3-83 417 4-43 4-65 4-83 4-99 5-13 9 2-59 3-32 3-76 4-08 4-34 4-54 4-72 4-87 601 10 2-56 3-27 3-70 402 4-26 4-47 4-64 4-78 4-91 11 2-54 3-23 3-66 3-96 4-20 4-40 4-57 4-71 4-84 12 2-52 3-20 3-62 3-92 416 4-35 4-51 4-65 4-78 13 2-50 318 3-59 3-88 412 4-30 4-46 4-60 4-72 14 2-49 3-16 3-56 3-85 4-08 4-27 4-42 4-66 4-68 15 2-48 3-14 3-54 3-83 405 4-23 4-39 4-52 4-64 16 2-47 312 3-52 3-80 403 4-21 4-36 4-49 4-61 17 2-46 311 3-50 3-78 400 4-18 4-33 4-46 4-58 18 2-45 3-10 3-49 3-77 3-98 4-10 4-31 4-44 4-55 19 2-45 309 3-47 3-75 3-97 4-14 4-29 4-42 4-53 20 2-44 308 3-46 3-74 3-95 4-12 4-27 4-40 4-51 24 2-42 3-05 3-42 3-69 3-90 407 4-21 4-34 4-44 30 2-40 3-02 3-39 3-65 3-85 402 4-16 4-28 4-38 40 2-38 2-99 3-35 3-60 3-80 3-96 4-10 4-21 4-32 60 2-36 2-96 3-31 3-56 3-75 3-91 4-04 416 4-25 120 2-34 2-93 3-28 3-52 3-71 3-86 3-99 410 4-19 00 2-33 2-90 3-24 3-48 3-66 3-81 3-93 404 4-13 X 11 12 13 14 15 16 17 18 19 20 1 25-24 25-92 26-54 27-10 27-62 2810 28-54 28-96 29-35 29-71 2 10-01 10-26 10-49 10-70 10-89 1107 11-24 11-39 11-54 11-68 3 7-49 7-67 7-83 7-98 812 8-25 8-3T 8-48 8-58 8-68 4 6-49 6-65 6-78 6-91 702 713 7-23 7-33 7-41 7-50 5 6-97 610 6-22 6-34 6-44 6-54 6-63 6-71 6-79 6-86 6 5-64 5-76 5-87 6-98 607 616 6-25 6-32 6-40 6-47 7 6-41 5-53 5-64 5-74 5-83 5-91 5-99 606 613 619 8 5-25 5-36 5-46 5-56 5-64 5-72 6-80 5-87 6-93 600 9 513 5-23 6-33 5-42 5-51 5-58 5-66 5-72 6-79 5-85 10 503 513 6-23 5-32 5-40 5-47 5-54 5-61 5-67 5-73 11 4-95 5-05 515 5-23 5-31 6-38 5-45 5-51 5-57 5-63 12 4-89 4-99 6-08 5-16 6-24 5-31 5-37 5-44 6-49 5-55 13 4-83 4-93 6-02 510 518 6-25 5-31 6-37 5-43 5-48 14 4-79 4-88 4-97 5-05 512 5-19 5-26 5-32 5-37 5-43 15 4-75 4-84 4-93 601 5-08 5-15 5-21 6-27 5-32 5-38 16 4-71 4-81 4-89 4-97 604 611 6-17 5-23 5-28 5-33 17 4-68 4-77 4-86 4-93 501 507 5-13 6-19 5-24 5-30 18 4-65 4-75 4-83 4-90 4-98 5-04 510 516 5-21 5-28 19 4-63 4-72 4-80 4-88 4-95 501 507 513 5-18 5-23 20 4-61 4-70 4-78 4-85 4-92 4-99 505 5- 10 616 5-20 24 4-54 4-63 4-71 4-78 4-85 4-91 4-97 502 507 512 30 4-47 4-56 4-64 4-71 4-77 4-83 4-89 4-94 4-99 603 40 4-41 4-49 4-56 4-63 4-69 4-75 4-81 4-86 4-90 4-95 60 4-34 4-42 4-49 4-56 4-62 4-67 4-73 4-78 4-82 4-86 120 4-28 4-35 4-42 4-48 4-54 4-60 4-65 4-69 4-74 4-78 00 4-21 4-28 4-35 4-41 4-47 4-52 4-67 4-61 4-65 4-69 n: size of sample from which range obtained, v. degrees of freedom of independent a,. 407 Percentage Points of the Studentized Range, q=(x n -x 1 )/s v . (continued) Upper 5 % points X 2 3 4 5 6 7 8 9 10 1 17-97 26-98 32-82 3708 40-41 4312 45-40 47-36 4907 2 6-08 8-33 9-80 10-88 11-74 12-44 1303 13-54 13-99 3 4-50 5-91 6-82 7-50 804 8-48 8-85 918 9-46 4 3-93 504 5-76 6-29 6-71 7-05 7-35 7-60 7-83 5 3-64 4-60 5-22 5-67 6-03 6-33 6-58 6-80 6-99 6 3-46 4-34 4-90 5-30 8-63 5-90 6-12 6-32 6-49 7 3-34 4-16 4-68 508 5-36 5-61 5-82 6-00 616 8 3-26 4-04 4-53 4-89 5-17 5-40 5-60 5-77 5-92 9 3-20 3-95 4-41 4-76 502 5-24 5-43 5-59 5-74 10 3-15 3-88 4-33 4-63 4-91 5-12 5-30 5-46 5-60 11 3-11 3-82 4-26 4-57 4-82 5-03 5-20 5-35 5-49 12 3-08 3-77 4-20 4-51 4-75 4-95 512 5-27 5-39 13 306 3-73 415 4-45 4-69 4-SS 505 5-19 5-32 14 3-03 3-70 4-11 4-41 4-64 4-83 4-99 5-13 5-25 15 301 3-67 4-08 4-37 4-59 4-78 4-94 5-08 5-20 16 300 3-65 405 4-33 4-56 4-74 4-90 5-03 5- 15 17 2-98 3-63 402 4-30 4-52 4-70 4-86 4-99 511 18 2-97 3-61 400 4-23 4-49 4-67 4-82 4-96 507 19 2-96 3-59 3-98 4-25 4-47 4-65 4-79 4-92 504 20 2-95 3-58 396 4-23 4-45 4-62 4-77 4-90 501 24 2-92 3-53 3-90 4-17 4-37 4-54 4-68 4-81 4-92 30 2-89 3-49 3-85 4-10 4-30 4-46 4-60 4-72 4-82 40 2-86 3-44 3-79 4-04 4-23 4-39 4-52 4-63 4-73 60 2-83 3-40 3-74 3-98 416 4-31 4-44 4-55 4-65 120 2-80 3-36 3-68 3-92 410 4-24 4-36 4-47 4-56 00 2-77 3-31 3-63 3-86 403 4-17 4-29 4-39 4-47 X 11 12 13 14 15 16 17 18 19 20 1 50-59 51-96 53-20 54-33 55-36 56-32 57-22 58-04 58-83 59-56 2 14-39 14-75 1508 15-38 15-65 15-91 16- 14 16-37 16-57 16-77 3 9-72 9-95 1015 10-35 10-52 10-69 10-84 10-98 11-11 11-24 4 803 8-21 8-37 8-52 8-66 8-79 8-91 903 913 9-23 5 7-17 7-32 7-47 7-60 7-72 7-83 7-93 803 812 8-21 6 6-65 6-79 6-92 703 7-14 7-24 7-34 7-43 7-51 7-59 7 6-30 6-43 6-55 6-66 6-76 6-85 6-94 7-02 7-10 7-17 8 605 6-18 6-29 6-39 6-48 6-57 6-65 6-73 6-80 6-87 9 5-87 5-98 609 6-19 6-28 6-36 6-44 6-51 6-58 6-64 10 5-72 6-83 5-93 603 6-11 6- 19 6-27 6-34 6-40 6-47 11 5-61 5-71 5-81 5-90 5-98 606 6-13 6-20 6-27 6-33 12 6-51 5-61 5-71 5-80 5-88 6-95 602 609 6-15 6-21 13 5-43 6-53 5-63 6-71 5-79 6-86 5-93 5-99 605 6-11 14 5-36 6-46 5-55 5-64 5-71 5-79 5-85 5-91 5-97 603 15 5-31 5-40 5-49 6-57 5-65 5-72 5-78 5-85 5-90 6-96 16 5-26 6-35 6-44 5-52 5-59 6-66 6-73 5-79 5-84 5-90 17 5-21 5-31 539 5-47 5-54 5-61 5-67 6-73 5-79 5-84 18 5-17 5-27 5-35 5-43 5-50 5-57 5-63 5-69 5-74 6-79 19 5-14 6-23 5-31 5-39 5-46 5-53 5-59 6-65 5-70 6-75 20 5-11 5-20 5-28 5-36 5-43 5-49 5-55 5-61 5-66 6-71 24 501 6-10 5-18 5-25 5-32 5-38 5-44 5-49 5-55 5-59 30 4-92 500 5-08 5-15 5-21 6-27 5-33 6-38 5-43 6-47 40 4-82 4-90 4-98 504 5-11 516 5-22 6-27 5-31 6-36 60 4-73 4-81 4-88 4-94 500 606 6-11 5-15 5-20 5-24 120 4-64 4-71 4-78 4-84 4-90 4-95 5-00 504 509 6-13 00 4-55 4-62 4-68 4-74 4-80 4-86 4-89 4-93 4-97 6-01 n: size of sample from which range obtained, v. degrees of freedom of independent *,. 408 Percentage Points of the Studentized Range, q=(x n -x,)/s v . (continued) Upper 1 % points \ n V \ 2 3 4 5 6 7 8 9 10 1 9003 1350 164-3 1856 202-2 215-8 227-2 2370 245-6 2 1404 1902 22-29 24-72 26-63 28-20 29-53 30-68 31-69 3 8-26 10-62 1217 13-33 14-24 15-00 15-64 16-20 16-69 4 6-51 8-12 9-17 9-96 10-58 1110 11-55 11-93 12-27 5 5-70 6-98 7-80 8-42 8-91 9-32 9-67 9-97 10-24 6 5-24 6-33 703 7-56 7-97 8-32 8-61 8-87 9-10 7 4-95 5-92 6-54 7-01 7-37 7-68 7-94 8-17 8-37 8 4-75 5-64 6-20 6-62 6-96 7-24 7-47 7-68 7-86 9 4-60 5-43 5-96 6-35 6-66 6-91 713 7-33 7-49 10 4-48 5-27 5-77 6- 14 6-43 6-67 6-87 7-05 7-21 11 4-39 5- 15 5-62 5-97 6-25 6-48 6-67 6-84 6-99 12 4-32 5-05 5-50 5-84 6-10 6-32 6-51 6-67 6-81 13 4-26 4-96 5-40 5-73 5-98 6- 19 6-37 6-53 6-07 14 4-21 4-89 5-32 5-63 5-88 608 6-26 6-41 6-54 15 417 4-84 6-25 5-56 5-80 5-99 6-16 6-31 6-44 16 413 4-79 5- 19 5-49 5-72 5-92 6-08 6-22 6-35 17 410 4-74 5-14 5-43 5-66 5-85 601 6-15 6-27 18 4-07 4-70 509 5-38 6-60 5-79 5-94 6-08 6-20 19 4-05 4-67 505 5-33 5-55 5-73 6-89 6-02 6- 14 20 402 4-64 5-02 5-29 5-51 5-69 6-84 5-97 609 24 3-96 4-55 4-91 517 5-37 5-54 5-69 5-81 5-92 30 3-89 4-45 4-80 505 5-24 5-40 5-54 5-65 5-76 40 3-82 4-37 4-70 4-93 511 5-26 5-39 5-50 5-60 60 3-76 4-28 4-59 4-82 4-99 513 6-25 5-36 5-45 120 3-70 4-20 4-50 4-71 4-87 5-01 612 5-21 5-30 oo 3-64 4-12 4-40 4-60 4-76 4-88 4-99 508 5-16 \ n v \ 11 12 13 14 15 16 17 18 19 20 1 253-2 2600 266-2 271-8 277-0 281-8 286-3 290-4 294-3 2980 2 32-59 33-40 3413 34-81 35-43 3600 36-53 •3703 37-50 37-95 3 1713 17-53 17-89 18-22 18-52 18-81 19-07 19-32 19-55 19-77 4 12-57 12-84 1309 13-32 13-53 13-73 13-91 14-08 14-24 14-40 5 10-48 10-70 10-89 11-08 11-24 11-40 11-55 11-68 11-81 11-93 6 9-30 9-48 9-65 9-81 9-95 1008 10-21 10-32 10-43 10-54 7 8-55 8-71 8-86 900 9- 12 9-24 9-35 946 9-55 9-65 8 8-03 8-18 8-31 8-44 8-55 8-66 8-76 8-85 8-94 903 9 7-65 7-78 7-91 803 8-13 8-23 8-33 8-41 8-49 8-57 10 7-36 7-49 7-60 7-71 7-81 7-91 7-99 8-08 815 8-23 11 713 7-25 7-36 7-46 7-56 7-65 7-73 7-81 7-88 7-95 12 6-94 706 7-17 7-26 7-36 7-44 7-52 7-59 7-66 7-73 13 6-79 6-90 7-01 710 7-19 7-27 7-35 7-42 7-48 7-55 14 6-66 6-77 6-87 6-96 7-05 713 7-20 7-27 7-33 7-39 15 6- 55 6-66 6-76 6-34 6-93 700 7-07 714 7-20 7-26 16 6-46 6-56 6-66 6-74 6-82 6-90 6-97 703 7 09 715 17 6-38 6-48 6-57 6-66 6-73 6-81 6-87 6-94 7-00 7-05 18 6-31 641 6-50 6-58 6-65 6-73 6-79 6-85 6-91 6-97 19 6-25 6-34 643 6-51 6-58 6-65 6-72 6-78 6-84 6-89 20 6-1'J 6-28 6-37 6-45 6-52 6-59 6-65 6-71 6-77 6-82 24 602 6-11 6- 19 6-26 6-33 6-39 6-45 6-51 6-56 6-61 30 5- 8 j 5-93 601 6-08 614 6-20 6-26 631 6-36 6-41 40 5-69 5-76 5-83 5-90 5-96 602 607 6-12 616 6-21 60 5-53 5-60 5-67 5-73 5-78 5-84 5-89 5-93 6-97 601 120 5-37 5-44 5-50 5-56 5-61 5-66 5-71 5-75 5-79 5-83 oo 5-23 5-29 5-35 5-40 5-45 5-49 5-54 5-57 5-61 5-65 This table was reprinted from Biometrika Tables for Statisticians, Vol. 1, 3rd Edition, Table 29, with the permission of the Biometrika Trustees. The Normal Probability Function 409 The integral P(X) and ordinate Z(X) in terms of the standardized deviate X HX) ■00 01 •02 •03 •04 •06 •06 ■07 ■08 ■09 •10 ■11 •It •IS •14 •16 •16 ■17 ■18 ■19 •SO •tl •It ■ts Si S6 S6 S7 S8 S9 ■SO •SI •31 S3 ■S4 ■35 ■S6 37 ■38 •39 ■40 •41 ■4* ■43 ■44 ■45 •46 ■47 ■48 ■49 ■60 •5000000 •5039S94 ■6079783 •5119665 •5159534 •6199388 •6239222 •5279032 •5318814 •5358564 •5398278 •6437953 •5477684 •5517168 •6556700 •5596177 •5635595 •5674949 •5714237 •5753454 •6792597 ■6831662 •5870644 •5909541 •5948349 ■5987063 •6025681 •6064199 •6102612 •6140919 •6179114 ■6217195 •6255153 •6293000 •6330717 •6368307 •6405764 •6443088 •6480273 •6517317 ■6654217 •6590970 •6627573 •6664022 •6700314 •6730448 •6772419 ■6808225 •6843863 •6879331 •6914625 6 + 39894 39890 39882 39870 39854 39834 39810 39782 39750 39714 39075 39631 39584 39532 39477 39418 39355 39288 39217 39143 39065 38983 38897 38808 38715 38618 38518 38414 38306 38195 38081 379C3 37842 37717 37589 37468 37323 37185 37044 36900 36753 36602 36449 36293 36133 35971 35806 35638 35467 35294 4 8 12 16 20 24 28 32 36 40 44 48 51 65 69 63 67 71 74 78 86 89 93 97 100 104 107 111 114 118 121 125 128 131 135 138 141 144 147 160 153 156 169 162 165 168 171 173 176 Z(X) •3989423 •3989223 •3988625 •3987628 •3986233 ■3984439 ■3982248 •3979661 •3976677 •3973298 •3969525 •3965360 •3960802 •3956854 •3950517 •3944793 •3938684 •3932190 ■3925315 •3918060 ■3910427 •3902419 •3894038 •3885286 •3876166 ■38G6681 •3856834 ■3846627 •3836063 •3825146 •3813878 •38022C4 •3790305 •3778007 •3705372 •3752403 ■3739106 •3725483 •3711539 •3697277 ■3682701 ■3667817 ■3652627 ■3637136 ■3621349 •3605270 •3588903 •3572253 •3565325 •3538124 •3520653 199 598 997 1395 1793 2191 2588 2984 3379 3773 416G 4558 4948 5337 5724 6110 6493 C875 7255 7633 8008 8381 8752 9120 9485 9847 10207 10564 10917 11268 11615 11 958 12298 12635 12968 13297 13623 13944 14262 14575 14886 15190 15491 15787 16079 16367 16650 16928 17202 17470 P 399 399 399 398 398 397 397 39G 395 394 393 392 390 389 387 386 384 382 380 378 375 373 371 368 365 362 357 354 350 347 344 340 337 333 329 325 322 318 313 309 305 301 296 292 288 278 274 269 264 ■50 ■51 ■5t ■53 ■54 ■55 ■56 ■57 ■58 ■59 ■60 ■61 ■6S •6S ■64 ■65 •66 ■67 ■68 ■69 10 •71 •7t •73 14 •75 •76 •77 •78 ■79 ■80 ■81 ■8S ■83 ■84 ■85 ■86 •87 ■88 ■89 ■90 ■91 ■91 ■93 ■94 •95 •96 ■97 •98 99 1-00 P(X) •6914625 •6949743 •6984682 •7019440 •7054015 •71)88403 •7122603 •7156612 •7190-127 •7224047 •7257469 •7290691 •7323711 •7356527 •7389137 ■7421539 •7453731 •7485711 •7517478 •7549029 •7580363 •7611479 •7642375 7673049 •7703500 •7733726 •7763727 •7793501 •7823046 •7852361 •7881446 •7910299 •7938919 •7967306 ■7995458 •8023375 •8051055 ■8078498 •8105703 •8132671 •8159399 ■8185887 •6212136 ■8238145 •8263912 ■6289439 ■8314724 •8339768 ■8364569 •8389129 •8413447 i + 35118 34939 34758 34574 34388 34200 34009 33815 33620 33422 33222 33020 32816 32610 32402 32192 31930 31767 31551 31334 31116 30896 30674 30451 30226 30001 29773 29545 29316 29085 28853 28620 28387 28152 27917 27680 27443 27205 26967 26728 26489 26249 26008 25768 25527 25285 26044 24802 24560 24318 176 179 181 184 186 189 191 193 196 198 200 202 204 206 208 210 212 214 215 217 219 220 222 223 225 226 227 228 230 231 232 233 234 235 235 236 237 238 238 239 239 240 240 241 241 241 242 242 242 242 242 Z(X) = e-»*7V(2ff), P(X) =\-Q{X) -11 Z{u) du. 410 The Normal Probability Function (continued) Z(X) •3520653 •3502919 ■3484925 •3466677 •3448180 ■3429439 •3410458 ■3391243 •3371799 -3352132 •3332246 •3312147 •3291840 •3271330 •3250623 •3229724 •3208638 •3187371 •3165929 3144317 •3122539 •3100603 •3078513 •3056274 •3033893 •3011374 ■2988724 ■2965948 •2943050 2920038 •2896916 •2873689 •2850364 •2826945 ■2803438 ■2779849 •2756182 •2732444 •2708640 •2684774 •2660852 •2636880 •2612863 •2588805 •2564713 ■2540591 •2516443 •2492277 •2468095 •2443904 ■2419707 17734 17994 18248 18497 18741 18981 19215 19444 19667 19886 20099 20307 20510 20707 20899 21086 21267 21442 21613 21777 21936 22090 22239 22381 22519 22650 22777 22897 23013 23122 23227 23325 23419 23507 23589 23666 23738 23805 23866 23922 83972 24017 24058 24093 24122 24147 24167 24182 24191 24196 264 259 254 249 244 239 234 229 224 219 213 208 203 197 192 187 181 176 170 165 159 154 148 143 137 132 126 121 115 110 104 99 93 88 63 77 72 66 61 56 51 45 40 35 30 25 20 15 10 5 100 1-01 102 V03 1-04 V05 106 1-07 108 109 110 111 lit IIS 1H 115 1-16 1-17 1-18 119 ISO 1*1 vn 1-23 V24 1-25 1-26 1-27 128 1-29 ISO 1-31 1-32 133 134 1-35 1-36 1ST 138 1-39 Vlfi 11,1 1-lfi 143 l-U 1-45 1-46 1-47 1-48 1-49 1-50 P(X) ■8413447 •8437524 •8461358 •8484950 •8508300 •8531409 •8554277 ■8576903 ■8599289 ■8621434 •8643339 •8665005 •8686431 •8707619 •8728568 ■8749281 ■8769756 •8789995 •8809999 •8829768 ■8849303 ■8868606 •8887676 •8906514 •8925123 •8943502 •8961653 •8979577 •8997274 •9014747 •9031995 •9049021 •9065825 •9082409 •9098773 •9114920 •9130850 •9146565 •9162067 •91773515 ■9192433 ■9207302 •9221962 •9236415 •9250663 •9264707 •9278550 ■9292191 ■9305634 ■9318879 •9331928 6 •+ 24076 23834 23592 23351 23109 22868 22626 22386 22145 21905 21665 21426 21188 20950 20712 20475 20239 20004 19769 19535 19302 19070 18839 18609 18379 18151 17924 17697 17472 17248 17026 16804 16584 16365 16147 15930 15715 15501 15289 15078 14868 14660 14453 14248 14044 13842 13642 13443 13245 13049 8* 242 242 242 242 242 241 241 241 240 240 240 239 239 238 237 237 236 235 235 234 233 232 231 230 229 228 227 226 225 224 223 222 220 219 218 217 215 214 212 211 210 208 207 205 204 202 201 199 197 196 194 Z(X) •2419707 •2395511 •2371320 •2347138 •2322970 ■2298821 •2274696 •2250599 •2226535 •2202508 •2178522 2154582 2130691 2106856 2083078 2059363 2035714 2012135 1988631 1965205 1941861 1918602 1895432 1872354 1849373 1826491 1803712 1781038 1758474 1736022 1713686 1691468 1669370 1647397 1625551 1603833 1582248 1560797 1539483 1518308 1497275 1476385 1455641 1435046 1414600 1394306 1374165 1354181 1334353 1314684 1295176 24196 24191 24182 24168 24149 24126 24097 24064 24027 23986 23940 23890 23836 23778 23715 23649 23578 23504 23426 23344 23259 23170 23077 22981 22882 22779 22673 22564 22452 22337 22218 22097 21973 21847 21717 21585 21451 21314 21175 21033 20890 20744 20596 20446 20294 20140 19985 19828 19669 19508 + 5 10 14 19 24 28 33 37 41 46 00 54 68 62 66 70 74 78 82 85 89 93 96 99 103 106 109 112 115 118 121 124 127 129 132 134 137 139 142 144 146 148 150 152 164 155 157 159 160 162 Note sign of second difference, 8*. The Normal Probability Function (continued) 411 X P(X) S + a» Z(X) 1*0 •9331928 12855 12662 12471 1228S 12094 11908 194 •1295176 1*1 ■9344783 193 •1275830 l*t -9357445 191 •1256646 1*3 -9369916 189 •1237628 1*4 ■9382198 188 •1218775 1-55 ■9394292 186 •1200090 1*6 ■9406201 11724 11541 11360 11181 11004 184 •1181573 1*7 ■9417924 183 •1163225 1*3 ■9429466 181 •1145048 1*9 ■9440826 179 •1127042 1*0 •9452007 177 •1109208 1*1 •9463011 10828 10654 10482 10311 10142 176 •1091548 l*t •9473839 174 •1074061 1*3 ■9484493 172 •1056748 1*4 ■9494974 170 •1039611 1*5 ■9605285 169 ■1022649 1*6 ■9515428 9975 9810 9647 9485 9326 167 •1005864 1*7 ■9525403 165 ■0989255 1*3 ■9535213 163 •0972823 1*9 ■9544860 162 ■0956568 1-70 ■9554345 160 -0940491 1-71 ■9563671 9167 9011 8856 8704 8553 158 ■0924591 17t -9572838 156 ■0908870 1-73 ■9581849 155 •0893326 1-74 •9590705 153 ■0877961 1-76 -9599408 151 ■0862773 1-76 ■9607961 8403 8256 6110 7966 7824 149 ■0847764 ITT ■9616364 147 -0832932 178 •9624620 146 •0818278 1-79 •9632730 144 ■0803801 1*0 ■9640697 142 ■0789502 181 -9648521 7684 7545 7409 7273 7140 140 ■0775379 V8t •9656205 139 •0761433 1*3 •9663750 137 0747663 1*4 •9671159 135 •0734068 1*5 •9678432 133 O720C49 186 ■9685572 7009 6879 6751 6624 6500 132 ■0707404 1*7 ■9692581 130 0694333 1*8 ■9699460 128 0681436 1*9 •9706210 126 •0668711 V90 •9712834 125 0656158 V91 •9719334 6377 6255 6136 6018 5902 123 0643777 1-99 •9725711 121 0631566 V93 ■9731966 120 0619524 1*4 •9738102 118 0607602 1-95 -9744119 116 0595947 1-96 9750021 6787 6674 6563 5453 115 0584409 1-97 •9755808 113 0573038 1-98 •9761482 in 0561831 1-99 •9767045 110 0550789 too •9772499 108 0539910 19346 19183 19018 18853 18685 18517 18348 18177 18006 17834 17661 17487 17312 17137 16962 16786 16609 16432 16255 16077 15899 15722 15544 15366 15188 15010 14832 14654 14477 14300 14123 1394ft 13770 13594 13419 13245 13071 12897 12725 12553 12382 12211 12041 11873 11705 11538 11372 11206 11042 10879 + 162 163 165 166 167 168 169 170 171 172 173 174 174 175 176 176 177 177 177 178 178 178 178 178 178 178 178 178 177 177 177 176 176 176 175 175 174 173 173 172 171 170 170 169 168 167 166 165 164 163 162 X P{X) 4 + S* too ■9772499 6345 5239 5134 6031 4929 4829 108 toi •9777844 106 tot •9783083 105 tos •9788217 103 104 •9793248 102 tos ■9798178 100 t-06 ■9803007 4731 4634 4539 4445 4352 98 tort •9807738 97 t-08 •9812372 95 t-09 •9816911 94 t-io •9821356 92 til •9825708 4262 4172 4084 3998 3913 91 tit •9829970 89 113 •9834142 88 t-14 ■9838226 86 tis •9842224 85 tie ■9846137 3829 3747 3666 3587 3509 84 tn ■9849966 82 t-18 •9853713 81 tl9 •9857379 79 tto -9860966 78 ttl -9864474 3432 3357 3283 3210 3138 77 ttt •9867906 75 tts •9871263 74 t*4 •9874545 73 tts •9877755 71 t-t6 ■9880894 3068 2999 2932 2865 2800 70 tn ■9883962 69 tts ■9886962 68 t-29 ■9889893 66 t*o ■9892759 65 t*l •9895559 2736 2674 2612 2552 2492 64 tst ■9898296 63 1*3 ■9900969 62 t*4 ■9903581 60 t*5 •9906133 59 t*6 •9908625 2434 2377 2321 2267 2213 68 t*7 ■9911060 67 t*8 -9913437 56 $39 •9915758 65 t-40 •9918025 54 t-41 •9920237 2160 2108 2058 2008 1960 63 t'42 •9922397 52 t-43 •9924506 51 t-44 •9926564 50 t-45 •9928572 49 t-46 "9930531 1912 1865 1820 1775 48 247 •9932443 47 S-48 ■9934309 46 t-49 ■9936128 45 t*0 ■9937903 44 Z(X) = e-«^/ v '(27r), P(X) = 1 - Q(X) J -00 )du. 412 The Normal Probability Function (continued) Z{X) ■0539910 ■0529192 •0518636 ■0508239 •0498001 ■0487920 ■0477996 ■0468226 •0458611 •0449148 ■0439836 ■0430674 •0421661 ■0412795 •0404076 •0395500 •0387069 •0378779 •0370629 •0362619 •0354746 •0347009 •0339408 0331939 ■0324603 •0317397 •0310319 ■0303370 029«546 ■0289847 •0283270 •0276816 •0270481 •0264265 •0258166 •0252182 ■0246313 ■0240556 ■0234910 O220374 ■0223945 ■0218624 O213407 ■021 18294 11203284 ■0198374 0193563 ■0188850 •0184233 •0179711 •0175283 10717 10557 10397 10238 10081 9924 9769 9616 9463 9312 9162 9013 8866 8720 8575 8432 8290 8149 8010 7873 7737 7602 7468 7337 7206 7077 6950 6824 6699 6576 6455 6335 6216 6009 6984 6870 5757 6646 5536 5428 6322 6817 6113 6011 4910 4811 4713 4617 4522 4428 + 162 161 160 159 157 156 155 154 153 161 150 149 147 146 145 143 142 140 139 138 136 135 133 132 130 129 1S7 126 125 123 122 120 119 117 116 114 113 HI 110 108 107 105 104 102 101 99 98 96 95 93 92 t-50 tSl SSi t-53 t-5+ t-55 t-58 t-57 t-58 t-59 teo tei t-6t t-63 t-6* S-65 t-66 t-67 1-68 tH9 t-70 til t-72 t-73 en t-75 t-76 t-77 g-78 t-79 t-80 t-81 t8t t-83 i-84 t-85 t-86 t-87 t-88 t-89 t-90 t-91 t-9t t-93 t-91, t-95 toe t-97 t-98 2-99 300 P(X) •9937903 ■9939634 ■9941323 ■9942969 ■9944574 ■9946139 ■9947664 ■9949151 ■9950600 •9952012 ■9953388 •9954729 •9956035 •9957308 •9958547 ■9969754 ■9960930 ■9962074 ■9963189 •9964274 •9965330 •9966358 •9967359 •9968333 •9909280 ■9970202 •9971099 •9971972 •9972821 ■9973646 ■9974449 •9975229 ■9975988 ■0976726 •9977443 ■9978140 ■9978818 ■9979476 •9980116 ■9980738 ■9981342 ■9981929 ■9982498 •93830.-12 •90*3589 -9984111 ■9984618 •9985110 •9985588 ■9986051 ■998G501 t + 1731 1688 1646 1606 166S 1626 1487 1449 1412 1376 1341 1306 1272 1230 1207 1176 1146 1115 1085 1050 1028 1001 974 948 922 897 873 849 825 803 781 759 738 717 697 678 658 640 622 604 687 570 653 637 622 607 492 478 464 450 44 43 42 41 40 39 39 38 37 36 35 35 34 33 32 32 31 30 29 29 28 27 27 26 26 25 24 24 23 23 22 22 21 21 20 20 19 19 18 18 17 17 16 16 16 16 15 14 14 14 13 Z(X) •0175283 ■0170947 ■0166701 •0162545 ■0158476 0154493 0150596 0146782 0143051 0139401 0135830 Ol 32337 Ol 28921 0125581 0122316 OH9122 •0116001 Ol 12951 0109969 Ol 07056 ■0104209 O101428 0098712 •00960.-.8 O0934C0 0090936 0088465 O086058 O083f.07 0081308 O079155 O076965 0074829 0072744 0070711 0068728 O066793 0064907 O063067 0061274 0059525 O067821 0060160 ■0051541 O052963 O061426 0049929 0048470 0047050 O045066 0044318 4336 4246 4167 4069 3982 3897 3814 3731 3650 3571 3493 3416 3340 3266 3193 3121 3051 2981 2013 2«47 2781 2717 2G54 2502 2631 •471 2413 2355 2209 2244 2189 2136 2084 2033 1983 1934 1886 1839 1793 1748 1704 1661 1619 1578 1537 1497 1459 1421 1384 1347 + 92 91 89 88 60 86 84 82 81 80 78 77 76 74 73 72 70 69 68 67 66 64 C3 62 61 60 69 57 56 55 54 63 52 51 60 49 48 47 46 45 43 42 41 40 40 39 38 37 36 35 Note sign of second difference, 8*. 413 The Normal Probability Function (continued) X P{X) 8 + <$• Z(X) i 8* + X P(X) s + 8* s-oo •9986501 437 424 411 399 387 375 13 •0044318 1312 1277 1243 1210 1178 1146 35 3-50 •9997674 3 301 •9986938 13 •0043007 35 3-51 •9997759 86 3 S-02 •9987361 13 •0041729 34 SSI •9997842 83 3 SOS •9987772 12 •0040486 33 3-53 •9997922 80 3 3-04 ■9938171 12 0039276 32 3-54 ■9997999 77 3 3-05 ■9988558 12 •0038098 32 355 •9998074 74 72 3 3-06 •9988933 364 353 342 332 322 11 ■0036951 1115 1085 1056 1027 999 31 356 -9998146 3 3-07 •9989297 11 •0035836 30 357 ■9998215 69 2 3-08 •9989650 11 ■0034751 29 3-58 •9998282 67 2 309 ■9989992 10 •0033695 29 359 •9998347 65 2 3-10 •9990324 10 O0326G8 28 360 9998409 62 60 2 3:11 •9990646 312 302 293 284 276 10 •0031669 971 944 918 893 668 27 361 9998469 2 Sit •9990957 10 •0030698 27 S-6S •9998527 58 2 313 ■9991260 9 •0029754 26 3-63 ■9998583 56 2 3H •991)1553 9 •0028835 26 3-64 •9998637 54 2 SIS •9991836 9 •0027943 25 3-65 ■9998689 52 60 2 3-16 •9992112 267 258 250 242 235 9 •0027076 843 820 797 774 752 24 3-66 •9998739 48 2 317 ■9992378 8 •0026231 24 3-67 •9998787 2 318 ■9992636 8 ■0025412 23 3-68 •9998834 47 2 319 ■9992886 8 ■0024615 23 369 •9998879 45 43 42 2 3-20 ■9993129 8 ■0023341 22 3-70 •9996922 2 SSI ■9993363 227 220 213 206 200 ■0023089 731 710 689 669 650 21 3-71 •9998964 40 39 37 36 35 2 331 •9993590 •0022358 21 3-79 ■9999004 383 •9993810 ■0021649 20 3-73 -9999043 3-H -9994024 •0020960 20 3-74 ■99990B0 385 -9994230 -0020290 19 375 ■9999116 sue •9994429 193 187 181 175 169 6 •0019641 631 612 695 677 660 19 3-76 ■9999150 33 32 31 30 29 sun ■9994623 6 ■0019010 18 3-77 •9999184 3S8 •9994810 6 ■0018397 18 3-78 ■9999216 SS9 ■9994991 6 •0017803 17 3-79 ■9999247 3-30 -9995166 6 •0017226 17 3-80 •9999277 SSI ■9995335 164 159 153 148 143 6 ■0016666 643 627 612 496 481 17 381 ■9999305 28 27 26 25 24 331 ■9995499 5 -0016122 16 383 -9999333 333 ■9995658 6 ■0015695 16 383 •9999359 3-34 ■9995811 ■0016084 15 384 •9999385 3-85 -9995959 ■0014587 15 S-85 ■9999409 3-36 -9996103 139 134 130 125 121 O014106 467 453 439 426 413 16 3-86 •9999433 23 22 21 20 19 3-37 -9996242 ■0013639 14 387 •9999456 338 •9996376 •0013187 14 3-88 0999478 339 •9996505 ■0012748 13 3-89 O099499 3-40 -9996631 •0012322 13 3-90 -9999519 3-41 ■9990752 117 113 109 106 102 001 18M) 400 388 376 364 353 13 3-91 -9999539 19 18 17 17 16 3-4* ■9990869 ■0011510 12 3-9t 0999557 3-43 •999C982 ■0011122 12 3-93 O990575 3-u ■9997091 ■0010747 12 394 ■9990593 3-45 ■9997197 -0010383 n 396 ■9999609 3-46 ■9997299 99 95 92 89 3 -0010030 342 331 320 310 11 3-96 0999625 16 15 14 14 3-47 •9997398 3 0009689 11 3-97 •9999641 3-48 ■9997493 3 0009358 10 3-98 •9999655 3-49 •9997586 3 0009037 10 3-99 400 | 0999670 3-60 •9997674 3 0008727 10 0999683 Z(X) = e-«*7V(2ff), P(X)=l-<UX)=j X Z{u)du. 414 The Normal Probability Function (continued) Z(X) •0008727 •0008426 •0008135 •0007853 ■0007681 •0007317 •0007061 ■0006814 •0006575 •0006343 •0006118 •0005902 •0005693 •0005400 •0005294 •0005105 ■0004921 ■000-1744 •0004573 ■0004408 •0004248 ■0004093 ■0003944 ■0003800 •0003661 ■0003526 •0003396 •0003271 •0003149 •0003032 •0002919 ■0002810 •0002705 •0002004 •0002506 •0002411 ■0002320 •0002232 •0002147 ■0002005 •0001987 ■0001910 ■0001837 •0<O1766 ■0001698 ■0001633 ■0001569 •0001508 ■0001449 •0001393 ■0001338 301 291 282 273 264 256 247 239 232 224 217 210 203 196 189 183 177 171 1G5 160 155 149 144 139 135 130 125 121 117 113 109 105 102 98 95 91 88 85 82 79 76 73 71 68 60 63 61 59 67 65 + 10 10 » » 9 8 8 8 8 8 7 7 7 7 6 6 6 6 6 6 6 6 5 5 6 5 4 4 4 4 4 4 4 4 3 3 8 t* t S* X P(X) + Z(X) + 400 •9999683 13 13 12 12 11 11 1 ■0001333 63 61 49 47 45 43 2 4-oi •999'J(;06 1 •0001286 2 402 •9999709 -0001235 2 40s ■9999721 ■0001186 2 4-04 ■9999733 ■0001140 2 40s -9999744 ■0001094 2 409 407 •9999755 ■9099765 10 10 9 9 9 ■0001051 •0001009 42 40 39 37 36 408 ■9999775 ■0000969 409 410 •0999784 ■9999793 ■0000930 •0000893 411 •9999802 8 8 8 7 7 •0000857 35 33 32 31 30 411 •9999811 ■0000822 413 ■9999819 O000789 4U ■9999826 •0000757 41s ■9999834 •0000726 419 ■9999841 7 7 6 e e ■0000697 28 27 26 25 24 417 418 •9999848 ■9999864 •0000668 ■0000641 4-19 •9999861 •0000615 4*0 ■99998U7 •0000589 421 ■9999872 6 6 6 6 6 •0000565 23 22 22 21 20 422 ■9999378 •0000542 4-23 -9999883 •0000519 4H •9999888 •0000498 425 ■99U9893 •0000477 4-26 ■9999893 ■0000467 19 18 18 17 16 4-27 •9999902 -0000438 4-28 •9999907 -0000420 429 -9999911 •0000402 4S0 -9999916 ■0000385 431 ■9999918 4 3 3 3 3 •0000369 16 15 14 4-st •9999922 ■0000351 433 •9999925 •0000339 4*4 -9999929 ■0000324 14 4-35 •9999932 ■0000310 13 436 •9999033 3 3 3 3 a ■0000297 13 4-37 •9991W38 ■00002H4 12 438 ■9999941 ■0000272 12 4-39 •9999943 •0000261 XI 440 •9999946 •0000249 11 iil •9999948 s 2 2 2 2 0000239 10 4V •9999951 ■0000228 10 4-43 ■9999953 •0000218 9 4-U •9990958 •0000209 9 445 ■9999957 •00002UO 9 4-46 •9999959 1 3 2 > •0000191 s 447 •9999961 ■0000183 g 448 ■9999963 ■0000175 8 449 •9999064 •0000167 7 4-50 •9999966 •00001C0 1 Note sign of second difference, S*. 415 The Normal Probability Function (continued) X P(X)* Z(X)» 450 66023 159837 4-61 67586 162797 *5t 69080 146051 4-53 70508 139500 454 71873 133401 4-66 73177 127473 466 74423 121797 457 75614 116362 4-68 76751 111159 469 77838 106177 460 78875 101409 4-61 79867 96845 16* 80813 92477 463 81717 88297 464 82580 84208 465 83403 80472 466 84100 76812 467 84940 73311 468 86656 69962 469 8C340 66760 4-70 86992 63698 4-71 87614 60771 47t 88208 67972 473 88774 66296 W4 89314 62739 4-75 89829 50295 476 90320 47960 4-77 90789 45728 4-78 91235 43596 4-79 91C61 41559 480 92067 39613 431 92453 37755 4-8t 92822 35980 483 93173 34285 4 84 93508 32067 485 93827 31122 486 94131 29647 4-87 94420 28239 4 88 94696 26895 489 94958 25613 490 96208 24390 491 95446 23222 49t 96673 22108 493 95889 21046 4-94 96094 20033 495 96289 19066 496 96475 18144 497 96652 17265 498 96821 1G428 499 9G981 15629 X P(X)* Z(X)* 6-00 97133 148C7 6-01 97278 14141 6-Ot 97416 13450 603 97548 12791 604 97672 12162 605 97791 11564 606 97904 10994 6-07 98011 10451 6-Ot 98113 9934 609 98210 9441 610 98302 8972 611 98389 8626 Sit 98472 8101 613 98551 7696 614 98626 7311 616 98698 6944 616 98766 6695 617 98830 6263 618 98891 6947 619 98949 6647 6*0 99004 6361 6*1 99056 6089 6*t 99105 4831 6*3 99152 4586 6*4 99197 4351 6*5 99240 4128 6*6 99280 3917 6*7 99318 3716 6*8 99354 3525 6*9 99388 3344 680 99421 3171 631 99452 3007 631 99481 2852 6-33 99509 2704 6-34 99535 2563 636 99660 2430 636 99584 2303 6-37 99606 2183 638 99628 2069 639 99648 1960 640 99667 1857 641 99085 1760 6-4$ 9970:2 1667 543 99718 '579 644 99734 1495 645 99748 1416 646 99762 1341 6-47 99775 1270 6-48 99787 1202 64$ 99799 1138 X P(X)* Z(X)* 650 99810 1077 651 99821 1019 651 99831 965 663 99840 913 654 99849 864 655 99867 817 6-66 99865 773 657 99873 731 658 99880 691 6-59 99886 654 660 99893 618 661 99899 585 66t 99905 553 663 99910 522 664 99915 494 6-65 99920 467 6-66 99924 441 6-67 99929 417 668 99933 394 6-69 99936 372 6-70 99940 351 6-71 99944 332 61t 99947 313 6-73 99950 296 6-74 99953 280 6-76 99955 264 676 99958 249 677 99960 235 6-73 99963 222 6-79 99965 210 680 999C7 198 581 99969 187 68t 99971 176 6*3 99972 166 6*4 99974 167 686 99975 148 6-86 99977 139 6H7 991)78 131 688 99979 124 689 99981 117 690 99982 110 691 999H3 104 6-9t 99984 98 6-93 99985 92 694 99986 87 695 99987 82 696 99987 77 697 99988 73 698 99989 68 699 99990 65 600 99990 61 Z(X) = e-*x , W2n), P(X) = l-Q(X) = f Z(u)du. * The entries for P(X) and Z{X) on this page are given to 10 decimal places; thus 0-99999 should be prefixed to each entry for P(X) and a decimal point, followed by four, five, ..., eight zeros, as appropriate, to Z[X). This table was reprinted from Biometrika Tables for Statisticians, Vol 1, 3rd Edition, Table 1, with the permission of the Biometrika Trustees. 4k ON Percentage Points of the F-distribution (Variance Ratio) Upper 25 % points 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 oo 1 583 7-60 8 20 8-58 8-82 898 9- 10 919 926 932 941 9-49 9-58 9 63 967 971 9-76 9-80 985 2 257 300 3 15 323 328 331 334 3 35 337 338 339 341 343 343 344 3-45 346 347 3-48 3 202 2-28 230 239 241 2-42 243 2 44 2-44 2-44 2-45 246 2-46 246 2-47 247 247 2-47 2-47 4 1-81 200 205 206 207 208 208 208 208 208 208 208 208 208 208 2 08 208 2 08 208 5 1-69 1 86 1-88 1-89 1 89 1-89 1-89 1-89 1-89 1-89 1-89 189 1-88 1-88 1-88 1 88 1-87 1-87 1-87 6 162 1 76 1-78 1-79 1-79 1-78 1-78 1-78 1-77 1-77 1-77 1 76 1-76 1-75 1-75 1-75 174 1-74 1-74 7 1 57 1-70 1-72 1-72 1 71 1-71 1-70 1-70 1 69 1 69 1 68 1 68 1-67 1-67 1-66 1 66 1-65 1 65 1-65 8 154 1 66 1-67 106 1 66 1-65 1-64 1-64 163 1 63 1 62 1 62 1 61 1 60 160 1-69 1-69 1-58 1-58 9 1-51 162 1-63 1-63 1 62 1-61 1-60 1-60 159 1-59 1-68 1-57 1 56 1-66 1-55 1-54 1-54 1-53 1-63 10 1 49 1-60 1 60 1-59 159 1 58 1-57 1-66 156 155 1 54 1 53 1-52 1-52 1-51 1-51 1-50 1 49 1-48 11 1-47 1-58 1-58 1 57 1 56 1-55 1-54 1-53 J 53 1-52 1 51 1-50 1-49 1-49 1 48 1 47 1-47 1 46 145 12 1 46 1 56 1-56 1 55 1 54 1-53 1-52 1-61 1-51 1-50 1-49 148 1-47 1 46 1 45 1-46 1-44 1-43 1-42 13 1 45 1-55 1 55 1 53 1-52 1 51 1 50 149 149 1-48 1-47 1-46 1 45 1-44 1 43 1-42 1 42 1 41 1-40 14 1-44 1-53 1-53 1 52 1-51 1-50 1-49 1-48 1-47 1 46 145 144 1-43 1-42 1-41 1-41 1-40 1-39 1-38 15 1-43 1-52 1 52 1-61 1-49 1-48 1-47 1-46 1 46 1-45 1 44 1-43 1-41 1 41 1-40 1 39 1-38 1-37 1-36 16 1 42 1-51 1 51 1-60 1-48 1-47 1-46 1 45 1 44 1-44 1 43 1 41 1 40 1-39 1-38 1 37 1 36 1-35 1-34 17 1-42 I 51 1-50 1-49 1-47 1 46 1 45 1 44 1 43 1 43 141 1 40 1 39 1 38 1-37 1 36 1-35 1 34 1-33 18 1-41 1 50 1-49 1-48 1 46 1 45 144 143 1-42 1-42 1-40 1 39 1 38 1-37 1-36 1 35 1-34 1 33 1 32 19 1-41 149 1-49 1-47 1 46 144 143 1-42 1 41 1-41 1 40 1-38 1 37 1 36 1-35 134 133 1 32 130 20 1-40 1-49 1-48 1-47 1-45 1 44 143 1-42 1-41 1-40 - 1-39 1 37 1-30 1-35 1-34 1-33 1-32 1-31 1-29 21 1-40 1 48 1-48 1 46 1-44 143 1 42 1 41 1-40 1-39 1-38 1-37 1-35 1 34 1-33 1-32 1 31 1 30 1-28 22 1 40 1-48 1-47 145 144 1-42 1 41 1-40 1-39 1-39 1 37 1 36 1 3t 1 33 1-32 1-31 1 30 1 29 1-28 23 1 39 1 47 1-47 1 45 1 43 1-42 1 41 1-40 139 1-38 1-37 1 35 1-31 1-33 1-32 1-31 1 30 1 28 1-27 24 1 39 1-47 146 1 44 1-43 1-41 1-40 1-39 1-38 1-38 1 36 1-35 1-33 1 32 1-31 1-30 1-29 1-28 1-26 25 1-39 147 1 46 1 44 142 1 41 1-40 1-39 1 38 1-37 1-36 134 1 33 132 131 1-29 1-28 1-27 1 25 26 1-38 1 46 1 45 1 44 142 141 1 39 1-38 1-37 1-37 1 35 1 34 1-32 1-31 1-30 1-29 1 28 1-26 1 25 27 1-38 I 46 1 45 143 142 1-40 1 39 1-38 1-37 1 36 1 35 1 33 1-32 1 31 1-30 1-28 1-27 1 26 1 24 28 1 38 1 46 1 45 1 43 1-41 1 40 1 39 1-38 1-37 1-36 1 34 1 33 1-31 1 30 1-29 1-28 1 27 1 25 1 24 29 1 38 1 45 145 1 43 1-41 1-40 1-38 1-37 1-36 1-35 1 34 1-32 l-3t 1 30 1-29 1-27 1 26 1 25 1 23 30 1-38 145 144 1-42 1 41 139 1-38 1-37 1-36 1-35 1-34 1-32 1-30 1 29 1-28 1-27 1 26 1-24 1-23 40 1 36 1-44 1-42 140 1-39 1 37 1-36 1 35 1 34 1-33 1-31 1-30 1-28 1 26 1 25 1-24 1-22 1 21 1 19 60 1 35 1-42 1-41 1-38 1-37 1 35 1-33 1-32 1-31 1-30 1-29 1-27 1-25 1-24 1-22 1-21 1 19 1 17 115 120 1 34 1-40 1 39 1-37 1 35 1 33 1-31 1-30 1-29 1-28 1-26 1-24 1-22 1 21 119 118 1 16 113 110 00 1 32 1-39 1-37 1 35 133 1 31 1 29 1-28 1-27 1-25 124 122 119 118 1-16 1 14 1 12 108 100 ^~gi~~ I — » whore «}=sjSi/i' 1 and s\ = S t fv t are independent mean squares estimating a common variance o** and based on i^and v % degrees of freedom, respectively* Percentage Points of the F-distribution (Variance Ratio) (continued) Upper 10 % points 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 CO 1 3986 4950 63-59 65-83 67-24 68 20 5891 69-44 69-86 60 19 60-71 61-22 6174 6200 6226 62-53 6279 6306 63-33 2 853 9-00 916 9-24 9-29 9-33 9-35 9-37 938 9-39 9-41 942 944 9-45 9-46 9-47 9-47 9-48 9-49 3 6-54 6-46 6-39 6-34 6-31 5 28 5-27 625 6 24 623 6-22 6-20 618 618 517 6-16 615 6-14 513 4 4 64 4-32 419 411 4-05 401 3-98 3-95 3-94 392 3 90 3-87 384 383 3-82 3-80 3-79 3-78 3-76 5 406 3-78 362 3-52 345 3-40 3-37 3 34 332 330 3-27 3-24 321 319 317 316 314 312 310 6 378 3-46 3 29 3-18 311 305 301 298 296 2-94 290 2-87 2-84 2-82 280 2-78 2-76 2-74 2-72 7 369 326 307 2-96 2-88 2-83 2-78 2-75 272 2-70 267 263 2-69 258 2-56 254 251 2-49 2-47 8 346 311 292 2-81 2-73 2-67 2-62 2-69 2-66 2 54 2-60 2-46 2-42 2-40 238 236 234 2-32 2-29 9 3 36 301 2-81 269 2-61 2-65 251 2-47 2-44 2-42 2-38 2-34 230 2-28 2-26 2-23 2-21 218 216 10 3-29 2-92 2-73 261 2-52 246 241 238 236 2-32 2-28 224 2-20 2-18 216 213 211 208 206 It 323 2-86 266 2-64 2-45 239 2 34 230 2-27 2-25 2-21 217 212 2- 10 208 205 203 200 1-97 12 318 2-81 261 2-48 2-39 233 2-28 2-24 221 2-19 216 210 206 204 201 1 99 1-96 1-93 1-90 13 314 276 256 243 235 2-28 2-23 2-20 216 214 210 205 201 1 98 1 96 1 93 1-90 1-88 1 85 14 310 2-73 2-52 2 39 231 2-24 2 19 216 212 2-10 205 201 1-96 1-94 1-91 1-89 1-86 1-83 1-80 15 307 2-70 2-49 236 2-27 221 2-16 2-12 209 206 2-02 1 97 1-92 1-90 1-87 1 85 1-82 1-79 1-76 16 305 2-67 246 233 224 218 213 209 206 203 1-99 1-94 1-89 1-87 184 181 1-78 1-75 1-72 17 303 264 244 231 2-22 215 210 206 203 200 1 96 1-91 1-86 1-84 181 1-78 1-75 1-72 169 18 3-01 2-62 2-42 2-29 2-20 213 208 204 200 1-98 1-93 1-89 1-84 1-81 1-78 1-75 1-72 169 166 19 2 99 261 2 40 2-27 218 211 2-06 202 1-98 1-96 1-91 1-86 1-81 1-79 176 1-73 1-70 167 163 20 2-97 2 59 2-38 225 216 209 204 200 1-96 1-94 1-89 1-84 1-79 1-77 1-74 171 1 68 1 64 1-61 21 296 2-67 236 223 2 14 208 202 1 98 1 95 1 92 1-87 1 83 1-78 1-75 1 72 1 69 1-66 1 62 169 22 295 2 66 235 2 22 2 13 206 201 1-97 1-93 1-90 1-86 1-81 1-76 1 73 1-70 1-67 164 160 1-67 23 2 94 2-55 2-34 221 2 11 2-05 1-99 1-95 1-92 1 89 1-84 1-80 1-74 1-72 1 69 1-66 1-62 1-59 1-65 24 2 93 2 64 2-33 219 2 10 204 1-98 1-94 191 1-88 183 1-78 1-73 1-70 167 1-64 161 1-67 1-53 25 292 2-63 2-32 2-18 2-09 202 1-97 1 93 1-89 1-87 1-82 1-77 1-72 169 166 1-63 1 59 156 1 52 26 291 2-62 2-31 217 208 201 1-96 1 92 1-88 1 86 1-81 1 76 1-71 1 68 1-65 1-61 1 58 1-54 1-60 27 290 2-61 2-30 217 2 07 200 1-95 1-91 1-87 1-85 1-80 1-75 1-70 1-67 1 64 1-60 1-57 1-53 1-49 28 2-89 2 50 229 216 206 200 1-94 1-90 187 1-84 1-79 1-74 1 69 166 1-63 1-69 1-56 1 52 1-48 29 2-89 2-60 2-28 216 206 1-99 1-93 1-89 1-86 1-83 1-78 1-73 1-68 1-66 1-62 1-58 1-66 1-61 1-47 30 2-88 249 2-28 214 205 1 98 1-93 1-88 1-85 1-82 1-77 1-72 1-87 1-64 161 1-57 1-64 1-60 146 40 284 2 44 223 209 200 1 93 1-87 1-83 1-79 1-76 1-71 166 161 1-57 164 151 1-47 142 1 38 60 2-79 239 2-18 204 1-95 1-87 1-82 1-77 1-74 1-71 1-66 1-60 1-64 161 1-48 1-44 140 135 1-29 120 2-75 235 2-13 1-99 1 90 1-82 1-77 1-72 1 68 1 65 1-60 1-55 148 1-45 1 41 1-37 1 32 1-26 1-19 00 2-71 2 30 208 1-94 1-86 1-77 1-72 167 1-63 1-60 1-65 1 49 1-42 1-38 1-34 1-30 1-24 117 100 F^ -s = — '/— i where »l=SJv l and «J = £,/>', are independent mean squares estimating a common variance cr* and based on e, and v, degrees of freedom, respectively. 4*. ^1 00 Percentage Points of the F-distribution (Variance Ratio) (continued) Upper 5 % points y. 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 CO i 1614 199-6 215-7 224-6 2302 2340 2368 238-9 240-5 241 9 2439 245 9 2480 249 1 2501 2511 2522 2533 254 3 2 18-61 1900 19 16 19 25 1930 19-33 19 35 1937 1938 1940 19-41 1943 19-45 1945 1946 1947 1948 19 49 19 50 3 1013 9-65 9-28 9- 12 901 8-94 889 8-86 8-81 8-79 8-74 8-70 866 864 862 859 857 8-55 8 53 4 7-71 694 6-59 639 6-26 6 16 6 00 604 600 696 5-91 6-86 6-80 6-77 575 6-72 6-69 6 66 563 5 661 6-79 641 519 505 4-95 488 482 4-77 474 468 462 4 58 453 4 50 446 443 4-40 4-36 6 6-99 614 476 4 63 439 4-28 421 415 410 4-06 400 394 387 3 84 381 3-77 374 3-70 367 7 6-59 4-74 435 412 3-97 3-87 3-79 373 3-68 3 64 3-57 3 51 3 44 341 338 334 3-30 3 27 323 8 6-32 4-46 407 3 84 369 358 3-60 344 3-39 335 3 28 322 3 15 3-12 3 08 304 301 297 2-93 9 6-12 4-26 3-86 3 63 3 48 337 3-29 323 3-18 314 307 301 294 290 2 86 2 83 279 275 271 10 4-96 410 3-71 348 333 3-22 3-14 307 302 2-98 291 285 2-77 274 270 266 262 2-68 254 11 484 3-98 3 69 336 3-20 309 301 2 05 290 2-85 2-79 2-72 285 261 2-57 253 249 245 240 12 476 3-89 349 3-26 311 300 291 2-85 2-80 2'75 269 2-62 2 54 251 247 243 238 234 230 13 467 3-81 341 3-18 303 2-92 2-83 2-77 2-71 2-67 260 2-53 246 2-42 2 38 234 230 225 221 14 460 3-74 3 34 311 2-96 2-86 2-76 2-70 265 260 2-53 2-46 239 235 231 2-27 222 218 2 13 15 4 64 368 329 306 2-90 2-79 2-71 2 64 259 2-54 248 2-40 233 229 2-25 220 2 16 2 11 2 07 16 4-49 363 3-24 3-01 2-85 2-74 2-66 259 264 2-49 2-42 2-35 2-28 2 24 2 19 2 15 2 11 206 201 17 445 3-69 3 20 2-96 2-81 2-70 2-61 255 2-49 2-45 238 231 2-23 2 19 2 15 2- 10 206 201 1-96 18 441 3-65 316 2-93 2-77 266 2 58 251 246 2-41 234 227 2 19 2 15 211 206 202 1-97 1-92 19 438 362 3-13 2-90 2-74 2-63 2-64 248 2-42 238 231 223 2 16 2 11 207 203 1 98 1 93 1-88 20 436 349 3-10 2-87 2-71 260 2-51 245 2-39 235 228 220 2 12 208 204 1 99 1 95 1 90 1 84 21 4 32 3-47 307 2-84 2 68 2-67 2-49 2-42 2-37 232 2 25 218 2 10 205 201 1-96 1 92 1-87 1-81 22 4 30 3 44 305 2-82 266 2-55 246 2-40 234 2-30 223 215 207 203 1 98 1 94 1 89 1 84 1-78 23 4-28 3-42 303 2-80 2-64 2-63 2-44 237 232 227 2-20 213 205 201 1 96 1 91 186 1-81 1-76 24 4-26 3-40 301 2-78 262 2-61 2-42 236 230 2-25 218 2-11 203 1 98 1-94 1 89 1-84 1-79 1-73 25 4 24 3-39 2-99 2-76 2-60 2-49 240 234 2-28 224 216 209 201 196 192 1-87 182 1-77 1-71 26 4-23 337 298 2-74 259 2-47 239 2-32 2-27 222 2-15 207 1 99 1 95 1 90 1-85 1 80 1-75 1 69 27 4-21 3-35 296 2-73 267 2-40 2-37 231 2-25 2-20 2-13 206 1 97 1 93 1 88 1 84 1 79 1 73 1-67 28 4-20 3-34 295 271 2 66 2-45 2-36 2-29 2-24 219 212 204 1-96 191 1-87 1 82 1-77 1-71 1 05 29 4-18 3-33 2-93 2-70 2-65 2-43 235 228 2-22 218 210 203 194 190 1-85 1 81 1-75 1-70 1-64 30 417 3-32 2 92 269 263 2-42 2-33 227 2-21 216 209 201 1 93 1 89 1 84 1 79 1 74 1 68 I 62 40 408 3-23 2-84 261 2-45 2 34 2-25 218 212 208 200 1 92 1 84 1 79 1-74 1 69 1 64 1-58 1-51 60 400 3-15 2-76 2 53 2-37 2 25 217 210 204 1 99 1-92 1 84 1-75 1-70 1-65 1-69 I 63 1 47 1-39 120 3-92 307 268 245 229 217 209 202 1-96 1 91 1-83 1 75 1-66 1 61 1 55 1 50 1 43 1 35 1-25 CO 384 300 2-60 2 37 221 2-10 201 194 1-88 1 83 1-75 1 67 1 57 1 52 1-46 1 39 1 32 1 22 1 00 f = j = — ' / — i where »} — Sjlv, and «} = 8Jv t axe independen t mean squares estimating a common variance <t* and based on v x and v t degrees of freedom, respectively. 1 »",/ c, Percentage Points of the F-distribution (Variance Ratio) (continued) Upper 2-5 % points 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 00 1 647-8 7995 864-2 899-6 921-8 9371 948-2 956-7 963-3 968-6 976-7 9849 993- 1 997-2 1001 1006 1010 1014 1018 2 38-51 3900 3917 39-25 39 30 3933 3936 39-37 3939 39-40 39-41 39-43 39-45 3946 39-46 39-47 39-48 39-49 3950 3 17-44 1604 16-44 1510 14-88 14-73 14-62 14-54 14-47 14-42 1434 1425 14-17 14-12 14-08 1404 13-99 13-95 13-90 4 12-22 1065 998 9-60 9-36 9-20 907 8-98 8 90 8-84 8-76 8-66 8 66 8-51 8-46 8-41 8 36 8-31 8-26 5 1001 843 7-76 7-39 715 698 685 6-76 6-68 6-62 6-62 6-43 6 33 6-28 6-23 6-18 6 12 607 602 6 881 7-26 660 623 6-99 5-82 6-70 6-60 5-52 5-48 6-37 6-27 617 512 507 501 4-96 4-90 4-85 7 807 6 54 6-89 6-52 5-29 612 4-99 4-90 482 4-76 4-67 4-57 4-47 4 42 4-36 431 4-25 4 20 414 8 7-57 606 5-42 605 4-82 4-65 4-53 443 436 4-30 4-20 4-10 400 3-95 3-89 384 3-78 3-73 3-67 9 7-21 6-71 508 4-72 4-48 4-32 4-20 410 4-03 3-96 3-87 3-77 367 3-61 3-56 361 3-45 3-39 333 10 6 94 646 4-83 4-47 4-24 407 395 385 3-78 3-72 3'62 3-52 3-42 3-37 3-31 3-26 3-20 314 308 11 672 5-26 4-63 4-28 404 3-88 376 3-66 3-59 353 343 333 323 317 312 3-06 300 294 288 12 6 55 5- 10 4-47 412 3-89 3-73 3-61 3-51 344 337 3-28 318 307 3-02 2-96 2-91 2-85 2-79 2-72 13 641 4-97 435 400 3-77 360 348 339 331 3-25 3 15 305 2-95 2-89 2-84 2-78 2-72 266 2-60 14 6-30 4-86 4-24 3-89 3-66 360 3 38 329 3-21 3-15 305 295 2-84 2-79 2-73 2-67 2-61 2-55 2-49 15 6 20 4-77 415 3-80 3-58 3-41 3 29 3-20 312 306 296 2-86 2-76 2-70 2 64 2-59 2-52 246 240 16 612 4-69 408 3-73 3-50 3 34 322 312 305 2-99 289 2-79 268 2-63 2-57 2-51 245 238 2-32 17 604 4-62 401 366 3-44 328 316 306 2-98 2-92 282 2-72 262 256 2-50 244 238 232 2 25 18 6 98 4 56 3-95 3-61 3-38 322 310 301 293 2-87 2-77 2 67 2-56 250 244 2-38 2-32 2 26 219 19 592 4-61 3-90 3-56 3-33 317 305 2-96 2-88 2-82 2-72 262 251 245 2 39 2 33 2-27 2-20 2-13 20 6-87 4 46 386 351 329 313 301 291 2-84 2-77 268 2-67 2-46 241 2 35 2-29 2-22 2-16 2 09 21 683 442 3-82 3-48 325 309 297 2-87 2-80 2-73 264 2-53 242 2-37 2-31 225 218 2-11 204 22 6-79 438 3-78 3-44 322 305 293 2-84 276 2-70 260 2-50 239 233 2-27 221 2-14 208 200 23 6-75 4-35 375 341 3 18 302 2-90 2-81 2-73 2-67 2-57 2-47 2 36 2-30 2 24 218 211 204 1-97 24 6-72 432 372 338 3-15 299 2-87 2-78 2-70 2-64 254 2-44 2-33 2-27 2-21 215 2-08 201 1-94 25 669 4-29 369 335 313 297 2-85 2-75 2-68 2-61 2-51 241 230 2-24 218 212 205 1-98 1-01 26 566 4-27 367 333 310 2-94 282 2-73 265 2-59 249 2-39 2-28 2-22 216 209 203 1-95 1-88 27 5-63 424 365 331 308 292 280 271 263 2-57 2-47 236 2-25 219 213 207 200 1-93 1-85 28 5-61 4-22 363 3 29 306 2-90 2-78 2-69 261 2-55 2-45 2-34 2-23 217 211 205 1-98 1-91 183 29 569 4-20 361 3-27 3-04 2-88 2-76 2-67 2-69 2 53 2-43 2-32 2-21 216 209 203 1-96 1-89 1-81 30 5-67 4-18 3-59 3-25 303 2-87 2-75 265 2-57 251 241 2-31 2-20 214 207 201 1-94 1-87 1-79 40 6-42 405 346 313 290 2-74 262 253 2-45 2-39 2-29 218 207 201 1-94 1-88 1-80 1-72 1-64 60 6-29 3-93 3-34 301 2-79 2 63 251 241 233 2-27 217 2-06 1-94 1-88 1-82 1-74 1 67 1-58 1-48 120 6-15 3-80 3-23 2-89 2-67 2-52 2-39 2-30 2-22 2 16 205 1-94 1-82 1 76 1-69 1 61 1 53 143 1 31 co 602 3-69 312 2-79 2-57 2-41 2-29 2-19 211 205 194 1-83 1-71 164 1-67 1-48 1-39 1-27 100 F=- -/— » where »i — S 1 lv 1 and »3 = S 1 /i>, are independent mean squares estimating a common variance <r* and based on v, and v t degrees of freedom, respectively. 3/s, 'v, 4* N3 O Percentage Points of the F-distribution (Variance Ratio) (continued) Upper 1 % pmrda ^ 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 00 1 4052 49996 5403 5625 5764 5859 6928 5981 6022 6056 6106 6167 6209 6235 6261 6287 6313 6339 6366 2 0850 99 00 99 17 99-25 9930 9933 9936 99-37 99-39 9940 99-42 9943 9945 9946 9947 99 47 99 48 99-49 99 50 3 3412 30 82 29-46 28-71 2824 27-91 27-67 27-49 27-36 27-23 2705 26-87 26-69 26-60 2650 2641 2632 26-22 26-13 1346 4 21-20 1800 16-69 16-98 16-62 16-21 14-98 14-80 14-66 14-66 14-37 14-20 1402 13-93 1384 13-75 1365 13-56 5 1626 1327 1206 11-39 10-97 1067 10-46 10-29 1016 1005 9-89 9-72 955 947 938 9-29 9-20 911 902 6 13-75 10-92 9-78 915 8-75 8-47 8-26 8-10 7-98 7-87 7-72 7-66 7-40 7-31 7-23 714 706 6-97 6-88 7 12 25 9 55 8-45 7-85 7-46 719 6-99 6-84 6-72 6-62 6-47 6-31 6 16 607 6-99 691 6-82 6-74 6-65 8 11 26 865 7-69 7-01 6-63 6-37 618 603 6-91 5-81 5-67 5-62 636 6-28 5-20 6-12 603 4-95 4-86 9 10 56 802 699 6-42 606 5-80 5-61 6-47 6-35 6-26 6-11 4-96 4-81 473 4-65 4-67 4-48 4-40 4-31 10 10-04 7-56 665 699 5-64 6-39 5-20 606 4 94 4-85 4-71 4-56 4-41 4 83 4 25 417 408 400 3-91 11 965 7-21 622 5-67 632 507 4-89 4-74 4 63 4 54 440 4-25 4-10 402 394 3 86 3-78 3-69 3-60 12 633 693 6-95 5-41 606 4-82 4-64 4-60 439 430 416 401 3-86 3-78 3-70 362 3-54 3-45 3-36 13 907 670 6-74 6-21 4-86 462 444 4-30 4 19 410 3-96 382 3-66 3-59 351 3-43 3 34 3-25 3-17 14 886 651 6-56 604 4-69 4 46 4-28 414 403 3-94 3-80 3-66 3-61 3-43 3-35 3-27 3-18 3-09 3-00 15 8-68 636 542 489 4-56 432 414 400 3-89 3-80 367 3-52 3-37 3-29 321 313 305 2 96 2-87 16 853 623 5-29 4-77 4 44 4-20 403 389 3-78 369 3-55 3-41 3-26 3-18 310 302 293 2-84 2-76 17 8-40 611 5-18 4-67 434 410 393 3-79 3 68 3-59 346 331 316 308 300 292 2 83 2-75 2-65 IS 8-29 601 609 4-58 4-25 401 3-84 3-71 3-60 351 3-37 323 3-08 3-00 292 2-84 2-75 2-66 2-67 19 8-18 593 601 4 50 4-17 3 94 377 363 3-62 3-43 330 315 300 2-92 2 84 2-76 2-67 2-68 2-49 20 810 5-85 494 4-43 410 387 3-70 3-66 346 337 3-23 309 2-94 2-86 2-78 269 261 252 2-42 21 802 5-78 4-87 4-37 404 381 364 351 340 3-31 317 303 2-88 2-80 2-72 2-64 2-55 2-46 2-38 22 7-95 6-72 4 82 4-31 3-99 3-76 3-59 3-45 3 36 3-26 3 12 298 283 2-75 2 67 2 58 2-50 2-40 2-31 23 7-88 5-66 4-76 426 394 3-71 3-64 341 330 321 307 293 2-78 2-70 2-62 2 54 246 235 2-28 24 7-82 6-61 4-72 4-22 3-90 3-67 3 50 336 3-28 317 303 2-89 2-74 266 2-58 2-49 2-40 2-31 2-21 25 7-77 6-67 468 4 18 385 363 346 332 322 313 2 99 2-85 2-70 2-62 2 54 245 2 36 2-27 217 26 7-72 653 464 414 382 3-59 3-42 3-29 318 309 2 96 2-81 266 2-58 2 50 242 233 2 23 2-13 27 7-68 6-49 460 411 3-78 3 66 339 3-26 3-15 306 293 2-78 263 2-55 2-47 238 2-29 2-20 2-10 28 7-64 6-45 4-57 407 3-76 353 336 3 23 3 12 3-03 2-90 2-75 2-60 252 244 235 2 26 217 2-08 29 7-60 642 4-64 404 3-73 3 60 333 3 20 309 300 2-87 273 2'67 2-49 241 2-33 2 23 2 14 2-03 30 7-56 6-39 451 4-02 3-70 3-47 3 30 317 307 208 284 2-70 256 2-47 2 39 2-30 221 211 201 40 7-31 6-18 431 383 351 3-29 312 2-99 2-89 2-80 266 2-52 237 2 26 2-20 2 11 202 1 92 1-80 60 708 4-98 4-13 365 334 312 295 282 2-72 2-63 2-50 235 2-20 2-12 203 1 94 1-84 1-73 1-60 120 685 4-79 395 348 317 2-96 2-79 2 66 2 56 2-47 234 219 203 1-95 1 86 1-76 1-66 1-53 1-38 00 663 4-61 3-78 332 302 2-80 264 261 2-41 232 218 204 1-88 1-79 1-70 1-59 1-47 1-32 1-00 '' = -J=-7*\ whore *i = s il"x aad A=SJv t are independent mean squares estimating a common variance <r« and based on •>, and v, degrees of freedom, respectively. Percentage Points of the F-distribution (Variance Ratio) (continued) Upper 0-5 % points V 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 so 1 16211 20000 21615 22500 23056 23437 23715 23925 24091 24224 24426 24630 24836 24940 25044 25148 25253 25359 25465 2 198 5 1990 1992 199-2 199-3 199-3 1994 199-4 1994 1994 1994 199-4 199-4 199-5 1995 199-6 199-5 199-5 1995 3 65-55 4980 47-47 4619 4539 44 84 44-43 4413 43-88 43-69 4339 4308 42-78 42-62 4247 4231 4215 41-99 41-83 4 31-33 26-28 24-26 2315 2246 21-97 21-62 21-35 21-14 2097 20-70 20-44 2017 2003 1989 19-75 1961 19-47 19-32 5 22-78 1831 16 53 15 56 14-94 1451 14-20 1396 13-77 13-62 13-38 13-15 12-90 12-78 1266 12-53 12-40 12-27 12-14 6 1863 14-54 1292 1203 11 46 1107 1079 1057 1039 1025 1003 981 9-59 9-47 936 924 912 900 8-88 7 1624 1240 10-88 1005 952 9-16 8-89 868 851 8-38 8-18 7-97 7-75 7-65 7-53 7-42 7-31 719 708 8 1469 11-04 960 881 8-30 7-95 769 7-60 7-34 7-21 701 681 661 650 6-40 629 6-18 606 6-95 9 1361 1011 8-72 7-96 7-47 713 6-88 6 69 664 6-42 623 6-03 5-83 6-73 6-62 6-52 641 5 30 6 19 10 12-83 943 808 7-34 687 6-54 630 612 6-97 5-85 5-66 6-47 6-27 617 507 497 4-86 4-75 4 64 11 12-23 8-91 7-60 688 6-42 610 6-86 668 5-54 5-42 6-24 505 4-86 4-76 4-65 4-55 444 4-34 4 23 12 11-75 851 7-23 6-52 607 6-76 6-52 6 35 6-20 609 4-91 4-72 4-53 443 433 423 412 401 390 13 11-37 8 19 693 623 5-79 5-48 5-25 508 494 4-82 4-64 4-46 4-27 417 407 397 387 3 76 365 14 11-06 7-92 6 68 600 656 6-26 503 486 4-72 4-60 4-43 425 406 3-96 386 3-76 366 305 344 15 1080 7-70 6-48 5-80 837 607 485 4-67 454 4-42 4-25 407 3-88 3-79 369 3-58 3-48 337 3-26 16 1058 7-51 630 5-64 621 4-91 469 452 4-38 4-27 4-10 3-92 3-73 3 64 3 54 3-44 333 322 3-11 17 1038 7-35 6 16 5-50 607 4-78 456 4-39 4-25 414 3-97 379 361 351 341 331 3-21 3 10 298 18 1022 721 603 6-37 4-96 466 4-44 4-28 414 403 3-86 3-68 3-50 3-40 3-30 320 3 10 2 99 2-87 19 1007 709 6-92 6-27 4-85 4-56 4-34 418 404 393 3-76 359 3-40 331 321 311 300 289 2-78 20 0-94 6-99 5-82 617 4-76 4-47 4-26 409 396 385 3-68 3 50 332 322 3 12 302 2-92 281 2-69 21 9-83 6-89 6-73 609 4-68 439 418 401 3-88 3-77 3-60 343 324 315 305 2-95 2-84 273 2-61 22 9-73 681 5 65 502 461 432 411 3-94 3-81 3-70 354 336 3-18 308 2-98 2-88 2-77 266 255 23 963 6-73 5-58 4-95 4-54 4-26 405 388 3-75 364 347 3-30 3-12 302 292 2-82 2-71 2-60 2-48 24 9 55 6 66 6-62 4-89 4-49 4-20 399 3 83 3-69 3-59 342 3-25 3-06 297 2-87 2-77 266 2 55 2-43 25 948 660 6-46 4-84 4-43 415 394 3-78 3-64 3-54 3-37 3-20 301 292 282 2-72 261 2 50 2-38 26 941 6 54 6-41 4-79 438 4-10 389 3-73 3-60 349 333 315 2-97 287 277 2-67 2-50 245 233 27 9 34 649 6-36 4-74 434 406 385 3-69 3-56 345 328 311 293 283 2 73 263 252 241 2-29 28 928 6-44 5-32 4-70 4-30 402 381 365 3-62 3 41 3-25 307 2-89 2-79 269 2-59 2-48 237 2 25 29 9 23 640 6-28 4-66 4-26 3-98 3-77 3-61 3-48 3-38 3-21 304 2-86 2-76 266 2-56 2-45 233 221 30 9-18 635 6 24 462 4-23 395 3-74 3-58 3-45 334 318 301 282 273 263 2-52 2-42 230 2-18 40 883 607 498 4-37 3-9!) 371 351 335 3-22 3 12 2-95 2-78 2 60 2-50 2-40 230 218 2 06 1 93 60 849 5-79 4-73 414 3-70 349 329 3-13 3-01 2-90 2-74 2-57 2-39 2-29 219 208 1-96 1 83 1-69 120 8- 18 554 4-50 3-92 3-55 3 28 309 293 281 2-71 254 237 2 19 209 1-98 1-87 1-75 1 61 1-43 CO 7-88 6-30 4-28 372 3-35 309 2-90 2-74 2 62 2-52 236 219 200 1-90 1-79 167 1 53 1 36 100 t S IS •^'=-j = — V-*. where «i=5 J /i' l and t^—S % lv t ore independent mean squares estimating a common variance a* and baaed on v t and i>, degrees of freedom, respectively rc N) Percentage Points of the F- distribution (Variance Ratio) (continued) Upper 0- 1 % points >\ 1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 00 1 4053* 5000* 5404* 5626* 5764* 6859* 5929* 6981* 6023* 6056* 6107* 6158* 6209* 6235* 6261* 6287* 6313* 6340* 6366* 2 998-5 9990 9992 999-2 9993 999-3 999-4 999-4 999-4 999-4 9994 9994 999-4 9995 999-5 9995 9996 9995 9995 3 1670 1485 141 1 1371 1346 132-8 1316 1306 129-9 1292 128-3 127-4 126-4 1259 1254 1250 1245 124-0 1235 4 7414 61-26 6618 53-44 61-71 60-53 49-66 4900 48-47 4805 47-41 46-76 4610 46-77 45-43 4509 44-76 44-40 4406 5 4718 3712 3320 31-09 2975 28 84 28 16 27 64 27-24 2692 26-42 25-91 2539 25-14 24-87 24 60 24-33 2406 23-79 6 3551 2700 2370 21-92 20-81 2003 1946 19 03 1869 1841 17-99 17-66 1712 16-89 16-67 1644 1621 15 99 16-75 7 2925 21-69 18-77 1719 1621 15-52 1502 14-63 1433 1408 13-71 13-32 1293 12-73 12-53 1233 1212 11 91 11-70 8 25-42 18-49 15-83 1439 1349 1286 1240 1204 11-77 11-64 11-19 10-84 1048 10-30 1011 992 9-73 9-53 9-33 9 22-86 1639 13-90 12-56 11-71 1113 1070 10-37 1011 9 89 9-67 9-24 8-90 8-72 8-65 837 819 800 7-81 10 21-04 14-91 12-65 11-28 1048 992 952 920 896 8-75 8-45 813 7-80 7-64 7-47 7-30 712 694 6-76 11 1969 13-81 11 56 1035 9-58 905 866 835 8- 12 7-92 7-63 7-32 701 6-85 668 6-52 6-35 617 600 12 18-64 1297 10-80 963 889 8-38 800 7-71 7-48 7-29 700 6-71 6-40 6-25 609 5-93 6-76 6-59 5-42 13 17-81 1231 1021 907 8-35 7-86 7-49 7-21 698 6-80 6-62 623 5-93 6-78 6 63 6-47 5-30 614 4-97 14 1714 11-78 973 8-62 7-92 7-43 708 6-80 6-68 6-40 613 6-85 6-66 6-41 6-25 610 4-94 4-77 4-60 15 1659 11-34 934 8-25 7-67 709 6-74 647 626 608 6-81 6-54 6-25 610 4-95 480 4-64 4-47 4 31 16 16 12 1097 900 7-94 7-27 681 6 46 6 19 5-98 5-81 5-55 6-27 4-99 485 4-70 454 4-39 423 406 17 1672 10 66 8-73 7-68 702 656 6-22 5-96 6-75 6-58 5-32 605 4-78 463 4-48 433 418 402 385 18 1538 1039 8-49 7-46 681 635 602 676 5-56 5-39 613 4-87 4-59 4-45 4-30 4-16 400 3-84 367 19 1608 1016 8-28 7-26 6 62 6- 18 6-85 659 6-39 5-22 4-97 4-70 443 4-29 414 399 3 84 368 3-51 20 1482 995 810 710 646 602 669 644 6-24 508 4-82 4-56 4 29 4-15 400 386 370 354 3-38 21 1459 9-77 7-94 6-95 632 6-88 6-56 6-31 611 4-95 4-70 4-44 417 403 3-88 3-74 358 342 3-26 22 1438 9-61 7-80 681 6 19 676 6-44 6-19 499 483 4-68 433 4 06 3-92 3-78 363 348 332 3-15 23 1419 947 7-67 669 608 565 633 609 489 4-73 4-48 4-23 3 96 382 3 68 353 338 3 22 305 24 1403 934 7-55 659 6-98 6-65 5-23 4-99 4-80 4-64 439 4- 14 3-87 3-74 3-59 345 329 314 2-97 25 1388 922 7-45 649 688 5-46 515 491 4-71 4-56 431 406 3-79 366 3-52 337 322 306 2-89 26 13-74 9 12 736 6-41 580 6-38 507 4 83 464 4-48 4-24 3-99 3-72 3-59 3-44 3 30 315 299 2-82 27 13-61 902 7-27 633 6-73 5-31 500 4-76 4-57 4-41 417 3-92 3-60 3-52 3-38 3 23 308 292 2-75 28 13-60 8-93 719 6-25 666 524 4-93 4-69 4-50 435 411 3-86 3-60 3-46 3-32 3 18 302 286 269 29 13-39 885 712 6-19 569 6-18 4-87 4-64 4-45 4-29 405 380 3 54 341 3-27 3 12 2-97 281 264 30 1329 877 705 612 553 6- 12 482 4-58 4-39 4-24 400 3-75 3-49 3-36 3-22 307 2-92 276 259 40 1261 825 660 6-70 5-13 4-73 444 4 21 402 3-87 3 64 3-40 316 301 2-87 273 2-57 241 223 60 11-97 7 76 6 17 631 4-76 4-37 409 3-87 369 3-54 331 308 2-83 2-69 2-55 2-41 2-25 2 08 1-89 120 11 38 7-32 6-79 495 4-42 4-04 3-77 3-55 3-38 3-24 302 2-78 2-63 2-40 2-26 2-11 1-95 1-76 1 54 co 10-83 691 542 4-62 410 3-74 347 3-27 310 2 96 2-74 251 2-27 213 1-99 1 84 1 66 145 100 • Multiply these entries by 100. This 01 % table is based on the following sources: Colcord 4 Doming (1935); Fisher & Yatos (1953, Table V) used with the permission of tho authors and of Messrs Oliver and Boyd; Norton (1952). This table was reprinted from Biometrika Tables lor Statisticians , Vol. 1, 3rd Edition, Table 18, with the permission of the Biometrika Trustees. 423 Percentage Points of the t-distribution Q = 0-4 0-25 005 0025 0005 0-0025 0-0005 V 20 = 0-8 0-5 01 005 0-01 0005 0-001 1 0-325 1000 6314 12-706 63 657 127-32 63662 2 ■289 0816 2-920 4303 9-925 14089 31-598 3 •277 •765 2-353 3- 182 6-841 7-453 12-924 4 •271 •741 2- 132 2-776 4-604 5-598 8-610 5 0-267 0-727 2-015 2-671 4032 4-773 6-809 6 •265 •718 1-943 2-447 3-707 4-317 5-959 7 ■263 •711 1 895 2 365 3499 4029 6-408 8 -262 •706 1-860 2-306 3355 3833 5 041 9 •261 •703 1-833 2262 3-250 3-690 4-781 10 0-260 0-700 1-812 2-228 3- 169 3-581 4-587 11 -260 •697 1-796 2-201 3 108 3-497 4-437 12 •259 ■695 1-782 2179 3055 3428 4-318 13 ■259 •694 1-771 2- 160 3012 3372 4-221 14 •258 •692 1-761 2 145 2-977 3-326 4140 15 0-258 0-691 1-753 2131 2-947 3-286 4073 16 ■258 •690 1-746 2120 2921 3-252 4015 17 •257 •689 1-740 2-110 2-898 3-222 3965 18 •257 ■688 1-734 2101 2-878 3197 3-922 19 •257 ■688 1-729 2093 2-861 3- 174 3 883 20 0-257 687 1-725 2-086 2-845 3-153 3 850 21 •257 •686 1-721 2080 2-831 3-135 3819 22 •256 •686 1-717 2074 2-819 3119 3-792 23 •256 •686 1-714 2069 2-807 3-104 3-767 24 •256 •685 1-711 2064 2-797 3091 3-745 25 0-256 0-684 1-708 2060 2-787 3078 3-725 26 •256 •684 1-706 2056 2-779 3067 3707 27 •256 ■684 1-703 2052 2-771 3057 3-690 28 -256 •683 1-701 2048 2-763 3047 3674 29 •256 •683 1-699 2045 2-756 3038 3659 30 256 0683 1-697 2042 2-750 3030 3 646 40 •255 •681 1684 2021 2-704 2-971 3551 60 •254 •679 1-671 2000 2 660 2-916 3-460 120 •254 •677 1-658 1-980 2-617 2-860 3-373 00 •263 •674 1-645 1-960 2-576 2-807 3291 Q- 1 -P{t\v) is the upper-tail area of the distribution for v degrees of freedom, appropriate for use in a single- toil test. For a two-tail test, 2Q must be used. This table was reprinted from Biometrika Tables (or Statisticians. Vol. 1, 3rd Edition, Table 12, with the permission of the Biometrika Trustees 424 Percentage Points of the X'-Distribution \ Q v \ 0995 0990 0975 0950 0900 750 0500 1 392704. 10- 10 157088.10-* 982069.10-* 393214.10-* 00157908 0- 1015308 0-454936 2 00100251 00201007 00506356 01 02587 0-210721 0-575364 1-38629 3 00717218 114832 0-215795 0-351846 0-584374 1-212534 2-3C597 4 0-206989 0-297109 0-484419 0-710723 1063623 1-92256 3-35669 5 0-411742 0-554298 0-831212 1145476 1-61031 2-67460 4-35146 6 0-675727 0-872090 1-23734 1-63538 2-20413 3-45460 5-34812 7 0-989256 1-239043 1-68987 216735 2-83311 4-25485 6-34581 8 1-34441 1-64650 2- 17973 2-73264 3-48954 507064 7-34412 9 1-73493 2-08790 2-70039 3-32511 410816 5-89883 8-34283 10 2-15586 2-55821 3-24097 3-94030 4-86518 6-73720 9-34182 11 2-60322 305348 3-81075 4-57481 5-57778 7-58414 10-3410 12 3-07382 3-57057 4-40379 5-22603 6-30380 8-43842 11-3403 13 3-56503 4- 10692 500875 5-89186 704150 9-29907 12-3398 14 407467 4-66043 5-62873 6-57063 7-78953 101653 133393 15 4-60092 5-22935 6-26214 7-26094 8-54076 11-0365 14-3389 16 5-14221 5-81221 690766 7-96165 9-31224 11-9122 15-3385 17 5-69722 640776 7-56419 8-67176 100852 12-7919 16-3382 18 6-26480 701491 8-23075 9-39046 10-8649 13-6753 17-3379 19 6-84397 7-63273 8-90652 101170 11-6509 14-5620 18-3377 20 7-43384 8-26040 9-59078 10-8508 12-4426 15-4518 19-3374 21 803365 8-89720 10-28293 11-5913 13-2396 16-3444 20-3372 22 8-64272 9-54249 10-9823 12-3380 140415 17-2396 21-3370 23 9-26043 1019567 11-6886 130905 14-8480 181373 22-3369 24 9-88623 10-8564 12-4012 13-8484 15-6587 190373 23-3367 25 10-5197 11-5240 131197 14-6114 16-4734 19-9393 24-3366 26 111602 121981 13-8439 15-3792 17-2919 208434 25-3365 27 11-8076 12-8785 14-5734 161514 181139 21-7494 26-3363 28 12-4613 13-5647 15-3079 16-9279 18-9392 22-6572 27-3362 29 13 1211 14-2565 160471 17-7084 19-7677 23-5666 28-3361 30 13-7867 14-9535 16-7908 18-4927 20-5992 24-4776 29-3360 40 20-7065 221643 24-4330 26-5093 290505 33-6603 39-3353 50 27-9907 29-7067 32-3574 34-7643 37-6886 42-9421 49-3349 60 35-5345 37-4849 40-4817 43-1880 46-4589 52-2938 59 3347 70 43-2752 45-4417 48-7576 51-7393 55-3289 61-6983 69-3345 80 511719 53-5401 571532 60-3915 64-2778 711445 79-3343 90 59-1963 61-7541 65-6466 69- 1260 73-2911 80-6247 89-3342 100 67-3276 700649 74-2219 77-9295 82-3581 901332 99-3341 X -2-5758 -2-3263 -1-9600 - 1-6449 -1-2816 -0-6745 00000 Q = Q(X* I »') = 1 - P(X' I ») = 2-»' {r(Jy)}-' r e-»*z»"-' dx. J x* 425 Percentage Points of the X'-Distribution (continued) *=i i -l +x JU or x*=j^+v(2^-i)}». V \ 0-250 0100 0050 0025 0010 0005 0001 1 1-32330 2-70554 3-84146 502389 6-63490 7-87944 10-828 2 2-77259 4-60517 5-99146 7-37776 9-21034 10-5966 13-816 3 4-10834 6-25139 7-81473 9-34840 11-3449 12-8382 16-266 4 5-38527 7-77944 9-48773 11-1433 13-2767 14-8003 18-467 5 6-62568 9-23636 110705 12-8325 15-0863 10-7496 20515 6 7-84080 10-6446 12-5916 14-4494 16-8119 18- ".476 22-458 7 903715 120170 140671 16-0128 18-4753 20-2777 24-322 8 10-2189 13-3616 15-5073 17-5345 200902 21-9550 26- 125 9 11-3888 14-6837 16-9190 190228 21-6660 23-5894 27-877 10 12-5489 15-9872 18-3070 20-4832 23-2093 25-1882 29-588 11 13-7007 17-2750 19-6751 21-9200 24-7250 26-7568 31-264 12 14-8454 18-5493 210261 23-3367 26-2170 28-2995 32-909 13 15-9839 19-8119 22-3620 24-7356 27-6882 29-8195 34-528 14 171169 210641 23-6848 26- 1189 291412 31-3194 36-123 15 18-2451 223071 24 9958 27-4884 30-5779 32-8013 37-697 16 19-3689 23-5418 20-2962 28-8454 31-9999 34-2672 39-252 17 20-4887 24-7690 27-5871 30- 1910 33-4087 35-7185 40-790 18 21-6049 25-9894 28-8693 31-5264 34-8053 37- 1565 42-312 19 22-7178 27-2036 301435 32-8523 361909 38-5823 43-820 20 23-8277 28-4120 31-4104 341696 37-5662 39-9968 45-315 21 24-9348 29-6151 320706 35-4789 38-9322 41-4011 46-797 22 26-0393 30-8133 33-9244 36-7807 40-2894 42-7957 48-268 23 27-1413 320069 35- 1725 38-0756 41-6384 44-1813 49-728 24 28-2412 33- 1962 36-4150 39-3641 42-9798 45-5585 51-179 25 29-3389 34-3816 37-6525 40-6465 44-3141 46-9279 52-618 26 30-4346 35-5632 38-8851 41-9232 45-6417 48-2899 54052 27 31-5284 36-7412 40-1133 43-1945 46-9629 49-6449 55-476 28 32-6205 37-9159 41-3371 44-4608 48-2782 50-9934 56-892 29 33-7109 390875 42-5570 45-7223 49-5879 52-3356 68-301 30 34-7997 40-2560 43-7730 46-9792 60-8922 53-6720 59-703 40 45-6160 51-8051 55-7585 59-3417 63-6907 66-7660 73-402 50 56-3336 631671 67-5048 71-4202 76-1539 79-4900 86-661 60 66-9815 74-3970 790819 83-2977 68-3794 91-9517 99-607 70 77-5767 85-5270 90-5312 950232 100-425 104-215 112317 80 88-1303 96-5782 101-879 106-629 112-329 116-321 124-839 90 98-6499 107-565 113-145 118-136 124116 128-299 137-208 100 109141 118-498 124-342 129-561 135-807 140169 149-449 X + 0-6745 + 1-2816 + 1-6449 + 1-9600 + 2-3263 + 2-5758 + 30902 For v > 100 take according to the degree of accuracy required. X is the standardized normal deviate corresponding to P=Y — Q, and is shown in the bottom line of the table. This table was reprinted from Biometrika Tables for Statisticians , Vol. 1, 3rd Edition, Table 8, with the permission of the Biomerrika Trustees. 426 Notes m HEWLETT PACKARD Part No. 98820-13111 Printed in U.S.A. E0782 R«t Edition; July 1982