Summer Research Fellowship Programme of India's Science Academies 2017
Study of pointing accuracy of 3.6m Devasthal Optical
Telescope
Mohan Agrawal
July 9, 2017
Contents
1 Introduction 2
1.1 Ways to point a telescope . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 About DOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Corrections involved 4
3 Modeling Process 7
3.1 Kinematic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Geometrical errors in kinematic model . . . . . . . . . . . . . . . . . . . 10
4 Model Fitting 12
4.1 Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 Criteria of estimate . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.2 The Gauss-Markov assumptions . . . . . . . . . . . . . . . . . . . 14
4.1.3 The Gauss-Markov Theorem and related properties . . . . . . . . 14
4.1.4 ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.5 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Regressing the pointing data . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Obtaining statistics 20
5.1 Residual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Revisiting assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Practical considerations 29
6.1 Off axis imager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Star selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Sources of uncertainity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Conclusion 34
8 Acknowledgements 34
Note: All the data, code and graphs used in the report can be found at
https://github.com/mohanagr/pointing-model
1
Abstract
Pointing accuracy is the first and foremost requirement for any astronomical observation.
With this report an effort has been made to study and document various ways of modeling
an alt-az telescope’s pointing accuracy, obatining statistical inferences from the fit models,
peering into the physical causes of the modeled errors and providing a comprehensive list
of shortcomings and limitations of the approach used. The study has been made using
the data obtained from 3.6m Devasthal Telescope. A kinematic model has been used to
model the pointing errors, vector form of least square method used to fit the model and
various statistical parameters obtained that allow one to asses the goodness of fit of the
model and provide solution to some common pointing difficulties that may arise during
an observation e.g. the imager being used is not aligned with the telescope’s rotator’s
axis. At the end several other sources/causes of error that may affect the accuracy, have
been summarized.
1 Introduction
With amateur telescopes when one wants to look at a bright object, say a planet, all that
needs to be done is, swing the telescope to the location where the planet is approximately
located and manually move it until it is focused in the eyepiece. But what if the target
object is faint? i.e. a deep sky object. Then the usual method is to use a finding-chart.
This approach (although still being used at some places) is highly inefficient simply
because of the sheer amount of wasted time.
Instead, large and professional telescopes have stepper motors & encoders (either
absolute or incremental) and have physical readouts. Here all an observer needs to do
is enter the desired location, usually as (RA, Dec), and the telescope (hopefully) points
to that direction. Although this approach points the telescope with minimum effort, it
is riddled with errors - the celestial target may or may not appear in the FOV (Field
Of View) and even if it does, it is certain that it will not appear in the center of the
FOV/crosshair. Taking the simplest of examples, it is possible that the encoders have
some zero-errors or the bearings have runout. The direction where one wants to look
is different from the direction the telescope’s looking. Therefore there arises a need
to introduce some correcting offset in the values that are to be input. Or still better,
generate a model that takes into account various errors that can be present (both static
and dynamic), takes the desired coordinates as input and gives the corrected coordinates
as output - the coordinates which should actually be entered in order for the image to be
centered in the FOV.
1.1 Ways to point a telescope
The term ’pointing’ itself refers to the initial acquisition of the target. Depending on the
type of target (bright/faint) and the accuracy demands (large observatories often need
sub-arcsecond pointing accuracies) there are different ways to point the telescopes.
1. Blind pointing : Also known as dead reckoning, this is the most basic way wherein
one enters the target coordinates and the telescope moves (usually to some different
coordinates since certain errors always present). This is generally used when the
accuracy required is not too demanding (several arc seconds is acceptable) e.g. in
case of wide-field imaging.
2
2. Offset pointing : When the pointing demands are higher than what the system is
capable of, a common way is to introduce a constant offset in the input values. This
offset can be calculated by first entering the actual coordinates of the (bright) object,
then centering the object in the FOV and noting down the resultant coordinates
shown by telescope readouts. The difference then gives the offset. Using this offset
for the rest of the observations often gives a decent performance with absolute
accuracy being less than 10
00
and often nears 1
00
2
00
if the observations are being
made only over a small patch of the sky.
3. Blind offset pointing : When the target object itself is faint, an offset is calculated
as above using a bright object in the vicinity of the target object.
4. Pointing models : This method involves constructing a mathematical model using
a set of calibration stars that can then predict correct encoder demands for any
future observations. Pointing models are used in all modern TCS (Telescope Control
Systems) and are capable of catering to sub-arcsecond pointing demands. The full-
blown pointing models (consisting of about 20 terms) generated for the 3.6m DOT
(Devasthal Optical Telescope) give a pointing accuracy of 0.5
00
0.6
00
.
This report is concerned with the development and usage of method no. 4. A pointing
model has been fit manually and the subsequent results are compared with that of the
results obtained from TPOINT software, to assess the performance of TPOINT’s model
and to calculate certain statistics that TPOINT does not provide. A python script
has also been written to compare various TPOINT models and generate bar plots to
enable the user to have a quick look at the performance of various models and choose
appropriately.
1.2 About DOT
The 3.6m Devasthal Optical Telescope (DOT) is an alt-azimuth, Ritchey Cr´etien type
telescope. The optical diagram is figure 1. The ’Main Port’ sits at FPA axial. There
are two FPA-side, holding the two ’Side Ports’. The pointing data used here has been
obtained from a CCD imager mounted at Main Port. Quick comparison can be done for
pointing models for different ports (although not much variation is expected) as shown
in figure 19 later in the report.
3
Figure 1: Optical diagram for the telescope featuring hyperbolic primary and secondary
mirrors. The focal length with the corrector is 32795.8mm. Image adapted from telescope
manuals.
2 Corrections involved
Before proceeding to build a model, it is necessary to know various corrections have to
be considered by a pointing model. These corrections can be broadly categorized into:
1. Corrections for apparent position : These include effects of gravitonal fields
of Sun and Moon on the earth - precession an nutation, proper motion of the
stars, annual parallax (significant for nearby stars), light deflection (because of
gravitational lensing by the Sun) and Earth’s rotation. These are the geometrical
corrections in the positions of the stars on the celestial sphere i.e. not for someone
on Earth.
2. Corrections for topocentric position : Topocentric means the positions which
will be observed by an observer on earth. These include correcting for diurnal
abberation and accounting for atmospheric refraction effect.
3. Telescope specific corrections : These corrections account for the imperfections
of the telescope (to be discussed in subsequent sections). Their aim is to mop up
any significant difference left between actual coordinates and the encoder demands
after the aforementioned corrections have been applied.
The corrections are shown in the following figure
[1]
.
4
Figure 2: The list of corrections that lead from the actual star coordiantes obtained from
a catalogue to the ’instrumental place’ - coordinates that will be actually input in the
telescope.
The apparent [α, δ] is obtained after applying celestial corrections. RA is then con-
verted to HA (Hour angle) using LST (Local Sidereal Time). Apparent [HA, Dec] then
gets corrected for earth specific effects including refraction and is called the Observed
Place [h, δ]. This is then converted to Horizontal Coordinate System i.e. [Az, El] repre-
senting Azimuth & Elevation. These alt-az coordinates then undergo corrections by the
telescope’s pointing model (e.g. the one generated by TPOINT) and the final result is
used to orient the telescope, called the Instrument Place
[2]
.
Some definitions consider ’apparent place/position’ to be the one which is observed
from earth, here that is called ’observed place’. ’Apparent position’ is the one which is
obtained after considering the effect of proper motion - from catalogue epoch to present
date; effects of ’general precession’ which is the well-known precession of Earth’s pole
around the pole of the ecliptic with a 26, 000 year period (consequently the vernal equinox
moves around the celestial equator, advancing by 50.25
00
/year) which produces an effect
on star positions of upto 50
00
(also being the reason as to why most catalogues specify
an epoch); and the nutation effects (residual wobbles, generally because of moon) which
affect the pointing by at most 10
00
. These effects combined, give us the ’apparent place’.
Effects such as light deflection (< 2
00
), Diurnal abberration(< 0.3
00
) and parallax (< 1
00
5
for nearest of the stars) can be neglected. Further most TCS’s allow the user to enter
effects such as parallax etc., if user does want them to be considered.
While calculating observed place, the most significant contribution is by that of re-
fraction. As shown below, continuous variation in the refractive index of the air raises
the apparent position of the star.
Figure 3: The effect of refraction on pointing
Usually, this effect scales with cot ζ (where ζ is the distance to zenith, 90
Elevation)
and can be as large as 1
0
for ζ = 45
. This effect raises the z coordiante of the aim vector
(position vector of the star in the sky) and the change is given by
[3]
:
z = K/z
where K(P, T ) 2.77 ×10
7
P × (1.0419 0.00380T )
P is the pressure at the observatory in mbar and T is the ambient temperature in
C.
For DOT this calculation is done by the TCS.
Most catalogues give what is called here the ’apparent place’ which is then input into
the TCS and steps 2 & 3 are computed; even if apparent place is not available, the TCS
will give user the option to specify the epoch of the observation, parallax and proper
motion and then the TCS will compute all 3 steps.
This study is concerned specifically with step 3. The data that is used to fit the
observations looks as shown in the following figure:
Figure 4: Format of data used to fit pointing models. The two leftmost values represent
Apparent (Az, Alt) and the two rightmost represent Instrument (Az, Alt), the one shown
by encoders
6
It is important to note that the difference of the two sets of values shown above is
assumed to have come solely from telescope’s inefficacies; which is to be corrected by step
3. In the sections that follow, this step will be discussed in detail, as to how these errors
are modeled. For calculating effect of precession, nutation, aberration etc. explicitly one
may refer [2].
3 Modeling Process
The pointing error is represented as a pair (∆Az, E). In order to build a model, we
need for formulate those errors i.e. we need to be able to represent the errors as:
A = f(A, E) + η (1)
E = g(A, E) + ξ (2)
η and ξ represent inherent irreducible random error in the model.
The goal of modeling is to generate a deterministic model that relates telescope mount
axis positions to the pointing error. Although no random variables are involved in the
above representation, the model is not truly deterministic. Some stochastic variables are
either excluded or too difficult to integrate into the model. They contribute as a random
error e.g. wind effects, local thermal variations etc. It is the choice of f and g that
differentiates the two common approaches.
One approach is to fit a spherical harmonic function model for both the errors. Zonal
harmonics are taken upto 4
th
order and the tesseral terms only to the first order. The
function may look something like:
f(A, E) = a
0
+ a
1
sin A + a
2
cos A cos E + a
3
sin A cos E + a
4
sin
2
E
+ a
5
cos A sin E cos E + a
6
sin A sin E cos E + a
7
sin
3
E...
It has been shown
[4]
that such a model lacks stability, often has high correlation
between the terms and performs badly when extrapolated. Further from the point of
view of a non-statiscian it provides absolutely no physical insights i.e. what caused those
errors or relating the errors with physical misalignment/errors of the telescope.
Other, more popular approach is to use a kinematic model. In such a model we derive
expressions for different types of geometrical errors, misalignments that might actually
be present in the telescope and their effect on pointing i.e. contribution to (∆A, E).
3.1 Kinematic model
A kinematic model incorporates geometrical errors that can be induced in a telescope
because of mechanical misalignments, flexures and low order bearing effects. Former two
are by far the dominating effects. There is nothing emprical in such a model, every error
has a physical basis, which is why as will be seen later, every term in the model comes
out to be highly significant. (very low p-values in hypothesis testing).
A telescope employs a two degree-of-freedom motion system (rotations about Az and
El axes) which is why the basic kinematic model also consists of two rotation motions
only. A number of errors are possible with such a system.
7
(a) Non perpendicularity
(b) Collimation error
Figure 5: Figure representing different geometrical errors. Blue vector in (a) represents
original axis and green shows the tilted one. Red vector shows the originally expected
pointing direction and the brown one shows the displaced direction. (Image adapted from
wildacard-innovations.com)
The Azimuth and Elevation axes might not be perpendicular.
There might be collimation error. The optical path might not be parallel to tele-
scope’s boresight e.g. the optical tube is not perpendicular to the elevation axis.
The telescope pier might be tilted (NS or EW).
Telescope tube might droop down because of gravitational loading. (flexure)
In a kinematic model, the errors each of the above mentioned factors introduce in
(∆A, E) can be calculated using 3-D geometry, which then form the required f(A, E)
and g(A, E).
There are various ways to represent a kinematic model. Here we will use a rotation-
only derivative of D-H (Denavit-Hartenberg) representation. This is because translational
errors are hardly present in a telescope & even if they are, they hardly affect pointing of
the telescope. It is the rotational alignment which is error prone and has an on-sky effect.
For all practical purposes the D-H convention is equivalent to using Euler angles or more
8
specifically Tait-Bryan angles for rotation. They are used to represent the orientation
of a rigid body with respect to a fixed coordinate system. The rotations in Tait-Bryan
system are about x-y-z axes. These rotations can be of two type:
1. Intrinsic : About XYZ axes which is attached to the rigid body (i.e. it’s orienta-
tion changes w.r.t. ground).
2. Extrinsic : About xyz axes which remains fixed (to the ground).
Here we will use extrinsic rotations. It is worthwhile to note that the XYZ axes
represent telescope’s mount axes (ones which will change orientation e.g. when the mount
is tilted). For our purposes the rigid body talked about earlier will be the telescope’s
boresight or the Optical Tube Assembly (OTA) (unless specified). Rotation about xyz
axes are represented as follows:
R
x
=
1 0 0
0 cos x sin x
0 sin x cos x
1 0 0
0 1 x
0 x 1
(3)
R
y
=
cos y 0 sin y
0 1 0
sin y 0 cos y
1 0 y
0 1 0
y 0 1
(4)
R
z
=
cos z sin z 0
sin z cos z 0
0 0 1
1 z 0
z 1 0
0 0 1
(5)
For R
y
the signs have been reversed compared to the other two because the rotations
are considered positive in the anti-clock direction when looking towards the axis. As
shown in figure 6, image b. θ for y axis implies that the rotation is into the page. As
per Euler’s rotation theorem, any orientation of a rigid body in space can be achieved by
3 elemental rotations about these axes.
Figure 6: Principal axes rotations
9
For an ideal telescope when it’s elevation and azimuth angles are set to zero it should
point at Northern horizon, which in our case represents the pointing vector
0 1 0
T
.
It is necessary to remember that all rotations are extrinsic (about xyz, attached to
ground). Now when the telescope is moved to a position (A, E), it is given two rotations
using R
z
and R
y
such that:
R
A
=
cos A sin A 0
sin A cos A 0
0 0 1
z = A has been substituted in R
z
since the rotation is clockwise (towards east or
x-axis). And for rotation around elevation axis, x = E in R
x
.
R
E
=
1 0 0
0 cos E sin E
0 sin E cos E
Because rotating the telescope about azimuth changes the orientation of elevation
axis, R
E
should be applied before R
A
. Hence the total rotation matrix is:
R
S
= R
A
R
B
=
cos A sin A cos E sin A sin E
sin A cos A cos E cos A sin E
0 sin E cos E
(6)
and produces the ideal, undeflected line of sight vector P
i
.
P
i
= R
S
P
0
=
sin A cos E
cos A cos E
sin E
(7)
From any pointing vector P, azimuth (A) and elevation angles can be found out by:
A = tan
1
(
P 1
P 2
) (8)
and
E = sin
1
(P 3) (9)
3.2 Geometrical errors in kinematic model
Each of the error discussed in previous section produces a small rotation of the line of
sight vector. The amount by which it changes (A, E) of the line of sight can be calculated.
The geometrical errors which have been used in this model have been summarized in table
1. As an example, derivation of the non-perpendicularity error has been shown.
To avoid any confusion it is necessary to understand that while visualizing these ro-
tations, one must keep the physical effects that cause these errors (as seen by someone
standing on the ground) separate from the way the final position is achieved mathemat-
ically. E.g. consider the non perpendicularity error. E.g. it is obvious for an observer
to see that the motions of Azimuth and Elevation axes will influence each other. When
the telescope tube is raised to some elevation, it will not rise straight up instead it will
rise at an angle, as shown in figure 5(a). In order to get to this position mathematically,
we first rotate the vector by desired amount (the elevation input) around the x axis and
10
Effect Symbol Matrix Extrinsic rotation axis
Optic axis misalignment CA R
M
z
Tube Flexure TF R
F
x
Elevation setting E R
E
x
Az-El non perpendicularity NPAE R
N
z
Azimuth setting A R
A
z
Mount tilt(NS) AN R
X
x
Mount tilt(EW) AW R
Y
y
Table 1: Summary of geometrical error effects
then rotate it about y axis by the amount of non-perpendicularity (NPAE ) to get the
desired tilt effect. The final orientation of tube will be exactly the same as one would
visualize when a single rotation is provided to an imperfect telescope. The xyz axes do
not represent the telescope.
On a telescope that suffers from NPAE, the final telescope vector can be given as:
P
N
= R
A
R
N
R
E
P
0
=
NP AE sin A cos E + cos E sin A
NP AE sin E sin A + cos A cos E
sin E
(10)
Using eqn. 6 and 7:
A
0
=
n sin E cos A + cos E sin A
n sin E sin A + cos E cos A
(11)
E
0
= E (12)
Although the NP error will influence the vertical as well as azimuthal orientation,
given the fact that the errors are usually very small in magnitude, R
N
has been taken as
given below.
R
N
=
1 0 NP AE
0 1 0
NP AE 0 1
(13)
This is the reason why the effect of NP over elevation can be neglected and E
0
comes
out to be independent of E in eqn 12.
Using the difference formula for tangent
tan(A
0
A) =
tan A
0
tan A
1 + tan A
0
tan A
(14)
and again taking the assumption that the error in Azimuth is small, letting
tan(A
0
A) A
0
A (15)
On substituting eqn. 11 in 14 (using 15) and simplyfying, we obtain the result:
A = NP AE tan E (16)
Continuting the previous discussion, similar is the case with NS and EW mount tilt
errors. Although it is obvious for someone on the ground that the mount tilt will effect
11
Effect ∆A ∆E
A encoder index error IA 0
E encoder index error 0 IE
Optic axis misalignment CA sec E 0
Tube Flexure 0 T F cot E
Az-El non perpendicularity NP AE tan E 0
Mount tilt(NS) AN sin A tan E AN cos A
Mount tilt(EW) AW cos A tan E AW sin A
Table 2: Error terms
both the telescope’s mount axis and the line of sight (tilting the telescope plane), we are
not considered with the telescope’s plane. Our only concern is what happens to the line
of sight because of the error. Hence we rotate the LOS vector by the respective error
amounts around x and y axis to emulate the effect of mount tilt. A natural question
then arises, what if some other rotation is given after mount tilt, now that in reality, the
telescope’s alt and az axes have tilted? The answer is, although we rotate the LOS vector
about the fixed extrinsic axes, we also keep in mind, what physical changes the actual
errors are making in reality. So, no rotation is given about an axis has previously been
transformed. E.g. Elevation setting is done before including NPAE since NPAE tilts the
elevation axis. The only rotations a telescope is capable of having is around Alt and Az
axes. They have already been given before the tilt effect is included.
The gist of this discussion is, there is a specific order to the rotations of the line of
sight vector if the effect of all the errors are to be considered.
The final telescope vector is given as:
P
T
= R
Y
R
X
R
A
R
N
R
E
R
F
R
M
P
0
(17)
If all the error amounts listed in table 1 were known one could in principle calculate
the actual pointing direction using eqn. 17. Since they are not, a common strategy is to
evaluate the effect of each error on altitude and azimuth (as was done for NPAE, eqn.
16), combine all the effects and then do some sort of statistical fit using available pointing
data (figure 4) to determine the error estimates. These effects are listed in table 2.
We finally obtain the relations talked about in eqn. 1 and 2.
d
A = f(A, E) = IACA sec E+NP AE tan E +AN sin A tan E +AW cos A tan E (18)
and
d
E = g(A, E) = IE T F cot E + AN cos A + AW sin A (19)
d
A and
d
E has been used instead of A and E because here we have excluded the
random error terms η and ξ. The ’hat’ symbol represents that the functions are our
estimates of the actual error. More on this will follow in subsequent sections.
4 Model Fitting
A derivative of multiple regression model was fit to the pointing data using least-squares
minimization technique. Before getting to the results obtained after the fitting, it is
12
necessary to examine the form of the model, the assumptions involved and the er-
rors/limitations of the approach.
4.1 Multiple Regression
In a multiple regression model we have the following components:
X being an n ×k matrix. n represents the number of observations and k represents
the number of independent variables for each of the n observation. E.g. sec E,
tan E etc. in our case for each of the observation in the dataset. (Accurate example
to follow later).
y being a n ×1 vector of observations on the dependent variable (n different obser-
vations of A or E in our case).
being a n × 1 vector of random disturbances (which are inherent to the model,
similar to η or ξ).
β being a k ×1 vector of unknown population parameters that we want to estimate
(e.g. NP AE or CA).
This is can be represented as:
y = Xβ + (20)
This is assumed to be an accurate description of the properties of the popuation
(all the stars in the sky). Our goal is to obtain estimates of β, the population
parameter vector.
4.1.1 Criteria of estimate
Our estimates for the population parameter β are given by
ˆ
β or b. Now when we predict
the output ’y’ using the same values that were used to fit the model, we get an estimate
of the observations of the dependent variable - ˆy i.e.
ˆy = Xb (21)
We want to minimize the difference between ˆy and y. There are many notions of such
least-distance. One of them is to minimize the summation of the square of errors known
as RSS (Residual Sum of Squares) given by
P
n
i=1
e
i
, where e
i
is simply y
i
ˆy
i
.
In matrix form:
RSS = e
T
· e = (y Xb)
T
· (y Xb)
= y
0
y 2b
0
X
0
y + b
0
X
0
Xb
(22)
Using matrix differentiation, differentiating above expression w.r.t. b and equating it
to 0, we get our estimates for β.
b = (X
0
X)
1
X
0
y (23)
13
4.1.2 The Gauss-Markov assumptions
These are the assumptions involved. These assumptions are needed to justify a number
of properties of least-squared estimators which make them such a popular choice. These
properties are listed in section 4.1.3.
It is important for one to not confuse e (residual) and (inherent random disturbance)
and not use the results about one for the other.
1. The zero conditional-mean assumption which states that the random disturbances
average out to 0.
E(|X) = 0 (24)
This also implies from eqn. 20 that:
E(y) = Xβ (25)
2. Homoskedasticity of the disturbances and no autocorrelation.
E( ·
T
|X) = σ
2
I (26)
·
T
simply gives the variance-covariance matrix of the disturbances. Homoskedas-
ticity implies that the disturbances have constant variance. No autocorrelation
implies that the disturbances in each of sample’s observation have no correlation
with each other. Hence the variance-covariance matrix is a diagonal matrix with
constant entries.
This assumption is important when one wants to calculate confidence intervals for
the predicted values.
3. Linearity between X and y which enables us to write eqn. 20. One has badly be-
haved residuals (ideally they should be randomly distributed) when this assumption
is violated. The cure then is to transform either LHS or RHS of the equation, say
replace y by log y or
y
4. Normal distribution of the disturbances.
|X N(0, σ
2
) (27)
This assumption is taken to make hypothesis testing easier (obtaining p-value using
t-distribution) and calculating confidence intervals for parameter estimates. It can
be justified on the basis of Central Limit Theorem.
4.1.3 The Gauss-Markov Theorem and related properties
This theorem states that the OLS (oridnary least squared) estimator is the Best Linear,
Unbiased, Effiecient estimator. This is why it is used for almost every fitting purposes. It
is important to state this theorem and it’s properties because it is very easy in statistics
to mistake one formula for the other or use two terms interchangibly when they actually
mean different things.
14
1. b is an unbiased estimator of β. That is, using eqn. 20 and 23
b = β + (X
0
X)
1
X
0
(28)
E(b) = E(β) + (X
0
X)
1
E(X
0
)
= β
(29)
since E(X
0
) = 0
This tells us that mean of estimated value of β approaches the true value over
several samples taken from the population.
2. b is a linear function of disturbances as shown by equation 28.
3. b has minimum variance among all linear, unbiased estimators.
4.1.4 ANOVA
ANOVA refers to ’Analysis of Variances’. When a model is fit to a data, the total
variability in the data can be divided in two parts: the variance which is explained by
the fitted model and the variance which is left unexplained.
Figure 7: Variability in the data
SSM =
n
X
i=1
( ˆy
i
¯y) (30)
SSE =
n
X
i=1
( ˆy
i
y
i
) (31)
SST = SSM + SSE (32)
15
DF M (Degrees of freedom for the model) = p 1 (33)
DF E (Degrees of freedom for error) = n p (34)
DF T (Total degrees of freedom) = n 1 (35)
Hence,
MSM (Explained variance) = SSM/DF M (36)
MSE (Unexplained variance) = SSE/DF E (37)
This concept is showed in figure 7.
This is particularly important for this enables us to calculate the F-statistic or perform
a F-test, which is used to check the following null hypothesis which checks whether or
not the y values bear any significance to the terms on RHS (feasibiity of the fit).
H
0
: β
1
= β
2
= . . . β
j
= 0
H
a
: β
j
6= 0 for at least one j
F-statistic is calculated using MSM/M SE. Generally a high value of F-statistic
indicates that null hypothesis should be rejected (more specifically, one has to calculate
a (at least) 95% confidence interval using F-distribution and then check if F-statistic lies
in the interval or not. If not, H
0
is rejected).
In our case of regressing pointing data, a ridiculously high F-statistic is obtained
(> 10
4
). It is expected because the terms which we are using to relate the errors are
actual, physical, geometrical errors. Such a high value shows that there is absolutely
nothing empirical in the model.
The goodness of fit of the model is calculated by R
2
statistic which is nothing by
the ratio of SSM/SST i.e. the fraction of total variance explained by the model. It is
worthwhile noting here that, when more terms are added to the model, irrespective of
whether or not they bear any significance, R
2
will increase. This is why many statisticians
prefer a statistic called adjusted R
2
which penalizes the model for adding more terms and
is given as
¯
R
2
= 1 (1 R
2
)
n1
np
.
Although R
2
can be calculated in our case, it too is very high (> 0.99) and bears less
significance, more on which will be clear in next section.
By far the most important parameter of ANOVA is MSE. It is used to estimate the
variance of random disturbances in the model
[5]
- σ
2
.
s
2
=
b
σ
2
= MSE =
RSS
n p
(38)
MSE or RMSE (
MSE) is one absolute fit of criterion. If any changes to the model
decrease the RMSE, those changes are desirable, whether it is adding/removing terms or
transforming the values. RMSE for the fit model in this study was around 0.4 (better
than TPOINT estimate) arcseconds (although the way it was obtained differs a little).
The variance-covariance matrix of the residuals can also be found out using MSE.
Following proof (39 through 41) which is often skipped in a lot of literature, reiterates
the importance of aforementioned assumptions.
16
e = y ˆy = Xβ + Xb
= Xβ + X(X
0
X)
1
X
0
y
= Xβ + X(X
0
X)
1
X
0
(Xβ + )
= (I H)
(39)
Matrix X(X
0
X)
1
X
0
is called the hat-maker matrix since it puts a hat on whatever it
operates on e.g. ˆy = Hy. It’s important property is that it’s idempotent and symmetric
i.e. H · H
T
= H.
Now,
V ar(e) = e · e
T
= (I H)(I H)
T
·
T
= (I H) ·
T
= (I H)σ
2
(Using assumption 4)
(40)
Since by eqn. 39, s
2
is an estimate of σ
2
,
Est. V ar(e) = s
2
(I H) (41)
If our assumption is perfectly valid (which is not in most cases) eqn. 41 is a pretty
good estimate, but one will see that explicitly calculating LHS of eqn. 40 and RHS of
eqn. 41 will not yield the same result. This check has been performed in the study and
hence mentioned here.
Lastly, we are also interested in checking the variances of our parameter estimates i.e.
b.
V ar(b|X) = E[(b E(b|X))(b E(b|X))
T
|X] (42)
= E[(b β)(b β)
T
|X] (43)
= σ
2
(X
0
X)
1
(44)
The conditional variance tells us, how much variance is left if we use E(Y |X) to predict
Y.
SE (Standard Error) = Est. V ar(b|X) = s
2
(X
0
X)
1
(45)
There is often a confusion between standard error and standard deviation. b is not
only an estimate of β but it is also a random variable itself, since it depends on the choice
of the sample X from the population. This is why we have used ’conditoned’ mean and
’conditioned’ variance everywhere. b(X) is an estimate (of β, as function of X) and b is a
random variable. So, the standard error of an estimate (b(x)) equals standard deviation
of the random variable (b). Standard error simply shows how far away your estimate is
from the conditoned expected value (mean of b over all possible X’s), which here, is also
the true value of b (β) as given by eqn. 29.
17
4.1.5 Hypothesis Testing
An example of hypothesis testing was shown in the previous section when F-test was
done. It is natural to question what if one wants to test the significance of one particular
coefficient in b. Using the method described here one can assess the significance of a
coefficient and calculate a confidence interval for it.
From the discussion in previous section (esp. eqn. 28, 29 & 44) i.e. if we are willing
to assume that the disturbances() are normally distibuted, we can write the following:
b = N(β, σ
2
(X
0
X)
1
) (46)
We state the following null-hypothesis:
H
0
: ˆy doesn
0
t depend on β
i
i.e. β
i
= 0
H
a
: β
i
6= 0
In order to check the significance of β
i
we look at it’s estimate b
i
. We want to determine
if b
i
is far enough from 0 so that we can be confident that it is not zero. How far is far
enough? In practice, we calculate a t-statistic given by:
t =
b
i
SE(b
i
)
(47)
where SE(b
i
) is given by diagonal element (i,i) of eqn. 45. It shows the no. of standard
deviations that b
i
is far away from 0. If it really is independent eqn. 47 will have a
t-distribution with n p degrees of freedom. Using this distribution we calculate the
probability of of b
i
|t| which is called as the p-value. This is essentially asking "what
are the chances of obtaining an even more extreme value of b
i
when it is assumed to be
0?”. A small p-value indicates a relation between response and predictor variables.
Using the concept of SE and p-values we can calculate 95% confidence interval for b
values. which is given as:
(1 α)% confidence interval for b
i
= b
i
± SE
b
i
× t
α
2
,np
(48)
if α = 0.5 it tells us the interval in which we have 95% probability of finding the true
value of b
i
, it simply gives us an idea of how far away b
i
is from β
i
.
4.2 Regressing the pointing data
The form of multile regression discussed earlier (eqn. 29) can not be used directly to
fit a model to pointing data. It would be ideal if we wanted to minimize the sum of
squared error in either of eqn. 18 or 19 only i.e. (f
ˆ
f)
2
or (g ˆg)
2
but the coordinates
represent a vector and we want to minimze the error distance between two points (true
and estimated) in the sky. The distance between two nearby points in the sky is given
as:
D =
p
(∆A cos E)
2
+ (∆E)
2
(49)
The above formula can be obtained fairly simply from spherical trigonometry. It is (∆D)
2
that we want to minimize. Further as shown in table 2, there are some coefficients that
are common for both A and E. Running two different models for them does not allow
for a common coefficients.
18
The pointing data has the true star coordinates and the coordinates shown by the
encoder. For constructing a regression model we will calculate the difference between
those two values, (∆A, E). This represents the y. Using the formulations developed in
eqn. 18 & 19, we will estimate those true errors as (
d
A,
d
E) using regression method
discussed below. This represents our ˆy. It is then trivial to see that the pointing errors
as in eqn. 49 is the error in model error estimates - ∆(
d
A). For brevity we shall adopt
to using f and g instead (as shown in eqn. 29).
In order to get across this difficulty, we will develop a model that looks like the
following
[6]
:
X
A
=
"
(1)
i
(0)
i
(sec E)
i
(0)
i
. . . (cos A tan E)
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
#
n×p
X
E
=
"
(0)
i
(1)
i
(0)
i
(cot E)
i
. . . (sin A)
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
#
n×p
(50)
i represents the number of observation, going from 1 to n
β
p×1
=
IA IE CA T F NP AE AN AW
T
(51)
f = X
A
β + η
g = X
E
β + ξ
(52)
It is worthwhile to note that each of X
A
β and X
E
β fit our usual notion of a model (with
η and ξ representing the familiar ). The term in β which is not present in a particular X
has the column corresponding to it, all zeroes. E.g. column 3 in X
E
is 0 since CA does
not affect E and column 4 corresponding to X
A
is 0 since TF does not affect A (please
refer table 2). It is also noteworthy that it is not necessary for the corrections in altitude
and azimuth to share the same coefficient (e.g. they can be different AN
az
and AN
alt
)
and depends on the type of telescope
[7]
. While constructing two separate models, it was
seen that the two diifferent coefficients have nearly the same value (varying only by few
milli arcseconds). Hence it is a same assumption to assume same coefficients for both the
correction in single model approach.
Since we do not want to run two separate models, in order to capture the vector-like
notion and minimize the squared error as shown in eq. 49, we formulate the model as:
f
g
2n×1
=
X
A
X
E
2n×p
β
p×1
+
η
ξ
2n×1
(53)
Both the models have been stacked on top on each other, giving us the familiar form of:
y = Xβ +
The only difference being that the dimensions have changed from n to 2n. Replacing that
in equations mentioned in section 4.1 will give us the correct estimates (for MSE etc.).
19
One can easily verify that sum of squared error is as expected by eqn. 49:
RSS = e
T
· e = (y ˆy)
T
· (y ˆy)
=
(f
ˆ
f) (g ˆg)
·
(f
ˆ
f)
(g ˆg)
= (∆f)
2
+ (∆g)
2
(54)
The only thing that seems off is the missing cos E term. That can be accounted for if eqn.
2 is multiplied by cos E on both sides (replace f by f cos E). That will make absolutely
no difference in the b values.
Replacing X in eqn. 23 using eqn. 53, we get the estimated value of β:
b = (X
T
A
X
A
+ X
T
E
X
E
)
1
(X
T
A
f + X
T
E
g) (55)
The dimensions of entities in equation 20. has changed, we have done nothing new
here. All the other estimates have been calculated using the methods described in previous
section. E.g. MSE is now given as:
MSE =
RSS
2n p
(56)
and residual pointing error is given as:
P =
r
(∆f cos E)
2
+ (∆E)
2
n
=
r
2n p
n
MSE (57)
5 Obtaining statistics
All calculations were done using ipython and Jupyter notebooks, utilising standard plot-
ting, machine learning and numerical computation libraries - scikit-learn, numpy, scipy,
matplotlib, pandas. Such models are usually fit using TPOINT but manual statistical
analysis was done because TPOINT:
does not give any information about the distribution of estimated parameters (con-
fidence intervals etc).
does not do any kind of cross-validtion to estimate MSE for the population (all-sky).
may (and does) miss some significant terms.
gives no idea about underfitting/overfitting.
does not check for normality of residuals (and other assumptions).
Initially two separate models were also fit since the correlation between f and g was
nearly negligible (< 0.02). Based on the assumption that minimizing f and g would
minimize their squared sum. Although this approach produced comparable results, it is
not technically accurate since Corr(X, Y ) only measures the linear dependence between
two vectors. If two sets of values are independent then they have Corr(X, Y ) = 0 but
vice versa is usually not true. Hence we shall stick to the model developed above.
On fitting only the 7 geometrical terms statistics were obtained as shown in fig.
8. RMSE is about 27 arcsecond and sky pointing accuracy is about 36 arcseconds.
20