Summer Research Fellowship Programme of India's Science Academies 2017
Unsupervised Anomalous
Trajectory Detection for
Crowded Scenes
Final Technical Report
Submitted in fulfilment of:
IAS-INSA-NASI Summer Research Fellowship Program
2017
Research work conducted at the:
Virtual Reality Laboratory
Indian Institute of Space Science and Technology,
Thiruvananthapuram
Author:
Deepan DAS
Indian Institute of
Engineering Science and
Technology, Shibpur
Appl. No.: ENGS3935
Supervisor:
Dr. Deepak MISHRA
Indian Institute of Space
Science and Technology,
Thiruvananthapuram
May-July, 2017
Contents
1 Introduction 7
1.1 Crowded Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 General characteristics . . . . . . . . . . . . . . . . . . . . 7
1.2 The need for Anomaly Detection . . . . . . . . . . . . . . . . . . 8
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Trajectory Extraction 12
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Trajectory Representation 14
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Spatio-temporal Density . . . . . . . . . . . . . . . . . . . . . . . 16
4 Clustering 17
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Mean Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.1 Non-Parametric Density Estimation . . . . . . . . . . . . 18
4.2.2 Kernel Density Estimation . . . . . . . . . . . . . . . . . 19
4.2.3 Density Gradient Ascent . . . . . . . . . . . . . . . . . . . 20
5 Anomalous Trajectory Detection 22
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Shannon Entropy . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.2 Proposed Approach . . . . . . . . . . . . . . . . . . . . . 23
1
6 Results 25
6.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.1.1 The UCF Database . . . . . . . . . . . . . . . . . . . . . 25
6.1.2 RPI Database . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2.1 Crowded Subway Exit Sequence . . . . . . . . . . . . . . 27
6.2.2 Crowded Intersection Sequence . . . . . . . . . . . . . . . 30
6.2.3 Pilgrim Sequence . . . . . . . . . . . . . . . . . . . . . . . 33
6.2.4 Marathon Sequence . . . . . . . . . . . . . . . . . . . . . 33
6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7 Summary 36
8 Conclusion 37
8.1 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
8.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8.3 Further Improvements . . . . . . . . . . . . . . . . . . . . . . . . 38
2
List of Figures
1.1 Discriminative Local and Global Anomalies . . . . . . . . . . . . 8
1.2 General Overview of the Work . . . . . . . . . . . . . . . . . . . 11
2.1 Tracking the centroids in each box for the Station Video . . . . . 13
2.2 Extracted Trajectories for the Marathon video . . . . . . . . . . 13
6.1 Crowded Subway Exit scene and extracted trajectories . . . . . . 27
6.2 Density Based Clustering and Anomalies - Sequence I . . . . . . 27
6.3 Shape Based Clustering and Anomalies - Sequence I . . . . . . . 28
6.4 Position Based Clustering and Anomalies - Sequence I . . . . . . 28
6.5 Standard Deviation Based Clustering and Anomalies - Sequence I 29
6.6 Overall Anomalies and Scenario Comparison - Sequence I . . . . 29
6.7 Crowded Intersection scene and extracted trajectories . . . . . . 30
6.8 Density Based Clustering and Anomalies - Sequence II . . . . . . 30
6.9 Shape Based Clustering and Anomalies - Sequence II . . . . . . . 31
6.10 Position Based Clustering and Anomalies - Sequence II . . . . . . 31
6.11 Standard Deviation Based Clustering and Anomalies - Sequence II 32
6.12 Overall Anomalies and Scenario Comparison - Sequence II . . . . 32
6.13 Overall Anomalies and Scenario Comparison - Sequence III . . . 33
6.14 Overall Anomalies and Scenario Comparison - Sequence IV . . . 33
3
List of Tables
6.1 Comparative Results . . . . . . . . . . . . . . . . . . . . . . . . . 34
4
Declaration
This Report is a presentation of my original research work. Wherever contri-
butions of others are involved, every effort is made to indicate this clearly, with
due reference to the literature, and acknowledgement of collaborative research
and discussions. The work was done under the guidance of Dr. Deepak Mishra,
at the Virtual Reality Laboratory, Indian Institute of Space Science and Tech-
nology, Thiruvananthapuram.
Deepan Das
IIEST Shibpur
Appl. No.: ENGS3935
In my capacity as supervisor of the candidate’s report, I certify that the above
statements are true to the best of my knowledge.
Dr. Deepak Mishra
Associate Professor
IIST Thiruvananthapuram
email:deepak.mishra@iist.ac.in
5
Acknowledgement
I would like to express my deepest appreciation to the Indian Academy of Sci-
ences, the National Academy of Sciences in India and the Indian National Sci-
ence Academy for having offered me the Summer Research Fellowship and an
opportunity to conduct research work under the guidance of Dr. Deepak Mishra
at the Indian Institute of Space Science and technology, Thiruvananthapuram.
Furthermore I would also like to acknowledge with much appreciation the
crucial role of Dr. Deepak Mishra, in having guided me well throughout the
entire duration of the project work with immaculate academic suggestions and
to have given the permission to use all required equipment and the necessary
materials in the Virtual Reality Laboratory at IIST Thiruvananthapuram to
complete the task Unsupervised Anomalous Trajectory Detection in Crowded
Scenes. I have to appreciate the guidance given by Dr. Gorthi Subramaniam
and all other members of the Virtual Reality Laboratory for relevant insights
and meaningful discussions during the project.
I would also like to extend my gratitude to Dr. Kurien Isaac, IIST Thiru-
vananthapuram for arranging all the necessary facilities for my accommodation
at IIST Thiruvananthapuram.
6
Chapter 1
Introduction
1.1 Crowded Scene
Computer vision research aims to converge at human like ability for extracting
useful information of behavioural patterns, and interpretations of abnormali-
ties. However, Psychosocial research[37] suggests that there are certain glaring
limitations relating to human abilities when it comes to observe simultaneously
changing signals or patterns.The automatic detection of patterns in crowded
scenes that do not conform to a set of established dominant patterns has become
extremely crucial for various sites including airports, railway stations, pilgrim-
ages and possibly, even naval routes. A crowd can be viewed as a considerably
large number of individuals in the same physical environment, characterized by
a largely dominant activity pattern when viewed as a whole. However, a crowd
also contains ambiguities at an individual level, that are determined by the dy-
namics and nature of movements exhibited by the individuals that form the
crowd. Crowds can be considered to be organised in a hierarchical fashion: as
a collection of individuals bounded by the same local constraints as the groups
they belong to, that further combine to form a crowded scene. This hierarchical
representation of the physical and behavioural attributes of a crowded scene
enables us to analyse and predict proceedings at various functional levels of the
crowd. The fact that, crowded scenes are composed of a large number of people
exhibiting a varied range of behaviours in a relatively constrained space makes
it difficult to specify the attributes of a pedestrian that makes up a crowd. Data
acquisition from such scenes has therefore, become extremely important for a
holistic understanding of all the implied phenomena in a crowd scene.
1.1.1 General characteristics
Pedestrians, the elementary constituents of a crowd, display an extensive range
of activities, that is both unpredictable and random at the same time, unless
they are constrained by external factors. They tend to appear and disappear
at relatively precise and repeating locations, that act as sources or sinks for
7
Figure 1.1: Discriminative Local and Global Anomalies
crowd flow. These locations are mainly doors, gateways or positions at the edge
of the frames. Pedestrian movement may also involve remaining stagnant at
a particular position in the frame for a sustained duration of time or moving
periodically, depending on various conditions. Most crowded videos exhibit
random patterns with no major connectivities, except a few attributes that are
distributed over a certain sections of people. These groups of people tend to
form groups, while behaving like a coherent cluster in a crowd, formed of more
such distinct clusters. [26]
1.2 The need for Anomaly Detection
The primary reason to study crowds is safety. Amongst other things, the need
to detect and prevent possible disasters drives the work of Computer vision re-
searchers across the world. Most of the crowd analysis research is initiated by
the need to develop mechanisms that keep individuals safe and secure. Tragic
incidents, like the Kumbh Mela Stampede[19], the Boston marathon bombing[7]
or the Hajj stampede[33] vindicate the need to understand crowd dynamics bet-
ter and with increasing accuracy. The study is also critical in the development
and design of public spaces that can cope up with the exponentially rising pop-
ulace of the world. The complexity of crowd dynamics and the heterogeneity of
each individual in the scene makes it all the more challenging. Computer graph-
ics and vision technologies help in understanding crowd management, anomaly
detection, activity recognition and crowd understanding amongst many other
things. Enormous amounts of data being captured every moment makes it
impossible for manual methods and techniques to follow and analyse motion
patterns. In this regard, the process of making intelligent decisions has to be
aided with computer-based technology. Research in video surveillance is for-
tunately, converging at automating this process, with increasingly minimal hu-
man intervention. The proposed work, aims to detect anomalous behaviour in
a crowded scene by detecting those trajectories of individuals in a crowd that
do not conform to the representative and characteristic clusters, that develop
in the crowded scene.
8
1.3 Related Work
The information contained in a crowded video can be utilised for anomaly detec-
tion in various ways. In general, given a dataset D = {x
, x
, ..., x
n
}, where the
general object x
i
may belong to an image patch, a pixel, a spatio-temporal 3-D
brick, a trajectory or a texture, our aim is to find a function that discriminates
an element from the set to be Normal or Abnormal.
f : x {normal, abnormal}
In one of the first works related to the topic, Ketterlin (Ketterlin 1997) con-
siders generic sequences (thus modelling trajectories as sequences of points)
together with a conceptual hierarchy over the sequence elements, used to com-
pute both the cluster representatives and the distance between two sequences.
Automatic abnormality detection finds applications in crowded surveillance[31],
social media behaviour monitoring[40], event retrieval[9]. Earlier approaches
like [31] focused on generating discriminative models to divide into semantic
parts. Some of the significant tasks in anomaly detection in crowded scenes
include those of Kim et al.[27] where a mixture of probabilistic principal com-
ponent analysers is used to learn patterns of local optical flow and a model is
obtained. Sun et al.[38] use a mixture of spatio-temporal saliency and motion
vectors over spatio-temporal volumes to measure the global intensity of anoma-
lies in the entire scene. Cong et al[16] use a multi-scale histogram of Optical
Flow as the feature descriptor and use it as a basis for sparse reconstruction. A
sparse reconstruction cost is computed for every incoming frame to determine
the normalcy of the frame. Wang et al. [39] use clustering to group pixel s
into activities and short video clips into different interactions and propose dif-
ferent hierarchical Bayesian models for learning crowd behaviour. Earlier, Ali
et al.[1] use Lagrangian Particle dynamics to model coherent crowd flow as fluid
flow. Unsupervised methods, using Neural networks have been developed, like
the Self Organizing Maps method [30], or the Learning Vector Quantization[25]
method. However, implementing these are difficult, as a general property of
neural nets. However, for real motion sequences, the convergence of these tech-
niques is slower. Trajectory Directional Histogram(TDH) can be used to encode
the statistical directional distribution of the trajectories[29], but does not encode
the spatial information. For data represented with a single Gaussian distribu-
tion, PCA coefficients can be used to represent trajectories as sub-trajectories[6].
Antonini et al.[4] transform the input trajectories using Independent Compo-
nent Analysis and then use Euclidean distance to find the similarities between
various trajectories. The features, useful for representing a trajectory can be
anything ranging from speed and angle [21]. The use of density feature, mean
position and direction [36] suits the task well. For the trajectory clustering task,
several methods including fuzzy k-means [24], k-medioids[8], spectral clustering
[5] and hierarchical clustering have been reference. However, these clustering
algorithms either require the explicit measure of distance between trajectories
or a predetermined number of clusters, and sometimes even both. Due to the
9
lack of a well-defined definition of a good distance measure, or a predetermined
optimum number of clusters, we resort to the non-parametric method of Mean
Shift Clustering[3]. Considering the problem of anomaly detection, the Shan-
non entropy has been a successful and popular tool in various video processing
methods [17]. The Shannon entropy has demonstrated it’s abilities for proper
video key frame selection and visualization [42].
1.4 Overview
Our work essentially proposes a clustering-based anomalous trajectory detector,
enabling the surveillance system to monitor anomalies at a very local level. The
entire process is unsupervised and depends on a process flow of trajectory extrac-
tion followed by trajectory modelling, based on a certain number of features that
are extracted. These feature vectors for each trajectory are then clustered inde-
pendently using a non-parametric, Gaussian kernel based Mean shift clustering
method. After having obtained the cluster centres, the anomalous trajectories
are detected and brought out from within each cluster, based on an Entropy
score for each feature space. The scope of this work extends beyond crowded
scene analysis and can also be used for various other scene-understanding prob-
lems like naval traffic studies, animal migratory routes study. The report has
been organized as follows: Chapter 2 covers the Trajectory extraction, followed
by the Feature extraction task in Chapter 3. Chapter 4 covers the mean shift
clustering task, followed by the anomaly detection algorithm in Chapter 5. The
report concludes with results and a concluding comment.
10
Figure 1.2: General Overview of the Work
11
Chapter 2
Trajectory Extraction
2.1 Introduction
One of the challenges facing an anomalous trajectory detector mechanism such
as this is related to the collection of trajectory or behavior data from the video
itself. Advances in real-world capturing technologies have led to large databases
of crowd videos. At a broad level, pedestrian tracking algorithms can be clas-
sified into as either online or offline trackers. Online trackers use only the past
and present frames for tracking, whereas the offline trackers use future frames
to track, and hence, is not of much use to us, here. Research in this field has
progressed from developing efficient tracking techniques for relatively isolated
objects to identifying patterns of motion in highly dense crowded scenes, where
it may seem very difficult to accurately track individual objects[11]. Object mo-
tion may be represented by trajectories of feature descriptor points, segmented
object contours [2], or object model centroids [15].
2.1.1 Representation
Formally, a trajectory is defined as a set of points: {(x
t
, y
t
), t = T
init
, ..., T
final
}.
These set of points represent discrete spatial locations spread over a temporal
axis. It is generally expected that feature points identified on the same physical
object or coherently moving objects will yield similar trajectories. This prop-
erty has been used in the block-tracking mechanism described in the following
section.
2.2 Approach
As discussed above, motion trajectories are an effective way to capture complex
spatio-temporal patterns in a video scene.
For the problem of anomalous trajectory detection, the proper extraction
of trajectories is an essential task at hand. For this purpose,a block-tracking
12
Figure 2.1: Tracking the centroids in each box for the Station Video
Figure 2.2: Extracted Trajectories for the Marathon video
mechanism has been put to use [36]. Since the videos under consideration are
densely populated with very small moving objects, it is generally accepted that
standard object tracking algorithms would perform poorly. The challenges fac-
ing such scenarios would mainly be high degree of occlusion. The technique used
here focuses on tracking non-overlapping blocks of a fixed size. Each frame is
divided into these blocks of size p x p. For each block, Harris Corner detector[23]
is used to detect low-feature descriptor points. The centroid of all these points
in a block is then tracked using the standard Kanade-Lucas-Tomasi tracking al-
gorithm. This tracking algorithm yields a trajectory for each such block. Since,
our work is to identify clusters from these trajectories, there is no need to link
or fix any preconditions for these tracks. However, trajectories only above a
certain threshold γ are taken into consideration. The block method helps in ob-
taining long, reliable tracks, that would otherwise become a cumbersome task,
implicating individual tracking.The challenges facing a high-density crowd also
includes the appearance of new feature points and disappearance of tracked
features points over time. In order to adjust for this problem, the existing set
of blocks is refreshed after every n number of frames, depending on the level
of activity in the video. The blocks that went out of the frame are removed,
whereas newly appearing ones are added to the frame.
13
Chapter 3
Trajectory Representation
3.1 Introduction
Computer vision tasks, when incorporating the use of tunable parameters, be-
come cumbersome. The use of feature-space in the analysis of images and frames
in a video becomes a reliable approach. However, it should be first ensured that
the features extracted provide a reliable representation of the input and that
only a very few parameters, that are derived from basic intuition are involved
in the process. A feature-space is essentially a mapping obtained through the
processing of data in small sets at a given time. In this way, each image or frame
can be represented as a point in a multi-dimensional space of the parameter.
As mentioned earlier, the clustering process cannot be implemented directly
on the trajectories. We need to extract relevant feature information from the
trajectories for the clustering operation. Recently, Khalid and Naftel [7] showed
that time-series modeling can be applied to object trajectory classification and
pattern discovery. In their work, high-dimensional trajectory data is projected
to a suitable lower-dimensional coefficient space. Classification and pattern dis-
covery analysis is performed in the lower-dimensional space. They concluded
that motion trajectories were well represented by frequency-domain coefficients.
The coefficient vectors were used as input to a neural network algorithm that
learned similarities between object trajectories in an unsupervised manner. In
their work related to counting pedestrians in a video sequence, Antonini and
Thiran [8] showed that motion trajectories projected onto the independent com-
ponent analysis (ICA) space yielded a better representation for clustering than
the original time-series data. Essentially, it can be concluded that trajectory
data is too noisy and large to be operated upon. A proper representation has
to be obtained from the trajectory data. For this, the extraction of density,
location and direction features [36] prove to be useful. We also introduce the
features regarding Standard deviation of the trajectory coordinates and a spatio-
temporal density parameter, to take into account the varying densities across a
14
trajectory, both along spatial and time domains. We discuss each feature one
by one.
3.2 Shape
The trajectory sketches a particular shape across the spatio-temporal scene, and
this shape can be represented by a polynomial function. The coefficients of this
function can be separately calculated for both the x and y coordinates. The
shape feature f
s
is represented as:
f
s
= [a
, ..., a
, b
, ...b
]
Where, the coefficients a
i
and b
i
are fitted over the trajectory data as the
following:
x(t) = a
+ a
t + a
t
+ a
t
y(t) = b
+ b
t + b
t
+ b
t
3.3 Location
Trajectories, separated by a considerably large distance may seem to belong to
the same class die to similarity in other features. They may be separated by
large distances but it may so be the case that they exhibit similar motion. To
account for this, we use the mean x and y coordinates to construct our position
feature vector f
l
= [mean
x
, mean
y
]
3.4 Density
The inclusion of this feature is inspired by the successful implementation of
density-based clustering methods. A multi-scale density feature from each tra-
jectory is extracted. For a trajectory T
j
, varying densities, depending on the
size of it’s neighbourhood, are computed as follows:
n
T,j,
= |{T
i
|∀i 6= j, d(f
j
, f
i
) < }|
where, d() denotes the Euclidean distance and the feature vector based on this
density is represented as:
F
j
= [n
j,
, n
j,
, n
j,
]
The three values correspond to the three values of epsilon.
15
3.5 Standard Deviation
Standard Deviation is an extremely popular measure that is used to quantify the
amount of variation or dispersion in a set of time-series data. A low standard
deviation indicates that the data points tend to be close to the mean of the
set, while a higher value indicates a wider spread. Trajectory data, modelled as
time-series data can also be characterized by the standard deviation, as follows:
E[X] = µ
σ =
p
(E[(X µ)
])
3.6 Spatio-temporal Density
It is usually found that most abnormalities occur in densely populated crowded
regions. Kolios et al.[22] propose a system to support density queries over a
database of uniform speed rectilinear trajectories, that is efficiently able to dis-
cover the spatial locations where moving objects are dense. In particular, flock
patterns are studied, which can be defined as groups of at least n moving ob-
jects, such that there exists a time interval of width larger than a given threshold
where a;; such objects always lay inside a circle of given radius. Spatio-temporal
clustering also considers spatially and temporally referenced events, instead of
moving objects. Kulldorff[28] developed a well known statistical scan for epi-
demiological data which scans spatio-temporal cylinders , where the rate of
disease cases is higher than outside the cylinder. In this work, we are interested
in distances that describe the similarity of objects along time and therefore are
computed by analysing the way distance between the objects varies. We restrict
ourselves to consider only pairs of contemporary instantiations of objects. The
distance between trajectories adopted here is computed in a most natural way,
as the average distance between objects,
D(τ
, τ
)|
T
=
R
d(τ
(t)
(t))dt
|T |
This metric aids in the visualization of the cumulated spatial distance for each
trajectory across all the frames.
16
Chapter 4
Clustering
4.1 Introduction
Cluster Analysis was first coined in the work by Clements[14] in 1954. Since
then, cluster analysis has progressed rapidly as an excellent tool for data anal-
ysis. Clustering problems are very important and often, very difficult to solve.
A host of different features can be used to map the higher-dimensional data in
terms of the parameters defined. The nature of these parameters depend upon
the application of the Computer Vision task. It is observed that significantly
dominant and usual featutres correspond to the denser regions of the feature
space. Characterization of all data samples by quantizing their degree of belong-
ingness to any of these dense regions underlines the fundamental idea behind
Clustering.
Kernel based methods find their use in a lot of applications. Kernel based meth-
ods can be used along with supervised and unsupervised learning. One of the
ways of using Kernel based methods is to find a kernel density estimate on the
data space and then search the modes of the estimated density. Amongst these
methods, the Mean Shift[18, 10] and the Mountain method[43] are the two most
popular techniques.
4.2 Mean Shift
Fukunaga and Hostetler[18] proposed the mean shift algorithm, which is a pow-
erful and versatile non-parametric iterative algorithm with applications in fields
ranging from Image Analysis, objective tracking, texture segmentation and data
mining.
The first step, is to learn an estimate of the density of data points. If the
input is a set of points, then mean shift considers them to be sampled from
the underlying probability density function. The dense regions in the feature
space correspond to the mode of the probability density function. For each data
point, Mean Shift associates it with the nearby peak of the data set’s probability
17
density function. For each data point, Mean shift defines a window around it
and computes the mean of the data point . Then it shifts the centre of the
window to the mean and repeats the algorithm till it converges. After each
iteration, we can consider that the window shifts to a more denser region of
the dataset. The process converges at the modes of the density estimates which
serve as data cluster centres. We try to understand the mathematical basis
behind this procedurally.
4.2.1 Non-Parametric Density Estimation
Consider P to be the probability that a vector x, drawn from a distribution p(x)
will fall in a given region R is
P =
Z
R
p(´x)d´x (4.1)
If N vectors are drawn from the distribution, then the probability that k of these
N vectors fall in the region is given by the binomial distribution:
P (k) =
N
k
p
k
( p)
Nk
(4.2)
It can be shown that the mean and the variance of the ration k/N from the
properties of binomial probability distribution functions to be:
E[
k
N
] = P (4.3)
var[
k
N
] =
P (1 P )
N
(4.4)
Therefore, as N , the distribution becomes sharper, with decreasing vari-
ance and we can expect a good estimate of the probability P as the ratio of k
and N. On the other hand, if we assume that the region grows smaller, so much
so that p(x) becomes almost uniform, then:
P =
Z
R
p(x)dx
=
p(x)V (4.5)
P
=
k
N
(4.6)
p(x)
=
k
NV
(4.7)
Where, V is the volume surrounding the sample x and N is the total num-
ber of samples. k refers to the number of samples inside V. Equation (4.7)
can be applied to practical density estimation problems, with two fundamental
approaches:
18
We can fix V and determine k from the data, leading to Kernel Density
Estimation, the subject of our approach
We can fix k and determine V from the data, leading to the formulation
of the k-Nearest Neighbour approach.
In the process of clustering our data with the ultimate aim of anomaly detection
in mind, we cannot arbitrarily fix the number of examples that maybe grouped
together with a data point. Hence, the method of Kernel Density Estimation is
used.
4.2.2 Kernel Density Estimation
As mentioned previously, Kernel Density Estimation is the most popular density
estimation method. A KDE can be thought of as a generalization of histograms
to define density estimates. Besides simplifying the mathematical treatment of
densities, more importantly, KDE enables one to use continuous optimization to
find maxima of the density, which is used in the Mean Shift Clustering algorithm.
This is also known as the Parzen window method. This helps us in analysing
the feature space with the underlying density function. The modes are located
among the zeros of the gradient of this density function f(x).
Suppose there are n data points in the d-dimensional space R
d
, then the kernel
density estimator with kernel K(x) and a d x d bandwidth matrix H, computed
in the point x is given by:
ˆ
f(x) =
1
n
n
X
i=1
K
H
(x x
i
) (4.8)
K
H
(x) = |H|
1/2
K(H
1/2
x) (4.9)
The kernel K(x) is a bounded function that satisfies the following conditions:
Z
R
K(x)dx = 1 (4.10)
Z
R
xK(x)dx = 0 (4.11)
Z
R
xx
T
K(x)dx = c
K
I (4.12)
lim
x→∞
K(x)dx = 0 (4.13)
The term c
K
is a constant. We will be using a special class of radially symmetric
kernels, satisfying the following:
K(x) = c
k,d
k(||x||
)
19
Where, c
k,d
is the normalization constant that makes K(x) integrate to zero.
Employing only one bandwidth parameter, we have:
ˆ
f(x) =
nh
d
n
X
i=
K(
x x
i
h
)
Employing the profile notation for bandwidth and the kernel, we get:
ˆ
f
h,k
(x) =
c
k,d
nh
d
n
X
i=
K(||
x x
i
h
||
) (4.14)
4.2.3 Density Gradient Ascent
The gradient of the density function is:
f
h,k
(x) =
c
k,d
nh
d+
n
X
i=
(x
i
x)
´
k(||
x x
i
h
||
) (4.15)
We define the function g(x) and it’s kernel G(x)
g(x) =
´
k(x)
G(x) = c
g,d
g(||x||
2
)
The kernel K(x) was called the shadow of G(x) in a slightly different context[41].
Assuming that the derivative of the kernel profile k exists for all x [, ), we
introduce the shadow kernel
f
h,k
(x) =
c
k,d
nh
d+
n
X
i=
(x
i
x)g(||
x x
i
h
||
)
f
h,k
(x) =
c
k,d
nh
d+
[
n
X
i=
g(||
x x
i
h
||
)][
P
n
i=
x
i
g(||
xx
i
h
||
)
P
n
i=
g(||
xx
i
h
||
)
x] (4.16)
The first term in this product is proportional to the density estimate at x
with kernel G.
´
f
h,G
=
c
k,d
nh
d+
n
X
i=
g(||
x x
i
h
||
) (4.17)
The second term is referred to as the Mean Shift
m
h,G
(x) = [
P
n
i=
x
i
g(||
xx
i
h
||
)
P
n
i=
g(||
xx
i
h
||
)
x] (4.18)
20
Mean Shift can be defined as the difference between the weighted mean using
kernel G for weights and the center of kernel window. From (17) and (18), (16)
becomes:
f
h,K
(x) =
´
f
h,G
(x)
c
k,d
h
c
g,d
m
h,G
(x) (4.19)
m
h,G
(x) =
h
c
f
h,K
(x)
´
f
h,G
(x)
(4.20)
At any location, the mean shift vector computed with kernel G is propor-
tional to normalized density gradient estimate obtained with kernel K. Moreover,
the mean shift vector always points towards the direction of maximum increase
in the density. This is a general information first remarked by Fukunaga and
Hostetler[18]. The relation is intuitive and the local mean is shifted toward the
region in which the majority of the points reside. The mean shift procedure is
obtained by successive steps that include:
Computation of the mean shift vector
Translation of the kernel window by the mean shift vector
The condition for convergence and mode detection is that the gradient has to
equal zero, as it is a local maxima.
f
h,K
(x) = 0
These modes serve as cluster centres and this clustering technique is applied to
all the independent feature matrices. For each feature, we get a set of cluster
centres: C = [c
i
, c
, ..., c
n
]. These cluster centres act as the modes of the data in
each cluster. These clusters have a specific non-overlapping set of trajectories
whose properties are characteristic of the cluster they belong to. Based on this
cluster information, our aim would be to identify the anomalous instances or
trajectories from each such cluster.
21
Chapter 5
Anomalous Trajectory
Detection
5.1 Introduction
After having clustered all trajectories to one or more distinct clusters for each
feature, we need to identify the anomalies amongst these trajectories. It is
generally assumed that those events which occur less often are more likely to
be classified as anomalies. The entire crowd is often characterized by some
dominant patterns, based on which, the entire set of trajectories is clustered.
Each one of these clusters have a cluster centre which represents the mode of the
distribution of that area, constrained by the specific feature. The anomalous
trajectories, present throughout the crowded scene may belong to any one of
these clusters but as a general property, will not have a substantial degree of
belongingness to any of the clusters. Whereas, those trajectories that have been
classified with a specific cluster with a substantial degree of belongingness are
expected to be normal trajectories that define the overall nature of the crowd.
The anomaly detection in this work involves two parts:
Information Theory based outlier detection in each cluster
Voting mechanism to identify all anomalous trajectories
These two broad tasks will help us identify all the anomalous trajectories in a
crowded scene.
5.2 Anomaly Detection
The anomalous trajectory detection task has been modelled as a task that iden-
tifies trajectories, both in dense and sparsely crowded regions of the crowd,
that deviate from other trajectories substantially. A distance based approach
22
has been used earlier[3]. It has been considered that if a trajectory T
j
C
k
with trajectory mean µ
T
lies far from the centre of the cluster µ
k
, then it is
considered as an outlier. In other words,
µ
T
µ
k
σ
f,k
> τ
However, this is indeed quite a primitive approach to tackle the problem that
generalizes it with the belongingness of a trajectory to a single cluster and the
distance between the means of it. As an instance, a trajectory, with similar
speed and similar shape features may extend further beyond the other trajec-
tories of the same cluster, resulting in a huge difference between the means
of it. Therefore, we resort to an Information Theory approach that quantizes
the belongingness of each trajectory to all of these clusters as a probability
distribution.
5.2.1 Shannon Entropy
Shannon Entropy is the most fundamental concept of Information Theory. It
has widespread applications in various domains. It is a characteristic property of
a probability distribution that provides a measure of uncertainty associated with
the distribution. The greatest advantage of entropy measures is that it can be
used to summarize feature distributions in the form of a single number. Entropy
was first defined and proposed by Clausius[13] in the early 1850s. In 1948,
Clause Shannon[35] adopted entropy into Information theory as a measure of
uncertainty associated with a random variable. The more random the variable,
the bigger the entropy and in contrast, the greater certainty of the variable, the
smaller the entropy.
For a probability distribution p(X = x
i
) of a discrete random variable X, the
Shannon Entropy is defined as:
H
s
(X) =
n
X
i=1
p(x
i
) log
a
1
p(x
i
)
(5.1)
The key to anomalous data detection is the establishment of a predefined
threshold, for deciding whether the trajectory is anomalous or not. A threshold
should be data adaptive and must adapt itself with the changing properties
of the data. The Shannon Entropy measure has been used successfully in a
lot of applications, such as video key from selection[42], video processing[17],
Network anomaly detection[34], Network baseline anomaly estimation[20] and
worm detection[32].
5.2.2 Proposed Approach
Our approach is based on the idea that an anomalous trajectory would exhibit
higher levels of entropy when compared to normal trajectories. Since ours is
a feature based approach, we propose a distance based entropy measure in the
23
feature space itself instead of examining the distance between mean positions of
each trajectory and the cluster centre. We describe each trajectory based on it’s
features and the cluster centres are obtained for each of these features. If the
feature descriptors of any one of the trajectories is less likely to occur amongst
the probability distribution of the feature descriptor, then it is classified as an
anomaly. Instead of comparing the distance between a cluster centre and a
trajectory feature descriptor, we build a probability distribution based on the
distances between the trajectory feature descriptor and all the cluster centres.
The entropy of each such trajectories is found out from this distribution and if
it exceeds a certain threshold, it is classified as an anomaly. The threshold is
adaptive to the data and is adjusted according to the heuristic description of
the anomalies in the scene.
C
i
= [c
,i
, c
,i
, . . . , c
n,i
]
distvec
j
= [distance(c
,i
, f
j
), . . . , distance(c
n,i
, f
j
)]
Where, C
i
represents the Cluster centres for a specific feature i and the distvec
vector contains the distance measures between each of the cluster centre and
trajectory f
j
. We further build the probability distribution P
j
in the following
manner:
p
j,k
=
distance(c
k,i
, f
j
)
P
n
m=1
distance(c
m,i
, f
j
)
(5.2)
P
j
= [p
j,1
, p
j,2
, . . . , p
j,n
] (5.3)
Corresponding to this probability distribution, we obtain an entropy measure
for each trajectory, as follows:
H
i
=
n
X
k=1
p
i,k
log
a
p
i,k
(5.4)
H = [H
1
, H
2
, . . . , H
numT raj
] (5.5)
From amongst these entropy values, we choose those entropy values that have a
value greater than a certain threshold, and are identified as anomalous trajec-
tories for that particular feature. In a crowd flow, the changes in it’s attributes
occur randomly and most definitely. A particular section of the crowd moving
in a definite way may thin out after random intervals of time. The crowd may
also suddenly slow down or fasten up, thereby changing the feature parameters
of the trajectories at each instant of time. Moreover, new trajectories that are
introduced after a fixed interval of time may have similar features as a particular
cluster but may exhibit one or more abnormality due to it’s later introduction.
Therefore, we cannot club all trajectories marked as anomalous from the above
stated procedure as our desired set of abnormalities. We employ a voting mech-
anism wherein, a trajectory is marked as an abnormality only if it has been
marked anomalous in a majority of the feature spaces. Therefore, the voting
score of each trajectory at the end of the detection mechanism decides if it is
an abnormality.
24
Chapter 6
Results
6.1 Dataset
In order to effectively test the methods developed, experiments were ran using
two crowd video datasets. Both datasets have been used previously for analysing
and studying crowded scenes. The two datasets are described below:
6.1.1 The UCF Database
The data set contains videos of crowds and other high density moving objects.
The videos are collected mainly from the BBC Motion Gallery and Getty Im-
ages website. The Pilgrims sequence, the Marathon Sequence and the Crowded
Intersection sequence has been used from this dataset for comparison[1]
Pilgrim Sequence
The Pilgrim sequence is a high density crowd video showing random motion of
Hajj pilgrims in an open space. This kind of video is the exact kind which needs
to monitored properly as there can be arbitrary rogue trajectories that cause
disorder and discord in the crowd.
Crowded Intersection Sequence
This sequence is recorded at a crowded traffic intersection having considerable
amount of anomalous elements in the form of cyclists and improperly moving
vehicles. The normal flow of traffic seems to be broken by the slow motion
of a bus in the initial stages and later, by the random movement of cyclists.
This presents itself as an ideal case for surveillance and our method can provide
ample insight on handling the traffic better.
25
Marathon Sequence
The Marathon Sequence shows a smooth, uniform movement of a coherent crowd
going around a semi-circular bend. There are no observable anomalies in the
crowded sequence. The entire crowd moves along the set pattern and shows no
abnormalities. This sequence acts as a check for the method and the method
performs well, showing no major anomalies. The method works well in handling
the curved trajectories.
6.1.2 RPI Database
The Database has four videos of crowded places. The videos were initially used
to generate dominant trajectories in a crowded video by Cheriyadat et al.[12].
However, one of the sequences in it presents itself as an ideal candidate for the
task.
Crowded Subway Exit Sequence
This sequence shows a surveillance sequence of a number of people converging
at a Subway Exit. The crowd flow is smooth initially at the exit, but due to
the convergence of a number of streams of people, it eventually gets congested
and gets slow. The method can be used to detect those tracks that cause the
congestion and help designing adequate methods to get rid of it.
6.2 Results
It is to be mentioned that the proposed method incorporates the use of extract-
ing trajectories from the crowded videos and therefore there are no such ground
truth made available to us. We develop the ground truth from the extracted
trajectories based on close and minute observation from the videos. The test
results are compared with the marked ground truth. The results indicate that
this method exhibits excellent Specificity, i.e. the probability of classifying a
normal trajectory as anomalous is extremely low. However, improvement can
be achieved in the Sensitivity of the approach by improving the True Positive
rate. It is to be noted that the method indicates almost all anomalous tra-
jectories in the expected regions of interest with commendable accuracy. The
graphical plots reveal the effective nature of the results produced. The follow-
ing part presents the results for the Crowded Subway Exit sequence(Sequence-I)
and the Crowded Intersection Sequence(Sequence - II).
26