Preview only show first 10 pages with watermark. For full document please download

Keystroke Dynamics For Mobile Devices

   EMBED


Share

Transcript

KEYSTROKE DYNAMICS FOR MOBILE DEVICES – ALGORITHM AND AUTHENTICATION _______________ A Thesis Presented to the Faculty of San Diego State University _______________ In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science _______________ by Mradul Shrivasatva Summer 2011 iii Copyright © 2011 by Mradul Shrivasatva All Rights Reserved iv DEDICATION I dedicate this thesis to my father, Mr. Brajendra K. Shrivastava, for being my constant source of inspiration and my mother, Mrs Madhu Shrivastava, for her unconditional love and encouragement. Without their support, this would not have been possible. v ABSTRACT OF THE THESIS Keystroke Dynamics for Mobile Devices – Algorithm and Authentication by Mradul Shrivasatva Master of Science in Computer Science San Diego State University, 2011 Mobile handsets are play a significant role in the modern society, providing accessibility to personal and confidential data applications anywhere, anytime. With so much importance of these devices, the focus is now on mobile security application. Majority of these applications are one factor authorization, such as verifying password and pins. This thesis will provide details about authenticating the user using new two-factor biometric security known as; Key Stroke Dynamics. This type of security checks what you type and how you type. Implementation of Keystroke Dynamics on mobile phones is divided into two important phases. In the first phase, data from the user’s samples is collected and stored in database. The second phase of the project is described as implementation of the algorithm and authentication of the users on the basis of data collected from the samples. This thesis ill highlight the second phase of the project. Smart phones used for the implementation of project, are built on Android OS. vi TABLE OF CONTENTS PAGE ABSTRACT ...............................................................................................................................v LIST OF TABLES ................................................................................................................. viii LIST OF FIGURES ................................................................................................................. ix LIST OF ABBREVIATIONS ....................................................................................................x ACKNOWLEDGEMENTS ..................................................................................................... xi CHAPTER 1 INTRODUCTION .........................................................................................................1 2 KEYSTROKE DYAMICS BACKGROUND AND TERMINOLOGIES ....................3 2.1 Introduction to Mobile Phone Security ..............................................................3 2.2 Biometric Security .............................................................................................3 2.2.1 Success Factors of Biometrics ................................................................. 5 2.2.2 Accuracy .................................................................................................. 6 2.2.3 Speed ........................................................................................................ 7 2.2.4 Resistance to Counterfeiting .................................................................... 7 2.2.5 Reliability ................................................................................................. 7 2.2.6 Data Storage Requirements...................................................................... 7 2.2.7 Enrollment Time ...................................................................................... 9 2.2.8 Perceived Intrusiveness ............................................................................ 9 2.3 Mobile Phone Security .......................................................................................9 2.4 Keystroke Dynamics ........................................................................................10 2.5 Statistical Terminologies .................................................................................11 2.5.1 Measure of Central Tendency ................................................................ 11 2.5.2 Mean ...................................................................................................... 12 2.5.3 Mode ...................................................................................................... 12 2.5.4 Median ................................................................................................... 12 2.5.5 Standard Deviation................................................................................. 12 vii 2.5.6 Range ..................................................................................................... 13 3 TECHNOLOGIES .......................................................................................................14 3.1 Requirements ...................................................................................................14 3.2 Java ..................................................................................................................14 3.2.1 Simplicity ............................................................................................... 15 3.2.2 Robustness ............................................................................................. 16 3.2.3 Multithreading........................................................................................ 16 3.3 Android ............................................................................................................16 3.3.1 Android Architecture ............................................................................. 17 3.3.1.1 Applications .................................................................................. 17 3.3.1.2 Application Framework ................................................................ 17 3.3.1.3 Libraries ........................................................................................ 19 3.3.1.4 Android Runtime .......................................................................... 19 3.3.1.5 Linux Kernel ................................................................................. 20 3.3.1.6 The Manifest File .......................................................................... 20 3.3.2 Development in Eclipse with ADT ........................................................ 20 3.4 SQLite Database ..............................................................................................21 4 DISCUSSION OF ALGORITHM ...............................................................................23 4.1 Activities and Control Flow .............................................................................23 4.2 Algorithm .........................................................................................................25 5 DATABASE SCHEMA...............................................................................................28 6 CONCLUSION ............................................................................................................30 6.1 Testing..............................................................................................................30 6.2 Limitations .......................................................................................................31 6.3 Future Enhancements .......................................................................................32 BIBLIOGRAPHY ....................................................................................................................33 viii LIST OF TABLES PAGE Table 6.1. FAR & FRR Values at Different Thresholds..........................................................30 ix LIST OF FIGURES PAGE Figure 2.1. CER and error rate relationship. ..............................................................................6 Figure 2.2. Biometric authentication process low impact..........................................................8 Figure 3.1. Android architecture. .............................................................................................18 Figure 4.1. Activity for logging in the user. ............................................................................23 Figure 4.2. Activity for collecting samples. .............................................................................24 Figure 4.3. Registration activity for new users. .......................................................................25 Figure 4.4. Data values collected for total login time. .............................................................27 Figure 6.1. ROC curve. ............................................................................................................31 x LIST OF ABBREVIATIONS ADT Android Development Tools APIs Application Programming Interfaces AWT Abstract Window Toolkit CER Crossover Error Rate EER Equal Error Rate FAR False Acceptance Rate FRR False Rejection Rate PDAs Personal Digital Assistants PINs Personal Identification Number ROC Receiver Operatic Characteristic SIM Subscriber Identification Module xi ACKNOWLEDGEMENTS I would like to thank thesis advisor Dr. Joseph Lewis Mr. Peter Bartoli, Department of Computer Science, San Diego State University for guiding me throughout my thesis term. Mr. Peter Bartoli was always supportive in discussing the solution of the problems faced during the implementation. Dr. Joseph Lewis has also been very helpful in discussing the content of this thesis. I would also like to thank Dr. Kris Stewart and Dr. Carmelo Interlando for taking time to review my thesis and guiding me to complete this report. I offer my regards to all of those who supported me in any respect during the completion of the thesis. 1 CHAPTER 1 INTRODUCTION Global access to information is becoming an integral part of everyone’s life these days. Transfer of personal or corporate data anytime, anyplace has become a necessity. Hence, ability to communicate and work on a single device made smart mobile phones to play an important role at present. Unfortunately, with this type of global access to the information are increased chances of malicious attacks and intrusions on Mobile Phones and other portable devices. In addition, it can be conjectured that the more advanced capabilities of third generation mobile handsets are with their ability to pay for products using digital money, surf the internet, buy and sell stocks, transfer money, and manage bank accounts, will make mobile handsets even more desirable targets for identity theft. However, the rise in computing mobility could cause a number of security issues, in particular with attackers accessing the data stored on the device. The handsets do come with in-built authentication and keyboard lock passwords , however, these can be deceived very easily. Advance measures and security techniques are now being experimented against unauthorized access to the crucial information. Biometric security techniques are being implemented on mobile phones. This thesis report explains a new technique known as; Keystroke Dynamics Authentication. It is a two factor biometric security system authentication on correctness of password and correctness of typing pattern. Based on typing rhythm on particular mobile devices, a user can be identified. In particular, the report identifies touchpad interactions of the user on the mobile phone screen. Biometrics is based not on what the user knows, or what they carry, but who the user is, through their individual unique characteristic. One such biometric that lends itself to a mobile context due to the keypad already residing on the handset, is keystroke analysis – which authenticates the user by their typing style. This thesis report describes the authentication of the user using keystroke dynamics which is the integral part of its implementation on an Android mobile device. Implementation through the collection of different time interval values was completed by Ritesh Dedhia. For 2 more details on collection of data for keystroke dynamics; please refer to the thesis report on Mobile Keystroke Dynamics - Data Collection. The primary objective of both theses is to implement and test keystroke dynamics on mobile devices. This project was segregated into the two above mentioned parts and time interval values of a user is stored in the database on the first part, which is used to authenticate the user on the second part. The thesis is organized into six chapters. The topics are as follows; background about the project and relative study are specified in Chapter 2. Chapter 3 talks about the technology used to build the project. The algorithm used to calculate range and verify the user is explained in Chapter 4. Chapter 5 describes the database structure of the project. Chapter 6 concludes with the result of the testing and the final results. 3 CHAPTER 2 KEYSTROKE DYAMICS BACKGROUND AND TERMINOLOGIES 2.1 INTRODUCTION TO MOBILE PHONE SECURITY Mobile handsets have found an important place in modern society, with millions currently in use. The majority of these devices use inherently weak authentication mechanisms, based upon passwords and Personal Identification Number (PINs). This report presents a feasibility study into a biometric-based technique, known as keystroke analysis – which authenticates the user based upon their typing characteristic. In particular, this thesis report identifies interaction specifically when a user enters a 10-digit number and seeks authentication after performing this interaction. This chapter addresses some background study, evolution of smart phones, terminologies related to biometric techniques, mobile devices, keystroke dynamics, and the work involved in this project. It begins with a brief history and necessity of Biometric Security on mobile devices. A discussion about Keystroke Dynamics will follow. 2.2 BIOMETRIC SECURITY Biometrics is relevant with the identification and verification of individual based on human characteristics. Biometric approaches are typically subdivided into two categories: physiological and behavioral biometrics. Physiological biometric is based on bodily characteristics, such as fingerprint, facial recognition, and iris scanning. Behavioral biometric is based on the way people do things, such as keystroke dynamics, mouse movement, and speech recognition. All biometric systems works in a four-stage process that consists of the following steps:  Capture: A biometric system collects the sample of biometric features like fingerprint, voice etc of the person who wants to login to the system.  Extraction: The data extraction is done uniquely from the sample and a template is created. Unique features are then extracted by the system and converted into a digital 4 biometric code. This sample is then stored as the biometric template for that individual.  Comparison: The template is then compared with a new sample. The biometric data are then stored as the biometric template or template or reference template for that person.  Match/non-match: The system then decides whether the features extracted from the new sample are a match or a non-match with the template. When identity needs checking, the person interacts with the biometric system, a new biometric sample is taken and compared with the template. If the template and the new sample match, the person’s identity is confirmed else a non-match is confirmed. The Biometric authentication system includes three layered architecture:  Enroll: A sample is captured from a device, processed into a usable form from which a template is constructed, and returned to the application.  Verify: One or more samples are captured, processed into a usable form, and then matched against an input template. The results of the comparison are returned.  Identify: One or more samples are captured, processed into a usable form, and matched against a set of templates. A list is generated to show how close the samples compare against the top candidates in the set. A biometric template is an individual’s sample, a reference data, which is first captured from the selected biometric device. Later, the individual’s identity is verified by comparing the subsequent collected data against the individual’s biometric template stored in the system. Typically, during the enrollment process, three to four samples may be captured to arrive at a representative template. The resultant biometric templates, as well as the overall enrollment process, are key for the overall success of the biometric application. If the quality of the template is poor, the user will need to go through re-enrollment again. The template may be stored, within the biometric device, remotely in a central repository or on a portable card. Storing the template on the biometric device has the advantage of fast access to the data. There is no dependency on the network or another system to access the template. This method applies well in situations when there are few users of the application. Storing the template in a central repository is a good option in a high-performance, secure environment. Keep in mind that the size of the biometric template varies from one vendor product to the next and is typically between 9 bytes and 1.5k. For example, as a fingerprint is scanned, up to 100 minutia points are captured and run against an algorithm to create a 256-byte binary 5 template. An ideal configuration could be one in which copies of templates related to users are stored locally for fast access, while others are downloaded from the system if the template cannot be found locally. Storing the template on a card or a token has the advantage that the user carries his or her template with them and can use it at any authorized reader position. Users might prefer this method because they maintain control and ownership of their template. However, if the token is lost or damaged, the user would need to re-enroll. If the user base does not object to storage of the templates on the network, then an ideal solution would be to store the template on the token as well as the network. If the token is lost or damaged, the user can provide acceptable identity information to access the information based on the template that can be accessed on the network. The enrollment time is the time it takes to enroll or register a user to the biometric system. The enrollment time depends on a number of variables such as: users’ experience with the device or use of custom software or type of information collected at the time of enrollment [1]. All biometrics are measured under certain criteria. These are known as Biometric performance measures.  False Acceptance Rate (FAR): This determines how often an intruder can successfully bypass the biometric authentication. A lower rate is more secure; for example, an FAR of 1% states that the chance of fooling the system is 1:100 [2].  False Rejection Rate (FRR): This signifies how often a real user will not be verified successfully. A high rate translates into more user retries, hence usability suffers [3].  Equal Error Rate (EER): The relationship between FAR and FRR is converse – although not always linearly in behavioral biometrics. EER is where the FAR and FRR would be equals. The best technologies have the lowest ERR rate. It is also known as Crossover Error Rate (CER; see Figure 2.1).  Receiver Operatic Characteristic (ROC) Curve: Characteristic graph of biometric system where x-axis represents the threshold of biometric system and y-axis represents FAR and FRR values [4]. 2.2.1 Success Factors of Biometrics When considering the purchase and implementation of a biometrics identification system, an organization should address the following eight critical success factors:  Accuracy  Speed 6 Figure 2.1. CER and error rate relationship.  Resistance to counterfeiting  Reliability  Data storage requirements  Enrollment time  Perceived intrusiveness  User acceptance 2.2.2 Accuracy Biometric devices have improved significantly over the past several years. However, there are still no guarantees of 100% accuracy. It is your responsibility to select the level of inaccuracy that you and your employees can tolerate. When judging error rates, consider the principle types of errors—Type I and Type II. Type I errors include all instances in which a biometric system denies access to an authorized user. The identification of an unauthorized user as an authorized user is an example of a Type II error. By adjusting the sensitivity of the biometric sensor, you can increase or decrease the occurrence of each error type. However, 7 as you decrease Type I errors, you increase Type II errors. The opposite is also true. The key objective in implementing a biometric system is the proper balance between these two error types. The most common method is to focus on the CER. This is the point at which the frequency of Type I errors (FRR) and the frequency of Type II errors (FRR) are equal. When shopping for the right system for your business, the CER is the best indicator of overall accuracy. CER is expressed as a percentage. Lower values are better. Values of two to five percent are generally considered acceptable. 2.2.3 Speed When considering the probability that your users will accept the use of biometrics, the speed at which a sensor and its controlling software accept or reject authentication attempts is the most important factor. The effective throughput, or how many users a biometric sensor can process in a given period, is a function of the entire authentication process. Figure 2.1 depicts the several stages involved. Acceptable throughput is typically five seconds per person or six to ten people per minute. User frustration begins to set in at lower throughput rates. See Figure 2.2 [5]. 2.2.4 Resistance to Counterfeiting Some biometric solutions might be susceptible to counterfeiting. For example, some early systems allowed an intruder to use lifted finger or hand prints to gain entry. Today’s systems are, in general, more sophisticated; they use the entire geometry of a finger or hand instead of just the line patterns that make up prints. Make sure to ask the right questions if you consider using a biometric access control system. When possible, request a demonstration of the system’s resistance to counterfeiting. 2.2.5 Reliability Sensors must continue to operate at a low CER between failures. A gradual degradation in throughput affects user acceptability and organizational productivity. 2.2.6 Data Storage Requirements The amount of storage necessary to support a biometric system depends on what data is actually stored. Voice recognition systems might use a great deal of storage; voice files are 8 Figure 2.2. Biometric authentication process low impact. Source: Tom Olzak. Keystroke Dynamics: Low Impact Biometric Verification, 2006. www.infosecwriters.com/text_resources/pdf/Keystroke _TOlzak.pdf, accessed Dec. 2010. usually large. Current finger architecture recognition technology, however, simply stores a relatively small hash value created when a user is enrolled. Whenever a sensor scans the finger again, it re-computes the hash value and compares it to the stored value. Whatever biometric solution you choose, make sure you understand the impact on your storage environment. 9 2.2.7 Enrollment Time Another factor influencing user acceptance is the time required to enroll a new user into the biometric system. An acceptable enrollment duration is usually two minutes or less per person. This enrollment rate not only reduces employee frustration but it also helps reduce administrative costs associated with system management. 2.2.8 Perceived Intrusiveness Second only to throughput, the amount of personal intrusiveness a sensor presents to your employees is a major determinant when assessing user acceptance. The following is a list of common fears that grow out of biometric implementations:  Fear that the company stores unique personal information  Fear that the company is collecting personal health information (retinal scans look at patterns that are also used to determine certain health conditions) for insurance purposes  Fear that the red light in retinal scanning sensors is physically harmful  Fear of contracting diseases through contact with publicly used sensors One way to deal with these issues is to hold open and honest discussions about how the systems work, the health risks involved, and how the organization plans to use the information. Remember, user acceptance doesn’t depend on how you perceive biometric authentication. Rather, it depends on how your employees perceive it. Another way to address the issues surrounding intrusiveness is to deploy a solution that is not only non-intrusive, but it also adds no additional effort to authentication or authorization activities. 2.3 MOBILE PHONE SECURITY Nowadays, Mobile devices, such as cellular phones and Personal Digital Assistants (PDAs), become wide spread in excess of over three billion users. Most mobiles are operated by touching a display commonly used, because the touch screen interface is user-friendly. Currently, mobile devices are used to not only make or receive a call, take photos, and play video games, but also give the special assistance in the business, such as providing internet access, directing access to e-mail and cooperating data, transferring money, and managing 10 bank account. As a consequence, the authentication of users for mobile devices has become an important issue. The authentication on mobile devices can be classified in three fundamental approaches. The first approach is using a PIN or a password which is a secrete-knowledge based technique. This technique offers a standard level of protection and provide cheap and quick authentication. Unfortunately, it is not enough to the safeguard mobile device and data access through them because passwords have never been completely protected by the owners; sharing passwords with friends or any other systems are unavoidable problems. Moreover, the result of a survey from [6] has shown that most users agree that using PIN is very inconvenient and they do not have confidence in the protection of the PIN facility provides. The second approach is the token-based technique or Subscriber Identification Module (SIM). In this approach, when users do not want to use the mobile, the mobile’s SIM must be removed. However, removing SIM is not recommended due to inconvenient manners. The last approach is applying the biometric technique. This technique is based on a unique characteristic of a person that provides an improvement on the current authentication. 2.4 KEYSTROKE DYNAMICS Using any kind of mobile phones, people cannot avoid interaction with keystroke dynamics. However, each person may have different styles to press the key because the typing style is based on user’s experience and individual skill which is difficult to imitate [7]. A keystroke dynamics is based on the assumption that different people have unique habitual rhythm pattern in the ways they typed. The first study was done in 1980 by Gaines [8] who showed that the keystroke timing is a feasible authentication measure. Researches on user authentication using the keystroke dynamics are still going on and numbers of the researches are increasing. The assessment of keystroke dynamics is based on the traditional statistical analysis or the relatively newer pattern recognition technique. Password based authentication systems remain prevalent for the past three decades. Continued dominance of passwords is due to a lack of suitable security alternatives and/or extensions. Keystroke dynamics is a behavioral biometric and it provides an answer to the authentication and security problem. The principle behind keystroke dynamics is to extract and analyze the way an individual types as opposed to only what the individual types. This 11 technology is relatively cheaper than the fingerprint or retinal scan technology, which requires expensive and extra hardware for data collection. Keystroke dynamics do not require any extra hardware and data collection software of keystrokes is easily reusable whereas hardware is not. So researchers are trying to find a suitable mechanism for its commercial use. There are four steps involved in keystroke dynamics. First, a user registers or enrolls his/her timing vector patterns. Second, a feature subset selection is built. Third, a classifier is built using the timing patterns. Fourth, whenever a new timing vector pattern is presented, it is either accepted or rejected based on the classification made by the classifier. There are a number of different aspects of keystroke characteristics that can be used for identification like interval between keystrokes, duration of keystrokes, frequency error control, pressure of keystrokes, Rate of typing, statistics of text etc. To capture keystroke dynamics, it is necessary for users to type their own password a number of times during enrollment. The time duration of each key pressed, the keystroke latency between two successive keys and digraph, the time between key pressed and successive key pressed are measured using real time measurement in our experiment. The Mean and Standard Deviation of these measurements are found and the user profile is created. 2.5 STATISTICAL TERMINOLOGIES Statistics is a set of concepts, rules and procedures that help us to organize numerical information, understand statistical techniques, underlying decisions that affect our daily life and make informed decisions. Following are the definitions of some statistical terms used in the project. 2.5.1 Measure of Central Tendency A distribution is created when data variables are arranged in order from highest to lowest, typically represented graphically as a frequency distribution, plotting the value of the variable versus the frequency of that value’s occurrence. It was shown quite a long time ago that if the number of data points is very large, the distribution usually becomes a normal or Gaussian distribution (the bell-shaped curve). Researchers can learn a lot from these plots, including the shape of the distribution, the range of values, and the most common value. 12 Measures of central tendency, that is the values that the distribution seems to cluster around, are some of the most commonly calculated statistics. 2.5.2 Mean It is one of the most used statistics in all manners of research. It is the arithmetic average of values, calculated as the sum of all values divided by the number of values. Although the mean provides a simple summary of a distribution, it doesn’t indicate anything about the range of values. The distribution may have a very tight distribution around the mean or may be so spread out that a peak is hard to identify. Another shortcoming of the mean is that it is sensitive to extreme values. A single outlying data point can skew the distribution so that the mean no longer represents the peak of the curve. Because of this, other descriptors of the central tendency may actually be more useful. 2.5.3 Mode It is the most frequently occurring value. It’s where the distribution displays a peak (or peaks, in the case of a multi-modal distribution). It is the only descriptor of central tendency possible with nominal data. (Nominal data classifies items into mutually exclusive groups and can only be classified as equal or not equal, for example, “male” or “female.”) 2.5.4 Median It is the midpoint of the distribution, with half of the values on either side of it. Another way to say this is that the median represents the 50 percentile. The median can tell us about the shape of the distribution. In a normal distribution, the mean and median (and mode) are the same. If the distribution is skewed, the mean is closer to the mode than is the median. The median is usually the best measure of central tendency in a skewed distribution, for example the salaries in the NBA where a few people earn much, much more than the others [9]. 2.5.5 Standard Deviation It is a widely used measurement of variability or diversity used in statistics and probability theory. It shows how much variation or ‘dispersion’ there is from the ‘average’ (mean, or expected/budgeted value). A low standard deviation indicates that the data points 13 tend to be very close to the mean, whereas high standard deviation indicates that the data is spread out over a large range of values. 2.5.6 Range The range of a function f is {y | there exists an x in the domain of f such that y = f(x)}. In this case, the codomain of f must be specified, but is often assumed to be the set of all real numbers. 14 CHAPTER 3 TECHNOLOGIES The chapter primarily focuses on the brief discussion of the technologies used to develop the software. Chapter starts with the requirements of the project followed by the description about language used for building the project. The chapter concludes by the tools and database used. 3.1 REQUIREMENTS The prime objective of building keystroke dynamics is to analyze its accuracy on mobiles phones. The positive behavior of this technique may result as a login authentication provider for mobiles phones and other portable PC. The requirements gathered are enlisted below.  The main objective of the thesis is to check whether this technique can be implemented in future for mobile phone. Hence, the technology used for this should be latest and the next big thing in future. Android OS with firmware 1.6 or above has been chosen for the development.  The language used in Android OS is Java. Hence, Java SDK is also required to develop software on Android OS.  Database used for mobile devices is primarily SQLite. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files.  All mobile devices having Android OS installed comes with touch pad. Thus touch keyboard should be used for giving inputs to the software. 3.2 JAVA The KeyStrokeDynamics software is built using Java Software Development Kit 6 on Android OS. The integrated development environment used for the writing Java classes is Eclipse Galileo. The reasons behind choosing Java over other software languages include the following.  The Android userspace is largely dominated by Java technologies that run on top of Google’s custom Dalvik Java virtual machine 15  Java is simple, easy to implement and object oriented.  Java provides high performance using its very large set of application programming interfaces (APIs).  Java is robust and secure.  Java can provide multi threaded programming so that the program execution is faster and it is dynamic.  Java is platform independent, architecturally neutral and highly interpretable.  Java has excellent set of Graphical user interface APIs in form of its abstract window toolkit (AWT) class as well Java Swing class. These points are elaborated below. An object oriented programming language is one which lets you create objects. An object is an entity which drives the class attributes and functions to which it belongs. An object oriented model is a collection of interacting objects which is different from conventional programming. Java is object oriented because it focuses on creating objects and making them work together. The process of creating an object is known as instantiation. Java features all of the object oriented concepts mentioned below.  Polymorphism: A single method can generate different set of results when passed with different set of arguments.  Inheritance: Classes are arranged hierarchically and child class can access methods and attributes of its parent class.  Data encapsulation: The attributes, variables and methods of a particular class are differentiated based on their role in the programming model such as public, private, protected. Things like these make programming very loosely coupled so that the complexity is reduced and the programming models become highly independent and modular. 3.2.1 Simplicity Java complier automatically creates the Java compiled classes into machine readable byte-code. The most important feature of Java which makes it very simple is its ability to handle automatic memory management. Java uses automatic garbage collection when an object is destroyed to release the memory unlike C++ where programmer is responsible for freeing the memory associated with the deleted object. 16 3.2.2 Robustness A robust programming language is very stable, secure and does not fall prey to third party trapdoors. Hence it is very reliable. This is because Java is highly supported language, intended for use in networked environment. No programming language can really assure fullproof reliability but there aren’t much security holes in Java. An example is a bad Java program will never crash your computer unlike a C program. Java is dynamic in a sense that Java puts lot of emphasis on runtime error checking and eliminating situations which are error prone. 3.2.3 Multithreading A multithreaded program divides any process into several threads. A thread is a smallest unit of program execution. These individual threads run in parallel to allow faster execution of a program and increase the program execution speed. Java has a separate API dedicated for multithreaded programming which has been smoothly integrated into it unlike C++ where operating system specific procedures have to be called in order to enable multithreading. 3.3 ANDROID Android is a software stack for mobile devices that includes an operating system, middleware and key applications. The Android SDK provides the tools and APIs necessary to begin developing applications on the Android platform using the Java programming language. Android was built from the ground-up to enable developers to create compelling mobile applications that take full advantage of all a handset has to offer. It was built to be truly open. For example, an application can call upon any of the phone’s core functionality such as making calls, sending text messages, or using the camera, allowing developers to create richer and more cohesive experiences for users. Android is built on the open Linux Kernel. Furthermore, it utilizes a custom virtual machine that was designed to optimize memory and hardware resources in a mobile environment. Android is open source; it can be liberally extended to incorporate new cutting edge technologies as they emerge. The platform will continue to evolve as the developer community works together to build innovative mobile applications [10]. Following are the features of Android: 17  Application framework enabling reuse and replacement of components  Dalvik virtual machine optimized for mobile devices  Integrated browser based on the open source WebKit engine  Optimized graphics powered by a custom 2D graphics library; 3D graphics based on the OpenGL ES 1.0 specification (hardware acceleration optional)  SQLite for structured data storage  Media support for common audio, video, and still image formats (MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, GIF)  GSM Telephony (hardware dependent)  Bluetooth, EDGE, 3G, and WiFi (hardware dependent)  Camera, GPS, compass, and accelerometer (hardware dependent)  Rich development environment including a device emulator, tools for debugging, memory and performance profiling, and a plugin for the Eclipse IDE 3.3.1 Android Architecture Figure 3.1 shows the major components of the Android operating system. Each section is described in more detail below. 3.3.1.1 APPLICATIONS Android is shipped with a set of core applications including an email client, SMS program, calendar, maps, browser, contacts, and others. All applications are written using the Java programming language. 3.3.1.2 APPLICATION FRAMEWORK By providing an open development platform, Android offers developers the ability to build extremely rich and innovative applications. Developers are free to take advantage of the device hardware, access location information, run background services, set alarms, add notifications to the status bar, and much, much more. Developers have full access to the same framework APIs used by the core applications. The application architecture is designed to simplify the reuse of components; any application can publish its capabilities and any other application may then make use of those capabilities (subject to security constraints enforced by the framework). This same 18 Figure 3.1. Android architecture. mechanism allows components to be replaced by the user. Underlying all applications is a set of services and systems, including:  A rich and extensible set of Views that can be used to build an application, including lists, grids, text boxes, buttons, and even an embeddable web browser.  Content Providers that enable applications to access data from other applications (such as Contacts), or to share their own data.  A Resource Manager, providing access to non-code resources such as localized strings, graphics, and layout files.  A Notification Manager that enables all applications to display custom alerts in the status bar. 19  An Activity Manager that manages the lifecycle of applications and provides a common navigation backstack. 3.3.1.3 LIBRARIES Android includes a set of C/C++ libraries used by various components of the Android system. These capabilities are exposed to developers through the Android application framework. Some of the core libraries are listed below:  System C library - a BSD-derived implementation of the standard C system library (libc), tuned for embedded Linux-based devices  Media Libraries - based on PacketVideo’s OpenCORE; the libraries support playback and recording of many popular audio and video formats, as well as static image files, including MPEG4, H.264, MP3, AAC, AMR, JPG, and PNG  Surface Manager - manages access to the display subsystem and seamlessly composites 2D and 3D graphic layers from multiple applications  LibWebCore - a modern web browser engine which powers both the Android browser and an embeddable web view  SGL - the underlying 2D graphics engine  3D libraries - an implementation based on OpenGL ES 1.0 APIs; the libraries use either hardware 3D acceleration (where available) or the included, highly optimized 3D software rasterizer  FreeType - bitmap and vector font rendering  SQLite - a powerful and lightweight relational database engine available to all applications. 3.3.1.4 ANDROID RUNTIME Android includes a set of core libraries that provides most of the functionality available in the core libraries of the Java programming language.Every Android application runs in its own process, with its own instance of the Dalvik virtual machine. Dalvik has been written so that a device can run multiple VMs efficiently. The Dalvik VM executes files in the Dalvik Executable (.dex) format which is optimized for minimal memory footprint. The VM is register-based, and runs classes compiled by a Java language compiler that have been transformed into the .dex format by the included “dx” tool. The Dalvik VM relies on the Linux kernel for underlying functionality such as threading and low-level memory management. 20 3.3.1.5 LINUX KERNEL Android relies on Linux version 2.6 for core system services such as security, memory management, process management, network stack, and driver model. The kernel also acts as an abstraction layer between the hardware and the rest of the software stack [10]. 3.3.1.6 THE MANIFEST FILE Before Android can start an application component, it must learn that the component exists. Therefore, applications declare their components in a manifest file that’s bundled into the Android package, the .apk file that also holds the application’s code, files, and resources. The manifest is a structured XML file and is always named AndroidManifest.xml for all applications. It does a number of things in addition to declaring the application’s components, such as naming any libraries the application needs to be linked against (besides the default Android library) and identifying any permission the application expects to be granted. But the principal task of the manifest is to inform Android about the application’s components [11]. 3.3.2 Development in Eclipse with ADT The Android Development Tools (ADT) plugin for Eclipse adds powerful extensions to the Eclipse integrated development environment. It allows you to create and debug Android applications easier and faster. If you use Eclipse, the ADT plugin gives you an incredible boost in developing Android applications:  It gives you access to other Android development tools from inside the Eclipse IDE. For example, ADT lets you access the many capabilities of the DDMS tool: take screenshots, manage port-forwarding, set breakpoints, and view thread and process information directly from Eclipse.  It provides a New Project Wizard, which helps you quickly create and set up all of the basic files you’ll need for a new Android application.  It automates and simplifies the process of building your Android application.  It provides an Android code editor that helps you write valid XML for your Android manifest and resource files.  It will even export your project into a signed APK, which can be distributed to users. 21 3.4 SQLITE DATABASE SQLite is a in-process library that implements a self-contained, serverless, zeroconfiguration, transactional SQL database engine. The code for SQLite is in the public domain and is thus free for use for any purpose, commercial or private. SQLite is currently found in more applications than we can count, including several high-profile projects. SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file. The database file format is cross-platform - you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures. These features make SQLite a popular choice as an Application File Format. Think of SQLite not as a replacement for Oracle but as a replacement for fopen() SQLite is a compact library. With all features enabled, the library size can be less than 300KiB, depending on compiler optimization settings. (Some compiler optimizations such as aggressive function inlining and loop unrolling can cause the object code to be much larger.) If optional features are omitted, the size of the SQLite library can be reduced below 180KiB. SQLite can also be made to run in minimal stack space (4KiB) and very little heap (100KiB), making SQLite a popular database engine choice on memory constrained gadgets such as cellphones, PDAs, and MP3 players. There is a tradeoff between memory usage and speed. SQLite generally runs faster the more memory you give it. Nevertheless, performance is usually quite good even in low-memory environments. SQLite is very carefully tested prior to every release and has a reputation for being very reliable. Most of the SQLite source code is devoted purely to testing and verification. An automated test suite runs millions and millions of test cases involving hundreds of millions of individual SQL statements and achieves 100% branch test coverage. SQLite responds gracefully to memory allocation failures and disk I/O errors. Transactions are ACID even if interrupted by system crashes or power failures. All of this is verified by the automated tests using special test harnesses which simulate system failures. Of course, even with all this testing, there are still bugs. But unlike some similar projects (especially commercial competitors) SQLite is open and honest about all bugs and provides bugs lists 22 including lists of critical bugs and minute-by-minute chronologies of bug reports and code changes. Following are some of the features of SQLite.  Transactions are atomic, consistent, isolated, and durable (ACID) even after system crashes and power failures.  Zero-configuration - no setup or administration needed.  Implements most of SQL92. (Features not supported)  A complete database is stored in a single cross-platform disk file.  Supports terabyte-sized databases and gigabyte-sized strings and blobs.  Small code footprint: less than 325KiB fully configured or less than 190KiB with optional features omitted.  Faster than popular client/server database engines for most common operations.  Simple, easy to use API.  Written in ANSI-C. TCL bindings included. Bindings for dozens of other languages available separately.  Well-commented source code with 100% branch test coverage.  Available as a single ANSI-C source-code file that you can easily drop into another project.  Self-contained: no external dependencies.  Cross-platform: Unix (Linux and Mac OS X), OS/2, and Windows (Win32 and WinCE) are supported out of the box. Easy to port to other systems.  Sources are in the public domain. Use for any purpose.  Comes with a standalone command-line interface (CLI) client that can be used to administer SQLite databases [11]. 23 CHAPTER 4 DISCUSSION OF ALGORITHM 4.1 ACTIVITIES AND CONTROL FLOW When a user starts the application, a login activity is launched where a registered user submits his 10 digit numeric password and an unregistered user can register him by clicking on New User button. See Figure 4.1. Figure 4.1. Activity for logging in the user. After entering the password and clicking on Login button, if the password is not found in the database an error message is displayed and if it is found but the sample count is 24 less than 15, CollectSample activity is displayed which collects different types of time intervals while entering samples and stores in database. The counter for the samples was displayed on the activity and was reduced after a successful input from the user on the same activity until last sample is entered. See Figure 4.2. Figure 4.2. Activity for collecting samples. If the user login is not found or the user clicks new user button, then registration activity is displayed where user is asked to enter is first name, last name and a 10 digit number. See Figure 4.3. While the user is typing on touchpad for submitting a sample, factors like dwell time (time interval between consecutive key press and key release), flight time (time interval between consecutive key release and key press), total time and the count of usage of delete key is calculated and upon clicking the submit button, the values are stored in the respective 25 Figure 4.3. Registration activity for new users. table. Every user is identified by his 10 digit unique number and a user id but every sample entered by a single user is identified by the timestamp. Hence values calculate while typing the unique number is stored in database along with userid, 10 digit number, time interval calculated and timestamp. 4.2 ALGORITHM The algorithm for comparing the calculated time intervals is based on frequency distribution of the values. Database structure for the application will be described in next chapter but for now we are using a table named total_time which hold the records for total time taken by the user to enter his unique number is used for explanation. Maximum and minimum values of the total time from the table are fetched and their subtraction is divided by the number of intervals, which will be called size of the interval. Number of intervals for 26 each table is indirectly proportional to the subtraction of minimum and maximum value, i.e. higher the subtraction result, lesser will be the number of intervals and this is managed by looking at the values in database. This is because if the difference is higher, it implies that the values of the factor for each key pressed tend to change with bigger margin from the previously calculated value. After the segregation of values into different range, frequency distribution is calculated. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way the table summarizes the distribution of values in the sample. The interval with maximum number of frequency is the desirable range of the total time. Next time when the user tries to enter the unique code, his total time is calculated and checked whether it lies in the selected interval. Similar process is applied for all the tables in database. If the values of time intervals calculated fall under desirable range, then the user can be said to be authenticated. Formula to calculate length of the interval can be expressed as intervalSize = (max-min)/interval where max is maximum and min is minimum value of a factor taken under consideration and interval is the number of interval required to distribute the value. The pseudo code is: 1. int rangeArray[] = {0,0,0,0,0,0,0,0,0} 2. int rangeArrayCopy[] = rangeArray[] 3. long rangeSize 4. func calculateRange(max, min, columnValue, intervalCount) 5. rangeSize = (max-min)/intervalCount 6. for(i=0->intervalCount,i++) 7. if(columnValue>(min+i)*rangeSize && <=min+(i+1)*interval) 8. rangeArray[i] ++ 9. rangeArrayCopy[i]++ 10. Else continue 11. func getInterval(max, min, enteredValue, intervalCount) 12. range=(max-min)/intervalCount 13. int arrayMax = rangeArrayCopy[0]; 14. For(i=1->intervalCount,i++) 15. If(rangeArrayCopy[i]>arrayMax) 27 16. arrayMax = rangeArrayCopy[i] 17. For(i=1->intervalCount,i++) 18. If (arrayMax==rangeArray[i]) 19. interval1=min+(i*interval) 20. interval2=min+((i+1)*interval) 21. If(enteredValue>=interval1 && <=interval2) 22. Return true Figure 4.4 is the image of the table login_name which will be used as an example for description. As from the image it can be seen that twenty one samples of the user with user_id 1 are present. Total time taken in milliseconds by the user to enter his 10 digit unique code is stored in total_time column. Figure 4.4. Data values collected for total login time. 28 CHAPTER 5 DATABASE SCHEMA Database structure of keystroke dynamics is fairly simple and optimized. Database used for storing the values is SQLite. Below are descriptions of all the tables:  Registration: Before submitting the samples, a user is asked for registration. The table stores personal details of each user such as; First Name, Last Name, and Phone Number inputted at the time of registration. _id Fname Lname Phone  Login_details: This table is used to store login details of each user. Each time the user logs in; an entry is made in this table with the timestamp. The entries in this table are not deleted if the user login is unsuccessful. Therefore, the same user can have more than one entry in the table since the timestamps are going to be different. _id Userid Phone Timestamp  Login_time: The total time required for the user to login is stored in this table. If the user login is unsuccessful, the entry is deleted from this table. Thus, only successful login attempts of the user is maintained. _id Userid Total_time Timestamp  Dwell_details: Time elapsed between a key pressed event and key release event of each button is stored in this table with the time stamp and value of the button typed. _id Userid Key_typed Dwell_time Timestamp 29  Flight_details: The table stores the flight details of the keys. The time elapsed between the release of the previous key and pressing of the next key along with the key sequence is stored. _id Userid Key_num Flight_time Timestamp  Row_switch_details: This table basically stores the flight time along with the row switch details. Each time a row is changed when pressing a key, (Which means a key in the first row is pressed followed by a key in third row), an entry is made consisting of row switch count and flight time also known as row switch time. Row switch count is pre-determined; change from first to second row gets a ‘1’, first to third gets a ‘2’ and so on. _id Userid Row_switch_count Row_Switch_time Timestamp  Delete_frequency: This table stores the count of the button click event of delete key in each sample or successful login. _id Userid Key Timestamp 30 CHAPTER 6 CONCLUSION In this thesis project, emphasis is on the importance of keystroke dynamics for mobile devices. The implementation of keystroke dynamics on mobile devices is cost effective and compatible as integration of external hardware is not required. The conclusion of this thesis is based on comparing the data stored of a user with the login input for authentication. 6.1 TESTING The results of testing were shown by influence of practical aspects which were tested and observed. Keystroke Dynamics is a two factor security biometric security, hence, for a successful login, firstly password should be known and secondly, typing rhythm should be match. For testing, first factor was removed. Password was known so that FAR can be calculated. Following is the table which stores the testing results of eight threshold levels. After submitting the samples, authentication was tested on the mobile application. The login information was submitted by same method which was used while submitting samples, i.e. using touchpad of the mobile device and entering ten digit passwords. Results are displayed in Table 6.1. Table 6.1. FAR & FRR Values at Different Thresholds Threshold 1 2 3 4 5 6 7 8 FAR 1 .89 .49 .26 .12 .08 .04 .02 FRR 0 .11 .32 .46 .65 .76 .84 .87 Based on these FRR and FAR values ROC curve is plotted and intersection point of FRR and FAR curve is calculated which is called EER of the biometric security system. 31 From Figure 6.1, it can be seen that FRR is intersecting with FAR near to 0.38 on yaxis. This shows that ERR of keystroke dynamics is high. A biometric system is considered accurate if EER is very low. The above result is because of the limitation of using only 10 digit numeric password. Alpha numeric passwords can give high accuracy results as the keypad for alphabets is larger than the numeric keypad and number of keys used for typing password is more. 1.2 1 1 0.89 0.8 0.65 0.6 0.2 0 FRR 0.49 0.46 0.32 0.26 0.4 0.11 FAR 0.12 0 Th h ld Figure 6.1. ROC curve. 6.2 LIMITATIONS Behavior of Keystroke dynamics on mobile devices is different than on Desktop in terms of both data collection and verification. Firstly, keystrokes on touchpad keyboards cannot be read as spontaneously as on qwerty keyboard of desktops which increases the variation in sample value. Secondly, keys which are in frequent use on large keyboards for typing special characters are not present in touchpad keyboards like Shift and Capslock. Thirdly, keys in touchpads are smaller than the normal keyboard which increased the chances of typing errors. Implementation of Keystroke dynamics on Android devices created restriction to use Android APIs. Keystroke events for alphabets and special characters are not detected by 32 Android API. Hence, the alternate way was to use the 10 digit numeric identification number. This may get resolved in the future release of new firmware of Android. Thus, the use of keystroke dynamics on mobile devices is not viable until release of android API for key up and key down events of alpha numeric keypad on touch screen. 6.3 FUTURE ENHANCEMENTS The existing application was built using Android APIs using language JAVA. Hence, this application is only supported on Android devices. A latest Adobe technology allows us to create application using Adobe AIR which is supported on many other smartphones like blackberry and i-Phones. The Adobe Flash Platform already enables developers to deliver consistent application experiences across multiple browsers and operating systems. With the introduction of Adobe Flex® SDK “Hero” and Adobe Flash Builder™ “Burrito,” along with the availability of the Adobe AIR® runtime on mobile devices, developers can now build mobile Flex applications for touchscreen smartphones and tablets with the same ease and quality as on desktop platforms. At present, the latency of touchpad on Android phones is high because of which sometimes garbage time values get stored in the database. If this value is very large, it can increase the interval size with a big difference. Hence it is very necessary to ensure that we don’t have garbage values in database. At the hardware level improvement in touchpad spontaneity can be expected in near future form the mobile phones companies so that the keystrokes can be detected quickly. Moreover, update in Android API is required which will support the keystroke event detection of alphanumeric keyboard. 33 BIBLIOGRAPHY [1] WE Excel Software Pvt. Ltd. Working of Biometrics Technology, 2011. http://www.weexcel.in/NewsDesc1.aspx?cod=86, accessed Sept. 2010. [2] Webopedia.com. False Acceptance, 2011. http://www.webopedia.com/TERM/F/false_acceptance.html, accessed Jan. 2010. [3] Webopedia.com. False rejection, 2011. http://www.webopedia.com/TERM/F/false_rejection.html, accessed Jan. 2010. [4] John C. Checco. Keystroke Dynamics and Corporate Security, 2003. http://www.checco.com/about/john.checco/publications/2003_Keystroke_Biometrics _Intro.pdf, accessed Dec. 2010. [5] Tom Olzak. Keystroke Dynamics: Low Impact Biometric Verification, 2006. www.infosecwriters.com/text_resources/pdf/Keystroke_TOlzak.pdf, accessed Dec. 2010. [6] N. L. Clarke and S. M. Furnell. Authentication of users on mobile telephones – A survey of attitudes and practices. Computers & Security, 24:519-527, 2005. [7] Hataichanok Saevanee, Pattarasinee Bhatarakosol. user authentication using combination of behavioral biometrics over the touchpad acting like touch screen of mobile device. Proceedings of the International Conference on Computer and Electrical Engineering, Phuket, Thailand, 2008. [8] R. Gaines, W. Lisowski, S. Press and N. Shapiro. Authentication by keystroke timing: some preliminary results (Rand Report R-2560-NSF). Santa Monica, CA: Rand Corporation California, 1980. [9] Weber State University. Statistics Background, n.d. http://faculty.weber.edu/wlorowitz/3053/Statistics%20Background.pdf, accessed Dec. 2010. [10] Android Developers. What is Android?, 2010. http://developer.android.com/guide/basics/what-is-android.html, accessed Dec. 2010. [11] SQLit.org. About SQLite, 2010. http://www.sqlite.org/about.html, accessed Dec. 2010.