Weekly SPQR Blog: 2013

Monday, August 19, 2013

Week 15

This week I continued working on the Android randomness collection.

Networking:

First, I wrote a simple server program using Java sockets to accept and print any incoming communication. I then wrote a corresponding client program and ran both on the same machine. After I got this working, I verified that they also worked on separate machines. With this working, I modified the client program for Android and ran it on the Nexus S. This process mainly involved creating a new thread to do the network communication since Android does not allow this from the UI thread. This allowed me to get working communication between the Android phone and a server.

Next, I wrote the final server program. The networking is very similar to the previous program, but I modified the structure to spawn a new thread for every client. I then wrote synchronized methods to separate the messages based on type (wifi, bluetooth, antenna) and write them to files local to the server.

Android Services:

As stated in my previous post, I want to alter my existing Android code to use services so that the user doesn't have to leave the collection app open for it to do its job. Therefore, I did some research on services and the code needed to implement them. I followed instructions I have found online, and have modified the previous bluetooth program to use a service. At present, I am debugging this since the service doesn't ever seem to start. I am confident however, that I will resolve this quickly, and that once I do, the other two parts will be easy to finish.

Sunday, August 4, 2013

Week 13

Last week, I began working on an Android App to collect data on available wifi networks. This week, I was able to finish this app as well as to write apps to collect bluetooth and cell tower data.

Wifi and Bluetooth:

The Android API provides the built in classes WifiManger and BleuetoothAdapter. I was able to use these classes to set up BroadcastReceivers. The apps continuously scans for wifi/bluetooth networks and upon completion, the respective BroadcastReceiver gets called to print the data.

Cell Towers:

The Android API provides the built in class TelephonyManager. This class is very helpful, but I ran into a problem using its getAllCellInfo method. Apparently, this method is not supported on Nexus phones and therefore always returns null. Therefore, I was not able to access the built in CellTower Objects. However, I was able to use the getNeighboringCellInfo method to get the RSSI, Cell ID (CID) and Location Area Code (LAC) for each available cell tower.

I was able to test all three apps, and they seem to be working well.

I then began learning more about Android services. These are essentially processes that can be run in the background. I would like to convert my current code to make use of services so that the user doesn't need to worry about anything other than starting and stopping the app. This will be one of my main goals next week.

Since the end goal is to deploy this app on many phones for data collection, the goal is to have each phone send its acquired data back to a central location. I will therefore write a server application to collect and store the data being sent by the test apps. I plan to write this program next week and deploy it on the spqr bicuspid server.

Finally, I will be on vacation the week of August 5^th, so my work will continue on August 12^th.

Monday, July 29, 2013

Week 12

During the beginning of the week, I continued to learn more about the Android programming environment. I have been using a mixture of the Google online resources and a number of youtube tutorials. At this point, I feel that I have a good understanding of the basic building blocks of an Android application and that anything else I might need while creating an app, I will be able to learn when necessary.

On Thursday, Vishal and I met with Amir to discuss the sources of mobile randomness we wish to investigate. We came up with the following six sources:

- Available Bluetooth Device Statistics

- Available Wifi Network Statistics

- Antenna Signal Strength

- Battery Power/Voltage

- Battery Charge/Discharge Rate

- Running Process Statistics

Vishal and I split the list and each of us will work on three sources. For now, we will each create an app, with the eventual goal being to combine them into a single app that can collect from all six sources. We also hope to set up some sort of networking so that we can send the aquired information to a central location rather than having it stored locally.

I took the first three sources and have begun working on recording available wifi networks and their statistics. Android has a built-in WifiManager class that facilitates any wifi interaction. I have written a test application, but am getting errors when running it on the emulator. I don't know if the errors are due to my program or the fact that it is running on an emulator. In the next week, I will debug the wifi app, and work on the other two sources.

Monday, July 22, 2013

Week 11

The next step in our survey of entropy sources will be mobile devices. Mobile devices are extremely prevalent, and commonly perform tasks that require a high degree of security. These devices offer the advantages of mobility and thus an unpredictable environment, as well as several embedded devices such as cameras and microphones. However, they are more susceptible to theft and lack the computing resources of more traditional machines.

I read a paper: “The Sources of Randomness in Mobile Devices” (from Masaryk University, Czech Republic). The authors do a study of the entropy gathering potential of mobile devices. They focus mainly on the microphone and camera inputs and through statistical analysis show that these devices are capable of providing truly random bits. They do not give a good idea however of the rate at which such bits might be generated.

For our project, I anticipate analyzing sources such as the microphone, camera, battery level, signal strength, and gps position. We might also be able to use some of the sources we have already examined such as bluetooth, wifi, and process statistics.

In order to investigate these sources, I have been learning how to create apps for the android operating system. Google provides a good training tutorial that I have been using. I have learned quite a bit by creating several apps with basic functionality. I have a bit farther to go in this training, but it should help me to create some apps to record the sources we are interested in.

Monday, July 15, 2013

Week 10

Last week I began working on a program to record wifi statistics. I was using the C interface for the iwlist library. After some struggles trying to find the statistics I was looking for in the different structures throughout the code, I decided to use the built in iwlist scan function. I therefore wrote a parser program and a shell script. The script runs the system call: iwlist wlan0 scan and pipes the results into the parser program. The parser then goes through the results, extracts the useful information and writes it to a file. We now have the capability to record available wifi networks and their related statistics.

I then began work on a similar program for recording available Bluetooth statistics. This proved a little more difficult since most statistics require a connection. I was able to find the device mac-address and clock offset for each available device without establishing a connection. I used the system call: hcitool inq (inquire). However, while this function is useful, it often does not find all of the available devices. I therefore wrote a program to continuously run this function and record any new devices encountered.

I then spoke with Amir, and we decided to move into mobile devices. He provided me with a Nexus S Android phone. I have never done any mobile programming, so I started an online tutorial on mobile app development at http://www.coreservlets.com/android-tutorial/. This is going well and I am learning a lot.

Monday, July 8, 2013

Week 9

After looking into the event codes with Amir, we decided to just record all of the events so that we could look at them after recording some data. Therefore, I changed the code I had written to record every event to a different file. We will now be able to observe both the quality and frequency of each random event.

We have now written scripts to record online sources, Linux random hardware events, and process statistics. With all of these sources done, we decided to begin looking into wifi signals. I downloaded the source code for the iwlib library. This code interfaces with the Linux kernel to retrieve the available wifi information. I took the relevant code and wrote a new program to record for each wifi source, its ESSID, frequency, quality and level. I am still working on finding a way to record the router mac address.

When I last spoke to Amir, he said he was going to buy a bluetooth dongle so that we could also look at available bluetooth signals in a similar manner as we are currently doing for wifi signals.

Monday, July 1, 2013

Week 8

This week, I continued on the tasks I had been working on last week. Fist, I've never really learned how to correctly write shell scripts, and since I've been writing so many lately, I took some time to do a shell scripting tutorial at linuxcommand.org. This really helped, and I edited some of the scripts I've written in past weeks for better functionality. I then wrote a script to periodically log the process statistics of the running machine. This writes all the data resulting from the top command to a file at a user defined interval.

I then spent some time digging deeper into the Linux kernel to track down the event codes. I found some relatively detailed documentation about their usage, and determined that for the most part, we were correct about their usage. There are two potential problems though:

1. The EV_KEY event code is used for any key or button press. This is exactly what we want for the keyboard, but it means that this will also encode any mouse button press.

2. The EV_ABS event code records the absolute position of the mouse. This might be what we want, but there is also an event code EV_REL which encodes the relative movement of the mouse. It seems to me that EV_ABS would be a better random source for a touchscreen environment, whereas EV_REL would be better for a system with a typical mouse.

I will speak to Amir about this and determine if we should change the current version of the code.

I also spent some time trying to determine the prevalence of different hardware sensors in different machines. I am having a very difficult time finding this information though. My current thinking is that anything more than CPU temperature would be a bad source due to lack of hardware support across multiple platforms.

Monday, June 24, 2013

Week 7

I spoke with Amir, and we decided to forgo writing random data to a file from the kernel. Instead, I altered the kernel to write all of the relevant data to the syslog log file on every random event. This required much less complexity than writing to a file since I was able to use the existing printk system call which exists for this purpose.

The four random events we are interested in capturing are mouse, keyboard, disk, and interrupt. The mouse and keyboard events are captured when their drivers call the add_input_randomness function and pass it an event type. This function then calls the add_timer_randomness function which adds the random bits to the entropy pool.

The disk events are captured when the add_disk_randomness function is called. This function then calls the add_timer_randomness function to add the random bits to the entropy pool.

The interrupt events are captured and handled completely separately by the add_interrupt_randomness function.

In order to separate these events, I put wrote data to the log file in the add_timer_randomness and add_interrupt_randomness functions. The latter was trivial since the interrupt events are the only ones to call this function. However, to separate the events that call add_timer_randomness, I added an event_type argument to the function. The type from add_input_randomness was passed directly, and a new type was defined and passed from the add_disk_randomness function. For each event, the number of bits credited (deemed random) is written to the log file along with all of the data mixed into the entropy pool.

There was one oddity I encountered. The possible event types passed to the add_input_randomness function are defined as:

EVENT TYPE	DESCRIPTION
EV_SYN	Used as markers to separate events. Events may be separated in time or in space, such as with the multitouch protocol.
EV_KEY	Used to describe state changes of keyboards, buttons, or other key-like devices.
EV_REL	Used to describe relative axis value changes, e.g. moving the mouse 5 units to the left.
EV_ABS	Used to describe absolute axis value changes, e.g. describing the coordinates of a touch on a touchscreen.
EV_MSC	Used to describe miscellaneous input data that do not fit into other types.
EV_SW	Used to describe binary state input switches.
EV_LED	Used to turn LEDs on devices on and off.
EV_SND	Used to output sound to devices.
EV_REP	Used for autorepeating devices.
EV_FF	Used to send force feedback commands to an input device.
EV_PWR	A special type for power button and switch input.
EV_FF_STATUS	Used to receive force feedback device status.

I assumed that the event types EV_KEY and EV_ABS correspond to keyboard and mouse events respectively. However, I have not verified this. Also, the type EV_MSC seems to get passed on a regular basis, and I don't know what event is causing this. The current code ignores all types but EV_KEY and EV_ABS. Therefore, it would be best to dig deeper into the code to confirm the relation between events and these types.

After altering the kernel code, I wrote a user space program to parse the syslog file and look for the events logged from the kernel. This parser separates the events and writes them each to a file depending on the event type.

I then wrote an installer shell script to make the kernel changes, install the new kernel, and set the parser to run on boot. The installation takes a long time, but it means that we can install this on any machine running Linux.

Next, I began to look into potential randomness in CPU temperature and fan speed. On Linux, the lm-sensors program allows you to read the values of any hardware sensors that are installed on your machine. This works very well, but is highly dependent on the hardware of different machines. My laptop for example has CPU temperature sensors, but no fan speed sensor. Therefore, I am in the process of trying to find which sensors are common in different types of machines, as well as the refresh rates for these sensors.

I also wrote a shell script which logs the all of the currently running processes and all of the information associated with them. We are hoping that we can use process statistics like memory usage as sources of randomness. I now need to write a script to periodically log this info so we can sit down and analyze the information.

Monday, June 17, 2013

Week 6

This week I have been attempting to work with and alter the linux kernel. After some difficulties recompiling and installing the kernel, I found a guide from a University of Kentucky OS class which allowed me to install my new kernel. After this, I was able to add code to log the random data to the linux syslog file through the printk function. While this was successful, the goal is to write the data to a normal file.

Information on writing to files from within the linux kernel is difficult to find since it is considered something never to be done in kernel code. However, since I am not writing code for contribution to the kernel distribution, it is still something I want to pursue. I was able to find several different ways to do this, and have tried all of them. However, each one has caused the kernel to hang on boot. I am attempting to debug the problems, but it is very time consuming since each attempt requires me to recompile the kernel. This process generally takes about 10-15 minutes.

I will continue to debug this problem in the coming week. Once this is solved, I just need to modify the code so it can tell which random event occurred, and to use this to write to a different file for each event.

Monday, June 10, 2013

Week 5

The code for the Linux Random Number Generator a continuously reseeded Twisted Generalized Feedback Shift Register. Last week, I found the places in the code where entropy is added and the PRNG is reseeded as well as the places where the entropy estimation is calculated. However, since the entropy estimation is solely based on the length of time between events, I could see no correlation between the entropy estimation and which input bits were considered random. Therefore, I decided to look deeper into the TGFSR to understand how it works and to verify what I found before. After reading several papers, I discovered that the TGFSR works by using a primitive polynomial as a cyclic generator for a finite field. It generates all the powers of x mod the primitive polynomial and evaluates them at x=1. This produces a Tausworthe sequence of statistically random bits. Using a delay factor, this sequence is transformed into a linearly independent matrix that forms the initial seed for the generator. When a random number is requested, the rows of the matrix corresponding to the powers of the primitive polynomial (called the taps) are xored together and hashed via SHA-1. The pre-hash value is then put back into the matrix and the taps are all moved up by one. The LRNG adds randomness by doing the same procedure as described above, except that the value to be put back into the matrix is first xored with the bits produced by the random event.

This functionality seems to support the observation that there is no correlation between the entropy estimate and which bits are random. As long as some of the bits produced by the random event are indeed random, the mixing procedure in conjunction with the SHA-1 hash will ensure an unpredictable output.

In light of this, we decided to proceed by writing all of the random bits from each event to separate files so that we could still do some analysis. I began working on this by trying to modify the Linux kernel in a very small way and then to recompile and reinstall. I am attempting this on a Ubuntu Virtual Machine on a Mac laptop through Virtual Box. I am able to modify and recompile the source code, but I am having difficulty installing the kernel with my changes. The instructions from the book Linux Kernel Development are for using the original Grub boot loader. However, Virtual Box seems to be using Grub2. I have found some forum posts on-line, but am now getting a file system error when I try to boot the new version. I will continue this work in the coming week.

Monday, June 3, 2013

Weeks 3/4

The next goal for our project is to gather a stream of random bits generated from the Linux kernel. As described in the paper “Analysis of the Linux Random Number Generator” by Gutterman & Pinkas, the Linux Random Number Generator (LRNG) gathers random bits from the following sources:

* Keybord

* Mouse

* Hard Drive

* Interrupts

We wish to gather a sampling of these random bits in order to test their statistical properties.

I downloaded the latest Linux source (version 3.9.3) and began looking through the RNG. The majority of relevant code can be found in the file random.c. As documented in this file, and as described in the paper by Gutterman & Pinkas, the LRNG functions as a Twisted Generalized Feedback Shift Register. The functionality is is a pseudo-random number generator that is continuously reseeded as new random events occur.

Each random event produces two random words. One is based on the timing, and the other is based on the type of event. The LRNG runs an algorithm to determine the number of bits that were actually random in those words, and increments the current entropy estimate accordingly. When a random byte is requested from the pool, the output is hashed using the SHA-1 algorithm. It seems that the code is somewhat dependent on the randomness of SHA-1 in that if some part of the input is random, then it will hash to an unpredictable output. The difficulty for us will be determining which of the pre-hash bits are actually random according to the LRNG's estimate, since these are the ones we want to analyze.

I have located each function that adds entropy to the pool for each type of random event. So once I determine how to extract which bits we are interested in, I should be able to output those bits to a file.

I have installed a Linux virtual machine on a Mac laptop through virtual box and plan to make changes to the kernel source in order to write the random bits from each source to a separate file.

I have read parts of Robert Love's book “Linux Kernel Development”. The book is based of the Linux kernel version 2.6.10, but still seems to be a good reference for the kernel's functionality.

I have also been reading up on the functionality of the Twisted Generalized Feedback Shift Register PRNG. It involves quite a bit of complex abstract algebra, but I think I have a decent grasp of its basic functionality. This knowledge is extremely important since the entire LRNG code in random.c is based on it.

I am continuing to examine the kernel code, and am continuing to understand more of it. In the next week, I will continue this process with the goal of extracting the random bits as they are mixed into the entropy pool.

Monday, May 20, 2013

SPQR Week #2

Gathering Randomness from Online Sources:

Our project will be to examine different sources of entropy that can be used to provide cryptographically secure random bits. There are a number of online sources that will provide random bits, generated from a number of sources. Although such online sources should not be used for sensitive purposes, they are useful for examining the sources from which they reportedly gather entropy. With this intention, I have written four scripts to gather random bits from the following four online sources:

HotBits: Entropy gather from "timing successive pairs of radioactive decays detected by a Geiger-

Müller tube interfaced to a computer."

random.org: Entropy gathered from atmospheric noise

EntropyPool: Entropy gathered from "local processes, files and devices, Web page hits and remote

Web sites"

randomnumbers.info: Entropy gathered from a quantum photon detector.

Hypothesis testing and cryptographic math:

Having completed the previous task, I spent the rest of the week brushing up on some math. A large part of this project will be testing various entropy sources to see if the resulting random but streams are in fact random. This will be done mainly with the aid of several software packages, mainly the NIST test suite. However, its been over a year since I took statistics, so I spent some time brushing up on hypothesis testing so that I will be able to understand the tests when we begin applying them.

I also continued to read up on the basics of cryptography. I spent some time this week working through some of the math behind some of the encryption schemes and making sure that I understood how they work.

Monday, May 13, 2013

Background Papers on Cryptographic Sources of Randomness

Mining Your P’s and Q’s: Detection of Widespread Weak Keys in Network Devices:

Summary: The authors conducted an Internet-wide scan of RSA public keys and DSA signatures. With this information, they searched for pairs of hosts whose RSA keys share prime factors, or whose DSA signatures share ephemeral keys. Both scenarios potentially compromise each host’s private key. The study showed that devices are vulnerable when they generate keys on boot, before the OS has been able to collect sufficient entropy to seed their pseudo-random number generators.

Thoughts: The results of this study seem to indicate that currently used entropy sources provide good sources of randomness when given sufficient collection time. However, the authors note that several popular implementations of RSA and DSA do not use the Linux interface random, which blocks until sufficient entropy has been collected, but instead use random, which immediately returns with the requested number of random bytes. The authors also note that the greatest vulnerabilities come from imbedded or headless devices that do not have user input nor, sometimes, disk drives. Both of which are commonly used to generate entropy.

This discussion seems to indicate that our work should focus on sources of randomness that would be available to such embedded devices. Also, it would be helpful if an entropy source could be discovered that would quickly generate randomness. This would discourage developers from dismissing the use of a blocking implementation.

Analysis of the Linux Random Number Generator:

Summary: The authors have analyzed the source code for the Linux Random Number Generator (LRNG). They provide details as to its functionality and mention several potential security flaws. The LRNG maintains three separate entropy pools. The primary pool gets takes entropy from mouse/keyboard inputs, disk I/O operations, and certain interrupts. The secondary and urandom pools then take entropy from the primary pool. The secondary pool feeds the random system call which blocks until sufficient entropy has been gathered. The urandom pool feeds the urandom system call which immediately returns with the requested number of random bytes, regardless of the amount of entropy gathered.

Thoughts: Most of the vulnerabilities described by the authors seem to be a result of the specific implementation of the LRNG, and could be remedied by the solutions that the authors suggest. The largest vulnerability however, is for embedded devices with no hard disk such as certain routers. As the authors note, the only real source of entropy is network interrupts which are also potentially observable by an attacker. The best advancement to the field would be to find a better source of entropy for embedded devices such as this.

When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography:

Summary: The authors discuss how a VM snapshot may cache cryptographic randomness. Running from this snapshot can then lead to using repeated, and thus predictable randomness. The most significant/likely vulnerability is that DSA may generate a repeated r value due to the RNG being seeded with the same, old value.

The authors also discuss a method for hedging against potential randomness failures. The general idea is to hash the generated random bits before using them in any cryptographic operation. This does not alleviate the problem of bad randomness, but ensures slightly stronger security in the face of a bad RNG.

Thoughts: The authors have pointed out a significant vulnerability that comes about from repeated randomness. They point out that the problem can be solved by ensuring that fresh randomness is incorporated into any cryptographic operations immediately before it is computed. They do mention however, that this may be a problem if such randomness sources are also cached. Therefore, they suggest using hardware generated randomness for this purpose rather than an entropy pool such as the Linux random number generator, which may be cached. This discussion highlights the largest problem in the field: there is a tradeoff between performance and security. It may take a significant amount of time to generate sufficient entropy for many cryptographic operations to be considered secure. However, many developers are more interested in the performance of their applications, and are not willing to wait for this security.

The authors also discuss hedging which adds some security in the face of a bad RNG. However, they note that this merely allows a function to fail gracefully. Therefore, despite this method, effective random number generation is still extremely important.

A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications:

Summary: This paper describes a test suite that can be used to determine if a given random number generator outputs bit sequences that are sufficiently random. The main purpose of the tests described is to ensure that the random number generator produces bits that are uniformly random, and completely independent. The following table outlines each test and its purpose.

Test	Purpose
The Frequency (Monobit) Test	Determine whether the number of ones and zeros in a sequence are approximately the same as would be expected for a truly random sequence
Frequency Test within a Block	Determine whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption of randomness.
The Runs Test	Determine whether the number of runs of ones and zeros of various lengths is as expected for a random sequence.
Tests for the Longest-Run-of-Ones in a Block	Determine whether the length of the longest run of ones within the tested sequence is consistent with the length of the longest run of ones that would be expected in a random sequence.
The Binary Matrix Rank Test	Check for linear dependence among fixed length substrings of the original sequence.
The Discrete Fourier Transform (Spectral) Test	Detect periodic features (i.e., repetitive patterns that are near each other) in the tested sequence that would indicate a deviation from the assumption of randomness.
The Non-overlapping Template Matching Test	Detect generators that produce too many occurrences of a given non-periodic (aperiodic) pattern.
The Overlapping Template Matching Test	Same as Non-Overlapping Template Matching Test but when the pattern is found, the window slides only one bit before resuming the search.
Maurer's "Universal Statistical" Test	Detect whether or not the sequence can be significantly compressed without loss of information.
The Linear Complexity Test	Determine whether or not the sequence is complex enough to be considered random.
The Serial Test	Determine whether the number of occurrences of the 2^m m-bit overlapping patterns is approximately the same as would be expected for a random sequence.
The Approximate Entropy Test	Compare the frequency of overlapping blocks of two consecutive/adjacent lengths (m and m+1) against the expected result for a random sequence.
The Cumulative Sums (Cusums) Test	Determine whether the cumulative sum of the partial sequences occurring in the tested sequence is too large or too small relative to the expected behavior of that cumulative sum for random sequences.
The Random Excursions Test	Determine if the number of visits to a particular state within a cycle deviates from what one would expect for a random sequence.
The Random Excursions Variant Test	Detect deviations from the expected number of visits to various states in the random walk.

Thoughts: This test suite will be very useful for our purposes. It will be crucial in testing the sources of randomness that we come up with. A RNG is of no use if its results are not random or independent as an attacker may be able to gain information if the RNG is used for cryptographic purposes.

How to Generate Cryptographically Strong Sequences of Pseudo Random Bits:

Summary: The authors present a pseudo random number generator that, given a random seed, is capable of producing a statistically random bit sequence. The generator relies on the intractability of the Discrete Logarithm Problem to show that given any sequence of generated bits, an attacker cannot predict the next bit with probability greater than 0.5 in polynomial time.

Thoughts: Pseudo Random number generators are extremely important for cryptographic applications since such functions often cannot wait for a true random number generator to provide enough randomness for each bit requested. Therefore, a true RNG can act as a seed to a PRNG to provide a statistically random sequence. This assumes however, that the PRNG sequence appears random and independent without knowledge of the seed. For our purposes, PRNG’s are important in that we must find entropy sources that can generate a seed for such a generator.