Monday, June 24, 2013

Week 7


I spoke with Amir, and we decided to forgo writing random data to a file from the kernel. Instead, I altered the kernel to write all of the relevant data to the syslog log file on every random event. This required much less complexity than writing to a file since I was able to use the existing printk system call which exists for this purpose.

The four random events we are interested in capturing are mouse, keyboard, disk, and interrupt. The mouse and keyboard events are captured when their drivers call the add_input_randomness function and pass it an event type. This function then calls the add_timer_randomness function which adds the random bits to the entropy pool.
The disk events are captured when the add_disk_randomness function is called. This function then calls the add_timer_randomness function to add the random bits to the entropy pool.
The interrupt events are captured and handled completely separately by the add_interrupt_randomness function.

In order to separate these events, I put wrote data to the log file in the add_timer_randomness and add_interrupt_randomness functions. The latter was trivial since the interrupt events are the only ones to call this function. However, to separate the events that call add_timer_randomness, I added an event_type argument to the function. The type from add_input_randomness was passed directly, and a new type was defined and passed from the add_disk_randomness function. For each event, the number of bits credited (deemed random) is written to the log file along with all of the data mixed into the entropy pool.
There was one oddity I encountered. The possible event types passed to the add_input_randomness function are defined as:

EVENT TYPE
DESCRIPTION
EV_SYN
Used as markers to separate events. Events may be separated in time or in space, such as with the multitouch protocol.
EV_KEY
Used to describe state changes of keyboards, buttons, or other key-like devices.
EV_REL
Used to describe relative axis value changes, e.g. moving the mouse 5 units to the left.
EV_ABS
Used to describe absolute axis value changes, e.g. describing the coordinates of a touch on a touchscreen.
EV_MSC
Used to describe miscellaneous input data that do not fit into other types.
EV_SW
Used to describe binary state input switches.
EV_LED
Used to turn LEDs on devices on and off.
EV_SND
Used to output sound to devices.
EV_REP
Used for autorepeating devices.
EV_FF
Used to send force feedback commands to an input device.
EV_PWR
A special type for power button and switch input.
EV_FF_STATUS
Used to receive force feedback device status.


I assumed that the event types EV_KEY and EV_ABS correspond to keyboard and mouse events respectively. However, I have not verified this. Also, the type EV_MSC seems to get passed on a regular basis, and I don't know what event is causing this. The current code ignores all types but EV_KEY and EV_ABS. Therefore, it would be best to dig deeper into the code to confirm the relation between events and these types.

After altering the kernel code, I wrote a user space program to parse the syslog file and look for the events logged from the kernel. This parser separates the events and writes them each to a file depending on the event type.
I then wrote an installer shell script to make the kernel changes, install the new kernel, and set the parser to run on boot. The installation takes a long time, but it means that we can install this on any machine running Linux.


Next, I began to look into potential randomness in CPU temperature and fan speed. On Linux, the lm-sensors program allows you to read the values of any hardware sensors that are installed on your machine. This works very well, but is highly dependent on the hardware of different machines. My laptop for example has CPU temperature sensors, but no fan speed sensor. Therefore, I am in the process of trying to find which sensors are common in different types of machines, as well as the refresh rates for these sensors.

I also wrote a shell script which logs the all of the currently running processes and all of the information associated with them. We are hoping that we can use process statistics like memory usage as sources of randomness. I now need to write a script to periodically log this info so we can sit down and analyze the information.

Monday, June 17, 2013

Week 6

This week I have been attempting to work with and alter the linux kernel. After some difficulties recompiling and installing the kernel, I found a guide from a University of Kentucky OS class which allowed me to install my new kernel. After this, I was able to add code to log the random data to the linux syslog file through the printk function. While this was successful, the goal is to write the data to a normal file.

Information on writing to files from within the linux kernel is difficult to find since it is considered something never to be done in kernel code. However, since I am not writing code for contribution to the kernel distribution, it is still something I want to pursue. I was able to find several different ways to do this, and have tried all of them. However, each one has caused the kernel to hang on boot. I am attempting to debug the problems, but it is very time consuming since each attempt requires me to recompile the kernel. This process generally takes about 10-15 minutes.

I will continue to debug this problem in the coming week. Once this is solved, I just need to modify the code so it can tell which random event occurred, and to use this to write to a different file for each event.

Monday, June 10, 2013

Week 5


The code for the Linux Random Number Generator a continuously reseeded Twisted Generalized Feedback Shift Register. Last week, I found the places in the code where entropy is added and the PRNG is reseeded as well as the places where the entropy estimation is calculated. However, since the entropy estimation is solely based on the length of time between events, I could see no correlation between the entropy estimation and which input bits were considered random. Therefore, I decided to look deeper into the TGFSR to understand how it works and to verify what I found before. After reading several papers, I discovered that the TGFSR works by using a primitive polynomial as a cyclic generator for a finite field. It generates all the powers of x mod the primitive polynomial and evaluates them at x=1. This produces a Tausworthe sequence of statistically random bits. Using a delay factor, this sequence is transformed into a linearly independent matrix that forms the initial seed for the generator. When a random number is requested, the rows of the matrix corresponding to the powers of the primitive polynomial (called the taps) are xored together and hashed via SHA-1. The pre-hash value is then put back into the matrix and the taps are all moved up by one. The LRNG adds randomness by doing the same procedure as described above, except that the value to be put back into the matrix is first xored with the bits produced by the random event.

This functionality seems to support the observation that there is no correlation between the entropy estimate and which bits are random. As long as some of the bits produced by the random event are indeed random, the mixing procedure in conjunction with the SHA-1 hash will ensure an unpredictable output.

In light of this, we decided to proceed by writing all of the random bits from each event to separate files so that we could still do some analysis. I began working on this by trying to modify the Linux kernel in a very small way and then to recompile and reinstall. I am attempting this on a Ubuntu Virtual Machine on a Mac laptop through Virtual Box. I am able to modify and recompile the source code, but I am having difficulty installing the kernel with my changes. The instructions from the book Linux Kernel Development are for using the original Grub boot loader. However, Virtual Box seems to be using Grub2. I have found some forum posts on-line, but am now getting a file system error when I try to boot the new version. I will continue this work in the coming week.

Monday, June 3, 2013

Weeks 3/4


The next goal for our project is to gather a stream of random bits generated from the Linux kernel. As described in the paper “Analysis of the Linux Random Number Generator” by Gutterman & Pinkas, the Linux Random Number Generator (LRNG) gathers random bits from the following sources:
* Keybord
* Mouse
* Hard Drive
* Interrupts

We wish to gather a sampling of these random bits in order to test their statistical properties.

I downloaded the latest Linux source (version 3.9.3) and began looking through the RNG. The majority of relevant code can be found in the file random.c. As documented in this file, and as described in the paper by Gutterman & Pinkas, the LRNG functions as a Twisted Generalized Feedback Shift Register. The functionality is is a pseudo-random number generator that is continuously reseeded as new random events occur.

Each random event produces two random words. One is based on the timing, and the other is based on the type of event. The LRNG runs an algorithm to determine the number of bits that were actually random in those words, and increments the current entropy estimate accordingly. When a random byte is requested from the pool, the output is hashed using the SHA-1 algorithm. It seems that the code is somewhat dependent on the randomness of SHA-1 in that if some part of the input is random, then it will hash to an unpredictable output. The difficulty for us will be determining which of the pre-hash bits are actually random according to the LRNG's estimate, since these are the ones we want to analyze.
I have located each function that adds entropy to the pool for each type of random event. So once I determine how to extract which bits we are interested in, I should be able to output those bits to a file.

I have installed a Linux virtual machine on a Mac laptop through virtual box and plan to make changes to the kernel source in order to write the random bits from each source to a separate file.

I have read parts of Robert Love's book “Linux Kernel Development”. The book is based of the Linux kernel version 2.6.10, but still seems to be a good reference for the kernel's functionality.

I have also been reading up on the functionality of the Twisted Generalized Feedback Shift Register PRNG. It involves quite a bit of complex abstract algebra, but I think I have a decent grasp of its basic functionality. This knowledge is extremely important since the entire LRNG code in random.c is based on it.

I am continuing to examine the kernel code, and am continuing to understand more of it. In the next week, I will continue this process with the goal of extracting the random bits as they are mixed into the entropy pool.