Home     Books & Chapters     Publications     Video Lectures     PyHubs     Typing Dynamics    

Task1 - Person Authentication

For this task of the person identification challenge, we recoded the dynamics of typing with a JavaScript application. In each typing session, the users were asked to type some sentences and the keyboard events "keyup", "keydown" and "keypress" were captured by the JavaScript application and stored on our web server. In the file keystrokes-12users-raw-data.txt, each record starts with the keyword TYPING PATTERN which is followed by the identifier of the typing session. Each subsequent line corresponds to a keyboard event. These event are: keypress, keyup and keydown. Each line, contains the following pieces of information:

  • type of the keyboard event (keydown/keyup/keypress)

  • event.keyCode (the keyCode field of the corresponding JavaScript event)

  • event.which (the which field of the corresponding JavaScript event)

  • event.charCode (the charCode field of the corresponding JavaScript event)

  • event.shiftKey (the shiftKey field of the corresponding JavaScript event)

  • the value returned by JavaScript's Date.getTime() function, i.e., the number of milliseconds since the 1st of January 1970.

Additionally, you are given the true identity of the users (coded by integer numbers from 1 to 12) for 5 typing pattern per users in the file keystrokes-12users-train-labels.txt. Each line of this file contains two numbers separated by a comma:

  • the identifier of a typing pattern (pattern id for short), and
  • the identifier of the user who typed that pattern (user id for short).

The file keystrokes-12users-test-hypothetic-labels.txt contains the hypothetical identities of the users for the rest of the typing patterns (i.e., for those typing patterns for which the true identity of the user is not given in keystrokes-12users-train-labels.txt). The hypothetical identities are given in the same format as the true identities (i.e., as pairs of pattern ids and user ids).

Your task is to decide if the hypothetical identities in keystrokes-12users-test-hypothetic-labels.txt match the true identities of the users who typed those patterns.


Your solution for the above task should be a list of pairs. The first number of the pair should be the pattern id. The second number of the pair depends on your prediction. If you predict that the hypothetical user identity for the pattern is the same as the true user identity, the second number should be 1. Otherwise it should be 0. Please store your pairs into a CSV-file, each line of which should correspond to one of your pairs and upload this file as your submission. An example for a file that can be uploaded as submission is keystrokes-12users-test-random-submission.txt. This file contains all the pattern ids for which predictions are expected, but the predictions (i.e., 0-s or 1-s depending on whether the hypothetical user identifiers are predicted to match the true user identifiers) are filled by a random number generator.

When submitting your solutions, you will need to provide your "login". Please use your nickname as "login". Your solutions will only be displayed at the leaderboard once the administrator has approved your nickname. You may also provide a short description of your solution.


Evaluation of the submissions is based on accuracy. The results reported on the leaderboard are calculated based on a subset of the predictions. In order to ensure the validity of the results, the organizers of the challenge calculate the accuracy on the rest of the predictions as well. Submissions are not displayed on the leaderboard if the accuracy on the both subsets are substantially different.

What to win?

If you have a model that performs well, and you want to write a joint research paper, please feel free to contact us in e-mail: buza at biointelligence dot hu .

"Honest Usage" Policy

Finally, we mention that there are many ways of cheating. As this is a machine learning challenge, the intended behaviour is to train a classifier and use it to make predictions. While you are training a classifier, you are expected to use the labelled data as training data in order to fit your model. Any form of cheating or dishonest behaviour is discouraged. If dishonest behaviour is detected, the affected results will be removed from the leaderboard.

If you have any questions, please feel free to contact us in e-mail:
buza at biointelligence dot hu

Good luck with the challenge!