LetterRecognition          package:mlbench          R Documentation

_L_e_t_t_e_r _I_m_a_g_e _R_e_c_o_g_n_i_t_i_o_n _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     The objective is to identify each of a large number of
     black-and-white rectangular pixel displays as one of the 26
     capital letters in the English alphabet.  The character images
     were based on 20 different fonts and each letter within these 20
     fonts was randomly distorted to produce a file of 20,000 unique
     stimuli.  Each stimulus was converted into 16 primitive numerical
     attributes (statistical moments and edge counts) which were then
     scaled to fit into a range of integer values from 0 through 15. 
     We typically train on the first 16000 items and then use the
     resulting model to predict the letter category for the remaining
     4000.  See the article cited below for more details.

_U_s_a_g_e:

     data(LetterRecognition)

_F_o_r_m_a_t:

     A data frame with 20,000 observations on 17 variables, the first
     is a factor with levels A-Z, the remaining 16 are numeric.

        [,1]  lettr  capital letter
        [,2]  x.box  horizontal position of box
        [,3]  y.box  vertical position of box
        [,4]  width  width of box
        [,5]  high   height of box
        [,6]  onpix  total number of on pixels
        [,7]  x.bar  mean x of on pixels in box
        [,8]  y.bar  mean y of on pixels in box
        [,9]  x2bar  mean x variance
       [,10]  y2bar  mean y variance
       [,11]  xybar  mean x y correlation
       [,12]  x2ybr  mean of x^2 y
       [,13]  xy2br  mean of x y^2
       [,14]  x.ege  mean edge count left to right
       [,15]  xegvy  correlation of x.ege with y
       [,16]  y.ege  mean edge count bottom to top
       [,17]  yegvx  correlation of y.ege with x

_S_o_u_r_c_e:

        *  Creator: David J. Slate

        *  Odesta Corporation; 1890 Maple Ave; Suite 115; Evanston, IL
           60201

        *  Donor: David J. Slate (dave@math.nwu.edu) (708) 491-3867   

     These data have been taken from the UCI Repository Of Machine
     Learning Databases at

        *  <URL: ftp://ftp.ics.uci.edu/pub/machine-learning-databases>

        *  <URL: http://www.ics.uci.edu/~mlearn/MLRepository.html>

     and were converted to R format by
     Friedrich.Leisch@ci.tuwien.ac.at.

_R_e_f_e_r_e_n_c_e_s:

     P. W. Frey and D. J. Slate (Machine Learning Vol 6/2 March 91):
     "Letter Recognition Using Holland-style Adaptive Classifiers".

     The research for this article investigated the ability of several
     variations of Holland-style adaptive classifier systems to learn
     to correctly guess the letter categories associated with vectors
     of 16 integer attributes extracted from raster scan images of the
     letters. The best accuracy obtained was a little over 80%.  It
     would be interesting to see how well other methods do with the
     same data.

