Soybean               package:mlbench               R Documentation

_S_o_y_b_e_a_n _D_a_t_a_b_a_s_e

_D_e_s_c_r_i_p_t_i_o_n:

     There are 19 classes, only the first 15 of which have been used in
     prior work.  The folklore seems to be that the last four classes
     are unjustified by the data since they have so few examples. There
     are 35 categorical attributes, some nominal and some ordered.  The
     value ``dna'' means does not apply.  The values for attributes are
     encoded numerically, with the first value encoded as ``0,'' the
     second as ``1,'' and so forth.

_U_s_a_g_e:

     data(Soybean)

_F_o_r_m_a_t:

     A data frame with 683 observations on 36 variables. There are 35
     categorical attributes, all numerical and a nominal denoting the
     class.

       [,1]   Class            the 19 classes
       [,2]   date             apr(0),may(1),june(2),july(3),aug(4),sept(5),oct(6).
       [,3]   plant.stand      normal(0),lt-normal(1).
       [,4]   precip           lt-norm(0),norm(1),gt-norm(2).
       [,5]   temp             lt-norm(0),norm(1),gt-norm(2).
       [,6]   hail             yes(0),no(1).
       [,7]   crop.hist        dif-lst-yr(0),s-l-y(1),s-l-2-y(2), s-l-7-y(3).
       [,8]   area.dam         scatter(0),low-area(1),upper-ar(2),whole-field(3).
       [,9]   sever            minor(0),pot-severe(1),severe(2).
       [,10]  seed.tmt         none(0),fungicide(1),other(2).
       [,11]  germ             90-100%(0),80-89%(1),lt-80%(2).
       [,12]  plant.growth     norm(0),abnorm(1).
       [,13]  leaves           norm(0),abnorm(1).
       [,14]  leaf.halo        absent(0),yellow-halos(1),no-yellow-halos(2).
       [,15]  leaf.marg        w-s-marg(0),no-w-s-marg(1),dna(2).
       [,16]  leaf.size        lt-1/8(0),gt-1/8(1),dna(2).
       [,17]  leaf.shread      absent(0),present(1).
       [,18]  leaf.malf        absent(0),present(1).
       [,19]  leaf.mild        absent(0),upper-surf(1),lower-surf(2).
       [,20]  stem             norm(0),abnorm(1).
       [,21]  lodging          yes(0),no(1).
       [,22]  stem.cankers     absent(0),below-soil(1),above-s(2),ab-sec-nde(3).
       [,23]  canker.lesion    dna(0),brown(1),dk-brown-blk(2),tan(3).
       [,24]  fruiting.bodies  absent(0),present(1).
       [,25]  ext.decay        absent(0),firm-and-dry(1),watery(2).
       [,26]  mycelium         absent(0),present(1).
       [,27]  int.discolor     none(0),brown(1),black(2).
       [,28]  sclerotia        absent(0),present(1).
       [,29]  fruit.pods       norm(0),diseased(1),few-present(2),dna(3).
       [,30]  fruit.spots      absent(0),col(1),br-w/blk-speck(2),distort(3),dna(4).
       [,31]  seed             norm(0),abnorm(1).
       [,32]  mold.growth      absent(0),present(1).
       [,33]  seed.discolor    absent(0),present(1).
       [,34]  seed.size        norm(0),lt-norm(1).
       [,35]  shriveling       absent(0),present(1).
       [,36]  roots            norm(0),rotted(1),galls-cysts(2).

_S_o_u_r_c_e:

        *  Source: R.S. Michalski and R.L. Chilausky "Learning by Being
           Told and Learning from Examples: An Experimental Comparison
           of the Two Methods of Knowledge Acquisition in the Context
           of Developing an Expert System for Soybean Disease
           Diagnosis", International Journal of Policy Analysis and
           Information Systems, Vol. 4, No. 2, 1980.

        *  Donor: Ming Tan & Jeff Schlimmer (Jeff.Schlimmer%cs.cmu.edu)

     These data have been taken from the UCI Repository Of Machine
     Learning Databases at

        *  <URL: ftp://ftp.ics.uci.edu/pub/machine-learning-databases>

        *  <URL: http://www.ics.uci.edu/~mlearn/MLRepository.html>

     and were converted to R format by
     Evgenia.Dimitriadou@ci.tuwien.ac.at.

_R_e_f_e_r_e_n_c_e_s:

     Tan, M., & Eshelman, L. (1988). Using weighted networks to
     represent classification knowledge in noisy domains.  Proceedings
     of the Fifth International Conference on Machine Learning (pp.
     121-134). Ann Arbor, Michigan: Morgan Kaufmann. - IWN recorded a
     97.1% classification accuracy  - 290 training and 340 test
     instances

     Fisher,D.H. & Schlimmer,J.C. (1988). Concept Simplification and
     Predictive Accuracy. Proceedings of the Fifth International
     Conference on Machine Learning (pp. 22-28). Ann Arbor, Michigan:
     Morgan Kaufmann. - Notes why this database is highly predictable

