BreastCancer             package:mlbench             R Documentation

_W_i_s_c_o_n_s_i_n _B_r_e_a_s_t _C_a_n_c_e_r _D_a_t_a_b_a_s_e

_D_e_s_c_r_i_p_t_i_o_n:

     The objective is to identify each of a number of benign or
     malignant classes. Samples arrive periodically as Dr. Wolberg
     reports his clinical cases. The database therefore reflects this
     chronological grouping of the data.  This grouping information
     appears immediately below, having been removed from the data
     itself.  Each variable except for the first was converted into 11
     primitive numerical attributes with values ranging from 0 through
     10.  There are 16 missing attribute values. See cited below for
     more details.

_U_s_a_g_e:

     data(BreastCancer)

_F_o_r_m_a_t:

     A data frame with 699 observations on 11 variables, one being a
     character variable, 9 being ordered or nominal, and 1 target
     class.

       [,1]   Id               Sample code number
       [,2]   Cl.thickness     Clump Thickness
       [,3]   Cell.size        Uniformity of Cell Size
       [,4]   Cell.shape       Uniformity of Cell Shape
       [,5]   Marg.adhesion    Marginal Adhesion
       [,6]   Epith.c.size     Single Epithelial Cell Size
       [,7]   Bare.nuclei      Bare Nuclei
       [,8]   Bl.cromatin      Bland Chromatin
       [,9]   Normal.nucleoli  Normal Nucleoli
       [,10]  Mitoses          Mitoses
       [,11]  Class            Class

_S_o_u_r_c_e:

        *  Creator: Dr. WIlliam H. Wolberg (physician); University of
           Wisconsin Hospital ;Madison; Wisconsin; USA 

        *  Donor: Olvi Mangasarian (mangasarian@cs.wisc.edu)

        *  Received: David W. Aha (aha@cs.jhu.edu)

     These data have been taken from the UCI Repository Of Machine
     Learning Databases at

        *  <URL: ftp://ftp.ics.uci.edu/pub/machine-learning-databases>

        *  <URL: http://www.ics.uci.edu/~mlearn/MLRepository.html>

     and were converted to R format by
     Evgenia.Dimitriadou@ci.tuwien.ac.at.

_R_e_f_e_r_e_n_c_e_s:

     1. Wolberg,W.H., & Mangasarian,O.L. (1990). Multisurface method of
      pattern separation for medical diagnosis applied to breast
     cytology. In Proceedings of the National Academy of Sciences, 87,
     9193-9196.
      - Size of data set: only 369 instances (at that point in time)
      - Collected classification results: 1 trial only
      - Two pairs of parallel hyperplanes were found to be consistent
     with 50% of the data
      - Accuracy on remaining 50% of dataset: 93.5%
      - Three pairs of parallel hyperplanes were found to be consistent
     with 67% of data
      - Accuracy on remaining 33% of dataset: 95.9%

     2. Zhang,J. (1992). Selecting typical instances in instance-based
     learning.  In Proceedings of the Ninth International Machine
     Learning Conference (pp. 470-479).  Aberdeen, Scotland: Morgan
     Kaufmann.
      - Size of data set: only 369 instances (at that point in time)
      - Applied 4 instance-based learning algorithms
      - Collected classification results averaged over 10 trials
      - Best accuracy result: 
      - 1-nearest neighbor: 93.7%
      - trained on 200 instances, tested on the other 169
      - Also of interest:
      - Using only typical instances: 92.2% (storing only 23.1
     instances)
      - trained on 200 instances, tested on the other 169

