mysqlDBApply             package:RMySQL             R Documentation

_A_p_p_l_y _R/_S-_P_l_u_s _f_u_n_c_t_i_o_n_s _t_o _r_e_m_o_t_e _g_r_o_u_p_s _o_f _D_B_M_S _r_o_w_s (_e_x_p_e_r_i_m_e_n_t_a_l)

_D_e_s_c_r_i_p_t_i_o_n:

     Applies R/S-Plus functions to groups of remote DBMS rows without
     bringing an entire result set all at once.  The result set is
     expected to be sorted by the grouping field.

_U_s_a_g_e:

     mysqlDBApply(res, INDEX, FUN = stop("must specify FUN"), 
              begin = NULL, 
              group.begin =  NULL, 
              new.record = NULL, 
              end = NULL, 
              batchSize = 100, maxBatch = 1e6, 
              ..., simplify = TRUE)

_A_r_g_u_m_e_n_t_s:

     res: a result set (see 'dbSendQuery').

   INDEX: a character or integer specifying the field name or field
          number that defines the various groups.

     FUN: a function to be invoked upon identifying the last row from
          every group. This function will be passed a data frame
          holding the records of the current group,  a character string
          with the group label, plus any other arguments passed to
          'dbApply' as '"..."'.

   begin: a function of no arguments to be invoked just prior to 
          retrieve the first row from the result set.

     end: a function of no arguments to be invoked just after
          retrieving  the last row from the result set.

group.begin: a function of one argument (the group label) to be 
          invoked upon identifying a row from a new group

new.record: a function to be invoked as each individual record is
          fetched.  The first argument to this function is a one-row
          data.frame holding the new record.

batchSize: the default number of rows to bring from the remote  result
          set. If needed, this is automatically extended to hold groups
          bigger than 'batchSize'.

maxBatch: the absolute maximum of rows per group that may be extracted
          from the result set.

     ...: any additional arguments to be passed to 'FUN'.

simplify: Not yet implemented

_D_e_t_a_i_l_s:

     'dbApply'  This function is meant to handle somewhat gracefully(?)
     large amounts  of data from the DBMS by bringing into R manageable
     chunks (about  'batchSize' records at a time, but not more than
     'maxBatch');  the idea is that the data from individual groups can
     be handled by R, but not all the groups at the same time.  

     The MySQL implementation 'mysqlDBApply' allows us to register R 
     functions that get invoked when certain fetching events occur.
     These include the ``begin'' event (no records have been yet
     fetched), ``begin.group'' (the record just  fetched belongs to a
     new group), ``new record'' (every fetched record generates this
     event), ``group.end'' (the record just fetched was the last row of
     the current group), ``end'' (the very last record from the result
     set). Awk and perl programmers will find this paradigm very
     familiar (although SAP's ABAP language is closer to what we're
     doing).

_V_a_l_u_e:

     A list with as many elements as there were groups in the result
     set.

_N_o_t_e:

     This is an experimental version implemented only in R (there are
     plans, time permitting, to implement it in S-Plus).

     The terminology that we're using is closer to SQL than R.  In R
     what we're referring to ``groups'' are the individual levels of a
     factor (grouping field in our terminology).

_S_e_e _A_l_s_o:

     'MySQL', 'dbSendQuery', 'fetch'.

_E_x_a_m_p_l_e_s:

     ## Not run: 
     ## compute quanitiles for each network agent
     con <- dbConnect(MySQL(), group="vitalAnalysis")
     res <- dbSendQuery(con, 
                  "select Agent, ip_addr, DATA from pseudo_data order by Agent")
     out <- dbApply(res, INDEX = "Agent", 
             FUN = function(x, grp) quantile(x$DATA, names=FALSE))
     ## End(Not run)

