r - How can I efficiently generate a dataframe of simulated values? -


I am trying to create a data frame of counterfeit values ​​based on the current distribution parameters. My main data frame contains mean and standard deviation for each observation, such as:

  example.data & lt; - data.frame (country = c ("a", "b", "c"), score_men = c (0.5, 0.4, 0.6), score_sd = c (0.1, 0.1, 0.2)) # country score_manon score_sd # 1 0.5 0.1 # 2B 0.4 0.1 # 3C 0.6 0.2   

I usually use the score_mean and score_sd parameters to draw from normal distribution to sapply () And a custom function:

  Score simulation and lieutenant; - Function (Scoreman, Score, SD) {Return (Mean (Ronam (100, Median = Score.men, SD = Score, SD))}} Symmeted Soccer & lt; - sapply (example.data $ score_man, FUN = score.simulate, score.sd = example.data $ score_sd) # [1] 0.4936432 0.3753853 0.6267956   

A round of counterfeit values ​​( Or column). However, I want to generate a lot of columns (like 100 or 1,000). To do this I have to wrap the lapply () inside a normal function inside my sapply () function and then convert the resulting list to a data frame < Code> ldply () in plyr :

  results.list & lt; - lapply (1: 5, Fun = function (x) with sapply (example.data $ score_mean, FUN = score.simulate, score.sd = example.data $ score_sd)) Library (plyr) simulated.scores & lt; - as.data.frame (t (ldply (results.list))) # V1 V2 V3 V4 V5 # V1 0.5047807 0.4902808 0.4857900 0.5008957 0.4993375 # V2 0.3996402 0.4128029 0.3875678 0.4044486 0.3982045 # V3 0.6017469 0.6055446 0.6058766 0.5894703 0.5 9 60403 This work , But (1) it looks really complicated, especially  as.data.frame (t (ldply (lapply (... fun = function (x) sapply ...)))  approach, (2) it is actually slow when experiments have a large number of iterations or big figures in my actual datasets, 3,000 lines, and 1,000 running 1 to 2 minutes.  

There it is In an efficient way to create a data frame of duplicate values?

the way I can think Take advantage of the underlying vector deviation in the rnorm Both the mean and the sd arguments are vectored, however you can only have an integer for the number of draws You can provide if you supply a vector for the mean and sd arguments, then the r Will rotate through it until it completes the required number of draws. Therefore, simply create the logic of n to rnorm for a multi length of your mean vector number of copies for each row of your data Will be. The frame below has the function n .

I can not think of a factor using base :: rnorm on my own basis

worked example
  #example data df & lt; # Data that gives a matrix - the data. Frames (country = c ("a", "b", "c"), mean = c (1, 10, 100), sd = c (1, 2, 10)) , And takes the column vector as a logic for meaning and SD criteria; - function (n, mean, sd return (matrix (outside, ncol = n, byrow = FALSE)) #reproducible results (note order of magnitude of rows) and input sample data) set.seed (1) normv (5 , Df $ means, df $ sd) # [, 1] [, 2] [, 3] [, 4] [, 5] # [1,] 0.3735462 2.595281 1.487429 0.6946116 0.3787594 # [2,] 10.3672866 10.659016 11.476649 13.0235623 5.5706002 # [3,] 91.6437139 91.795316 105.757814 103.8984324 111.2493092    

Comments

Popular posts from this blog

Java - Error: no suitable method found for add(int, java.lang.String) -

java - JPA TypedQuery: Parameter value element did not match expected type -

c++ - static template member variable has internal linkage but is not defined -