{"id":376,"date":"2013-03-11T19:01:34","date_gmt":"2013-03-11T19:01:34","guid":{"rendered":"http:\/\/blopig.com\/blog\/?p=376"},"modified":"2013-03-11T19:01:34","modified_gmt":"2013-03-11T19:01:34","slug":"arrrrgh-or-how-to-apply-a-fitted-model-to-new-data","status":"publish","type":"post","link":"https:\/\/www.blopig.com\/blog\/2013\/03\/arrrrgh-or-how-to-apply-a-fitted-model-to-new-data\/","title":{"rendered":"aRrrrgh! or how to apply a fitted model to new data"},"content":{"rendered":"<p>Recently I&#8217;ve been battling furiously with R while analysing some loop modelling accuracy data. The idea was simple:<\/p>\n<ol>\n<li><span style=\"line-height: 1.714285714;font-size: 1rem\">Fit a general linear model to some data<\/span><\/li>\n<li><span style=\"line-height: 1.714285714;font-size: 1rem\">Get out a formula to predict a variable (let&#8217;s call it &#8220;accuracy&#8221;) based on some input parameters<\/span><\/li>\n<li><span style=\"line-height: 1.714285714;font-size: 1rem\">Apply this formula to new data and see how well the predictor does<\/span><\/li>\n<\/ol>\n<p>It turns out, it&#8217;s not that simple to actually implement. Fitting a general linear model in R produces coefficients in a vector.<\/p>\n<pre class=\"lang:r decode:true\">model &lt;- glm(accuracy ~ param1 + param2 * param3, data=trainingset)\r\ncoef(model)<\/pre>\n<pre class=\"lang:r highlight:0 decode:true\">            (Intercept)                  param1                  param2 \r\n            0.435395087            -0.093295388             0.148154339 \r\n                 param3           param2:param3\r\n            0.024399530             0.021100300<\/pre>\n<p>There seems to be no easy way to insert these coefficients into your formula and apply the resulting equation to new data. The only easy thing to do is to plot the fitted values against the variable we&#8217;re trying to predict, i.e. plot our predictions on the training set itself:<\/p>\n<pre class=\"lang:r decode:true\">plot(model$fitted.values, trainingset$accuracy, xlab=\"score\", ylab=\"accuracy\", main=\"training set\")<\/pre>\n<p>I&#8217;m sure there must be a better way of doing this, but many hours of Googling led me nowhere. So here is how I did it.\u00a0I ended up writing my own parser function, which works only on very simple formulae using the + and * operators and without any R code inside the formula.<\/p>\n<pre class=\"lang:r decode:true\">coefapply &lt;- function(coefficients, row)\r\n{\r\n  result &lt;- 0\r\n  for (i in 1:length(coefficients))\r\n  {\r\n    subresult &lt;- as.numeric(coefficients[i])\r\n    if (!is.na(subresult))\r\n    {\r\n      name &lt;- names(coefficients[i])\r\n      if (name != \"(Intercept)\")\r\n      {\r\n        subnames &lt;- strsplit(name, \":\", fixed=TRUE)[[1]]\r\n        for (n in subnames)\r\n        {\r\n          subresult &lt;- subresult * as.numeric(row[n])\r\n        }\r\n      }\r\n      result &lt;- result + subresult\r\n    }\r\n  }\r\n  return(result)\r\n}\r\n\r\ncalculate_scores &lt;- function(data, coefficients)\r\n{\r\n  scores &lt;- vector(mode=\"numeric\", length=nrow(data))\r\n  for (i in 1:nrow(data))\r\n  {\r\n    row &lt;- data[i,]\r\n    scores[i] &lt;- coefapply(coefficients, row)\r\n  }\r\n  return(scores)\r\n}<\/pre>\n<p>Now we can apply our formula to a new dataset and plot the accuracy achieved on the new data:<\/p>\n<pre class=\"lang:r decode:true\">model_coef &lt;- coef(model)\r\n\r\n# Test if our scores are the same values as the model's fitted values\r\ntraining_scores &lt;- calculate_scores(model_coef, trainingset)\r\nsum((training_scores - model$fitted.values) &lt; 0.000000000001) \/ length(scores)\r\n\r\n# Calculate scores for our test set and plot them\r\ntest_scores &lt;- calculate_scores(model_coef, testset)\r\nplot(test_scores, testset$accuracy, xlab=\"score\", ylab=\"accuracy\", main=\"test set\")<\/pre>\n<p>It works for my purpose. Maybe one day someone will see this post, chuckle, and then enlighten me with their perfectly simple and elegant alternative.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I&#8217;ve been battling furiously with R while analysing some loop modelling accuracy data. The idea was simple: Fit a general linear model to some data Get out a formula to predict a variable (let&#8217;s call it &#8220;accuracy&#8221;) based on some input parameters Apply this formula to new data and see how well the predictor [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","wikipediapreview_detectlinks":true,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"ngg_post_thumbnail":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[14,15],"tags":[19,18],"ppma_author":[501],"class_list":["post-376","post","type-post","status-publish","format-standard","hentry","category-howto","category-technical","tag-programming","tag-r"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"authors":[{"term_id":501,"user_id":2,"is_guest":0,"slug":"seb","display_name":"Sebastian Kelm","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/7b3bce7bd485f4cfaa499e250df856f941b4c972b74d8a89f4244f0a2595d15a?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/376","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/comments?post=376"}],"version-history":[{"count":11,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/376\/revisions"}],"predecessor-version":[{"id":388,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/posts\/376\/revisions\/388"}],"wp:attachment":[{"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/media?parent=376"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/categories?post=376"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/tags?post=376"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.blopig.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=376"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}