Sunday 27 January 2013

Data science updates

I have successfully completed the "Computing for Data Analysis" course in Coursera! Analyzing data sets has never been this fun! The R plotting libraries are pretty cool. I especially like the different data visualization functions. I also have a soft spot for the lapply, sapply, tapply functions- they remind me of the foldl and foldr functions in Haskell.

It is time I got a hold of some huge datasets (which can't be loaded into memory) and try to work with those in R. To this end, I recently requested for the newly released "Click dataset" by Indiana University, which is about 2.5 TB (when compressed) of data. Unfortunately, they denied my request as I have to be associated with a research institution due to the "sensitive nature" of the requested data. I do empathize with them. This is not a problem though- there are plenty of large datasets out there.

I have also enrolled in the "Data analysis" course in Coursera as a follow up. Here are some of my short term goals with respect to data analysis in R :
  1. Experiment with the machine learning libraries in R
  2. Participate in a Kaggle competition
  3. Perform object oriented programming in R
  4. Visualize huge datasets
  5. Take more data analysis courses in Coursera

1 comment:

  1. That's pretty cool! I am going through the Data Science specialization of coursera at the moment myself. R is definitely a really powerful tool.

    ReplyDelete