Team Leader and Functional Director for Big Data Network Support Dr Nathan Cunningham ran an introductory course on Big Data and Statistics at Essex University’s Big Data and Analytics Summer School in August. topics and provide an introduction to a variety of key concepts including: MapReduce, Data transformation (Apache Pig), Cascading india rcs data frameworks Hadoop MapReduce, Data Classification (Mahout), Statistical Analysis for Massive Data Sets (bigmemory, biganalytics big linear regression) and When to build and when to outsource. Find out more about what the Big Data Network Support team are working on at our Big Data Network Support area.
So Why R?
Although alternative tools exist, with Python being one example, R has the advantage of being the only open-source programming language that has been built specifically for statistical analysis. It contains its own inbuilt statistical algorithms – the sheer amount of mathematical models and machine learning algorithms available to users in base R and third-party packages is staggering and continues to grow—reaching almost 7,000 add-on libraries on the Comprehensive R Archive Network (CRAN) as of August 2015.
Beyond its analytical and machine learning capabilities, R also allows for interactive graphics through external packages, with support for Google’s visualisation API (which allows access to structured data and visualises that data using JavaScript within web pages as well as enabling the creation of gadgets) and javascript libraries such as D3.js.