Registration is now open for two new EdX-hosted on-line courses (MOOCS) to be taught by AMPLab’s Anthony Joseph and Ameet Talwalkar. Both feature AMPLab’s Spark system (now Apache Spark) and cover Big Data processing and Scalable ML respectively. Summaries and registration links are below.
MOOC led by Prof. Anthony Joseph, UC Berkeley
Start date: June 1, 2015
Organizations use their data for decision support and to build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term Data Science. This course will attempt to articulate the expected output of Data Scientists and then teach students how to use PySpark (part of Apache Spark) to deliver against these expectations. The course assignments include Log Mining, Textual Entity Recognition, Collaborative Filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.
MOOC led by Prof. Ameet Talwalkar, UCLA
Start date: June 29, 2015
This course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Apache Spark, a cluster computing system well-suited for large-scale machine learning tasks. You will implement scalable algorithms for fundamental statistical models (linear regression, logistic regression, matrix factorization, principal component analysis) while tackling key problems from various domains: online advertising, personalized recommendation, and cognitive neuroscience.