Course Information:
- IDS 561 - Analytics for Big Data
- Spring 2022
- Wed. 6pm - 8:[email protected] Center F #4
- yuhenghu at uic dot edu
- Office hour: Email
Overview
The “big data” paradigm has drawn a significant amount of attention in recent years as costs of acquiring and storing data have plummeted. Instead, bottlenecks have been shifted to fast and in-depth analysis. However, this shift has created its own set of problems, the most obvious one is that large datasets are often computationally expensive to process. Algorithms that are efficiently capable of processing data that fit in memory may become prohibitively expensive to use on larger datasets. Consequently, it can be difficult to gain insights from the underlying data.
This course is an introductory course for big data analytics and data science. It has three main goals. First, it is intended to provide the student with an appreciation for the issues involved in doing data science to work on datasets that do not fit in main memory. Second, it is intended to provide a working knowledge of and experience with some of the current distributed frameworks (e.g. Hadoop). Third, the course is intended to provide students with hands-on opportunities to implement solutions using real-world datasets.
Academic Integrity
You are expected to adhere to the highest standards of academic honesty. Unless otherwise specified, collaboration on assignments is not allowed. Use of published materials is allowed, but the sources should be explicitly stated in your solutions. Violations will be reviewed and sanctioned according to the University Policy on Academic Integrity. Collaborations among team members are only allowed for the final term projects that are selected. "Academic integrity is the pursuit of scholarly activity free from fraud and deception and is an educational objective of this institution. Academic dishonesty includes, but is not limited to, cheating, plagiarizing, fabricating of information or citations, facilitating acts of academic dishonesty by others, having unauthorized possession of examinations, submitting work for another person or work previously used without informing the instructor, or tampering with the academic work of other students." For more information about violations of academic integrity and their consequences, consult http://vcsa.uic.edu/
Prerequisites
IDS 400/401, and IDS 572
Recommended textbooks
In-class survey
- about [email protected] class: click here
Weekly Schedule
Classs Date | Topic | Assignment | Note |
Jan 12 | Introduction of Big data analytics | HW #0 Environment setup and testing | |
Jan 19 | Big data with MapReduce I | ||
Jan 26 | Big data with MapReduce II | HW #1 MapReduce using Python | Lab #1 on Jan 27th |
Feb 2 | Big data with Spark I | Pop-up quiz 1 | |
Feb 9 | Big data with Spark II | ||
Feb 16 | Real-time Streaming Computing Models | HW #2 Programming with Spark | HW #1 due, Lab #2 on Feb 17 |
Feb 23 | Query Processing on Big Data Platforms | Project proposal due | |
Mar 2 | Data Frame on Spark | Pop-up quiz 2 | |
Mar 9 | Distributed Data Store I: CAP theorem | HW #3 Data Processing using Spark | HW #2 due, Lab #3 on Ma 10 |
Mar 16 | Distributed Data Store II: NoSQL and Cloud Computing | Mid-term project progress report due | |
Mar 23 | No Class Spring Break | ||
Mar 30 | Machine Learning for Big Data: Clustering | Pop-up quiz 3, HW #3 due, extended to Apr 3rd | |
Apr 6 | Machine Learning for Big Data: Recommender Systems | HW #4 Recommender system in Spark | Lab #4 on Apr 7 |
Apr 13 | Machine Learning for Big Data: Graph analysis + Frequent Pattern Mining | ||
Apr 20 | Machine Learning for Big Data: Text Processing | HW #4 due, Pop-up quiz 4 | |
Apr 27 | Final Presentation |