COMPX523-22A (HAM)

Data Stream Mining

15 Points

Edit Header Content
Division of Health Engineering Computing & Science
School of Computing and Mathematical Sciences
Department of Computer Science

Staff

Edit Staff Content

Convenor(s)

Lecturer(s)

Administrator(s)

: maria.admiraal@waikato.ac.nz

Placement/WIL Coordinator(s)

Tutor(s)

Student Representative(s)

Lab Technician(s)

Librarian(s)

: alistair.lamb@waikato.ac.nz

You can contact staff by:

  • Calling +64 7 838 4466 select option 1, then enter the extension.
  • Extensions starting with 4, 5, 9 or 3 can also be direct dialled:
    • For extensions starting with 4: dial +64 7 838 extension.
    • For extensions starting with 5: dial +64 7 858 extension.
    • For extensions starting with 9: dial +64 7 837 extension.
    • For extensions starting with 3: dial +64 7 2620 + the last 3 digits of the extension e.g. 3123 = +64 7 262 0123.
Edit Staff Content

Paper Description

Edit Paper Description Content

Machine learning algorithms leverage data to mimic intelligent behaviour for specific tasks, varying from detecting anomalies in industrial processes to movie recommendations. Recent technological advances enabled efficient transfer, storage, and process of data. These advances also impacted machine learning algorithms by allowing an ever-growing increase in complexity, which sometimes means increased accuracy.
However, to effectively learn from fast data, it is also essential to account for changes that may have catastrophic effects on the machine learning algorithms. For example, an algorithm may signal legit credit card usage as a fraud because the users' behaviour has changed.
The reason behind these changes may not be accessible to the algorithm, making it unable to react to them immediately. For example, a different online credit card usage pattern may be because customers cannot leave their homes.
This paper combines theoretical and practical aspects of machine learning for data streams. We present examples using the methods implemented in the Massive Online Analysis (MOA) framework and river (Python) framework.

The learning outcomes for this paper are linked to Washington Accord graduate attributes WA1-WA11. Explanation of the graduate attributes can be found at: https://www.ieagreements.org/

Edit Paper Description Content

Paper Structure

Edit Paper Structure Content
Class attendance is expected. The course notes provided are not comprehensive, additional material will be covered in class. You are responsible for all material covered in class.
Edit Paper Structure Content

Learning Outcomes

Edit Learning Outcomes Content

Students who successfully complete the paper should be able to:

  • Select and apply appropriate algorithms for data stream mining problems (WA2, WA3, WA4)
    Linked to the following assessments:
  • Design and implement new algorithms in a data stream mining framework like MOA, river, or similar (WA1, WA3, WA5)
    Linked to the following assessments:
  • Compare and evaluate different algorithms/solutions for a problem and summarize their findings in a report (WA4)
    Linked to the following assessments:
Edit Learning Outcomes Content
Edit Learning Outcomes Content

Assessment

Edit Assessments Content

"Samples of your work may be required as part of the Engineering New Zealand accreditation process for BE(Hons) degrees. Any samples taken will have the student name and ID redacted. If you do not want samples of your work collected then please email the engineering administrator, Natalie Shaw (natalie.shaw@waikato.ac.nz), to opt out."

Edit Additional Assessment Information Content

Assessment Components

Edit Assessments Content

The internal assessment/exam ratio (as stated in the University Calendar) is 100:0. There is no final exam. The final exam makes up 0% of the overall mark.

The internal assessment/exam ratio (as stated in the University Calendar) is 100:0 or 0:0, whichever is more favourable for the student. The final exam makes up either 0% or 0% of the overall mark.

Component DescriptionDue Date TimePercentage of overall markSubmission MethodCompulsory
1. Assignment 1 - coding and experimentation
8 Apr 2022
11:30 PM
20
  • Online: Submit through Moodle
2. Assignment 2 - coding and experimentation
6 May 2022
11:30 PM
20
  • Online: Submit through Moodle
3. Assignment 3 - implementation
27 May 2022
11:30 PM
30
  • Online: Submit through Moodle
4. Assignment 4 - report
10 Jun 2022
11:30 PM
30
  • Online: Submit through Moodle
Assessment Total:     100    
Failing to complete a compulsory assessment component of a paper will result in an IC grade
Edit Assessments Content

Required and Recommended Readings

Edit Required Readings Content

Required Readings

Edit Required Readings Content

https://mitpress.mit.edu/books/machine-learning-data-streams

There is an official online version available at https://moa.cms.waikato.ac.nz/book/

Edit Required Readings Content

Recommended Readings

Edit Recommended Readings Content
Gama, Joao. Knowledge discovery from data streams. CRC Press, 2010.
Edit Recommended Readings Content

Online Support

Edit Online Support Content
Moodle is used for this paper.
Edit Online Support Content

Workload

Edit Workload Content
About 150 hours in total, including lecture, reading time, and assignments.
Edit Workload Content

Linkages to Other Papers

Edit Linkages Content
Three 300 level Computer Science papers, including COMP321 Practical Data Mining or COMP316 Artificial Intelligence Techniques and Applications Corresponding Papers COMP523 Data Stream Mining
Edit Linkages Content

Prerequisite(s)

Prerequisite papers: COMPX305 or COMPX310 or COMP316 or COMP321 and a further 30 points at 300 level in Computer Science

Corequisite(s)

Equivalent(s)

Restriction(s)

Restricted papers: COMP423, COMP523

Edit Linkages Content