7072CEM Assignment Help
MACHINE LEARNING Assignment help
During this module, you learned about different machine learning techniques, associated concepts and applications. We explored a number of classification algorithms, such as Generalized Logistic Regression, Linear Discriminant Analysis, Optimized K-nearest Neighbour, Bayesian and Statistical Methods, Support Vector Machines and Decision Trees. Also, we covered clustering algorithms, such as K-means, and feature selection and extraction methods, such as PCA. In this assignment, you will have to select an application related to a classification, clustering, or anomaly detection problem, and explore how best to apply machine learning algorithms to solve it. Basically, you are free to choose any of the following datasets (or another one agreed with your tutor in advance) and apply at least THREE classification or clustering techniques on it.
SOURCES OF DATASETS
- Bags of Words (Classification)
- Daily and Sports Activities Dataset (Classification and Clustering)
- Dresses Attribute Sales Dataset (Regression)
- Ensemble classifier for citation function classification (including citation importance classification as a special case of citation function classification) (provided by Dr. Xiaorui Jiang)
A wide range of datasets suitable for the CW are available here:
- Or other (choose as you wish, but match techniques with the dataset)
Must-Read Notice 1: The example datasets used in the labs are NOT allowed.
Must-Read Notice 2: Kaggle datasets are NOT allowed! You will be defaulted to ZERO.
You can combine and choose from the above-mentioned algorithms or you can use or come up with a new classification or clustering algorithm!
The purpose of this coursework is to
- Examine the fundamental concepts of machine learning, their implementation and application.
- Perform appropriate preparation of a dataset and evaluate the performance of different learning algorithms on this dataset.
- Gain practical experience in selecting machine learning algorithms for solving a real-life classification or clustering problem.
You will be required to:
- Work and write up Working in groups is NOT ALLOWED.
- Actively participate in all activities.
- Welcome to submit progress on your work regularly to get formative feedback and improve the final submission.
- Must-Read Notice: Before your start, READ the four samples at the bottom of the “Module Essentials >> Assessments” page, and use these samples to guide your own report writing. This is critically important for you to understand the elements and requirements of the CW.
Your final submission will be scientific outputs of two folds:
- A “scientific paper” of up to 6 A4 pages (written individually, not as a group), based on the experience and results gained during the project work.
- A viva video recording your introduction to the dataset preparation, data wrangling, model training and testing, demonstration of running the pipeline (especially producing prediction outputs from your model), and model evaluation etc.
You are encouraged to target a certain conference or journal and submit the proposed paper to it. Submission guidelines can be found on the conference or journal web page you choose to submit to.
List of reputed conferences and journals:
- IJCNN Conference
- NeurIPS Conference
- International Conference of Machine Learning
- Machine Learning Journal
- Neural Networks Journal
- Others (please let us know)
The paper should broadly include the following sections:
- Introduction (where you introduce the problem along a short literature review of related work; if the literature review is longer, it is recommended to be a section on its own, which would be better)
- Problem and Data set(s) description (where you describe in detail the problem you want to solve and its significance)
- Methods (where you shortly describe the machine learning methods and/or other methods employed to solve the problem)
- Experimental setup (including data pre-processing, feature selection and extraction, classification/clustering parameters)
- Discussion and Conclusions
These are generic section titles, which you may adapt appropriately to the application/problem that is being investigated. You may include sections describing modifications of algorithms or developments that are novel and specific to your work.
You will need to follow the formatting guidelines of the IEEE Manuscript Template for Conference Proceedings (A4)
You may include figures, tables, pseudo-code, and appendices with the actual code that has been developed. You are free to use any programming language you are comfortable with (e.g., Python, Matlab, R, etc.), but Python is the main language throughout this module.
More information of how to write a paper is available at the following link: “Crafting Papers on Machine Learning”, by Pat Langley (which can be found here if the previous link does not work
The project general guidelines and milestones:
Please note, the following guidelines are good practice and should lead to better result, but you have the freedom to pick whatever is suitable for your style:
- Working individually!
- Select a real-world classification/regression/clustering problem and one or more appropriate dataset(s) as suggested previoulsy. You may also use the following links, which have numerous problems and datasets:
- UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/;
- ICML 2019 accepted papers: https://icml.cc/Conferences/2019/Schedule?type=Poster;
- Kaggle competitions: http://www.kaggle.com/competitions;
- Stanford machine learning projects:
- You do NOT need to write a proposal about the dataset and machine learning problem.
- In the following weeks until the submission deadline you will select, implement and apply appropriate machine learning algorithms to the selected problem, performing data pre-processing, if needed, and record the results from the experiments.
- You will write up your final paper, and submit it by the deadline specified on the first page.