7153CEM – M138CEM Assignment Help
Big Data Analytics and Data Visualisation Assignment help
Submission arrangement online via Aula:
File types and method of recording: WORD using the “Assignments” link in 7153CEM/ M138CEM
Mark and Feedback date (DD/MM/YY): 3 weeks after submission
Mark and Feedback method (e.g. in lecture, electronic via Aula): electronic via Aula
Module Learning Outcomes Assessed:
3. Demonstrate sound knowledge of different data analytical techniques for different structured
and unstructured big data sets to support decision-making.
4. Critically identify and select appropriate analytical technique for big data analysis using
examples from case studies
5. Critically evaluate and apply appropriate methods that are suitable for visualising big data
Task:
1. Select a dataset of your choice from Kaggle. The dataset should be suitable for Big Data analytics
(Please see description below).
2. Use PySpark (exclusively) to analyze the dataset. You should perform at least one of the following
data analysis tasks (regression, clustering, classification, etc). You have to explain your choice of the
techniques used.
3. Use Tableau (exclusively) to explore your dataset and/or to show the results of your analysis. All
figures must be created using Tableau
4. Critically analyze your findings: the results and the methods used.
Procedure:
You have to write a project proposal (maximum of 1 A4 page), giving the title of the project, a
brief description of the problem and the tasks you plan to apply, and the dataset you are using
This document is for Coventry University students for their own use in completing their
assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
[email protected]
(its name and a direct link to it, clear and detailed description of how you plan to use it), and a
brief work plan. You have to submit the proposal by 7 October 2022 at 18.00 via 7153CEM/
M138CEM project proposal link. It is your responsibility to make sure the dataset is suitable for
the CW according to the description, as this is part of the grade of the CW
No two students can work on the same dataset. Once you have submitted your project
proposal you have to IMMEDIATELY state, under the assigned post on Aula, that you have
chosen that dataset (direct link to it) so that no other student can work on it. Once you choose
a dataset, the choice is final (no resubmission of the project proposal is allowed), so you have
to choose carefully.
The dataset should be freely accessed (no registration is required). If there are several files in
the link, you have to clearly specify which one you plan to work on.
Before you choose a dataset, you have to check the assigned post on Aula to make sure that no
other student has chosen this dataset before you. It is your full responsibility to make sure no
other student has chosen it. If it turns out two or more students chose the same dataset, the
only project proposal that will be considered is the one that appears earlier on the
corresponding post on Aula.
The only place where you can state which dataset you have selected is the corresponding post
on Aula, and this is the only place students need to check to see if the dataset has already been
chosen by another student
If you haven’t submitted a project proposal by the project proposal deadline, or you have
submitted one but you changed your mind later and you wanted to work on another dataset,
you can still do that, but in this case you will get a zero on the project proposal. You also have
to follow the same procedures as above (check that dataset hasn’t been chosen before, and
then state your choice under the corresponding Aula post). You also have to explain, at the very
beginning of you CW, why you have changed the dataset (in the case you have submitted a
project proposal and changed your mind later)
Because your colleagues only have access to the Aula post to know which dataset you have
chosen, you get a zero on the project proposal if the dataset in the proposal doesn’t match the
one on the Aula post
Your final CW submission will include a report (up to 3000 words – strict limit) where you
present your work.
Clarifications:
o You can use any operating system that you prefer to install your program.
o Coding the task you are performing yourself is a plus.
This document is for Coventry University students for their own use in completing their
assessed work for this module and should not be passed to third parties or posted on any
website. Any infringements of this rule should be reported to
[email protected]
o Given the nature of this module and the task, you should document everything you do.
o Everything you do should be reproducible: The link to the dataset should be provided (direct
link to the dataset itself not the site where it is hosted). The code used, in its totality (as text NOT
as file), should be included in the appendix using the right tool to include code in a WORD file
(WORD code syntax highlighter, for example). If you use a code that is not yours, whether totally
or partially, this should be very clearly indicated.
o You should provide in the appendix clear evidence, using screen captures, that you installed the
software and ran every part of the experiments. The screen captures should also clearly show
the device (i.e. user ID) on which the experiments were conducted/the software was installed
o Except for the dataset, NO LINKS of any kind to your work are allowed. Everything should be
included in the report itself (the body or the appendix)
o Plagiarism and collusion are taken extremely seriously. Any part, from any source, of any type,
in any language, should be COMPLETELY AND CLEARLY citated. If you use a figure/table/image
that is not yours, this should be indicated in the caption.
o The use of any online platform (AWS, Google Colab, Databricks, etc) is not recommended or
needed. However, if you need to do that you have two options ONLY:
Either, you install all the software required for the CW on your device, CLEARY
showing your name as the user, with all the required libraries and modules required
to do your CW to show that you can install and configure all the software required
for the CW. You may after that, if you choose, install all that again on an online
platform and your CW on that platform
Or, you use a platform that shows CLEALY, without any ambiguity, that this is your
personal account and your name clearly appears on screen captures you include on
your CW, showing all that.
Report Structure:
Your report should typically have:
o A title.
o An introduction in which you briefly describe your project, the dataset you are working on, the
data analysis task(s) you are performing and the software you are using.
o A background /related work/data analysis section, focusing on what is related to your project
o An implementation part, in which you introduce the dataset you are applying your program
to/the data analysis task you are performing, the program you are using – with description and
figures showing how it is installed, configured and how it works.
o Experimental section in which you give full description of the experimental protocol, how you
conducted the experiments, and the results
o A discussion of your findings.
o A conclusion.
o References.
o Appendix (not included in the word count)
Leave A Comment