7153CEM – M138CEM Big Data Analytics and Data Visualisation
- Select a dataset of your choice from The dataset should be suitable for Big Data analytics (Please see description below).
- Use PySpark (exclusively) to analyze the You should perform at least one of the following data analysis tasks (regression, clustering, classification, etc). You have to explain your choice of the techniques used.
- Use Tableau (exclusively) to explore your dataset and/or to show the results of your All
figures must be created using Tableau
- Critically analyze your findings: the results and the methods
- You have to write a project proposal (maximum of 1 A4 page), giving the title of the project, a brief description of the problem and the tasks you plan to apply, and the dataset you are using
(its name and a direct link to it, clear and detailed description of how you plan to use it), and a brief work plan. You have to submit the proposal by 7 October 2022 at 18.00 via 7153CEM/ M138CEM project proposal link. It is your responsibility to make sure the dataset is suitable for the CW according to the description, as this is part of the grade of the CW
- No two students can work on the same dataset. Once you have submitted your project proposal you have to IMMEDIATELY state, under the assigned post on Aula, that you have chosen that dataset (direct link to it) so that no other student can work on Once you choose a dataset, the choice is final (no resubmission of the project proposal is allowed), so you have to choose carefully.
- The dataset should be freely accessed (no registration is required). If there are several files in the link, you have to clearly specify which one you plan to work
- Before you choose a dataset, you have to check the assigned post on Aula to make sure that no other student has chosen this dataset before you. It is your full responsibility to make sure no other student has chosen it. If it turns out two or more students chose the same dataset, the only project proposal that will be considered is the one that appears earlier on the corresponding post on
- The only place where you can state which dataset you have selected is the corresponding post on Aula, and this is the only place students need to check to see if the dataset has already been chosen by another student
- If you haven’t submitted a project proposal by the project proposal deadline, or you have submitted one but you changed your mind later and you wanted to work on another dataset, you can still do that, but in this case you will get a zero on the project proposal. You also have to follow the same procedures as above (check that dataset hasn’t been chosen before, and then state your choice under the corresponding Aula post). You also have to explain, at the very beginning of you CW, why you have changed the dataset (in the case you have submitted a project proposal and changed your mind later)
- Because your colleagues only have access to the Aula post to know which dataset you have chosen, you get a zero on the project proposal if the dataset in the proposal doesn’t match the one on the Aula post
- Your final CW submission will include a report (up to 3000 words – strict limit) where you present your