ITC516 Data Mining and Visualisation for Business Intelligence – Assessment 2 – Weka and Written Exercise
Task 1: Weka data exploration [3 marks]
Load a dataset by clicking the Open file button in the top left corner of the panel. Inside the data folder, which is supplied when Weka is installed, you will find a file named weather.nominal.arff
As shown in the Weka interface, the weather data has 14 instances, and 5 attributes called outlook, temperature, humidity, windy, and play. Click on the name of an attribute in the left subpanel to see information about the selected attribute on the right, such as its values and other details. This information is also shown in the form of a histogram. All attributes in this dataset are “nominal” — that is, they have a predefined finite set of values. The last attribute, play, is the “class” attribute; its value can be yes or no. Answer the following:
(a) What are the values that the temperature and windy attributes have? [1 mark]
(b) What is the class value of instance number 6 in the weather data? [1 mark ]
(c) Load the weather.numeric.arff dataset and open it in the editor by clicking the Edit button from the row of buttons at the top of the Preprocess panel in Weka Interface and answer the following question. How many numeric and how many nominal attributes does this dataset have? Also, write the statistical information of the numerical attributes. [1 mark]
Task 2: Working with a new data file in Weka [4 marks]
1) Consider the Marks data set (Marks.txt is available at the student resources of your subject interact2 site) which represents the assessment results of 40 students in a subject consisting of four assignments and final exam.
a) Create an ARFF file by using a text editor for this dataset and open the ARFF file in Weka. [1 mark]
b) Apply the unsupervised Discretize filter to the Assignment-4 marks. Put a screenshot of the filter output in your assignment and make some remarks on the data. [1 mark]
c) Practice filling in the missing values for all columns in the Viewer window in Weka both manually and by using filters. Put a screenshot of the filter outputs in your assignment and make comments on what values are suggested by WEKA for the missing values? [2 marks]
Task 3: Visualization and Analysis [4 marks]
You will also do some tasks with the Weka software for data visualization and analysis. This task will build the practical and technical skills that will enable you to compare and evaluate output patterns for visualization.
Load the iris.arff dataset in Weka and answer the following questions.
(a) What is the range of possible values for each of the 4 attributes that can be observed in the dataset? [2 marks]
(b) Present a scatter plot visualization of this dataset and find which two classes have a more overlapping tendency and which one is likely to be a separate class as observed in the attribute-pair based plotting. Alternatively, you may use the 3D visualization feature provided in Weka to find which two classes have more overlapping tendencies and which one is likely to be a separate class using different combinations of any three featuring attributes out of four attributes in the dataset. [2 marks]
Task 4: Written Exercise [4 marks]
Topic: Security, Privacy and Ethics in Data Mining.
In this task, you are required to read the journal articles provided below and write a short discussion paper based on the topic of security, privacy and ethics in data mining. You must:
- identify the major security, privacy and ethical implications in data mining;
- evaluate how significant these implications are for the business sector; and
- support your response with appropriate examples and references.
The recommended word length for this task is 700 to 1000 words.