 # 7089CEM: Introduction to Statistical Methods for Data Science

## Introduction to Statistical Methods for Data Science Assignment help

Data:
The ‘simulated’ EEG time-series data and the sound signal are provided in the two separate Excel files. The X.csv file contains the EEG signals 𝐱𝟏 and 𝐱𝟐 that were measured from the prefrontal and auditory cortices respectively; and the y.csv file contains the sound signal 𝐲 (i.e. the voice of the mediation guide). The file time.csv contains the sampling time of all three signals in seconds. There are 2 minutes of signal in total collected with sampling frequency of 20 Hz. All signals are subject to additive noise (assuming independent and identically distributed (“i.i.d”) Gaussian with zero-mean) with unknown variance due to distortions during recording.

You should first perform an initial exploratory data analysis, by investigating:
• Time series plots (of audio and EEG signals)
• Distribution for each signal
• Correlation and scatter plots (between the audio and brain signals) to examine their dependencies

Regression – modelling the relationship between audio and EEG signals
We would like to determine a suitable mathematical model in explaining the relationship between the audio signal 𝐲 and the two brain signals 𝐱𝟏 and 𝐱𝟐, assuming such a relationship can be described by a polynomial regression model. Below are 5 candidate nonlinear polynomial regression models, and only one of them can ‘truly’ describe such a relationship. The objective is to identify this ‘true’ model from those candidate models following Tasks 2.1 –

Candidate models are with the following structures:
Model 1: y = θ1×13+θ2×25+θbias+ε
Model 2: y = θ1𝑥14+θ2𝑥22+θbias+ ε
Model 3: y = θ1𝑥13+θ2𝑥2+θ3𝑥1+θbias+ε
Model 4: y = θ1×1+θ2×12+θ3×13+θ4×23+θbias+ε
Model 5: y = θ1×13+θ2×14+θ3×2 +θbias+ε

Estimate model parameters 𝜽={𝜃1,𝜃2,⋯,𝜃𝑏𝑖𝑎𝑠}𝑇 for every candidate model using Least Squares (𝜽̂=(𝐗𝑇𝐗)−1𝐗𝑇𝐲), using the provided input and output datasets (use all the data for training).

Based on the estimated model parameters, compute the model residual (error) sum of squared errors (RSS), for every candidate model. 𝑅𝑆𝑆=Σ(𝑦𝑖−𝐱𝑖𝜽̂)2𝑛𝑖=1
Here 𝐱𝑖 denotes the 𝑖𝑡ℎ row (𝑖𝑡ℎ data sample) in the input data matrix 𝐗, 𝜽̂ is a column vector.

Compute the log-likelihood function for every candidate model: ln𝑝(𝐷|𝜽̂)=−𝑛2ln(2𝜋)−𝑛2ln(𝜎̂2)−12𝜎̂2RSS

By |2023-02-12T07:24:49+00:00February 12th, 2023|Categories: Database assignment help||0 Comments 