7089CEM Assignment Help
ย Introduction to Statistical Methods for Data Science Assignment help
Data:
The โsimulatedโ EEG time-series data and the sound signal are provided in the two separate Excel files. The X.csv file contains the EEG signals ๐ฑ๐ and ๐ฑ๐ that were measured from the prefrontal and auditory cortices respectively; and the y.csv file contains the sound signal ๐ฒ (i.e. the voice of the mediation guide). The file time.csv contains the sampling time of all three signals in seconds. There are 2 minutes of signal in total collected with sampling frequency of 20 Hz. All signals are subject to additive noise (assuming independent and identically distributed (โi.i.dโ) Gaussian with zero-mean) with unknown variance due to distortions during recording.
Task 1: Preliminary data analysis
You should first perform an initial exploratory data analysis, by investigating:
โข Time series plots (of audio and EEG signals)
โข Distribution for each signal
โข Correlation and scatter plots (between the audio and brain signals) to examine their dependencies
ย Regression โ modelling the relationship between audio and EEG signals
We would like to determine a suitable mathematical model in explaining the relationship between the audio signal ๐ฒ and the two brain signals ๐ฑ๐ and ๐ฑ๐, assuming such a relationship can be described by a polynomial regression model. Below are 5 candidate nonlinear polynomial regression models, and only one of them can โtrulyโ describe such a relationship. The objective is to identify this โtrueโ model from those candidate models following Tasks 2.1 โ
Candidate models are with the following structures:
Model 1: y = ฮธ1×13+ฮธ2×25+ฮธbias+ฮต
Model 2: y = ฮธ1๐ฅ14+ฮธ2๐ฅ22+ฮธbias+ ฮต
Model 3: y = ฮธ1๐ฅ13+ฮธ2๐ฅ2+ฮธ3๐ฅ1+ฮธbias+ฮต
Model 4: y = ฮธ1×1+ฮธ2×12+ฮธ3×13+ฮธ4×23+ฮธbias+ฮต
Model 5: y = ฮธ1×13+ฮธ2×14+ฮธ3×2 +ฮธbias+ฮต
Estimate model parameters ๐ฝ={๐1,๐2,โฏ,๐๐๐๐๐ }๐ for every candidate model using Least Squares (๐ฝฬ=(๐๐๐)โ1๐๐๐ฒ), using the provided input and output datasets (use all the data for training).
Based on the estimated model parameters, compute the model residual (error) sum of squared errors (RSS), for every candidate model. ๐
๐๐=ฮฃ(๐ฆ๐โ๐ฑ๐๐ฝฬ)2๐๐=1
Here ๐ฑ๐ denotes the ๐๐กโ row (๐๐กโ data sample) in the input data matrix ๐, ๐ฝฬ is a column vector.
Compute the log-likelihood function for every candidate model: ln๐(๐ท|๐ฝฬ)=โ๐2ln(2๐)โ๐2ln(๐ฬ2)โ12๐ฬ2RSS
Leave A Comment