Quantitative Data Analysis

Module Code: DS7006

Dead line: 11th January 2016

Project Outline:

Individual Data Analysis Project Submit: 11th January 2016 You will be provided with a data set as an SQLite database for you to analyse using R. The data set should be assessed for reliability, explored, hypotheses raised and tested. The report should provide the reader with a clear understanding of:

• which variables you chose to use,

• which techniques you used and in which order,

• why you chose to apply each of these techniques,

• what outcome resulted at each stage of the analysis and what it means.

State clearly your hypotheses that you develop through the data exploration. Use tables and visualisations as appropriate to present your analysis. Show the key elements of your SQL and R scripts, organised in an appendix.

Where texts and other background reading are cited, a list of references should be provided using Harvard style. Your report should tell the ‘story’ of your analysis project rather than just being a ‘catalogue’ of things done to data. Both assignments must be submitted through Moodle before midnight on the due date.

PLEASE NOTE: you must select your choice of data for Region: London, South East and East of England.

Set of SQLlite data attached in separate *sql format.

DS7006 – Quantitative Data Analysis

What makes a good individual data analysis project?

This should not be more than 4,000 words and although it should contain graphs and tables (even maps), these should be carefully constructed and there for the express purpose of illustrating what is discussed in the text. So, for example, giving Q-Q plots for 20 variables is unnecessary – just a few to illustrate typical characteristics found would do. Marks will be knocked off for gratuitous use of graphics and tables to bulk up the assignment to make it look BIG!

Set the scene: Begin by briefly introducing the topic you are analysing. Are there any relevant theories about the phenomenon you are looking at….is there a key literature? What are the objectives of your analysis, what is the story you want to tell through data analysis? “In this project, I am going to show through data analysis that…..”.

Data acquisition: You have been supplied with a SQLite database containing a range of variables for districts in England. Some data are from the 2001 census, others are about 2003-2005. There is an Excel spreadsheet indicating what each field is. Map SHP files are also provided. All the tables can be joined using the same primary key. You should choose a range of variables that might be analytically interesting and through one or more SQL queries join them together and export as a CSV file ready to use in R. Choose which are your dependant and independent variables. For count data you will need to normalise by using an appropriate base population (e.g. per thousand population; per thousand pensioners).

Data exploration: this includes univariate, bivariate and multi-variate. This is to understand and check the veracity of individual variables and searching for possible relationships between the dependent and independent variables. As a result of this were any corrections made to the data? Using neat tables, boxplots, scatter graphs etc., what did you find out about central tendency, spread, outliers, missing values, correlations etc.? Does this raise any new hypotheses you could test?

(Factor analysis): are you dealing with many variables…..should you carry out a factor analysis on the independent variables to find out the key dimensions within your data. What did you find?

(Classification): are you dealing with many cases that if hierarchically classified might show something interesting and new? How many groups did you choose and what distinguishes each group from the others?

Hypothesis testing: Clearly state your null hypotheses. Make sure you choose the right test. Are the data you are using in the test paired or independent? If using a parametric test, are your data normally distributed?…show the test of normality. What confidence interval are you using to accept or reject your null hypothesis (if not 95% (0.05) than give a reason)? What is the outcome of the test and what is your interpretation of what this means?

(Regression): is it appropriate to build a multiple regression model to show how all the independent variables work together to predict a target dependent variable? What is the R2 and the significance of the regression model?

Conclusions: what conclusions can you draw from the data analysis, what are main findings…what are the strengths and weaknesses of what you have done?....are there any implications for future analysis?

References: list of key references using Harvard.

Appendices: your SQL commands and R scripts.

Module Code: DS7006

Dead line: 11th January 2016

Project Outline:

Individual Data Analysis Project Submit: 11th January 2016 You will be provided with a data set as an SQLite database for you to analyse using R. The data set should be assessed for reliability, explored, hypotheses raised and tested. The report should provide the reader with a clear understanding of:

• which variables you chose to use,

• which techniques you used and in which order,

• why you chose to apply each of these techniques,

• what outcome resulted at each stage of the analysis and what it means.

State clearly your hypotheses that you develop through the data exploration. Use tables and visualisations as appropriate to present your analysis. Show the key elements of your SQL and R scripts, organised in an appendix.

Where texts and other background reading are cited, a list of references should be provided using Harvard style. Your report should tell the ‘story’ of your analysis project rather than just being a ‘catalogue’ of things done to data. Both assignments must be submitted through Moodle before midnight on the due date.

PLEASE NOTE: you must select your choice of data for Region: London, South East and East of England.

Set of SQLlite data attached in separate *sql format.

DS7006 – Quantitative Data Analysis

What makes a good individual data analysis project?

This should not be more than 4,000 words and although it should contain graphs and tables (even maps), these should be carefully constructed and there for the express purpose of illustrating what is discussed in the text. So, for example, giving Q-Q plots for 20 variables is unnecessary – just a few to illustrate typical characteristics found would do. Marks will be knocked off for gratuitous use of graphics and tables to bulk up the assignment to make it look BIG!

Set the scene: Begin by briefly introducing the topic you are analysing. Are there any relevant theories about the phenomenon you are looking at….is there a key literature? What are the objectives of your analysis, what is the story you want to tell through data analysis? “In this project, I am going to show through data analysis that…..”.

Data acquisition: You have been supplied with a SQLite database containing a range of variables for districts in England. Some data are from the 2001 census, others are about 2003-2005. There is an Excel spreadsheet indicating what each field is. Map SHP files are also provided. All the tables can be joined using the same primary key. You should choose a range of variables that might be analytically interesting and through one or more SQL queries join them together and export as a CSV file ready to use in R. Choose which are your dependant and independent variables. For count data you will need to normalise by using an appropriate base population (e.g. per thousand population; per thousand pensioners).

Data exploration: this includes univariate, bivariate and multi-variate. This is to understand and check the veracity of individual variables and searching for possible relationships between the dependent and independent variables. As a result of this were any corrections made to the data? Using neat tables, boxplots, scatter graphs etc., what did you find out about central tendency, spread, outliers, missing values, correlations etc.? Does this raise any new hypotheses you could test?

(Factor analysis): are you dealing with many variables…..should you carry out a factor analysis on the independent variables to find out the key dimensions within your data. What did you find?

(Classification): are you dealing with many cases that if hierarchically classified might show something interesting and new? How many groups did you choose and what distinguishes each group from the others?

Hypothesis testing: Clearly state your null hypotheses. Make sure you choose the right test. Are the data you are using in the test paired or independent? If using a parametric test, are your data normally distributed?…show the test of normality. What confidence interval are you using to accept or reject your null hypothesis (if not 95% (0.05) than give a reason)? What is the outcome of the test and what is your interpretation of what this means?

(Regression): is it appropriate to build a multiple regression model to show how all the independent variables work together to predict a target dependent variable? What is the R2 and the significance of the regression model?

Conclusions: what conclusions can you draw from the data analysis, what are main findings…what are the strengths and weaknesses of what you have done?....are there any implications for future analysis?

References: list of key references using Harvard.

Appendices: your SQL commands and R scripts.

Assessment 3ASSESSMENTThis module is assessed through a portfolio submission which comprises 70% of individual coursework and 30% of a group presentation. The group members are expected to work together...powerpointwill need discussion to startENGINEERING DESIGN FOR INNOVATIONASSESSMENT BRIEF | 1CWK100This is an assignment for students on the following postgraduate programmes:• MSc Engineering Smart Systems•...2.2.2 Formative 1Week 7 Find a gap in practice for evidence translation - Formative Assessment 1Introduction to the Formative assessment Upon completing this lesson, you should be able to: • Find a knowledge...Global Business Environment Deadline: 29th October 2021Assessment GuidelinesWord Count: 500 (+-10%)Scenario:You are to select a UK company and conduct a PESTEL analysis on the firms operating environment...Assessment: Individual portfolio up to 4000 words or equivalent This assignment is to develop either a brand brief for a new brand OR a brand audit and recommendations for a failing brand. The portfolio...Assessment: Individual Coursework Assessment methods which enable students to demonstrate the learning outcomes for the module: Weighting: Learning Outcomes demonstrated Individual Coursework (3,000 words...Dissertation (All chapters) - Indicative table of contents provided in the brief.Topic - ESG investing and portfolio perfomancePlease refer to the brief attached for more information.Total word count -...**Show All Questions**