This project is designed to give you a flavor of how quantitative research is conducted in the real world, and how some of the econometric techniques we discussed in class are applied. There are several statistical packages out there, and you are required to conduct this project using R/RStudio.
This project will involve the following tasks:
1. Find data on the CDC website, find out what variables are available, and how the variables are defined.
2. Formulate an empirical model with six or more variables. What you want to investigate with the model must make sense.
3. Download at least two datasets in SAS transport (XPT) files (the variables you choose must come from two or more datasets).
4. Load the SAS transport data files into RStudio.
5. Merge the datasets into one.
6. Clean up the data.
7. Summarize descriptive statistics of the variables of interest.
8. Run a regression or regressions to estimate the coefficients.
9. Estimate the coefficients, and conduct hypothesis tests.
10. Discuss the findings of the results and any problems with the model or the results. For example, are you missing any key variables? Is this likely to lead to omitted variable bias?
11. Write a 5-to-6-page report or memorandum in 12pt Times New Roman double-spaced with an appendix that includes all the commands and outputs from RStudio (this page count does not include the appendix). The tone of your report does not have to be formal, but the report must describe in detail what and how you have done for each and every item listed above. It is not important what results you get, but it is important that you are able to use the data to back up your arguments, and tell a complete story. However, your model and everything you do must make basic sense, or no credit will be given. For instance, running a regression with ID number as a variable will be an automatic F for this project.








Jermaine Byrant
Nicole Johnson



