您的位置：百味书屋 > 范文大全 > 读书笔记 > stata学习笔记正文本文移动端：stata学习笔记

stata学习笔记

2017-04-10 05:47:21 来源网站：百味书屋

stata学习笔记

　　准备好开始学习了吗

　　STATA的基本操作

　　setmem 500m， perm

　　显示输入内容

　　Display 1

　　Display “clive”

　　显示数据集结构describe

　　Describe /d

　　编辑 edit

　　Edit

　　重命名变量

　　Rename var1 var2

　　显示数据集内容list/browse

　　List in 1

　　List in 2/10

　　数据导入:数据文件是文本类型（.csv）

　　insheet: . insheet using “C:\Documentsand Settings\Administrator\桌面\ST9007\dataset\Fees1.csv”，clear

　　内存为空时才可以导入数据集，否则会出现（you must start with an empty dataset）

　　清空内存中的所有变量：.drop _all

　　导入语句后加入“clear”命令

　　打开及退出已存文件use

　　Use 文件路径及文件名， clear

　　记录命令和输出结果（log）

　　1、开始建立记录文件：log using "J:\phd\output.log"， replace

　　2、暂停记录文件：log off

　　3、重新打开记录文件：log on

　　4、关闭记录文件：log close

　　创建和保存程序文件：（doedit， do）

　　1、打开程序编辑窗口：doedit

　　2、写入命令

　　3、保存文件，.do.

　　4、运行命令：.do 程序文件路径及文件名

　　多个数据集合并为一个数据集（变量和结构相同）纵向合并append

　　insheet using"J:\phd\Fees1.csv"， clear

　　save"J:\phd\Fees1.dta"， replace

　　insheet using"J:\phd\Fees2.csv"， clear

　　append using"J:\phd\Fees1.dta"

　　save"J:\phd\Fees1.dta"， replace

　　横向合并，在原数据集基础上加上另外的变量merge

　　1、insheet using"J:\phd\Fees1.csv"， clear

　　sort companyid yearend

　　save "J:\phd\Fees1.dta"， replace

　　describe

　　insheet using "J:\phd\Fees6.csv"， clear

　　sort companyid yearend

　　merge companyid yearend using "J:\phd\Fees1.dta"

　　save "J:\phd\Fees1.dta"， replace

　　describe

　　2、_merge==1 obs. From master data

　　_merge==2 obs. From using data

　　_merge==3 obs. From both master and using data

　　帮助文件：help

　　1、. Help describe

　　描述性统计量

　　summarize incorporationyear 单个

　　summarize incorporationyear-big6 连续多个

　　summarize _all or simply summarize 所有

　　更详细的统计量

　　summarize incorporationyear， detail

　　centile

　　centile auditfees， centile（0（10）100）

　　centile auditfees， centile（0（5）100）

　　tabulate不同类型变量的频数和比例

　　tabulate companytype

　　tabulate companytype big6， column 按列计算百分比

　　tabulate companytype big6， row 按行计算百分比

　　tab companytype big6 ifcompanytype<=3， row col 同时按行列和条件计算百分比

　　计算满足条件观测的个数

　　count if big6==1

　　count if big6==0| big6==1

　　按离散变量排序，对连续变量计算描述性统计量：

　　by companytype， sort:summarize auditfees， detail

　　sort companytype

　　By companytype:summarizeauditees

　　转换变量

　　按公司类型将公开发行股票公司赋值为1，其他为0

　　gen listed=0

　　replace listed=1if companytype==2

　　replace listed=1if companytype==3

　　replace listed=1if companytype==5

　　replace listed=.if companytype==.

　　产生新变量gen

　　Generate newvar=表达式

　　模型

　　format x1 %10.3f ——将x1的列宽固定为10，小数点后取三位

　　基本一元回归

　　regress y x

　　回归结果的保存

　　回归结果的系数保存在_b[varname]内存变量中，常数项的系数保存在（_cons）内存变量中。

　　预测值及残差

　　predict yhat

　　predict yres， resid

　　yres即为真实值得与预测值之差

　　残差与X的散点图

　　twoway （scatter y_res x）（lfit y_res x）

　　衡量估计系数准确程度：标准误差。

　　用样本的标准偏差与系数之间的关系来衡量即T值（用系数除以标准差），同时P值是根据T值的分布计算出来的，表示系数落入标准对应上下限的可能性。前提是残差符合以下假设：

　　同方差：Homoscedasticity （i.e.， the residuals have a constant variance）

　　独立不相关：Non-correlation （i.e.， the residuals are not correlated with eachother）

　　正态分布：Normality （i.e.， the residuals are normally distributed）

　　回归结果包含的一些内容的意思

　　l 各变差的自由度：

　　For the ESS， df = k-1 where k = number of regression coefficients（df = 2 – 1）

　　For the RSS， df = n – k where n =number of observations （= 11 - 2）

　　For the TSS， df = n-1 （ = 11 – 1）

　　MS：变差除以自由度：The last column（MS） reports the ESS， RSS and TSS divided by their respective degrees offreedom

　　R平方：The R-squared = ESS / TSS

　　调整的R平方：Adj R-squared =1-（1-R2）（n-1）/（n-k），消除了加入相关度不高解释变量后R平方增加的不足。

　　Root MSE = square root of RSS/n-k：模型的平均解释能力

　　The F-statistic = （ESS/k-1）/（RSS/n-k）：模型的总解释能力

　　Heteroscedasticity（hettest）异方差性

　　检验方差齐性的方法：

　　回归后使用hettest命令：

　　• reg auditfees nonauditfees totalassets big6 listed

　　• hettest

　　方差齐性不会使系数有偏，但会使使系数的标准差有偏。产生的原因有可能是数据本身有界限，产生高的偏度。一些方差不齐可以通过取对数消除。当发现不齐性时使用Huber/White/sandwich estimator对标准差进行调整。STATA可以在回归时加上robust来实现。

　　reg auditfees nonauditfees totalassets big6 listed， robust

　　加robust后的回归系数相同，但标准差不同，T值变小，P值变大，F值变小，R2不变。

　　Correlated errors（自变量相关）

　　The residuals of a given firm are correlated across years （“timeseries dependence”），面板数据（In paneldata），同一公司不可观测的特性对不同年度都会产生一定的影响，这时就会使数据不独立。there are likely to be unobserved company-specific characteristicsthat are relatively constant over time

　　标准差会下偏，This problem canbe avoided by adjusting the standard errors for the clustering of yearlyobservations across a given company

　　消除变量相关问题：

　　在回归中加入robust cluster（）

　　reg lnaf lntabig6 listed， robust cluster （companyid）

　　如何验证同一公司不同年度数据的残差的相关性

　　reg lnaf lnta

　　predict res， resid

　　keep companyid year res

　　sort companyid year

　　drop if companyid==companyid[_n-1] & year==year[_n-1]

　　reshape wide res， i（companyid） j（year）

　　browse

　　pwcorr res1998- res2002

　　在使用面板数据时应注意：

　　只用robust控制heteroscedasticity，而未用cluster（）控制time-series dependence，T统计量也会上偏

　　如果 heteroscedasticity也未控制，T统计量会上偏更严重。

　　因此在使用面板数据时应加入robust cluster（） option， otherwise your “significant” results frompooled regressions may be spurious.

　　什么情况下会产生多重共线性

　　l We have seen that when there isperfect collinearity between independent variables， STATA will have to excludeone of them. For example， year_1 + year_2 + year_3 + year_4 + year_5 = 1

　　reg lnaf year_1 year_2 year_3year_4 year_5， nocons

　　STATA automatically throws awayone of the year dummies so that the model can be estimated

　　l Even if the independentvariables are not perfectly collinear， there can still be a problem if they arehighly correlated

　　后果：

　　the standard errors of the coefficients to be large （i.e.， thecoefficients are not estimated precisely）

　　the coefficient estimates can be highly unstable

　　衡量方法：

　　Variance-inflation factors （VIF）可用来衡量是否存在多重共线性。

　　reg lnaf lnta big6 lnta1

　　vif

　　reg lnaf lnta big6

　　vif

　　多重共线性的严重程度：如果为10时可判断为高，为20时可判断为非常高。

来源：网络整理免责声明：本文仅限学习分享，如产生版权问题，请联系我们及时删除。

《stata学习笔记》出自：百味书屋
链接地址：http://www.850500.com/news/124470.html
转载请保留,谢谢!

查看更多相关内容>>stata学习笔记

相关文章

上一篇：群众路线学习笔记

下一篇：幼儿园小班教养笔记

推荐范文