经济学家应该了解的机器学习方法

来源于《比较》 2022年第2期出版日期 2022年04月01日

听报道

《比较》2022年第2期

出版日期 2022-04-01

相关报道: 科技公司中的经济学家和经济学; 【财新周刊】像经济学家那样思考; 智能贝塔十问（上）：机器学习在智能贝塔投资的应用; 智能贝塔十问（下）: 打开机器学习算法的“黑箱子”; 艾伦AI研究所埃齐奥尼：机器学习需积累更多“常识”

文｜苏珊·阿西吉多·因本斯

10.结论

　　快速发展的机器学习研究给经济学领域的实证研究者提供了大量工具。在这篇回顾文献中，我们描述了自认为对经济学家最有用的某些方法，并强烈主张将其纳入核心的计量经济学研究生课程。熟悉这些方法将帮助学者们开展更先进的实证研究，并促进与其他领域的同行们的有效交流。

　　（余江译）

参考文献

　　Abadie A，Cattaneo MD.2018.Econometric methods for program evaluation.Annu.Rev.Econ.10：465-503.

　　Abadie A，Diamond A，Hainmueller J.2010.Synthetic control methods for comparative case studies：estimating the effect of California’s tobacco control program.J.Am.Stat.Assoc.105：493-505.

　　Abadie A，Diamond A，Hainmueller J.2015.Comparative politics and the synthetic control method.Am.J.Political Sci.59：495-510.

　　Abadie A，Imbens GW.2011.Bias-corrected matching estimators for average treatment effects.J.Bus.Econ.Stat. 29：1-11.

　　Alpaydin E.2009.Introduction to Machine Learning.Cambridge，MA：MIT Press.

　　Angrist JD，Pischke JS.2008.Mostly Harmless Econometrics：An Empiricist’s Companion.Princeton，NJ：Princeton Univ.Press.

　　Arjovsky M，Bottou L.2017.Towards principled methods for training generative adversarial networks.arXiv：1701.04862 \[stat.ML\].

　　Arora S，Li Y，Liang Y，Ma T.2016.RAND-WALK：a latent variable model approach to word embeddings.Trans.Assoc.Comput.Linguist.4：385-99.

　　Athey S.2017.Beyond prediction：using big data for policy problems.Science 355：483-85.

　　Athey S.2019.The impact of machine learning on economics.In The Economics of Artificial Intelligence：AnAgenda，ed.AK Agrawal，J Gans，A Goldfarb.Chicago：Univ.Chicago Press.In press.

　　Athey S，Bayati M，Doudchenko N，Imbens G，Khosravi K.2017a.Matrix completion methods for causal panel data models.arXiv：1710.10251 \[math.ST\].

　　Athey S，Bayati M，Imbens G，Zhaonan Q.2019.Ensemble methods for causal effects in panel data settings.NBER Work.Pap.25675.

　　Athey S，Blei D，Donnelly R，Ruiz F.2017b.Counterfactual inference for consumer choice across many product categories.AEA Pap.Proc.108：64-67.

　　Athey S，Imbens G.2016.Recursive partitioning for heterogeneous causal effects.PNAS 113：7353-60.

　　Athey S，Imbens G，Wager S.2016a.Efficient inference of average treatment effects in high dimensions via approximate residual balancing.arXiv：1604.07125 \[math.ST\].

　　Athey S，Imbens GW.2017a.The econometrics of randomized experiments.In Handbook of Economic Field Experiments， Vol.1，ed.E Duflo，A Banerjee，pp.73-140.Amsterdam：Elsevier.

　　Athey S，Imbens GW.2017b.The state of applied econometrics：causality and policy evaluation.J.Econ.Perspect.31：3-32.

　　Athey S，Mobius MM，Pál J.2017c.The impact of aggregators on internet news consumption.Unpublished manuscript，Grad.School Bus.，Stanford Univ.，Stanford，CA.

　　Athey S，Tibshirani J，Wager S.2016b.Generalized random forests.arXiv：1610.01271 \[stat.ME\].

　　Athey S，Wager S.2017.Efficient policy learning.arXiv：1702.02896 \[math.ST\].

　　Bai J.2003.Inferential theory for factor models of large dimensions.Econometrica 71：135-71.

　　Bai J，Ng S.2002.Determining the number of factors in approximate factor models.Econometrica 70：191-221.

　　Bai J，Ng S.2017.Principal components and regularized estimation of factor models.arXiv：1708.08137 \[stat.ME\].

　　Bamler R，Mandt S.2017.Dynamic word embeddings via skip-gram filtering.In Proceedings of the 34th International Conference on Machine Learning，pp.380-89.La Jolla，CA：Int.Mach.Learn.Soc.

　　Barkan O.2016.Bayesian neural word embedding.arXiv：1603.06571 \[math.ST\].

　　Bastani H，Bayati M.2015.Online decision-making with high-dimensional covariates.Work.Pap.，Univ.Penn./Stanford Grad.School Bus.，Philadelphia/Stanford，CA.

　　Bell RM，Koren Y.2007.Lessons from the Netflix prize challenge.ACM SIGKDD Explor.Newsl.9：75-79.

　　Belloni A，Chernozhukov V，Hansen C.2014.High-dimensional methods and inference on structural and treatment effects.J.Econ.Perspect.28：29-50.

　　Bengio Y，Ducharme R，Vincent P，Janvin C.2003.A neural probabilistic language model.J.Mach.Learn.Res. 3：1137-55.

　　Bengio Y，Schwenk H，Senécal JS，Morin F，Gauvain JL.2006.Neural probabilistic language models.In Innovations in Machine Learning：Theory and Applications，ed.DE Holmes，pp.137-86.Berlin：Springer.

　　Bennett J，Lanning S.2007.The Netflix prize.In Proceedings of KDD Cup and Workshop 2007，p.35.New York：ACM.

　　Bertsimas D，King A，Mazumder R.2016.Best subset selection via a modern optimization lens.Ann.Stat.44：813-52.

　　Bickel P，Klaassen C，Ritov Y，Wellner J.1998.Efficient and Adaptive Estimation for Semiparametric Models.Berlin：Springer.

　　Bierens HJ.1987.Kernel estimators of regression functions.In Advances in Econometrics：Fifth World Congress，Vol.1，ed.TF Bewley，pp.99-144.Cambridge，UK：Cambridge Univ.Press.

　　Blei DM，Lafferty JD.2009.Topic models.In Text Mining：Classification，Clustering，and Applications，ed.A Srivastava，M Sahami，pp.101-24.Boca Raton，FL：CRC Press.

　　Bottou L.1998.Online learning and stochastic approximations.In On-Line Learning in Neural Networks，ed.D Saad，pp.9-42.New York：ACM.

　　Bottou L.2012.Stochastic gradient descent tricks.In Neural Networks：Tricks of the Trade，ed.G Montavon，G Orr，K-R Müller，pp.421-36.Berlin：Springer.

　　Breiman L.1993.Better subset selection using the non-negative garotte.Tech.Rep.，Univ.Calif.，Berkeley.

　　Breiman L.1996.Bagging predictors.Mach.Learn.24：123-40.

　　Breiman L.2001a.Random forests.Mach.Learn. 45：5-32.

　　Breiman L.2001b.Statistical modeling：the two cultures （with comments and a rejoinder by the author）.Stat.Sci. 16：199-231.

　　Breiman L，Friedman J，Stone CJ，Olshen RA.1984.Classification and Regression Trees.Boca Raton，FL：CRC Press.

　　Burkov A.2019.The Hundred-Page Machine Learning Book.Quebec City，Can.：Andriy Burkov.

　　Candeés E，Tao T.2007.The Dantzig selector：statistical estimation when p is much larger than n.Ann.Stat.35：2313-51.

　　Candeés EJ，Recht B.2009.Exact matrix completion via convex optimization.Found.Comput.Math.9：717.

　　Chamberlain G.2000.Econometrics and decision theory.J.Econom.95：255-83.

　　Chen X.2007.Large sample sieve estimation of semi-nonparametric models.In Handbook of Econometrics，Vol.6B，ed.JJ Heckman，EE Learner，pp.5549-632.Amsterdam：Elsevier.

　　Chernozhukov V，Chetverikov D，Demirer M，Duflo E，Hansen C，et al.2016a.Double machine learning for treatment and causal parameters.Tech.Rep.，Cent.Microdata Methods Pract.，Inst.Fiscal Stud.，London.

　　Chernozhukov V，Chetverikov D，Demirer M，Duflo E，Hansen C，et al.2018a.Double/debiased machine learning for treatment and structural parameters.Econom.J.21：C1-68.

　　Chernozhukov V，Chetverikov D，Demirer M，Duflo E，Hansen C，Newey W.2017.Double/debiased/Neyman machine learning of treatment effects.Am.Econ.Rev.107：261-65.

　　Chernozhukov V，Demirer M，Duflo E，Fernandez-Val I.2018b.Generic machine learning inference on heterogenous treatment effects in randomized experiments.NBER Work.Pap.24678.

　　Chernozhukov V，Escanciano JC，Ichimura H，Newey WK.2016b.Locally robust semiparametric estimation.arXiv：1608.00033 \[math.ST.\].

　　Chernozhukov V，Newey W，Robins J.2018c.Double/de-biased machine learning using regularized Riesz representers.arXiv：1802.08667 \[stat.ML\].

　　Chipman HA，George EI，McCulloch RE.2010.Bart：Bayesian additive regression trees.Ann.Appl.Stat.4：266-98.

　　Cortes C，Vapnik V.1995.Support-vector networks.Mach.Learn.20：273-97.

　　Dietterich TG.2000.Ensemble methods in machine learning.In Multiple Classifier Systems：First International Workshop，Cagliari，Italy，June 21-23，pp.1-15.Berlin：Springer.

　　Dimakopoulou M，Athey S，Imbens G.2017.Estimation considerations in contextual bandits.arXiv：1711.07077 \[stat.ML\].

　　Dimakopoulou M，Zhou Z，Athey S，Imbens G.2018.Balanced linear contextual bandits.arXiv：1812.06227.

　　Doudchenko N，Imbens GW.2016.Balancing，regression，difference-in-differences and synthetic control methods：a synthesis.NBER Work.Pap.22791.

　　Dudik M，Erhan D，Langford J，Li L.2014.Doubly robust policy evaluation and optimization.Stat.Sci.29：485-511.

　　Dudik M，Langford J，Li L.2011.Doubly robust policy evaluation and learning.In Proceedings of the 28th International Conference on Machine Learning，pp.1097-104.La Jolla，CA：Int.Mach.Learn.Soc.

　　Efron B，Hastie T.2016.Computer Age Statistical Inference，Vol.5.Cambridge，UK：Cambridge Univ.Press.

　　Efron B，Hastie T，Johnstone I，Tibshirani R.2004.Least angle regression.Ann.Stat.32：407-99.

　　Farrell MH，Liang T，Misra S.2018.Deep neural networks for estimation and inference：application to causal effects and other semiparametric estimands.arXiv：1809.09953 .\[econ.EM\].

　　Firth JR.1957.A synopsis of linguistic theory 1930-1955.In Studies in Linguistic Analysis （Special Volume of the Philological Society），ed.JR Firth，pp.1-32.Oxford，UK：Blackwell.

　　Friedberg R，Tibshirani J，Athey S，Wager S.2018.Local linear forests.arXiv：1807.11408 \[stat.ML\].

　　Friedman JH.2002.Stochastic gradient boosting.Comput.Stat.Data Anal. 38：367-78.

　　Gentzkow M，Kelly BT，Taddy M.2017.Text as data.NBER Work.Pap.23276.

　　Goodfellow I，Pouget-Abadie J，Mirza M，Xu B，Warde-Farley D，et al.2014.Generative adversarial nets.In Advances in Neural Information Processing Systems，Vol.27，ed.Z Ghahramani，M Welling，C Cortes，ND.Lawrence，KQ Weinberger，pp.2672-80.San Diego，CA：Neural Inf.Process.Syst.Found.

　　Gopalan P，Hofman J，Blei DM.2015.Scalable recommendation with hierarchical Poisson factorization.In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence，Amsterdam，Neth.，July 12-16，art.208.Amsterdam：Assoc.Uncertain.Artif.Intell.

　　Green DP，Kern HL.2012.Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees.Public Opin.Q. 76：491-511.

　　Greene WH.2000.Econometric Analysis.Upper Saddle River，N J：Prentice Hall.4th ed.

　　Harris ZS.1954.Distributional structure.Word 10：146-62.

　　Hartford J，Lewis G，Taddy M.2016.Counterfactual prediction with deep instrumental variables networks.arXiv：1612.09596 \[stat.AP\].

　　Hartigan JA，Wong MA.1979.Algorithm as 136：a k-means clustering algorithm.J.R.Stat.Soc.Ser.C 28：100-8.

　　Hastie T，Tibshirani R，Friedman J.2009.The Elements of Statistical Learning.Berlin：Springer.

　　Hastie T，Tibshirani R，Tibshirani RJ.2017.Extended comparisons of best subset selection，forward stepwise selection，and the lasso.arXiv：1707.08692 \[stat.ME\].

　　Hastie T，Tibshirani R，Wainwright M.2015.Statistical Learning with Sparsity：The Lasso and Generalizations.New York：CRC Press.

　　Hill JL.2011.Bayesian nonparametric modeling for causal inference.J.Comput.Graph.Stat. 20：217-40.

　　Hirano K，Porter JR.2009.Asymptotics for statistical treatment rules.Econometrica 77：1683-701.

　　Hoerl AE，Kennard RW.1970.Ridge regression：biased estimation for nonorthogonal problems.Technometrics 12：55-67.

　　Holland PW.1986.Statistics and causal inference.J.Am.Stat.Assoc. 81：945-60.

　　Hornik K，Stinchcombe M，White H.1989.Multilayer feedforward networks are universal approximators.Neural Netw. 2：359-66.

　　Imai K，Ratkovic M.2013.Estimating treatment effect heterogeneity in randomized program evaluation.Ann.Appl.Stat.7：443-70.

　　Imbens G，Wooldridge J.2009.Recent developments in the econometrics of program evaluation.J.Econ.Lit..47：5-86.

　　Imbens GW，Lemieux T.2008.Regression discontinuity designs：a guide to practice.J.Econom.142：615-35.

　　Imbens GW，Rubin DB.2015.Causal Inference in Statistics，Social，and Biomedical Sciences.Cambridge，UK：Cambridge Univ.Press.

　　Jacobs B，Donkers B，Fok D.2014.Product Recommendations Based on Latent Purchase Motivations.Rotterdam，Neth.：ERIM.

　　Jiang N，Li L.2016.Doubly robust off-policy value evaluation for reinforcement learning.In Proceedings of the 33rd International Conference on Machine Learning，pp.652-61.

　　La Jolla，CA：Int.Mach.Learn.Soc.Kallus N.2017.Balanced policy evaluation and learning.arXiv：1705.07384 \[stat.ML\].

　　Keane MP.2013.Panel data discrete choice models of consumer demand.In The Oxford Handbook of Panel Data，ed.BH Baltagi，pp.54-102.Oxford，UK：Oxford Univ.Press.

　　KitagawaT，TetenovA.2015.Who should be treated?Empirical welfare maximization methods for treatment choice.Tech.Rep.，Cent.Microdata Methods Pract.，Inst.Fiscal Stud.，London.

　　Knox SW.2018.Machine Learning：A Concise Introduction.Hoboken，NJ：Wiley.

　　Krizhevsky A，Sutskever I，Hinton GE.2012.Imagenet classification with deep convolutional neural networks.In Advances in Neural Information Processing Systems，Vol.25，ed.Z Ghahramani，M Welling，C Cortes，ND Lawrence，KQ Weinberger，pp.1097-105.San Diego，CA：Neural Inf.Process.Syst.Found.

　　Künzel S，Sekhon J，Bickel P，Yu B.2017.Meta-learners for estimating heterogeneous treatment effects using machine learning.arXiv：1706.03461 \[math.ST\].

　　Lai TL，Robbins H.1985.Asymptotically efficient adaptive allocation rules.Adv.Appl.Math. 6：4-22.

　　LeCun Y，Bengio Y，Hinton G.2015.Deep learning.Nature 521：436-44.

　　Levy O，Goldberg Y.2014.Neural word embedding as implicit matrix factorization.In Advances in Neural Information Processing Systems，Vol.27，ed.Z Ghahramani，M Welling，C Cortes，ND Lawrence，KQ Weinberger，pp.2177-85.San Diego，CA：Neural Inf.Process.Syst.Found.

　　Li L，Chen S，Kleban J，Gupta A.2014.Counterfactual estimation and optimization of click metrics for search engines：a case study.In Proceedings of the 24th International Conference on the World Wide Web，pp.929-34.New York：ACM.

　　Li L，Chu W，Langford J，Moon T，Wang X.2012.An unbiased offline evaluation ofcontextual bandit algorithms with generalized linear models.In Proceedings of 4th ACM International Conference on Web Search and Data Mining， pp.297-306.New York：ACM.

　　Matzkin RL.1994.Restrictions of economic theory in nonparametric methods.In Handbook of Econometrics，Vol.4，ed.R Engle，D McFadden，pp.2523-58.Amsterdam：Elsevier.

　　Matzkin RL.2007.Nonparametric identification.In Handbook of Econometrics，Vol.6B，ed.J Heckman，E Learner，pp.5307-68.Amsterdam：Elsevier.

　　Mazumder R，Hastie T，Tibshirani R.2010.Spectral regularization algorithms for learning large incomplete matrices.J.Mach.Learn.Res.11：2287-322.

　　Meinshausen N.2007.Relaxed lasso.Comput.Stat.Data Anal.52：374-93.

　　Mikolov T，Chen K，Corrado GS，Dean J.2013a.Efficient estimation of word representations in vector space.arXiv：1301.3781 \[cs.CL\].

　　Mikolov T，Sutskever I，Chen K，Corrado GS，Dean J.2013b.Distributed representations of words and phrases and their compositionality.In Advances in Neural Information Processing Systems，Vol.26，ed.Z Ghahramani，M Welling，C Cortes，ND Lawrence，KQ Weinberger，pp.3111-19.San Diego，CA：Neural Inf.Process.Syst.Found.

　　Mikolov T，Yih W，Zweig G.2013c.Linguistic regularities in continuous space word representations.In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies，pp.746-51.New York：Assoc.Comput.Linguist.

　　Miller A.2002.Subset Selection in Regression.New York：CRC Press.

　　Mnih A，Hinton GE.2007.Three new graphical models for statistical language modelling.In International Conference on Machine Learning，pp.641-48.La Jolla，CA：Int.Mach.Learn.Soc.

　　Mnih A，Kavukcuoglu K.2013.Learning word embeddings efficiently withnoise-contrastive estimation.In Advances in Neural Information Processing Systems，Vol.26，ed.Z Ghahramani，M Welling，C Cortes，ND Lawrence，KQ Weinberger，pp.2265-73.San Diego，CA：Neural Inf.Process.Syst.Found.

　　Mnih A，Teh YW.2012.A fast and simple algorithm for training neural probabilistic language models.In Proceedings of the 29th International Conference on Machine Learning，pp.419-26.La Jolla，CA：Int.Mach.Learn.Soc.

　　Morris CN.1983.Parametric empirical Bayes inference：theory and applications.J.Am.Stat.Assoc.78：47-55.

　　Mullainathan S，Spiess J.2017.Machine learning：an applied econometric approach.J.Econ.Perspect.31：87-106.

　　Nie X，Wager S.2019.Quasi-oracle estimation of heterogeneous treatment effects.arXiv：1712.04912 \[stat.ML\].

　　Pennington J，Socher R，Manning CD.2014.GloVe：global vectors for word representation.In Proceedings of the 2014 Conference on Empirical Methods on Natural Language Processing，pp.1532-43.New York：Assoc.Comput.Linguist.

　　Robins J，Rotnitzky A.1995.Semiparametric efficiency in multivariate regression models with missing data.J.Am.Stat.Assoc.90：122-29.

　　Rosenbaum PR，Rubin DB.1983.The central role of the propensity score in observational studies for causal effects.Biometrika 70：41-55.

　　Ruiz FJ，Athey S，Blei DM.2017.SHOPPER：a probabilistic model of consumer choice with substitutes and complements.arXiv：1711.03560 \[stat.ML\].

　　Rumelhart DE，Hinton GE，Williams RJ.1986.Learning representations by back-propagating errors.Nature 323：533-36.

　　Schapire RE，Freund Y.2012.Boosting：Foundations and Algorithms.Cambridge，MA：MIT Press.

　　Scholkopf B，Smola AJ.2001.Learning with Kernels：Support Vector Machines，Regularization，Optimization，and Beyond.Cambridge，MA：MIT Press.

　　Scott SL.2010.A modern Bayesian look at the multi-armed bandit.Appl.Stoch.Models Bus.Ind.26：639-58.

　　Semenova V，Goldman M，Chernozhukov V，Taddy M.2018.Orthogonal ML for demand estimation：high dimensional causal inference in dynamic panels.arXiv：1712.09988 \[stat.ML\].

　　Strehl A，Langford J，Li L，Kakade S.2010.Learning from logged implicit exploration data.In Advances in Neural Information Processing Systems，Vol.23，ed.Z Ghahramani，M Welling，C Cortes，ND Lawrence，KQ Weinberger，pp.2217-25.San Diego，CA：Neural Inf.Process.Syst.Found.

　　Sutton RS，Barto AG.1998.Reinforcement Learning：An Introduction.Cambridge，MA：MIT Press.

　　Swaminathan A，Joachims T.2015.Batch learning from logged bandit feedback through counterfactual risk minimization.J.Mach.Learn.Res. 16：1731-55.

　　Thomas P，Brunskill E.2016.Data-efficient off-policy policy evaluation for reinforcement learning.In Proceedings of the International Conference on Machine Learning，pp.2139-48.La Jolla，CA：Int.Mach.Learn.Soc.

　　Thompson WR.1933.On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.Biometrika25：285-94.

　　Tibshirani R.1996.Regression shrinkage and selection via the lasso.J.R.Stat.Soc. 58：267-88.

　　Tibshirani R，Hastie T.1987.Local likelihood estimation.J.Am.Stat.Assoc.82：559-67.

　　van der Laan MJ，Rubin D.2006.Targeted maximum likelihood learning.Int.J.Biostat. 2（1）：34-56.

　　Van der Vaart AW.2000.Asymptotic Statistics.Cambridge，UK：Cambridge Univ.Press.

　　Vapnik V.2013.The Nature of Statistical Learning Theory.Berlin：Springer.

　　Varian HR.2014.Big data：new tricks for econometrics.J.Econ.Perspect.28：3-28.

　　Vilnis L，McCallum A.2015.Word representations via Gaussian embedding.arXiv：1412.6623 \[cs.CL\].

　　Wager S，Athey S.2017.Estimation and inference of heterogeneous treatment effects using random forests.J.Am.Stat.Assoc.113：1228-42.

　　Wan M，Wang D，Goldman M，Taddy M，Rao J，et al.2017.Modeling consumer preferences and price sensitivities from large-scale grocery shopping transaction logs.In Proceedings of the 26th International Conference on the World Wide Web，pp.1103-12.New York：ACM.

　　White H.1992.Artificial Neural Networks：Approximation and Learning Theory.Oxford，UK：Blackwell.

　　Wooldridge JM.2010.Econometric Analysis of Cross Section and Panel Data.Cambridge，MA：MIT Press.

　　Wu X，Kumar V，Quinlan JR，Ghosh J，Yang Q，et al.2008.Top 10 algorithms in data mining.Knowl.Inform.Syst.14：1-37.

　　Zeileis A，Hothorn T，Hornik K.2008.Model-based recursive partitioning.J.Comput.Graph.Stat. 17：492-514.

　　Zhou Z，Athey S，Wager S.2018.Offline multi-action policy learning：generalization and optimization.arXiv：1810.04778 \[stat.ML\].

　　Zou H，Hastie T.2005.Regularization and variable selection via the elastic net.J.R.Stat.Soc.B 67：301-20.

　　Zubizarreta JR.2015.Stable weights that balance covariates for estimation with incomplete outcome data.J.Am.Stat.Assoc. 110：910-22.

　　[《比较》印刷版，点此订阅，随时起刊，免费快递。]

版面编辑：吴秋晗

经济学家应该了解的机器学习方法

10.结论

参考文献

推荐阅读

评论区 0

图片推荐

视听推荐

编辑推荐

财新名家

视频

博客

最新文章