{smcl} {* *! version 1.2.1 07mar2013}{...} {findalias asfradohelp}{...} {vieweralsosee "" "--"}{...} {vieweralsosee "[R] help" "help help"}{...} {viewerjumpto "Syntax" "examplehelpfile##syntax"}{...} {viewerjumpto "Description" "examplehelpfile##description"}{...} {viewerjumpto "Options" "examplehelpfile##options"}{...} {viewerjumpto "Remarks" "examplehelpfile##remarks"}{...} {viewerjumpto "Examples" "examplehelpfile##examples"}{...} {title:Title} {phang} {bf:readhtmltable} {hline 2} Read a web page and convert any tables it contains to a data set {marker syntax}{...} {title:Syntax} {p 8 17 2} {cmdab:readhtmltable:} {page} [{cmd:,} {it:options}] {synoptset 20 tabbed}{...} {synopthdr} {synoptline} {syntab:Main} {synopt:{opt v:arnames}}Use the first row as variable names{p_end} {synopt:{opt h:tml}}Keep HTML markup{p_end} {synopt:{opt d:ebug}}Save intermediate results{p_end} {synoptline} {marker description}{...} {title:Description} {pstd} {cmd:readhtmltable} reads the requested web page and converts any tables it contains into a Stata data set. {marker options}{...} {title:Options} {dlgtab:Main} {phang} {opt varnames} indicates that the first row of each column should be used as the variable name. If it is emtpy, the default variable name will be used. {phang} {opt html} keeps HTML markup (defined as anything in between < and >). By default HTML markup is removed. {phang} {opt debug} keeps the intermediate results generated by the program which are dropped by default. Probably only useful for debugging. {marker remarks}{...} {title:Remarks} {pstd}If the web page contains multiple tables they will still be placed in a single data set and it is up to the user to separate them (this may change in future versions). The default variable names are of the form t#c# (e.g. t1c1 is table 1, column 1) but can be overridden with the {varnames} option.{p_end} {pstd}Note that the new data set will replace any existing data set, and this program does not (yet) check to see if anything should be saved first!{p_end} {pstd}This program is in still early in the development process. Use at your own risk, but it's normally pretty easy to tell if a page has been read properly.{p_end} {marker examples}{...} {title:Examples} {pstd}Read in the SSCC training schedule for the current semester{p_end} {phang}{cmd:. readhtmltable https://ssc.wisc.edu/sscc_jsp/training/}{p_end} {pstd}Use the first row as variable names{p_end} {phang}{cmd:. readhtmltable https://ssc.wisc.edu/sscc_jsp/training/, varnames}{p_end} {pstd}Some pages will not read properly. Here the problem is cells that take up multiple columns and/or rows (FWIW, R's readhtmlTable has the same problem with this page){p_end} {phang}{cmd:. readhtmltable http://www.nfl.com/stats/team, varnames}{p_end}