Lawrence H. Smith
Address: (removed)
Silver Spring, MD
Phone: (removed)
E-mail: lh "at" smith.net

Objective   To apply mathematics, statistics, and computer algorithms to challenging problems in computational linguistics, information retrieval, text mining, machine learning, and high performance computing.

Education  
Indiana University Bloomington PhD in Mathematics 1/1998
University of Illinois at Urbana-Champaign MS in Mathematics 12/1982
University of Illinois at Urbana-Champaign BS in Engineering Physics 5/1981
(degree dates shown)

Positions  
National Center for Biotechnology Information Staff Scientist 6/2008
National Center for Biotechnology Information Systems Analyst Contractor 8/2002
National Library of Medicine Medical Informatics Fellow 2/2002
Department of Defense Applied Mathematician 7/2000
National Cancer Institute Bioinformatics Research Fellow 7/1998
Dow AgroSciences LLC Systems Development Consultant 11/1993
Data Parallel Systems, Inc. (DPSI) GUI Developer 6/1994
Dome Software, Inc. Systems Analyst 9/1987
Micro Database Systems, Inc. Programmer 5/1985
Naval Ocean Systems Center Information Scientist 11/1983
Data General Corp. Diagnostics Programmer 6/1982
(start dates shown)

Computer
Skills
  Machine learning applications and algorithm development.
System analysis, design, and development.
Programming in C/C++, Perl, Java, plus assembler, LISP, Prolog, PostScript, SPlus, and matlab.
GUI development with Javascript/AJAX, HTML, and X-Windows/Motif.
Relational database modeling with ORACLE, MySQL, and MS-SQL.
Network programming with MPI, TCP/IP, and DECnet.
Operating Systems UNIX, VAX/VMS, and Windows.

Experience
Detail
 
National Center for Biotechnology Information, Bethesda, MD
Staff Scientist/Systems Analyst Contractor
8/2002
to present
I conduct original research in statistical natural language processing and machine learning to analyze large-scale, complex biomedical text and numerical data. I transitioned from contractor to staff scientist in 6/2008, in order to continue with independent research projects.
  • Implemented a C++ class library for machine learning experiments on very large databases, optimized for parallel grid computing.
  • Developed and published a new method for accurately identifying related sentence pairs from diverse abstracts in MEDLINE.
  • Discovered a method that effectively characterizes the popularity of MEDLINE articles, approximating actual user "clickthrough" behavior, and demonstrated the feasibility of forecasting public interest in articles.
  • Developed a criterion that identifies synonymous phrases in MEDLINE content to improve query indexing, and directed a Summer student in a project to collect data for evaluation.
  • Collaborated internationally with nineteen teams of academic and commercial researchers, to improve algorithms for recognizing gene mentions identification in MEDLINE abstracts.
  • Established and published a baseline for utilization of parse data in the gene mention problem.
  • Lead developer of the MedTag SQL database, for collecting, disseminating, and analyzing diversely annotated linguistic corpora.
  • Lead Developer of a robust system to identify and extract bibliographic references from garbled OCR text using Perl and C++.
  • Lead developer of MedPost, a hidden Markov model part of speech tagger implemented in C++. MedPost outperforms current state-of-the-art taggers on biomedical text.
  • Lead developer of a novel hidden Markov model algorithm for automatically aligning complex real-world sequence and text data, implemented as a portable C++ class library.
  • Lead developer of a fuzzy set query information retrieval algorithm, with query term expansion, implemented in C++.
  • Co-organizer of the BioCreative II Workshop, responsible for evaluating and summarizing participants in the Gene Mention task.
  • Program Committee member of numerous biocomputing conferences including the Pacific Symposium on Biocomputing, and the Association of Computational Linguistics BioNLP workshops.
  • Peer review of numerous scientific papers for Bioinformatics, Journal of Applied Bioinformatics, BMC Bioinformatics, and Journal of Biomedical Discovery and Collaboration.

National Library of Medicine, Bethesda, MD
Medical Informatics Fellow
2/2002
to 8/2002
As a Medical Informatics Fellow at the Lister Hill Center for Biomedical Communications, I designed and prototyped a web-based CGI program to provide in depth information to the public about genetic diseases and the impact of the Human Genome Project.

Department of Defense, Ft. Meade, MD
Applied Mathematician
7/2000
to 2/2002
As an applied mathematician at the DoD, I researched signals analysis techniques for analyzing voice signals, including formant and pitch parameter estimation, vowel, speaker, and gender identification, and theoretical models of speech production using MATLAB.

National Cancer Institute, Bethesda, MD
Bioinformatics Research Fellow
7/1998
to 7/2000
As a research fellow in the Laboratory of Molecular Pharmacology working for Dr. John Weinstein, I developed data analysis tools and new mathematical methods for analyzing and visualizing complex data using SPlus and Postscript.
  • Lead Developer of a suite of statistical data analysis tools.
  • Lead Developer of an application to visualize genes, drugs, and cells based on measurements of a large number of attributes.
  • Co-developer of the MedMiner internet text mining system.

Dow AgroSciences LLC, Zionsville, IN
Systems Development Consultant
11/1993
to 7/1998
I co-developed the VISTA system, originally designed for DowElanco by Dome Software.
  • Lead Developer of a MACCS formatted chemical structure database.
  • Lead Developer of a calculations module with rapid addition of new calculations, a graphical user interface, and interactive display of raw data and fitted curves.
  • Co-developer of a client/server module that allows multidisciplinary researchers to design a factorial experiment with up to four factors, and record routine experimental data for reporting and analysis.
  • Co-developer of a relational data model to store bioassay results.
  • Research Consultant on statistical calculations, including parameter estimation using linear regression, probit analysis, and logistic curve fitting.

Data Parallel Systems, Inc., Bloomington, IN
GUI Developer
6/1994
to 6/1995
DPSI was commissioned entirely by MasPar, Inc. to develop a relational database system that took full advantage of MasPar's massively parallel (SIMD) computers. The database was sold to Wal-Mart Stores, Inc. for data mining.
  • Lead Developer of a Unix-based Motif user interface for use in managing the company's parallel-computer relational database product. As the sole developer, I was responsible for establishing requirements, designing, and implementing the interface.

Dome Software, Inc., Indianapolis, IN
Systems Analyst
9/1987
to 6/1994
Originally named Scientific Software Products, the company's mission was to develop an object-oriented database with a graphical interface accessible by computer network. After initial development, the product was crucial in securing several lucrative contracts.
  • Manager of programmers implementing system enhancements.
  • Lead Developer of the Parley communications tool, the core of the VISTA research information management system.
  • Lead Developer of a networked database tool for rapid system prototyping and application development used in contracts with Boehringer-Mannheim Corporation and Eastman Kodak Company.
  • Technical Writer of reference, tutorial and marketing materials.
  • Technical Advisor providing training, technical and project organization advice.
  • Developer of pre-sales prototyping for Dome technical staff.
Micro Database Systems, Inc., Lafayette, IN
Programmer
5/1985
to 9/1987
Mdbs developed and marketed cross-platform database system development products, getting its start with CP/M on Z80 computers. Their KnowledgeMan was one of the original 'Works' programs, providing integrated database, spreadsheet, word processing, communications, and scripting.
  • Lead Developer of an expert system inference engine, implementing forward and backward chaining. The inference engine was integrated into a spreadsheet program (KnowledgeMan) to become a new software product (Guru).
  • Lead Developer of an expert system analysis tool to diagnose and correct faulty production rules.
  • Technical Advisor on fuzzy valued data types which were used in the Guru expert system tool.
  • Co-developer of a cross-platform graphical user interface for editing a database of expert system production rules.
Naval Ocean Systems Center, San Diego, CA
Information Scientist
11/1983
to 5/1985
This was a research lab of the US Navy. I was hired into a new professional demonstration program which required four assignments averaging three months each.
  • Lead Developer of an expert system for assessing the results of on-going large scale simulations of a torpedo system.
  • Lead Developer of software in Pascal to analyze data collected from under-water microphones and apply doppler-shift prediction of moving sound sources.
  • Lead Developer of a Prolog interpreter in LISP which was used to benchmark the performance of a super-computer applied to artificial intelligence algorithms.
  • Technical Advisor evaluating theoretical merits of a proposed fuzzy-logic based expert system. I developed a fully functional prototype to demonstrate my analysis.

Data General Corp., Westbrook, ME
Diagnostics Programmer
6/1982
to 11/1983
I worked in the diagnostic software group in a manufacturing plant for computer peripheral storage devices. I was responsible for developing software to test units produced on the assembly-line, as well as developing the in-house portion of a corporate-wide system to monitor operations.
  • Lead Developer of a report generating tool for accessing data from a real-time assembly-line monitoring system.
  • Lead Developer of a stand-alone micro-computer system for diagnostic testing.
Publication
Detail
 

Smith LH, Wilbur WJ. (2010) Finding related sentence pairs in MEDLINE. Information Retrieval, DOI 10.1007/s10791-010-9126-8. [pdf]

Smith LH, Wilbur WJ. (2009) The value of parsing as feature generation for gene mention recognition. Journal of Biomedical Informatics, 42 (5), 895-904. [pdf]

Smith LH, et al. (2008) Overview of BioCreative II Gene Mention Recognition. Genome Biology, 2008, 9 (Suppl 2):S2. [pdf]

Wilbur JW, Smith LH and Tanabe LK. (2007) BioCreative 2. Gene Mention Task. In Proceedings of the Second BioCreative Challenge Evaluation Workshop, pp. 7-16. Madrid, April, 2007. [pdf]

Demner-Fushman D, et al. (2007) Combining Resources to Find Answers to Biomedical Questions. In Proceedings of the Text Retrieval Conference (TREC-2007). [pdf]

Demner-Fushman D, et al. (2006) Finding Relevant Passages in Scientific Articles: Fusion of Automatic Approaches vs. an Interactive Team Effort. In Proceedings of the Text Retrieval Conference (TREC-2006). [pdf]

Aronson AR, et al. (2006) Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents. In Proceedings of the Text Retrieval Conference (TREC-2005). [pdf]

Smith LH. (2005) On ordering free groups. Journal of Symbolic Computation, 40(6), 1285-1290. [pdf]

Smith LH, Tanabe L, Rindflesch T and Wilbur WJ. (2005) MedTag: A Collection of Biomedical Annotations. In Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 32-37. Detroit, June 2005. [pdf]

Smith L, Rindflesch T and Wilbur WJ. (2005) The Importance of the Lexicon in Tagging Biological Text. Natural Language Engineering, 12(2), 1-17. [pdf]

Smith L. (2005) Review of The Turing Test, Verbal Behavior as the Hallmark of Intelligence, by Stuart Shieber. Linguist List 16.1815. [link]

Smith LH. (2005) "Exploratory Genomic Data Analysis", In Medical Informatics, Knowledge Management and Data Mining in Biomedicine, Chen, H. et al., Eds., Springer, 2005. [book]

Aronson AR, et al. (2004) Knowledge-intensive and statistical approaches to the retrieval and annotation of genomics MEDLINE citations. In Proceedings of the Text Retrieval Conference (TREC-2004), NIST Special Publication 500-261. [pdf]

Smith L and Wilbur WJ. (2004) Retrieving definitional content for ontology development. Computational Biology and Chemistry, 28(2004), 387-391. [pdf]

Smith L, Rindflesch T and Wilbur WJ. (2004) MedPost: a Part of Speech Tagger for Biomedical Text. Bioinformatics, 20(14), 2320-2321. [pdf]

Yeganova L, Smith L and Wilbur WJ. (2004) Identification of related gene/protein names based on an HMM of name variations. Computational Biology and Chemistry, 28(2), 97-107. [pdf]

Kayaalp M, Aronson AR, Humphrey SM, Ide NC, Tanabe LK, Smith LH, Demner D, Loane RR, Mork JG, Bodenreider O. (2003) Methods for accurate retrieval of MEDLINE citations in functional genomics. In Proceedings of the Text Retrieval Conference (TREC-2003), NIST Special Publication 500-255, pp. 441-450. [pdf]

Smith L, Yeganova L and Wilbur WJ. (2003) Hidden Markov models and optimized sequence alignment. Computational Biology and Chemistry, 27(1), 77-84. [pdf]

Lee JK, Bussey KJ, Gwadry FG, Reinhold W, Riddick G, Pelletier SL, Nishizuka S, Szakacs G, Annereau J-P, Shankavaram U, Lababidi S, Smith LH, Gottesman MM, Weinstein JN. (2003) Comparing cDNA and Oligonucleotide Array Data: Concordance of Gene Expression Across Platforms for the NCI-60 Cancer Cells. Genome Biol, 4(12), R82. [pdf]

Smith LH and Nelson DJ. (2002) Multiple-tube resonance model. In Proceedings of SPIE, 4791, pp. 33-42. [pdf]

Nelson DJ and Smith LH. (2002) Tuning Time-Frequency methods for the detection of metered HF speech. In Proceedings of SPIE, 4791, pp. 71-79. [pdf]

Zhou Y, Gwadry FG, Reinhold WC, Miller LD, Smith LH, Scherf U, Liu ET, Kohn KW, Pommier Y, Weinstein JN. (2002) Transcriptional Regulation of Mitotic Genes by Camptothecin-Induced DNA Damage: Microarray Analysis of Dose- and Time-Dependent Effects. Cancer Res, 62(6), 1688-1695. [pdf]

Weinstein JN, Scherf U, Lee JK, Nishizuka S, Gwadry F, Bussey AK, Kim S, Smith LH, Tanabe L, Richman S, Alexander J, Kouros-Mehr H, Maunakea A, Reinhold WC. (2002) The Bioinformatics of Microarray Gene Expression Profiling. Cytometry. 47(1), 46-49. [pdf]

Lee JK, Scherf U, Smith LH, Tanabe L, Weinstein JN. (2001) Analysis of gene expression data of the NCl 60 cancer cell lines using Bayesian hierarchical effects model. In Proceedings of SPIE, 4266, pp. 228-235. [pdf]

Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO and Weinstein JN. (2000) A Gene Expression Database for the Molecular Pharmacology of Cancer. Nature Genetics, 24(3), 236-244. [pdf]

Tanabe L, Smith LH, Lee JK, Scherf U, Hunter L and Weinstein JN. (1999) MedMiner: An internet tool for filtering and organizing biomedical information, with application to gene expression profiling. Biotechniques, 27, 1210-1217. [pdf]

Smith LH. (1998) "Computing Resolutions over Associative Algebras with Ordered Basis", PhD Dissertation, Indiana University.