Experience Detail |
|
National Center for Biotechnology Information, Bethesda, MD
Staff Scientist/Systems Analyst Contractor |
8/2002 to present |
I conduct original research in statistical natural language processing
and machine learning to analyze large-scale, complex biomedical text
and numerical data. I transitioned from contractor to staff scientist in
6/2008, in order to continue with independent research projects.
-
Lead developer of the MedTag SQL database, for collecting, disseminating,
and analyzing diversely annotated linguistic corpora.
-
Lead Developer of a robust system to identify and extract bibliographic
references from garbled OCR text using Perl and C++.
-
Lead developer of MedPost, a hidden Markov model part of speech
tagger implemented in C++. MedPost outperforms current state-of-the-art
taggers on biomedical text.
-
Lead developer of a novel hidden Markov model algorithm for automatically
aligning complex real-world sequence and text data, implemented
as a portable C++ class library.
-
Lead developer of a fuzzy set query information retrieval algorithm,
with query term expansion, implemented in C++.
-
Co-organizer of the BioCreative II Workshop, responsible for evaluating
and summarizing participants in the Gene Mention task.
-
Program Committee member of numerous biocomputing conferences
including the Pacific Symposium on Biocomputing, and the Association
of Computational Linguistics BioNLP workshops.
-
Peer review of numerous scientific papers for
Bioinformatics,
Journal of Applied Bioinformatics,
BMC Bioinformatics,
and
Journal of Biomedical Discovery and Collaboration.
|
National Library of Medicine, Bethesda, MD
Medical Informatics Fellow |
2/2002 to 8/2002 |
|
As a Medical Informatics Fellow at the Lister Hill Center for
Biomedical Communications, I designed and prototyped a web-based
CGI program to provide in depth information to the public about
genetic diseases and the impact of the Human Genome Project.
|
Department of Defense, Ft. Meade, MD
Applied Mathematician |
7/2000 to 2/2002 |
|
As an applied mathematician at the DoD, I researched signals analysis
techniques for analyzing voice signals, including formant and
pitch parameter estimation, vowel, speaker, and gender identification,
and theoretical models of speech production using MATLAB.
|
National Cancer Institute, Bethesda, MD
Bioinformatics Research Fellow |
7/1998 to 7/2000 |
As a research fellow in the Laboratory of Molecular Pharmacology
working for Dr. John Weinstein, I developed data analysis tools
and new mathematical methods for analyzing and visualizing complex
data using SPlus and Postscript.
-
Lead Developer of a suite of statistical data analysis tools.
-
Lead Developer of an application to visualize genes, drugs, and
cells based on measurements of a large number of attributes.
-
Co-developer of the MedMiner internet text mining system.
|
Dow AgroSciences LLC, Zionsville, IN
Systems Development Consultant |
11/1993 to 7/1998 |
I co-developed the VISTA system, originally designed for DowElanco
by Dome Software.
-
Lead Developer of a MACCS formatted chemical structure database.
-
Lead Developer of a calculations module
with rapid addition of new calculations,
a graphical user interface,
and interactive display of raw data and fitted curves.
-
Co-developer of a client/server module
that allows multidisciplinary researchers
to design a factorial experiment with up to four factors,
and record routine experimental data for reporting and analysis.
-
Co-developer of a relational data model to store bioassay results.
-
Research Consultant on statistical calculations, including parameter
estimation using linear regression, probit analysis, and logistic
curve fitting.
|
Data Parallel Systems, Inc., Bloomington, IN
GUI Developer |
6/1994 to 6/1995 |
DPSI was commissioned entirely by MasPar, Inc. to develop a relational
database system that took full advantage of MasPar's massively
parallel (SIMD) computers. The database was sold to Wal-Mart Stores,
Inc. for data mining.
-
Lead Developer of a Unix-based Motif user interface for use in
managing the company's parallel-computer relational database product.
As the sole developer, I was responsible for establishing requirements,
designing, and implementing the interface.
|
Dome Software, Inc., Indianapolis, IN
Systems Analyst |
9/1987 to 6/1994 |
Originally named Scientific Software Products, the company's mission
was to develop an object-oriented database with a graphical interface
accessible by computer network. After initial development, the
product was crucial in securing several lucrative contracts.
-
Manager of programmers implementing system enhancements.
-
Lead Developer of the Parley communications tool, the core of
the VISTA research information management system.
-
Lead Developer of a networked database tool
for rapid system prototyping and application development
used in contracts
with Boehringer-Mannheim Corporation and Eastman Kodak Company.
-
Technical Writer of reference, tutorial and marketing materials.
-
Technical Advisor providing training, technical and project organization
advice.
-
Developer of pre-sales prototyping for Dome technical staff.
|
Micro Database Systems, Inc., Lafayette, IN
Programmer |
5/1985 to 9/1987 |
Mdbs developed and marketed cross-platform database system development
products, getting its start with CP/M on Z80 computers. Their
KnowledgeMan was one of the original 'Works' programs, providing
integrated database, spreadsheet, word processing, communications,
and scripting.
-
Lead Developer of an expert system inference engine, implementing
forward and backward chaining. The inference engine was integrated
into a spreadsheet program (KnowledgeMan) to become a new software
product (Guru).
-
Lead Developer of an expert system analysis tool to diagnose
and correct faulty production rules.
-
Technical Advisor on fuzzy valued data types which were used
in the Guru expert system tool.
-
Co-developer of a cross-platform graphical user interface for
editing a database of expert system production rules.
|
Naval Ocean Systems Center, San Diego, CA
Information Scientist |
11/1983 to 5/1985 |
This was a research lab of the US Navy.
I was hired into a new professional demonstration program
which required four assignments averaging three months each.
-
Lead Developer of an expert system for assessing the results of
on-going large scale simulations of a torpedo system.
-
Lead Developer of software in Pascal to analyze data collected
from under-water microphones and apply doppler-shift prediction
of moving sound sources.
-
Lead Developer of a Prolog interpreter in LISP
which was used to benchmark
the performance of a super-computer
applied to artificial intelligence algorithms.
-
Technical Advisor evaluating theoretical merits of a proposed
fuzzy-logic based expert system.
I developed a fully functional prototype to
demonstrate my analysis.
|
Data General Corp., Westbrook, ME
Diagnostics Programmer |
6/1982 to 11/1983 |
I worked in the diagnostic software group
in a manufacturing plant for computer peripheral storage devices.
I was responsible for developing software
to test units produced on the assembly-line,
as well as developing the in-house portion of a corporate-wide
system to monitor operations.
-
Lead Developer of a report generating tool for accessing data
from a real-time assembly-line monitoring system.
-
Lead Developer of a stand-alone micro-computer system
for diagnostic testing.
|
|
Publication Detail |
|
Smith LH, Wilbur WJ.
(2008)
The value of parsing as feature generation for
gene mention recognition.
Biomedical Informatics, to appear.
Smith LH, et al.
(2008)
Overview of BioCreative II Gene Mention Recognition.
Genome Biology, 2008, 9(Suppl 2):S2.
[pdf]
Wilbur JW, Smith LH and Tanabe LK.
(2007)
BioCreative 2. Gene Mention Task.
In Proceedings of the Second BioCreative Challenge Evaluation Workshop,
pp. 7-16. Madrid, April, 2007.
[pdf]
Demner-Fushman D, et al.
(2007)
Combining Resources to Find Answers to Biomedical Questions.
In Proceedings of the Text Retrieval Conference (TREC-2007),
[pdf]
Demner-Fushman D, et al.
(2006)
Finding Relevant Passages in Scientific Articles: Fusion of Automatic Approaches vs. an Interactive Team Effort.
In Proceedings of the Text Retrieval Conference (TREC-2006),
[pdf]
Aronson AR, et al.
(2006)
Fusion of knowledge-intensive and statistical
approaches for retrieving and annotating textual genomics documents.
In Proceedings of the Text Retrieval Conference (TREC-2005).
[pdf]
Smith LH.
(2005)
On ordering free groups.
Journal of Symbolic Computation,
40(6), 1285-1290.
[pdf]
Smith LH, Tanabe L, Rindflesch T and Wilbur WJ.
(2005)
MedTag: A Collection of Biomedical Annotations.
In Proceedings of the ACL-ISMB Workshop on Linking
Biological Literature, Ontologies and Databases: Mining Biological Semantics,
pp. 32-37. Detroit, June 2005.
[pdf]
Smith L, Rindflesch T and Wilbur WJ.
(2005)
The Importance of the Lexicon in Tagging Biological Text.
Natural Language Engineering, 12(2), 1-17.
[pdf]
Smith L. (2005)
Review of The Turing Test,
Verbal Behavior as the Hallmark of Intelligence,
by Stuart Shieber.
Linguist List 16.1815.
http://www.ling.ed.ac.uk/linguist/issues/16/16-1815.html
Smith LH.
(2005)
"Exploratory Genomic Data Analysis",
In Medical Informatics, Knowledge Management and Data Mining
in Biomedicine,
Chen, H. et al., Eds.,
Springer, 2005.
[book]
Aronson AR, et al.
(2004)
Knowledge-intensive and statistical approaches to
the retrieval and annotation of genomics MEDLINE citations.
In Proceedings of the Text Retrieval Conference (TREC-2004),
NIST Special Publication 500-261.
[pdf]
Smith L and Wilbur WJ.
(2004)
Retrieving definitional content for ontology development.
Computational Biology and Chemistry, 28(2004), 387-391.
[pdf]
Smith L, Rindflesch T and Wilbur WJ.
(2004)
MedPost: a Part of Speech Tagger for Biomedical Text.
Bioinformatics, 20(14), 2320-2321.
[pdf]
Yeganova L, Smith L and Wilbur WJ.
(2004)
Identification of related gene/protein names based on an HMM of name
variations.
Computational Biology and Chemistry,
28(2), 97-107.
[pdf]
Kayaalp M, Aronson AR, Humphrey SM, Ide NC, Tanabe LK, Smith LH, Demner D,
Loane RR, Mork JG, Bodenreider O.
(2003)
Methods for accurate retrieval of
MEDLINE citations in functional genomics.
In Proceedings of the Text Retrieval
Conference (TREC-2003), NIST Special Publication 500-255, pp. 441-450.
[pdf]
Smith L, Yeganova L and Wilbur WJ.
(2003)
Hidden Markov models and optimized sequence alignment.
Computational Biology and Chemistry, 27(1), 77-84.
[pdf]
Lee JK, Bussey KJ, Gwadry FG, Reinhold W, Riddick G, Pelletier SL, Nishizuka
S, Szakacs G, Annereau J-P, Shankavaram U, Lababidi S, Smith LH, Gottesman MM,
Weinstein JN.
(2003)
Comparing cDNA and Oligonucleotide Array Data: Concordance
of Gene Expression Across Platforms for the NCI-60 Cancer Cells.
Genome Biol, 4(12), R82.
[pdf]
Smith LH and Nelson DJ.
(2002)
Multiple-tube resonance model.
In Proceedings of SPIE, 4791, pp. 33-42.
[pdf]
Nelson DJ and Smith LH.
(2002)
Tuning Time-Frequency methods for the detection
of metered HF speech.
In Proceedings of SPIE, 4791, pp. 71-79.
[pdf]
Zhou Y, Gwadry FG, Reinhold WC, Miller LD, Smith LH, Scherf U, Liu ET, Kohn KW,
Pommier Y, Weinstein JN.
(2002)
Transcriptional Regulation of Mitotic Genes by
Camptothecin-Induced DNA Damage: Microarray Analysis of Dose- and Time-Dependent
Effects.
Cancer Res, 62(6), 1688-1695.
[pdf]
Weinstein JN, Scherf U, Lee JK, Nishizuka S, Gwadry F, Bussey AK, Kim S,
Smith LH, Tanabe L, Richman S, Alexander J, Kouros-Mehr H, Maunakea A,
Reinhold WC.
(2002)
The Bioinformatics of Microarray Gene Expression Profiling.
Cytometry. 47(1), 46-49.
[pdf]
Lee JK, Scherf U, Smith LH, Tanabe L, Weinstein JN.
(2001)
Analysis of gene
expression data of the NCl 60 cancer cell lines using Bayesian hierarchical
effects model.
In Proceedings of SPIE, 4266, pp. 228-235.
[pdf]
Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC,
Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y,
Botstein D, Brown PO and Weinstein JN.
(2000)
A Gene Expression Database for the Molecular Pharmacology of Cancer.
Nature Genetics, 24(3), 236-244.
[pdf]
Tanabe L, Smith LH, Lee JK, Scherf U, Hunter L and Weinstein JN.
(1999)
MedMiner: An internet tool for filtering and organizing biomedical
information, with application to gene expression profiling.
Biotechniques, 27, 1210-1217.
[pdf]
Smith LH. (1998)
"Computing Resolutions over Associative Algebras with Ordered Basis",
PhD Dissertation, Indiana University.
|