Lawrence H. Smith Silver Spring, MD http://lh.smith.net/ lh@smith.net CAREER INTERESTS I am interested in conducting basic research of a mathematical, statistical or linguistic nature and in any field of application. EDUCATION PhD in Math, Indiana University, (1/98) MS in Math, University of Illinois, (12/82) BS in Engineering Physics, University of Illinois, (5/81) POSITIONS Staff Scientist, National Center for Biotechnology Information, (6/2008 - present) Systems Analyst Contractor, National Center for Biotechnology Information, (8/2002 - 6/2008) Medical Informatics Fellow, National Library of Medicine, (2/2002 - 8/2002) Applied Mathematician, Department of Defense, (7/2000 - 2/2002) Bioinformatics Research Fellow, National Cancer Institute, (7/1998 - 7/2000) Systems Development Consultant, Dow AgroSciences LLC, (11/1993 - 7/1998) GUI Developer, Data Parallel Systems, Inc. (DPSI), (6/1994 - 6/1995) Systems Analyst, Dome Software, Inc., (9/1987-6/1994), Programmer, Micro Database Systems, Inc., (5/1985 - 9/1987) Information Scientist, Naval Ocean Systems Center, (11/1983 - 5/1985) Diagnostics Programmer, Data General Corp., (6/1982 - 11/1983) COMPUTER TECHNOLOGIES System analysis, design, and development Machine learning applications and algorithm development Programming in C/C++, Assembler, LISP, Prolog, Java, Perl, PostScript GUI development on Macintosh, Windows, X-Windows/Motif, HTML, JavaScript Relational database modeling with ORACLE, Access, SQL Network programming with TCP/IP, AppleTalk, DECnet Operating Systems UNIX, VMS, Macintosh, MS-DOS, Windows 3.1, Win/95 National Center for Biotechnology Information In order to pursue research in linguistics and Natural Language Processing, I moved from the fellowship at the NLM to the Computational Biology Branch of the NCBI, which is housed within the NLM. As a contractor supporting the research of Dr. John Wilbur, I am conducting original research in statistical NLP that will help users search and analyze the abstracts of biomedical research articles and data stored at the Library. I subsequently transitioned from contractor to staff scientist in order to continue with independent research projects. National Library of Medicine As an Medical Informatics Fellow at the National Library of Medicine I designed and prototyped a web-based service to provide in depth information to the public about genetic diseases and the impact of the Human Genome Project. This involved standard web and database systems development and a familiarization with medical genetics. Department of Defense As an applied mathematician at the DoD, I researched signals analysis techniques for analyzing voice signals, including formant and pitch parameter estimation, vowel, speaker, and gender identification, and theoretical models of speech production. National Cancer Institute As a research fellow in the Laboratory of Molecular Pharmacology working for Dr. John Weinstein, I developed web-based graphical interfaces to statistical data analysis tools. The statistical analysis was performed using SPlus running on an SGI that was accessed through cgi-bin programming. Biological data, such as drug activity and gene expression were generated and collected by scientists in the lab who then used the web-based tools to perform analysis and initiate new research. Another significant portion of my time was spent developing new mathematical methods for analyzing and visualizing this data. * Designed and developed a web site for the work group, including dynamic content displaying results of statistical analysis. Lab members and colleagues used the web site to conduct research in gene and drug interactions. * Conducted an exhaustive study to find the most effective methods for extracting data from gene expression microarrays. * Developed a technique for visualizing genes, drugs, and cells based on measurements of large numbers of attributes. * Assisted in the development of a web-based tool to increase the accuracy of PubMed in identifying literature abstracts relevant to gene and drug studies. Dow AgroSciences LLC Originally known as DowElanco, this is an agricultural chemicals company that started as a joint venture between Dow Chemical Company and Eli Lilly and Company. I helped design and develop a modular system to track chemical inventories and data obtained from diverse scientific testing, and to make this information available to chemists, biologists, and biochemists in the form of reports and ad hoc queries. The system is written in C and has had both a Macintosh and a Windows/95 graphical user interface (based on the portable tool Open Interface from Neuron Data), communicating to programs running on a VAX/VMS system housing an ORACLE database. All data is stored and retrieved using SQL. * Designed and developed a testing module to store biotechnology test results (e.g. electrophoresis) performed on genetically engineered samples. * Developed a client/server module that allows researchers in diverse disciplines to design a factorial experiment, conveniently combining up to four factors, and record routine experimental data for reporting and analysis. The first scientific discipline began using the system six months after the project started in 1995. Having a flexible design, incremental enhancements have been made since then to meet the needs of the remaining two disciplines. * Helped design a relational data model to store bioassay results. The model has also proven to be flexible enough to store micro-titre plate and biotechnology test data. * Worked with scientists to identify and give advice on required statistical calculations, which included parameter estimation using linear regression, probit analysis, and logistic curve fitting. Designed and developed the calculations as well as a method for rapid addition of new calculation routines, an interface for users to invoke calculations, and a method for interactively displaying graphs for the user to review showing raw data and superimposed curves. * Designed and developed a non-interactive program to validate and load MACCS formatted chemical structure data from batch files into a relational database system. By designing a configurable program, it has been possible to add new chemical data fields with minimal programming changes. Data Parallel Systems, Inc. DPSI was commissioned entirely by MasPar, Inc to develop a relational database system that took full advantage of MasPar's massively parallel (SIMD) computers. Although the database product is not currently marketed, it was sold to Wal-Mart Stores, Inc. and has been used there to perform data mining. * Developed a Unix-based Motif user interface for use in managing the company's parallel-computer relational database product. As the only employee assigned to the project, I was responsible for establishing requirements, designing, and implementing the interface while the database system was in its early stages of development. Dome Software, Inc. Originally named Scientific Software Products, this company's original mission was to develop an object-oriented database using ORACLE running on a VAX to store data and using AppleTalk to provide access to Macintosh computers running a graphical user interface. Although the product is not currently marketed, it formed the basis for the company winning several project contracts. * Supported internal staff on project contracts by providing training, technical and project organization advice, and pre-sales prototyping. * Developed product specifications for a network communication tool used to develop client-server applications combining the PC, Macintosh, VAX/VMS, and Unix and integrating DECnet, AppleTalk, and TCP/IP communication protocols. Contributed to programming, reference and tutorial documentation, marketing materials, and pre-sales prototypes. * Managed from one to three programmers to develop system enhancements. * Wrote technical reference and tutorial documentation for application developers using the company's object-oriented database. * Proposed a Macintosh/HyperCard scripting interface for the company's object-oriented database product. The interface was used for rapid prototyping and application development in project contracts with Boehringer-Mannheim Corporation and Eastman Kodak Company. * As part of a team of eight programmers, implemented system internals using C for a networked object-oriented database system. Micro Database Systems, Inc. Mdbs developed and marketed cross-platform database system development products, getting its start with CP/M on Z80 computers. Their KnowledgeMan was one of the original 'Works' systems, providing integrated database, spreadsheet, word processing, communications, and scripting. The company then branched into artificial intelligence by integrating an expert system with fuzzy sets. * Lobbied for and obtained support to implement system enhancements to an expert system development tool that made it possible for users to diagnose and correct their production rules when unexpected behavior was encountered. * Developed a cross-platform graphical user interface using the C curses library for editing a database of expert system production rules. * Provided technical advice on implementing fuzzy valued data types. A new programming system was subsequently designed in which all values were considered fuzzy sets, and all operations accepted and resulted in fuzzy sets. * Designed and developed an expert system inference engine using C, implementing forward and backward chaining. The inference engine was integrated into a spreadsheet program (KnowledgeMan) to become a new software product (Guru). Naval Ocean Systems Center This was a research lab of the US Navy. I was hired into a new professional demonstration program that provided for four assignments averaging three months each to get to know the center functions and to identify a position of greatest need. * Designed an expert system for assessing the results of on-going large scale simulations of a torpedo system. Implemented a prototype using LISP. * Developed software in Pascal to analyze data collected from under-water microphones and apply doppler-shift prediction of moving sound sources. * Developed a Prolog interpreter in LISP for use in benchmarking the performance of a super-computer applied to artificial intelligence algorithms, such as natural language processing. * Evaluated the theoretical merits of a proposed fuzzy-logic based expert system. As part of the evaluation, I developed a simplified prototype in Pascal which provided a working demonstration. Data General Corp. The facility that I worked at was a manufacturing plant for computer peripheral storage devices. I worked specifically in the diagnostic software group who had the responsibility for developing software to test all units produced on the assembly-line. All test programs were written in Assembler without any operating system support. The group also had responsibility for developing the in-house portion of a corporate-wide system to monitor operations. * Designed and developed a report generating tool using DG/L for accessing data from a real-time assembly-line monitoring system. * Created a system using Assembler and DG/L, that enabled engineers in a manufacturing environment to download diagnostic programs onto floppy disks and run them in stand-alone mode on remote Data General computers. PUBLICATIONS Smith LH, Wilbur JW. (2008) The value of parsing as feature generation for gene mention recognition. Biomedical Informatics, to appear. Smith LH, et al. (2008) Overview of BioCreative II Gene Mention Recognition. Genome Biology, 2008, 9(Suppl 2):S2. Wilbur JW, Smith LH and Tanabe LK. (2007) BioCreative 2. Gene Mention Task. In Proceedings of the Second BioCreative Challenge Evaluation Workshop, pp. 7-16. Madrid, April, 2007. Demner-Fushman D, et al. (2007) Combining Resources to Find Answers to Biomedical Questions. In Proceedings of the Text Retrieval Conference (TREC-2007). Demner-Fushman D, et al. (2006) Finding Relevant Passages in Scientific Articles: Fusion of Automatic Approaches vs. an Interactive Team Effort. In Proceedings of the Text Retrieval Conference (TREC-2006). Aronson AR, et al. (2006) Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents. In Proceedings of the Text Retrieval Conference (TREC-2005). Smith LH. (2005) On ordering free groups. Journal of Symbolic Computation, 40(6), 1285-1290. Smith LH, Tanabe L, Rindflesch T and Wilbur WJ. (2005) MedTag: A Collection of Biomedical Annotations. In Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 32-37. Detroit, June 2005. Smith L, Rindflesch T and Wilbur WJ. (2005) The Importance of the Lexicon in Tagging Biological Text. Natural Language Engineering, 12(2), 1-17. Smith, L. (2005) Review of "The Turing Test, Verbal Behavior as the Hallmark of Intelligence", by Stuart Shieber. Linguist List 16.1815. http://www.ling.ed.ac.uk/linguist/issues/16/16-1815.html Smith LH. (2005) "Exploratory Genomic Data Analysis", in Medical Informatics, Knowledge Management and Data Mining in Biomedicine, Chen, H. et al., Eds., Springer, 2005. Aronson AR, et al. (2005) Knowledge-intensive and statistical approaches to the retrieval and annotation of genomics MEDLINE citations. In Proceedings of the Text Retrieval Conference (TREC-2004), NIST Special Publication 500-261. Smith L and Wilbur WJ. (2004) Retrieving definitional content for ontology development. Computational Biology and Chemistry, 28(2004), 387-391. Smith L, Rindflesch T and Wilbur WJ. (2004) MedPost: a Part of Speech Tagger for Biomedical Text. Bioinformatics, 20(14), 2320-2321. Yeganova L, Smith L and Wilbur WJ. (2004) Identification of related gene/protein names based on an HMM of name variations. Computational Biology and Chemistry, 28(2), 97-107. Kayaalp M, Aronson AR, Humphrey SM, Ide NC, Tanabe LK, Smith LH, Demner D, Loane RR, Mork JG, Bodenreider O. (2003) Methods for accurate retrieval of MEDLINE citations in functional genomics. In Proceedings of the Text Retrieval Conference (TREC-2003), NIST Special Publication 500-255, pp. 441-450. Smith L, Yeganova L and Wilbur WJ. (2003) Hidden Markov models and optimized sequence alignment. Computational Biology and Chemistry, 27(1), 77-84. Lee JK, Bussey KJ, Gwadry FG, Reinhold W, Riddick G, Pelletier SL, Nishizuka S, Szakacs G, Annereau J-P, Shankavaram U, Lababidi S, Smith LH, Gottesman MM, Weinstein JN. (2003) Comparing cDNA and Oligonucleotide Array Data: Concordance of Gene Expression Across Platforms for the NCI-60 Cancer Cells. Genome Biol, 4(12), R82. Smith LH and Nelson DJ. (2002) Multiple-tube resonance model. In Proceedings of SPIE, 4791, pp. 33-42. Nelson DJ and Smith LH. (2002) Tuning Time-Frequency methods for the detection of metered HF speech. In Proceedings of SPIE, 4791, pp. 71-79. Zhou Y, Gwadry FG, Reinhold WC, Miller LD, Smith LH, Scherf U, Liu ET, Kohn KW, Pommier Y, Weinstein JN. (2002) Transcriptional Regulation of Mitotic Genes by Camptothecin-Induced DNA Damage: Microarray Analysis of Dose- and Time-Dependent Effects. Cancer Res, 62(6), 1688-1695. Weinstein JN, Scherf U, Lee JK, Nishizuka S, Gwadry F, Bussey AK, Kim S, Smith LH, Tanabe L, Richman S, Alexander J, Kouros-Mehr H, Maunakea A, Reinhold WC. (2002) The Bioinformatics of Microarray Gene Expression Profiling. Cytometry. 47(1), 46-49. Lee JK, Scherf U, Smith LH, Tanabe L, Weinstein JN. (2001) Analysis of gene expression data of the NCl 60 cancer cell lines using Bayesian hierarchical effects model. In Proceedings of SPIE, 4266, pp. 228-235. Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO and Weinstein JN. (2000) A Gene Expression Database for the Molecular Pharmacology of Cancer. Nature Genetics, 24(3), 236-244. Tanabe L, Smith LH, Lee JK, Scherf U, Hunter L and Weinstein JN. (1999) MedMiner: An internet tool for filtering and organizing biomedical information, with application to gene expression profiling. Biotechniques, 27, 1210-1217. Smith LH. (1998) "Computing Resolutions over Associative Algebras with Ordered Basis", PhD Dissertation, Indiana University.