One of the world's top-2 pharmaceutical companies.
The Company conducts clinical trials on human subjects to answer specific questions on the effect of newly discovered drugs and vaccines, generating safety and efficacy data. During the trial, the primary investigators are required to document and file the details of the trial, as a Clinical Trial Protocol document. The client requires to do extract structured information from documents amenable for rapid searching and advanced analytics. Another challenge was that there was no uniformity in the structure, content or style across the clinical trial protocol documents prepared in different parts of the world
We designed & built an algorithm for text mining and pattern recognition. The model was able to extract values for all key parameters provided, and generatea single consolidated spreadsheet from thousands of free-text documents.
PDF-XML, PDF-Text, R