Mining Cancer Clinical Notes
Use Head and Neck Cancer Medication Data to to apply NLP/TM methods and investigate the information content. In Chapter 7, we already saw some preliminary TM analysis. Now we need to go further.
- Use
MEDICATION_SUMMARY
to construct a VCorpus object
- Clean the VCorpus object
- Build a document term matrix (DTM)
- Add a column to indicate early and later cancer stage according to
seer_stage
(refer to Chapter 7)
- Use the DTM to construct a wordcloud for early stage, later stage and the entire dataset
- Interpret the wordclouds
- Compute the TF-IDF (Term Frequency - Inverse Document Frequency)
- Apply LASSO on the unweighted and weighted DTM respectively and evaluate the results according to AUC
- Try the cosine similarity transformation, apply LASSO, and compare the results
- Use other measures such as “class” for
cv.glmnet()
- Does it appear that these classifiers may provide an automated machine interpretation of unstructured free text?
SOCR Resource Visitor
number