Mining Cancer Clinical Notes
Use the Head and Neck Cancer Medication Data to to apply NLP/TM methods and investigate the corpus. You have already explored these data in Chapter 7. Now we need to go a step further.
- Use the
MEDICATION_SUMMARY
to construct a VCorpus object.
- Clean the VCorpus object.
- Build a document term matrix (DTM).
- Add a column to indicate early and later stage according to
seer_stage
, refer to Chapter 7.
- Use the DTM to construct a word cloud for early stage, later stage and the complete archive.
- Interpret the results of the three generated word clouds.
- Compute the TF-IDF(Term Frequency - Inverse Document Frequency).
- Apply LASSO on the unweighted and weighted DTM respectively and evaluate the results according to AUC.
- Try cosine similarity transformation, apply LASSO and compare the results.
- Use other measures such as “class” for
cv.glmnet()
.
- Does it appear that these automated machine learning methods understand well human language?
SOCR Resource Visitor
number