SOCR ≫ | DSPA ≫ | Topics ≫ |
Use the SOCR Jobs Data to practice learning via Apriori Association Rules
Focus on the Description
feature. Replace all underscore characters “_" with spaces.
Review chapter 7, use tm
package to process text data to plain text (Hint: need to apply stemDocument
as well, we will discuss more details in chapter 19).
Generate a “transaction” matrix by considering each job as one record and description words as “transaction” items. (Hint: You need to fill missing values since records do not have the same length of description.)
Save the data using write.csv()
and then use read.transactions()
in arules
package to read the CSV data file. Visualize the item support using item frequency plots. What terms appear as more popular?
Fit a model: myrules <- apriori(data=jobs,parameter=list(support=0.02, confidence=0.6, minlen=2))
. Try out several rule thresholds trading off gain and accuracy.
Evaluate the rules you obtained with lift
and visualize their metics.
Mine medical related rules(e.g.,rules include “treatment”, “patient”, “care”, “diagnos”. Notice: these are word stems).
Sort the set of association rules for all and medical related subsets.
Save these rules into a CSV file.