Working with website data
Network data and visualization
- Download 03_les miserablese_GraphData.txt
- Visualize this undirected network graph
- Summarize the graph and explain the output
- Calculate the degree and the centrality of this graph
- Find out some important nodes (corresponding to novel characters)
- Will the results change if we assume the graph is directed?
Data conversion and parallel computing
- Download CaseStudy12_ AdultsHeartAttack_Data.xlsx or require online
- Load this data as data frame
- Use
Export()
or write.xlsx()
to renew the xlsx file
- Use
rio
package to convert this “.xlsx”" file to “.csv”
- Generate generalizing tabular data structures
- Generate a
data.table
- Create disk-based data frames and perform basic calculation
- Perform basic calculation on the last 5 columns as a big matrix
- Use DIAGNOSIS, SEX, DRG, CHARGES, LOS and AGE to predict DIED with randomForest setting
ntree=20000
. Notice: sample without replacement to get as large as possible balanced dataset
- Run
train()
in caret
and detect the execution time
- Detect cores and make proper number of clusters
- Rerun
train()
parallelized and compare the execute time
- Use
foreach
and doMC
to design a parallelized random forest with ntree=20000
and compare the execution time against linear sequential execution.
SOCR Resource Visitor
number