Working with website data
 
 Network data and visualization
- Download 03_les miserablese_GraphData.txt
 
- Visualize this undirected network graph
 
- Summarize the graph and explain the output
 
- Calculate the degree and the centrality of this graph
 
- Find out some important nodes (corresponding to novel characters)
 
- Will the results change if we assume the graph is directed?
 
 
 Data conversion and parallel computing
- Download CaseStudy12_ AdultsHeartAttack_Data.xlsx or require online
 
- Load this data as data frame
 
- Use 
Export() or write.xlsx() to renew the xlsx file 
- Use 
rio package to convert this “.xlsx”" file to “.csv” 
- Generate generalizing tabular data structures
 
- Generate a 
data.table 
- Create disk-based data frames and perform basic calculation
 
- Perform basic calculation on the last 5 columns as a big matrix
 
- Use DIAGNOSIS, SEX, DRG, CHARGES, LOS and AGE to predict DIED with randomForest setting 
ntree=20000. Notice: sample without replacement to get as large as possible balanced dataset 
- Run 
train() in caret and detect the execution time 
- Detect cores and make proper number of clusters
 
- Rerun 
train() parallelized and compare the execute time 
- Use 
foreach and doMC to design a parallelized random forest with ntree=20000 and compare the execution time against linear sequential execution. 
 
	SOCR Resource Visitor
	number