SOCR ≫ DSPA ≫ Topics ≫

1 Working with website data

2 Network data and visualization

  • Download 03_les miserablese_GraphData.txt
  • Visualize this undirected network graph
  • Summarize the graph and explain the output
  • Calculate the degree and the centrality of this graph
  • Find out some important nodes (corresponding to novel characters)
  • Will the results change if we assume the graph is directed?

3 Data conversion and parallel computing

  • Download CaseStudy12_ AdultsHeartAttack_Data.xlsx or require online
  • Load this data as data frame
  • Use Export() or write.xlsx() to renew the xlsx file
  • Use rio package to convert this “.xlsx”" file to “.csv”
  • Generate generalizing tabular data structures
  • Generate a data.table
  • Create disk-based data frames and perform basic calculation
  • Perform basic calculation on the last 5 columns as a big matrix
  • Use DIAGNOSIS, SEX, DRG, CHARGES, LOS and AGE to predict DIED with randomForest setting ntree=20000. Notice: sample without replacement to get as large as possible balanced dataset
  • Run train() in caret and detect the execution time
  • Detect cores and make proper number of clusters
  • Rerun train() parallelized and compare the execute time
  • Use foreach and doMC to design a parallelized random forest with ntree=20000 and compare the execution time against linear sequential execution.

SOCR Resource Visitor number Dinov Email