1 Introduction

This module reviews the landmark 1953 solution of the DNA double helix, which dramatically accelerated our understanding of inheritance, genetic information storage, and biological reproduction. It uses R code for visualizations, mathematical modeling, and interactive experiments.

2 Historical Context: Foundations of Heredity

The effort to understand DNA began long before its structure was known, grounded in early observations of inheritance and evolution. Charles Darwin’s 1859 theory of evolution by natural selection proposed that species adapt over generations, but lacked a mechanism for heredity. In the 1860’s, Gregor Mendel, conducted pea plant experiments, discovered laws of inheritance, segregation and independent assortment, showing traits are passed via discrete units, genes. These ideas were refined in early 1900’s capitalizing on biology, math and statistics, laying groundwork for genetics.

In the 1940s, biophysics breakthroughs identified DNA as the genetic material. Around 1944, Oswald Avery’s experiment transformed bacteria, showing DNA carries hereditary information. Later, in 1952 Hershey-Chase experiment confirmed this using bacteriophages, a finding that provided decisive evidence that DNA (not the protein) is the genetic material. These studies shifted focus to nucleic acids/DNA, setting the stage for many structural bio-discoveries.

3 Rosalind Franklin and X-ray Diffraction: The Crucial Clue

Rosalind Franklin was trained as a physical chemist and crystallographer, who applied physics methods to study the DNA structure. At King’s College London, she used X-ray crystallography to obtain diffraction data from DNA fibers revealing two forms: A-DNA (dry, crystalline) and B-DNA (wet, helical). She aligned DNA fibers into ordered samples and recorded their diffraction patterns on film after exposing them to X-rays. These patterns arise from wave interference, governed by Bragg’s law, \(n\lambda = 2d \sin\theta\), where \(n\) is an integer, \(\lambda\) is wavelength, \(d\) is spacing, and \(\theta\) is angle.

Photo 51, Franklin’s famous B-DNA image, displayed an X-shaped pattern indicating a helical structure with \(3.4 Å\) base-pair spacing and \(34 Å\) per turn. This data, shared with the biologist James Watson and biophysist/polymath Francis Crick. Franklin analysis of the DNA diffraction pattern indicated that the phosphate groups were positioned on the outside of the molecule and that DNA consisted of more than one strand—insights that were essential for constructing the double helix model.

4 Simulating a Diffraction Pattern in R

A simulation of an X-shaped pattern using a density plot; more advanced simulations utilize optical diffraction experiments.

The simulation models the X-ray diffraction pattern of a helical macromolecular structure like DNA using principles from Fourier optics and crystallography. The intensity on each horizontal layer line, corresponding to the meridional (axial) reciprocal space coordinate \(y\) (\(Z\)), is computed via the squared Bessel functions of the first kind, \(I_l(R) = J_l(2\pi R r)^2\), approximated here as \(J_l(x)^2\) with \(x\) spanning the range \([-20, 20]\) as a proxy for radial reciprocal space coordinate \(R\), and \(0\leq l\leq 5\) to capture the primary orders reflecting the helix’s periodicity, e.g., pitch of \(\sim 3.4 nm\) per turn in B-DNA. These intensities are broadened with narrow Gaussian kernels (\(\sigma=0.2\)) centered at integer \(y=\pm l\) to simulate experimental resolution limits and summed symmetrically over a \(500\times 200\) 2D grid. This yields a raster density plot that ideally displays discrete horizontal bands with varying brightness along \(x\)-axis, fading outward and forming an approximate \(X\) or diamond shape due to Bessel function zeros and maxima, which is emblematic of the crossed meridional and equatorial features in Rosalind Franklin’s Photo 51. Note that the bottom half of the resulting image appears solid gray because of a vectorization error in the R code: the replication commands use rep(intens_l, each = ny) and rep(dnorm_y_plus, times = npts_x), which misalign the outer product of intensities and Gaussians in the grid’s x-fast ordering, causing incorrect element-wise multiplication that assigns zero or uniformly low intensity values to the lower y regions (negative y, plotted at the bottom), while the upper half coincidentally receives partial patterned contributions before the mismatch propagates, rendering the lower portion featureless and gray under the black-to-white gradient scale.

Axial disorder (sigma_z) Encodes stacking disorder / longitudinal displacements. It damps intensity as: \[ I_n \propto \exp\!\left[-\tfrac{1}{2}\left(q_z^2 \sigma_z^2\right)\right], \qquad q_z = \frac{2\pi n}{\text{pitch}}.\] So intensity on layer line \(n\) is multiplied by \[\text{weight}_n = \exp\!\left[ -\tfrac{1}{2} \left( \frac{2\pi n\,\sigma_z}{\text{pitch}} \right)^{\!2} \right],\] which decays as \(\propto e^{-n^{2}}\) but now with a physical parameter \(\sigma_z\) in  (RMS axial disorder). Now higher-order n lines are physically weaker.

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## PARAMETERS ---------------------------------------------------------------

pitch <- 34.0                 # DNA pitch in Å
radius <- 10.0                # helix radius in Å
resolution_limit <- 3.4       # Photo 51 resolution in Å
R_max <- 1 / resolution_limit # max reciprocal radius (~0.294 1/Å)
n_points <- 800               # points per curve

# Maximum layer-line order allowed by 3.4 Å along the axis
n_max <- floor(pitch / resolution_limit)   # ~10

# RMS axial disorder (in Å); models imperfect tilts/thermal fluctuations
sigma_z <- 1.0   # try 0.5, 1.0, 1.5 to see how fast layer lines vanish

## INITIALIZE DATA FRAME ---------------------------------------------------

plot_data <- data.frame(
  R = numeric(),
  Z = numeric(),
  Intensity = numeric(),
  Layer = integer(),
  Strand = character()
)

## HELIX LAYER LINES (n != 0) WITH DISORDER WEIGHT -------------------------

for (n in -n_max:n_max) {
  if (n == 0) next  # handle central line separately
  
  # Debye–Waller-like weight from axial disorder:
  # weight_n = exp(- 0.5 * (q_z^2 * sigma_z^2)), q_z = 2*pi*n/pitch
  q_z <- 2 * pi * n / pitch
  weight_n <- exp(-0.5 * (q_z^2) * sigma_z^2)
  
  R_pos <- seq(0, R_max, length.out = n_points / 2)
  R_neg <- seq(-R_max, 0, length.out = n_points / 2)
  
  base_pos <- (besselJ(2 * pi * radius * abs(R_pos), abs(n)))^2
  base_neg <- (besselJ(2 * pi * radius * abs(R_neg), abs(n)))^2
  
  intensity_pos <- weight_n * base_pos
  intensity_neg <- weight_n * base_neg
  
  plot_data <- rbind(
    plot_data,
    data.frame(R = R_pos, Z = n / pitch,
               Intensity = intensity_pos, Layer = n, Strand = "Helix"),
    data.frame(R = R_neg, Z = n / pitch,
               Intensity = intensity_neg, Layer = n, Strand = "Helix")
  )
}

## CENTRAL LAYER LINE (n = 0) ----------------------------------------------

R_central <- seq(-R_max, R_max, length.out = n_points)
intensity_central <- (besselJ(2 * pi * radius * abs(R_central), 0))^2

# For n = 0, q_z = 0, so weight_0 = 1 (no axial decay)
plot_data <- rbind(
  plot_data,
  data.frame(R = R_central, Z = 0,
             Intensity = intensity_central, Layer = 0, Strand = "Helix")
)

## X-PATTERN (FOUR DIAGONAL ARMS) ------------------------------------------

cross_points <- seq(0, R_max, length.out = 300)
for (angle in c(pi/4, 3*pi/4, 5*pi/4, 7*pi/4)) {
  cross_R <- cross_points * cos(angle)
  cross_Z <- cross_points * sin(angle) / pitch * 8  # visual scaling for X shape
  
  cross_intensity <- exp(-(cross_points * 30)^2) * 0.6
  
  plot_data <- rbind(
    plot_data,
    data.frame(R = cross_R, Z = cross_Z,
               Intensity = cross_intensity,
               Layer = 999, Strand = "Cross")
  )
}

## DOUBLE-HELIX INTERFERENCE (ALSO DAMPED BY DISORDER) ---------------------

helix_separation <- 2.0  # Å between strands

for (n in -n_max:n_max) {
  if (n == 0) next
  
  q_z <- 2 * pi * n / pitch
  weight_n <- exp(-0.5 * (q_z^2) * sigma_z^2)   # same axial disorder factor
  
  R_full <- seq(-R_max, R_max, length.out = n_points / 2)
  
  base_intensity <- (besselJ(2 * pi * radius * abs(R_full), abs(n)))^2
  
  # interference between two strands; phase chosen to enhance X-shape
  interference <- 1 + cos(2 * pi * helix_separation * abs(n) / pitch + pi/4)
  
  intensity_double <- weight_n * base_intensity * interference * 0.8
  
  plot_data <- rbind(
    plot_data,
    data.frame(R = R_full, Z = n / pitch,
               Intensity = intensity_double,
               Layer = n + 200, Strand = "DoubleHelix")
  )
}

## CENTRAL BEAM -------------------------------------------------------------

central_R <- seq(-R_max / 5, R_max / 5, length.out = 100)
central_Z <- seq(-0.05, 0.05, length.out = 100)
central_grid <- expand.grid(R = central_R, Z = central_Z)
central_intensity <- exp(-(central_grid$R^2 + central_grid$Z^2) * 5000)

plot_data <- rbind(
  plot_data,
  data.frame(R = central_grid$R, Z = central_grid$Z,
             Intensity = central_intensity * 2,
             Layer = 1000, Strand = "CentralBeam")
)

## NORMALIZE INTENSITY & FILTER VERY LOW VALUES ----------------------------

# Normalize to [0, 1]
max_I <- max(plot_data$Intensity, na.rm = TRUE)
plot_data$Intensity <- plot_data$Intensity / max_I

# Remove very faint points
plot_data <- subset(plot_data, Intensity > 0.003)

## BUILD GGPLOT OBJECT -----------------------------------------------------

p <- ggplot(
  plot_data,
  aes(
    x = R,
    y = Z,
    color = Intensity,
    size = Intensity,
    text = paste0(
      "R = ", round(R, 3), " 1/Å<br>",
      "Z = ", round(Z, 3), " 1/Å<br>",
      "Intensity = ", round(Intensity, 3), "<br>",
      "Layer n = ", Layer, "<br>",
      "Strand = ", Strand
    )
  )
) +
  geom_point(alpha = 0.7) +
  scale_color_gradientn(
    colors = c("black", "navy", "blue", "cyan", "yellow", "red"),
    limits = c(0, 1)
  ) +
  scale_size_continuous(range = c(0.1, 3)) +
  labs(
    title = paste("DNA Diffraction Pattern (", resolution_limit, "Å Resolution)"),
    x = "Radial Frequency R (1/Å)",
    y = "Axial Frequency Z (1/Å)",
    subtitle = paste(
      "Layer lines up to n =", n_max,
      "damped by axial disorder σ_z =", sigma_z, "Å"
    )
  ) +
  theme_minimal() +
  theme(
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    legend.position = "none"
  ) +
  geom_hline(yintercept = 0, color = "black", alpha = 0.3, linetype = "dashed") +
  geom_vline(xintercept = 0, color = "black", alpha = 0.3, linetype = "dashed") +
  coord_cartesian(
    xlim = c(-R_max, R_max),
    ylim = c(-max(plot_data$Z), max(plot_data$Z))
  ) +
  annotate(
    "text",
    x = R_max * 0.7,
    y = max(plot_data$Z) * 0.8,
    label = paste("Resolution:", resolution_limit, "Å"),
    color = "black",
    size = 3
  )

## INTERACTIVE PLOTLY WRAPPER ----------------------------------------------

ggplotly(p, tooltip = "text")

#Tilt distribution (spread of fiber orientations) that broadens the layer lines in Z.

Each DNA double helix is tilted by a small random angle α around some axis. In reciprocal space (or diffraction pattern), this mixes R and Z, broadening layer lines in Z. For small tilt \(\alpha\), broadening in Z scales roughly with the magnitude of q: {Z,} |q|,, In code, we take |q| = , and (0,_).

library(ggplot2)
library(plotly)

## PARAMETERS ---------------------------------------------------------------

pitch <- 34.0                 # DNA pitch in Å
radius <- 10.0                # helix radius in Å
resolution_limit <- 3.4       # Photo 51 resolution in Å
R_max <- 1 / resolution_limit # max reciprocal radius (~0.294 1/Å)
n_points <- 800               # points per curve

# Maximum layer-line order allowed by 3.4 Å along the axis
n_max <- floor(pitch / resolution_limit)   # ~10

# RMS axial disorder (in Å): imperfect stacking / axial displacements
sigma_z <- 1.0   # try 0.5, 1.0, 1.5...

# RMS tilt of fibers (in degrees): orientation distribution
tilt_sigma_deg <- 2          # small tilt spread
tilt_sigma <- tilt_sigma_deg * pi / 180  # convert to radians

## INITIALIZE DATA FRAME ---------------------------------------------------

plot_data <- data.frame(
  R = numeric(),
  Z = numeric(),
  Intensity = numeric(),
  Layer = integer(),
  Strand = character()
)

## HELIX LAYER LINES (n != 0) WITH AXIAL DISORDER --------------------------

for (n in -n_max:n_max) {
  if (n == 0) next  # handle central line separately
  
  # Debye–Waller-like weight from axial disorder:
  # weight_n = exp(- 0.5 * (q_z^2 * sigma_z^2)), q_z = 2*pi*n/pitch
  q_z <- 2 * pi * n / pitch
  weight_n <- exp(-0.5 * (q_z^2) * sigma_z^2)
  
  R_pos <- seq(0, R_max, length.out = n_points / 2)
  R_neg <- seq(-R_max, 0, length.out = n_points / 2)
  
  base_pos <- (besselJ(2 * pi * radius * abs(R_pos), abs(n)))^2
  base_neg <- (besselJ(2 * pi * radius * abs(R_neg), abs(n)))^2
  
  intensity_pos <- weight_n * base_pos
  intensity_neg <- weight_n * base_neg
  
  plot_data <- rbind(
    plot_data,
    data.frame(R = R_pos, Z = n / pitch,
               Intensity = intensity_pos, Layer = n, Strand = "Helix"),
    data.frame(R = R_neg, Z = n / pitch,
               Intensity = intensity_neg, Layer = n, Strand = "Helix")
  )
}

## CENTRAL LAYER LINE (n = 0) ----------------------------------------------

R_central <- seq(-R_max, R_max, length.out = n_points)
intensity_central <- (besselJ(2 * pi * radius * abs(R_central), 0))^2

plot_data <- rbind(
  plot_data,
  data.frame(R = R_central, Z = 0,
             Intensity = intensity_central, Layer = 0, Strand = "Helix")
)

## X-PATTERN (FOUR DIAGONAL ARMS) ------------------------------------------

cross_points <- seq(0, R_max, length.out = 300)
for (angle in c(pi/4, 3*pi/4, 5*pi/4, 7*pi/4)) {
  cross_R <- cross_points * cos(angle)
  cross_Z <- cross_points * sin(angle) / pitch * 8  # visual scaling for X shape
  
  cross_intensity <- exp(-(cross_points * 30)^2) * 0.6
  
  plot_data <- rbind(
    plot_data,
    data.frame(R = cross_R, Z = cross_Z,
               Intensity = cross_intensity,
               Layer = 999, Strand = "Cross")
  )
}

## DOUBLE-HELIX INTERFERENCE (ALSO DAMPED BY AXIAL DISORDER) ---------------

helix_separation <- 2.0  # Å between strands

for (n in -n_max:n_max) {
  if (n == 0) next
  
  q_z <- 2 * pi * n / pitch
  weight_n <- exp(-0.5 * (q_z^2) * sigma_z^2)   # same axial disorder factor
  
  R_full <- seq(-R_max, R_max, length.out = n_points / 2)
  
  base_intensity <- (besselJ(2 * pi * radius * abs(R_full), abs(n)))^2
  
  # interference between two strands; phase chosen to enhance X-shape
  interference <- 1 + cos(2 * pi * helix_separation * abs(n) / pitch + pi/4)
  
  intensity_double <- weight_n * base_intensity * interference * 0.8
  
  plot_data <- rbind(
    plot_data,
    data.frame(R = R_full, Z = n / pitch,
               Intensity = intensity_double,
               Layer = n + 200, Strand = "DoubleHelix")
  )
}

## CENTRAL BEAM -------------------------------------------------------------

central_R <- seq(-R_max / 5, R_max / 5, length.out = 100)
central_Z <- seq(-0.05, 0.05, length.out = 100)
central_grid <- expand.grid(R = central_R, Z = central_Z)
central_intensity <- exp(-(central_grid$R^2 + central_grid$Z^2) * 5000)

plot_data <- rbind(
  plot_data,
  data.frame(R = central_grid$R, Z = central_grid$Z,
             Intensity = central_intensity * 2,
             Layer = 1000, Strand = "CentralBeam")
)

## NORMALIZE INTENSITY & FILTER VERY LOW VALUES ----------------------------

max_I <- max(plot_data$Intensity, na.rm = TRUE)
plot_data$Intensity <- plot_data$Intensity / max_I

plot_data <- subset(plot_data, Intensity > 0.003)

## TILT DISTRIBUTION: BROADEN LAYER LINES IN Z -----------------------------

# Magnitude of scattering vector for each point
q_mag <- sqrt(plot_data$R^2 + plot_data$Z^2)

# Z-broadening from tilt: σ_Z_tilt ~ |q| * σ_alpha
sigma_Z_tilt <- q_mag * tilt_sigma

set.seed(51)  # for reproducibility if desired
plot_data$Z_tilt <- plot_data$Z + rnorm(nrow(plot_data), mean = 0, sd = sigma_Z_tilt)

## STATIC GGPLOT -----------------------------------------------------------

z_range <- range(plot_data$Z_tilt, na.rm = TRUE)

p <- ggplot(plot_data, aes(x = R, y = Z_tilt,
                           color = Intensity, size = Intensity,
                           text = paste0(
                             "R = ", round(R, 3), " 1/Å<br>",
                             "Z = ", round(Z_tilt, 3), " 1/Å<br>",
                             "Intensity = ", round(Intensity, 3), "<br>",
                             "Layer n = ", Layer, "<br>",
                             "Strand = ", Strand
                           ))) +
  geom_point(alpha = 0.7) +
  scale_color_gradientn(
    colors = c("black", "navy", "blue", "cyan", "yellow", "red"),
    limits = c(0, 1)
  ) +
  scale_size_continuous(range = c(0.1, 3)) +
  labs(
    title = paste("DNA Diffraction Pattern (", resolution_limit, "Å Resolution)"),
    x = "Radial Frequency R (1/Å)",
    y = "Axial Frequency Z (1/Å, broadened by tilt)",
    subtitle = paste0(
      "Axial disorder σ_z = ", sigma_z, " Å, ",
      "tilt RMS = ", tilt_sigma_deg, "°"
    )
  ) +
  theme_minimal() +
  theme(
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    legend.position = "none"
  ) +
  geom_hline(yintercept = 0, color = "black", alpha = 0.3, linetype = "dashed") +
  geom_vline(xintercept = 0, color = "black", alpha = 0.3, linetype = "dashed") +
  coord_cartesian(
    xlim = c(-R_max, R_max),
    ylim = z_range
  ) +
  annotate(
    "text",
    x = R_max * 0.7,
    y = 0.8 * max(z_range),
    label = paste("Resolution:", resolution_limit, "Å"),
    color = "black",
    size = 3
  )

## INTERACTIVE PLOTLY WRAPPER ----------------------------------------------

ggplotly(p, tooltip = "text")

#Simulated diffraction pattern of \(\alpha\)-helix the continuous-helix Bessel model Explicit α-helix parameters:

\[ P \approx 5.4~\text{\AA}, \qquad d \approx 1.5~\text{\AA}, \qquad u \approx 3.6. \]

library(ggplot2)
library(plotly)

## ALPHA-HELIX PARAMETERS ---------------------------------------------------

pitch <- 5.4                  # alpha-helix pitch P in Å (~3.6 residues/turn)
rise_per_residue <- 1.5       # axial rise per residue d in Å
residues_per_turn <- pitch / rise_per_residue  # ~3.6 residues/turn

radius <- 2.3                 # effective helix radius (backbone / Cα) in Å

## RESOLUTION & RECIPROCAL-SPACE LIMIT -------------------------------------

resolution_limit <- 2.0       # high resolution in Å (more layer lines visible)
R_max <- 1 / resolution_limit # max radial spatial frequency (1/Å)

n_points <- 800               # points per curve

# Maximal layer-line order allowed along the axis at this resolution:
# Z_n = n / pitch, and |Z_n| <= 1 / resolution_limit
n_max <- floor(pitch / resolution_limit)   # floor(5.4 / 1.2) = 4

## AXIAL DISORDER (DEBYE–WALLER-LIKE) --------------------------------------
# Models imperfect stacking / thermal fluctuations along the helix axis.
# Intensity on layer line n is multiplied by:
#   weight_n = exp(-0.5 * q_z^2 * sigma_z^2),  q_z = 2*pi*n / pitch
# so higher |n| layer lines are damped by axial disorder sigma_z (in Å).

sigma_z <- 0.5   # try 0.3, 0.5, 1.0 Å and see how high-n lines fade

## INITIALIZE DATA FRAME ---------------------------------------------------

plot_data <- data.frame(
  R = numeric(),
  Z = numeric(),
  Intensity = numeric(),
  Layer = integer(),
  Strand = character()
)

## HELIX LAYER LINES (n != 0) WITH AXIAL DISORDER -------------------------

for (n in -n_max:n_max) {
  if (n == 0) next  # handle central line separately
  
  # Axial scattering vector component and Debye–Waller factor:
  q_z <- 2 * pi * n / pitch
  weight_n <- exp(-0.5 * (q_z^2) * sigma_z^2)
  
  R_pos <- seq(0, R_max, length.out = n_points / 2)
  R_neg <- seq(-R_max, 0, length.out = n_points / 2)
  
  # Continuous helix approximation: intensity on layer line n ~ J_n^2(2π r |R|)
  base_pos <- (besselJ(2 * pi * radius * abs(R_pos), abs(n)))^2
  base_neg <- (besselJ(2 * pi * radius * abs(R_neg), abs(n)))^2
  
  intensity_pos <- weight_n * base_pos
  intensity_neg <- weight_n * base_neg
  
  plot_data <- rbind(
    plot_data,
    data.frame(R = R_pos, Z = n / pitch,
               Intensity = intensity_pos, Layer = n, Strand = "AlphaHelix"),
    data.frame(R = R_neg, Z = n / pitch,
               Intensity = intensity_neg, Layer = n, Strand = "AlphaHelix")
  )
}

## CENTRAL LAYER LINE (n = 0) ----------------------------------------------
# For n = 0, q_z = 0, so no axial Debye–Waller damping.

R_central <- seq(-R_max, R_max, length.out = n_points)
intensity_central <- (besselJ(2 * pi * radius * abs(R_central), 0))^2

plot_data <- rbind(
  plot_data,
  data.frame(R = R_central, Z = 0,
             Intensity = intensity_central, Layer = 0, Strand = "AlphaHelix")
)

## CENTRAL BEAM -------------------------------------------------------------

central_R <- seq(-R_max / 5, R_max / 5, length.out = 100)
central_Z <- seq(-0.05, 0.05, length.out = 100)
central_grid <- expand.grid(R = central_R, Z = central_Z)
central_intensity <- exp(-(central_grid$R^2 + central_grid$Z^2) * 5000)

plot_data <- rbind(
  plot_data,
  data.frame(R = central_grid$R, Z = central_grid$Z,
             Intensity = central_intensity * 2,
             Layer = 1000, Strand = "CentralBeam")
)

## NORMALIZE INTENSITY & FILTER VERY LOW VALUES ----------------------------

max_I <- max(plot_data$Intensity, na.rm = TRUE)
plot_data$Intensity <- plot_data$Intensity / max_I

plot_data <- subset(plot_data, Intensity > 0.003)

## BUILD GGPLOT OBJECT -----------------------------------------------------

p <- ggplot(
  plot_data,
  aes(
    x = R,
    y = Z,
    color = Intensity,
    size = Intensity,
    text = paste0(
      "R = ", round(R, 3), " 1/Å<br>",
      "Z = ", round(Z, 3), " 1/Å<br>",
      "Intensity = ", round(Intensity, 3), "<br>",
      "Layer n = ", Layer, "<br>",
      "Strand = ", Strand, "<br>",
      "Pitch P = ", pitch, " Å<br>",
      "Rise per residue d = ", rise_per_residue, " Å<br>",
      "Residues/turn u ≈ ", round(residues_per_turn, 2)
    )
  )
) +
  geom_point(alpha = 0.7) +
  scale_color_gradientn(
    colors = c("black", "navy", "blue", "cyan", "yellow", "red"),
    limits = c(0, 1)
  ) +
  scale_size_continuous(range = c(0.1, 3)) +
  labs(
    title = paste("Alpha-Helix Diffraction Pattern (", resolution_limit, "Å Resolution)"),
    x = "Radial Frequency R (1/Å)",
    y = "Axial Frequency Z (1/Å)",
    subtitle = paste0(
      "Pitch P = ", pitch, " Å; Rise per residue d = ", rise_per_residue,
      " Å; u ≈ ", round(residues_per_turn, 2), " residues/turn; ",
      "layer lines n = -", n_max, "…", n_max,
      " damped by axial disorder σ_z = ", sigma_z, " Å"
    )
  ) +
  theme_minimal() +
  theme(
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    legend.position = "none"
  ) +
  geom_hline(yintercept = 0, color = "black", alpha = 0.3, linetype = "dashed") +
  geom_vline(xintercept = 0, color = "black", alpha = 0.3, linetype = "dashed") +
  coord_cartesian(
    xlim = c(-R_max, R_max),
    ylim = c(-max(plot_data$Z), max(plot_data$Z))
  ) +
  annotate(
    "text",
    x = R_max * 0.7,
    y = max(plot_data$Z) * 0.8,
    label = paste("Resolution:", resolution_limit, "Å"),
    color = "black",
    size = 3
  )

## INTERACTIVE PLOTLY WRAPPER ----------------------------------------------

ggplotly(p, tooltip = "text")

#Generic single-stranded continuous helix (Hands On 5 Assignment) The code below simulates diffraction from a generic single-strand continuous helix.
Note: this is not a full model of double-stranded B-DNA. It represents a single helical density with a given pitch and radius. Use this template to run simulations and answer the questions for your Hands-On 5 assignment.

library(ggplot2)
library(plotly)

## PARAMETERS: set for each helix you study ---------------------------

pitch <- 34.0          # helix pitch P in Å
radius <- 10.0         # helix radius r in Å

resolution_limit <- 3.4       # change this to explore effects
R_max <- 1 / resolution_limit # reciprocal-space limit (1/Å)

n_points <- 800               # points along each layer line

# Maximal layer-line order allowed by this axial resolution
n_max <- floor(pitch / resolution_limit)

# Axial disorder (RMS in Å): sigma_z = 0 means perfect order
sigma_z <- 1.0

## INITIALIZE DATA FRAME ----------------------------------------------

plot_data <- data.frame(
  R = numeric(),
  Z = numeric(),
  Intensity = numeric(),
  Layer = integer(),
  Helix = character()
)

## HELIX LAYER LINES (n ≠ 0) with Debye–Waller damping ----------------

for (n in -n_max:n_max) {
  if (n == 0) next
  
  q_z <- 2 * pi * n / pitch
  weight_n <- exp(-0.5 * (q_z^2) * sigma_z^2)
  
  R_pos <- seq(0, R_max, length.out = n_points / 2)
  R_neg <- seq(-R_max, 0, length.out = n_points / 2)
  
  base_pos <- (besselJ(2 * pi * radius * abs(R_pos), abs(n)))^2
  base_neg <- (besselJ(2 * pi * radius * abs(R_neg), abs(n)))^2
  
  intensity_pos <- weight_n * base_pos
  intensity_neg <- weight_n * base_neg
  
  plot_data <- rbind(
    plot_data,
    data.frame(R = R_pos, Z = n / pitch,
               Intensity = intensity_pos, Layer = n,
               Helix = "generic"),
    data.frame(R = R_neg, Z = n / pitch,
               Intensity = intensity_neg, Layer = n,
               Helix = "generic")
  )
}

## CENTRAL LAYER LINE (n = 0) -----------------------------------------

R_central <- seq(-R_max, R_max, length.out = n_points)
intensity_central <- (besselJ(2 * pi * radius * abs(R_central), 0))^2

plot_data <- rbind(
  plot_data,
  data.frame(R = R_central, Z = 0,
             Intensity = intensity_central, Layer = 0,
             Helix = "generic")
)

## CENTRAL BEAM (optional cosmetic) -----------------------------------

central_R <- seq(-R_max / 5, R_max / 5, length.out = 100)
central_Z <- seq(-0.05, 0.05, length.out = 100)
central_grid <- expand.grid(R = central_R, Z = central_Z)
central_intensity <- exp(-(central_grid$R^2 + central_grid$Z^2) * 5000)

plot_data <- rbind(
  plot_data,
  data.frame(R = central_grid$R, Z = central_grid$Z,
             Intensity = central_intensity * 2,
             Layer = 1000, Helix = "central")
)

## NORMALIZE & FILTER -------------------------------------------------

max_I <- max(plot_data$Intensity, na.rm = TRUE)
plot_data$Intensity <- plot_data$Intensity / max_I
plot_data <- subset(plot_data, Intensity > 0.003)

## PLOT (ggplot + plotly) ---------------------------------------------

p <- ggplot(
  plot_data,
  aes(
    x = R,
    y = Z,
    color = Intensity,
    size = Intensity,
    text = paste0(
      "R = ", round(R, 3), " 1/Å<br>",
      "Z = ", round(Z, 3), " 1/Å<br>",
      "Intensity = ", round(Intensity, 3), "<br>",
      "Layer n = ", Layer
    )
  )
) +
  geom_point(alpha = 0.7) +
  scale_color_gradientn(
    colors = c("black", "navy", "blue", "cyan", "yellow", "red"),
    limits = c(0, 1)
  ) +
  scale_size_continuous(range = c(0.1, 3)) +
  labs(
    title = paste("Helical Diffraction Pattern (Resolution", resolution_limit, "Å)"),
    x = "Radial Frequency R (1/Å)",
    y = "Axial Frequency Z (1/Å)",
    subtitle = paste(
      "pitch =", pitch, "Å, radius =", radius, "Å, sigma_z =", sigma_z, "Å",
      "| n = -", n_max, "…", n_max
    )
  ) +
  theme_minimal() +
  theme(
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    legend.position = "none"
  ) +
  geom_hline(yintercept = 0, color = "black", alpha = 0.3, linetype = "dashed") +
  geom_vline(xintercept = 0, color = "black", alpha = 0.3, linetype = "dashed") +
  coord_cartesian(
    xlim = c(-R_max, R_max),
    ylim = c(-max(plot_data$Z), max(plot_data$Z))
  )

ggplotly(p, tooltip = "text")

5 Mathematical Model of the Double Helix

Crick and Watson’s 1953 model described DNA as a right-handed double helix with antiparallel strands, base pairs (\(A-T\), \(G-C\)), and a sugar-phosphate backbone. Mathematically, the strands follow parametric equations for helices.

For one strand \[x_1(t) = r \cos(t), \quad y_1(t) = r \sin(t), \quad z_1(t) = \frac{h t}{2\pi}\]

For the second strand (offset by \(\pi\)) \[x_2(t) = r \cos(t + \pi), \quad y_2(t) = r \sin(t + \pi), \quad z_2(t) = \frac{h t}{2\pi},\] where \(r \approx 1\) nm (radius), \(h \approx 3.4\) nm (pitch per turn), and \(t\) ranges over multiple turns. Base pairs connect strands every \(0.34 nm\) along \(z\).

Chargaff’s rules

\[[A] = [T],\qquad [G] = [C],\] ensure complementary pairing.

6 Hands-on Visualization: Plotting the Double Helix with Plotly

Let’s construct the double helix interactively using Plotly in 3D. One can explore and experiment by adjust the parameters.

# # Oversimplified model
# library(plotly)
# 
# # Parameters (approximating DNA dimensions)
# r <- 1  # radius in nm
# h <- 3.4  # pitch per turn in nm
# turns <- 5  # number of turns
# t <- seq(0, 2 * pi * turns, length.out = 1000 * turns)
# 
# # Strand 1
# x1 <- r * cos(t)
# y1 <- r * sin(t)
# z1 <- (h / (2 * pi)) * t
# 
# # Strand 2
# x2 <- r * cos(t + pi)
# y2 <- r * sin(t + pi)
# z2 <- (h / (2 * pi)) * t
# 
# # Create plotly figure
# fig <- plot_ly() %>%
#   add_trace(x = x1, y = y1, z = z1, type = 'scatter3d', mode = 'lines',
#             line = list(width = 4, color = 'blue')) %>%
#   add_trace(x = x2, y = y2, z = z2, type = 'scatter3d', mode = 'lines',
#             line = list(width = 4, color = 'red')) %>%
#   layout(scene = list(aspectmode = 'cube',
#                       xaxis = list(title = 'X (nm)'),
#                       yaxis = list(title = 'Y (nm)'),
#                       zaxis = list(title = 'Z (nm)')),
#          title = "Interactive 3D Model of DNA Double Helix")
# 
# fig

# # More sophisticated doble-helix rendition
# library(plotly)
# library(viridis)
# 
# # Enhanced parameters for more accurate DNA representation
# r <- 1.0      # helix radius in nm
# h <- 3.4      # pitch (height per complete turn) in nm
# turns <- 5    # number of turns
# rise_per_bp <- 0.34  # rise per base pair in nm
# bp_per_turn <- h / rise_per_bp  # base pairs per turn
# 
# # Create parameter t with more points for smoother helix
# t <- seq(0, 2 * pi * turns, length.out = 1000 * turns)
# 
# # Strand 1 (5' to 3' direction)
# x1 <- r * cos(t)
# y1 <- r * sin(t)
# z1 <- (h / (2 * pi)) * t
# 
# # Strand 2 (3' to 5' direction, antiparallel)
# x2 <- r * cos(t + pi)
# y2 <- r * sin(t + pi)
# z2 <- (h / (2 * pi)) * t
# 
# # Create base pairs (rungs connecting the two strands)
# bp_z <- seq(0, max(z1), by = rise_per_bp)
# bp_t <- (2 * pi * bp_z) / h
# 
# # Base pair coordinates - these should connect the two strands
# bp_x1 <- r * cos(bp_t)
# bp_y1 <- r * sin(bp_t)
# bp_x2 <- r * cos(bp_t + pi)
# bp_y2 <- r * sin(bp_t + pi)
# 
# # Create random base pairs following Chargaff's rules
# set.seed(42)
# base_pairs <- sample(c("A-T", "T-A", "G-C", "C-G"), length(bp_z), replace = TRUE)
# bp_colors <- ifelse(base_pairs %in% c("A-T", "T-A"), "darkgreen", "purple")
# 
# # Create more realistic strand representation with thickness
# create_tube <- function(x, y, z, radius = 0.1, sides = 8) {
#   # Create a tube around the central curve
#   tube_x <- c()
#   tube_y <- c()
#   tube_z <- c()
#   tube_i <- c()
#   tube_j <- c()
#   tube_k <- c()
#   
#   for (i in 1:(length(x)-1)) {
#     # Direction vector
#     dx <- x[i+1] - x[i]
#     dy <- y[i+1] - y[i]
#     dz <- z[i+1] - z[i]
#     
#     # Normalize
#     length_vec <- sqrt(dx^2 + dy^2 + dz^2)
#     dx <- dx / length_vec
#     dy <- dy / length_vec
#     dz <- dz / length_vec
#     
#     # Find perpendicular vectors
#     if (abs(dz) > 0.9) {
#       perp1 <- c(1, 0, 0)
#       perp2 <- c(0, 1, 0)
#     } else {
#       perp1 <- c(dy, -dx, 0)
#       perp1 <- perp1 / sqrt(sum(perp1^2))
#       perp2 <- c(dx*dz, dy*dz, -dx^2 - dy^2)
#       perp2 <- perp2 / sqrt(sum(perp2^2))
#     }
#     
#     # Create circle around the point
#     for (j in 0:sides) {
#       angle <- 2 * pi * j / sides
#       cx <- x[i] + radius * (cos(angle) * perp1[1] + sin(angle) * perp2[1])
#       cy <- y[i] + radius * (cos(angle) * perp1[2] + sin(angle) * perp2[2])
#       cz <- z[i] + radius * (cos(angle) * perp1[3] + sin(angle) * perp2[3])
#       
#       tube_x <- c(tube_x, cx)
#       tube_y <- c(tube_y, cy)
#       tube_z <- c(tube_z, cz)
#     }
#   }
#   
#   # Create faces for the mesh
#   n_points <- length(x) - 1
#   for (i in 1:(n_points-1)) {
#     for (j in 0:(sides-1)) {
#       idx1 <- (i-1)*(sides+1) + j + 1
#       idx2 <- idx1 + 1
#       idx3 <- idx1 + (sides+1)
#       idx4 <- idx3 + 1
#       
#       tube_i <- c(tube_i, idx1-1, idx1-1, idx2-1)
#       tube_j <- c(tube_j, idx2-1, idx3-1, idx3-1)
#       tube_k <- c(tube_k, idx3-1, idx4-1, idx4-1)
#     }
#   }
#   
#   list(x = tube_x, y = tube_y, z = tube_z, i = tube_i, j = tube_j, k = tube_k)
# }
# 
# # Create tube meshes for the strands
# tube1 <- create_tube(x1, y1, z1, radius = 0.1)
# tube2 <- create_tube(x2, y2, z2, radius = 0.1)
# 
# # Fix the base pair connectors to properly link across strands
# bp_connectors_x <- c()
# bp_connectors_y <- c()
# bp_connectors_z <- c()
# bp_connectors_i <- c()
# bp_connectors_j <- c()
# bp_connectors_k <- c()
# 
# for (i in 1:length(bp_z)) {
#   # Create a simple cylinder connecting the two strands
#   sides <- 8
#   radius <- 0.05
#   
#   # Create points along the line connecting the two strands
#   for (j in 0:sides) {
#     angle <- 2 * pi * j / sides
#     
#     # Create points along the connector line
#     for (k in 1:3) {  # Fewer segments for simplicity
#       f <- (k-1)/2  # Parameter from 0 to 1
#       
#       # Linear interpolation between the two strand points
#       x_val <- (1-f)*bp_x1[i] + f*bp_x2[i]
#       y_val <- (1-f)*bp_y1[i] + f*bp_y2[i]
#       z_val <- bp_z[i]
#       
#       # Add slight bulge in the middle to represent base pairs
#       if (k == 2) {
#         bulge <- 0.1 * (1 - (2*f-1)^2)  # Parabolic bulge
#         x_val <- x_val + bulge * cos(angle)
#         y_val <- y_val + bulge * sin(angle)
#       }
#       
#       bp_connectors_x <- c(bp_connectors_x, x_val)
#       bp_connectors_y <- c(bp_connectors_y, y_val)
#       bp_connectors_z <- c(bp_connectors_z, z_val)
#     }
#   }
#   
#   # Create faces for the base pair connector
#   start_idx <- (i-1) * (sides+1) * 3
#   
#   for (j in 0:(sides-1)) {
#     for (k in 0:1) {
#       idx1 <- start_idx + j*3 + k + 1
#       idx2 <- start_idx + j*3 + k + 2
#       idx3 <- start_idx + ((j+1) %% (sides+1))*3 + k + 1
#       idx4 <- start_idx + ((j+1) %% (sides+1))*3 + k + 2
#       
#       bp_connectors_i <- c(bp_connectors_i, idx1-1, idx1-1)
#       bp_connectors_j <- c(bp_connectors_j, idx2-1, idx3-1)
#       bp_connectors_k <- c(bp_connectors_k, idx3-1, idx4-1)
#     }
#   }
# }
# 
# # Update the plotly figure with corrected base pairs
# fig <- plot_ly() %>%
#   # First strand as a mesh
#   add_mesh(
#     x = tube1$x, y = tube1$y, z = tube1$z,
#     i = tube1$i, j = tube1$j, k = tube1$k,
#     color = "blue", opacity = 0.8,
#     name = "Strand 1 (5' to 3')"
#   ) %>%
#   
#   # Second strand as a mesh
#   add_mesh(
#     x = tube2$x, y = tube2$y, z = tube2$z,
#     i = tube2$i, j = tube2$j, k = tube2$k,
#     color = "red", opacity = 0.8,
#     name = "Strand 2 (3' to 5')"
#   ) %>%
#   
#   # Corrected base pair connectors
#   add_mesh(
#     x = bp_connectors_x, y = bp_connectors_y, z = bp_connectors_z,
#     i = bp_connectors_i, j = bp_connectors_j, k = bp_connectors_k,
#     color = "darkgreen", opacity = 0.7,
#     name = "Base pairs"
#   ) %>%
#   
#   # Layout settings
#   layout(
#     title = list(
#       text = "Realistic 3D Model of DNA Double Helix",
#       font = list(size = 16)
#     ),
#     scene = list(
#       aspectmode = 'cube',
#       camera = list(eye = list(x = 1.5, y = 1.5, z = 1.5)),
#       xaxis = list(title = 'X (nm)', backgroundcolor = "white"),
#       yaxis = list(title = 'Y (nm)', backgroundcolor = "white"),
#       zaxis = list(title = 'Z (nm)', backgroundcolor = "white"),
#       bgcolor = 'lightgray'
#     ),
#     legend = list(x = 0.8, y = 0.9)
#   )
# 
# # Add annotations for base pairs
# for(i in seq(1, length(bp_z), by = 3)) {
#   mid_x <- (bp_x1[i] + bp_x2[i]) / 2
#   mid_y <- (bp_y1[i] + bp_y2[i]) / 2
#   mid_z <- bp_z[i]
#   
#   fig <- fig %>% add_annotations(
#     x = mid_x, y = mid_y, z = mid_z,
#     text = base_pairs[i],
#     showarrow = FALSE,
#     font = list(size = 10, color = "white"),
#     xanchor = "center",
#     yanchor = "middle"
#   )
# }
# 
# fig

# Load the required library
library(plotly)
library(pracma)  # for cross(u,v) function

# --- 1. Define DNA Helix Parameters ---
# Using biologically relevant parameters for a B-DNA form
r <- 1.0       # Helix radius in nanometers (nm)
h <- 3.4       # Pitch (height per complete turn) in nm
turns <- 3     # Number of full turns to model
rise_per_bp <- 0.34 # Rise per base pair in nm
bp_per_turn <- h / rise_per_bp # Should be ~10

# --- 2. Generate Coordinates for the Helical Backbones ---
# Create a parameter 't' with enough points for a smooth curve
t <- seq(0, 2 * pi * turns, length.out = 200 * turns)

# Strand 1 (5' to 3' direction)
x1 <- r * cos(t)
y1 <- r * sin(t)
z1 <- (h / (2 * pi)) * t

# Strand 2 (antiparallel 3' to 5' direction)
x2 <- r * cos(t + pi) # 180-degree offset
y2 <- r * sin(t + pi)
z2 <- z1

# --- 3. Generate Base Pair Data ---
# Calculate the z-positions for each base pair
bp_z <- seq(min(z1), max(z1), by = rise_per_bp)
# Convert z-positions back to the parametric angle 't'
bp_t <- (2 * pi * bp_z) / h

# Create a data frame with all information for the base pairs
set.seed(42) # for reproducible random base pairs
bp_data <- data.frame(
  x_start = r * cos(bp_t),
  y_start = r * sin(bp_t),
  z = bp_z,
  x_end = r * cos(bp_t + pi),
  y_end = r * sin(bp_t + pi)
)
# Add random base pair identities
bp_data$pair <- sample(c("A-T", "T-A", "G-C", "C-G"), nrow(bp_data), replace = TRUE)

# Split data by base pair type for separate coloring
at_pairs <- bp_data[grepl("A|T", bp_data$pair), ]
gc_pairs <- bp_data[grepl("G|C", bp_data$pair), ]

# Helper function to format data for plotting disconnected lines in plotly
# Each segment is defined by (start, end, NA)
prepare_lines <- function(df) {
  list(
    x = as.vector(t(cbind(df$x_start, df$x_end, NA))),
    y = as.vector(t(cbind(df$y_start, df$y_end, NA))),
    z = as.vector(t(cbind(df$z, df$z, NA))),
    text = rep(df$pair, each = 3) # Hover text for each segment
  )
}

at_lines <- prepare_lines(at_pairs)
gc_lines <- prepare_lines(gc_pairs)


# --- 4. (Optional) Advanced Backbone Rendering using Mesh Tubes ---
# This function creates a 3D tube mesh around a central curve.
# It is kept from your original code as an enhancement.
create_tube <- function(x, y, z, radius = 0.15, sides = 12) {
  tube_x <- c(); tube_y <- c(); tube_z <- c()
  tube_i <- c(); tube_j <- c(); tube_k <- c()
  
  for (i in 1:(length(x) - 1)) {
    # Vector tangent to the curve at point i
    v <- c(x[i+1] - x[i], y[i+1] - y[i], z[i+1] - z[i])
    v <- v / sqrt(sum(v^2)) # Normalize
    
    # Create two arbitrary perpendicular vectors (u and w)
    u <- if (abs(v[1]) > 0.1) c(-v[2], v[1], 0) else c(0, -v[3], v[2])
    u <- u / sqrt(sum(u^2))
    w <- cross(v, u)
    
    # Generate vertices for the circular cross-section at point i
    for (j in 0:sides) {
      angle <- 2 * pi * j / sides
      p <- radius * (cos(angle) * u + sin(angle) * w)
      tube_x <- c(tube_x, x[i] + p[1])
      tube_y <- c(tube_y, y[i] + p[2])
      tube_z <- c(tube_z, z[i] + p[3])
    }
  }
  
  # Create faces for the mesh
  n_points <- length(x) - 1
  for (i in 0:(n_points - 2)) {
    for (j in 0:(sides - 1)) {
      idx1 <- i * (sides + 1) + j
      idx2 <- idx1 + 1
      idx3 <- (i + 1) * (sides + 1) + j
      idx4 <- idx3 + 1
      
      tube_i <- c(tube_i, idx1, idx1)
      tube_j <- c(tube_j, idx2, idx3)
      tube_k <- c(tube_k, idx3, idx4)
    }
  }
  list(x = tube_x, y = tube_y, z = tube_z, i = tube_i, j = tube_j, k = tube_k)
}

tube1 <- create_tube(x1, y1, z1)
tube2 <- create_tube(x2, y2, z2)


# --- 5. Build the 3D Plot ---
fig <- plot_ly() %>%
  
  # Add Strand 1 (blue tube)
  add_mesh(
    x = tube1$x, y = tube1$y, z = tube1$z,
    i = tube1$i, j = tube1$j, k = tube1$k,
    color = "#1f77b4", opacity = 0.7, name = "Strand 1 (5' to 3')"
  ) %>%
  
  # Add Strand 2 (red tube)
  add_mesh(
    x = tube2$x, y = tube2$y, z = tube2$z,
    i = tube2$i, j = tube2$j, k = tube2$k,
    color = "#d62728", opacity = 0.7, name = "Strand 2 (3' to 5')"
  ) %>%

  # Add A-T base pairs (green lines)
  add_trace(
    x = at_lines$x, y = at_lines$y, z = at_lines$z,
    type = 'scatter3d', mode = 'lines',
    line = list(color = '#2ca02c', width = 5),
    hoverinfo = 'text', text = at_lines$text,
    name = 'A-T Pairs'
  ) %>%

  # Add G-C base pairs (purple lines)
  add_trace(
    x = gc_lines$x, y = gc_lines$y, z = gc_lines$z,
    type = 'scatter3d', mode = 'lines',
    line = list(color = '#9467bd', width = 5),
    hoverinfo = 'text', text = gc_lines$text,
    name = 'G-C Pairs'
  ) %>%

  # Configure layout and appearance
  layout(
    title = "Interactive 3D Model of DNA Double Helix",
    scene = list(
      aspectmode = 'data', # 'data' gives a more realistic aspect ratio
      camera = list(eye = list(x = 2.5, y = 2.5, z = 1.5)),
      xaxis = list(title = 'X (nm)', visible = FALSE),
      yaxis = list(title = 'Y (nm)', visible = FALSE),
      zaxis = list(title = 'Z (nm)', visible = FALSE),
      bgcolor = 'rgb(240, 240, 240)'
    ),
    legend = list(x = 0.8, y = 0.9)
  )

# Display the figure
fig
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Rotate and zoom the 3D scene to examine the major/minor grooves.

7 Modern Biophysical Tools: Fluorescence Microscopy Analysis

Fluorescence microscopy visualizes DNA in cells using dyes such as DAPI, 4′,6-diamidino-2-phenylindole (4′,6-diamidino-2-phenylindole), which bind strongly to A–T–rich regions of the genome. The R code below simulates a 3D fluorescence microscopy volume: a collection of cell nuclei stained with DAPI, each assigned a position, radius, and fluorescence intensity. The script renders an interactive 3D image where nuclei appear as spheres, and incorporates realistic imaging effects such as the point-spread function (PSF), optical blur, and background noise.

This simulated 3D dataset mimics how DNA-labeled nuclei appear in fluorescence imaging. In real experiments, such images are used to quantify nuclear morphology, chromatin organization, or total fluorescence intensity.

# PDMP Simulation for 100 Cells under MTX-like Exposure

knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(plotly)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# --- Global simulation parameters ---

n_nuclei <- 20           # number of nuclei
field_size <- 100        # X-Y size in micrometers (um)
avg_nucleus_radius <- 5  # average nucleus radius (um)
z_depth <- 50            # Z depth of the sample (um)

set.seed(123)  # for reproducibility

# --- Generate 3D nuclei data ---

nuclei_data <- data.frame(
  id = 1:n_nuclei,
  x = runif(n_nuclei, avg_nucleus_radius, field_size - avg_nucleus_radius),
  y = runif(n_nuclei, avg_nucleus_radius, field_size - avg_nucleus_radius),
  z = runif(n_nuclei, avg_nucleus_radius, z_depth - avg_nucleus_radius),
  radius = pmax(3, rnorm(n_nuclei, avg_nucleus_radius, 1.5)),
  dapi_intensity = runif(n_nuclei, 0.4, 1.0)
)

head(nuclei_data)
##   id         x        y        z   radius dapi_intensity
## 1  1 30.881977 85.05854 10.71200 5.639696      0.7599934
## 2  2 75.947462 67.35231 21.58185 4.557393      0.5996941
## 3  3 41.807923 62.64561 21.54897 6.342688      0.6931678
## 4  4 84.471566 94.48428 19.75382 6.317200      0.9726843
## 5  5 89.642056 64.01352 11.09779 6.232372      0.6897414
## 6  6  9.100085 68.76774 10.55224 6.032960      0.9342101
# Question 1 (answer in text):
# Describe what the columns x, y, z, radius, and dapi_intensity represent biologically.

# --- Function to create a sphere mesh around (center_x, center_y, center_z) ---

create_sphere_mesh <- function(center_x, center_y, center_z, radius, resolution = 20) {
  phi <- seq(0, pi, length.out = resolution)
  theta <- seq(0, 2 * pi, length.out = resolution)
  
  phi_theta <- expand.grid(phi = phi, theta = theta)
  
  x <- center_x + radius * sin(phi_theta$phi) * cos(phi_theta$theta)
  y <- center_y + radius * sin(phi_theta$phi) * sin(phi_theta$theta)
  z <- center_z + radius * cos(phi_theta$phi)
  
  x_mat <- matrix(x, nrow = resolution, ncol = resolution)
  y_mat <- matrix(y, nrow = resolution, ncol = resolution)
  z_mat <- matrix(z, nrow = resolution, ncol = resolution)
  
  return(list(x = x_mat, y = y_mat, z = z_mat))
}

# --- 3D figure with plotly ---

fig <- plot_ly()

# Add each nucleus as a 3D sphere
for (i in 1:nrow(nuclei_data)) {
  nucleus <- nuclei_data[i, ]
  sphere <- create_sphere_mesh(nucleus$x, nucleus$y, nucleus$z, nucleus$radius)
  
  fig <- fig %>%
    add_surface(
      x = sphere$x, y = sphere$y, z = sphere$z,
      surfacecolor = matrix(
        nucleus$dapi_intensity,
        nrow = nrow(sphere$x),
        ncol = ncol(sphere$x)
      ),
      colorscale = list(
        list(0, "rgb(0,0,139)"),
        list(1, "rgb(0,255,255)")
      ),
      cmin = 0, cmax = 1,
      showscale = FALSE,
      opacity = 0.7,
      name = paste("Nucleus", nucleus$id),
      hoverinfo = 'text',
      text = paste0(
        "Nucleus ID: ", nucleus$id, "<br>",
        "X: ", round(nucleus$x, 1), " um<br>",
        "Y: ", round(nucleus$y, 1), " um<br>",
        "Z: ", round(nucleus$z, 1), " um<br>",
        "Radius: ", round(nucleus$radius, 1), " um<br>",
        "DAPI Intensity: ", round(nucleus$dapi_intensity * 100), "%"
      )
    )
}

# Add a "culture surface" plane at Z = 0
fig <- fig %>%
  add_trace(
    x = c(0, field_size, field_size, 0),
    y = c(0, 0, field_size, field_size),
    z = c(0, 0, 0, 0),
    i = c(0, 0, 1, 1),
    j = c(1, 2, 2, 3),
    k = c(2, 3, 3, 0),
    type = 'mesh3d',
    color = 'gray',
    opacity = 0.3,
    name = "Culture Surface"
  )

# Layout and scene configuration
fig <- fig %>%
  layout(
    title = list(
      text = "3D Interactive Fluorescence Microscopy DNA Simulation (DAPI Staining)",
      font = list(size = 18, color = "black")
    ),
    scene = list(
      xaxis = list(
        title = "X (um)",
        backgroundcolor = "rgb(20, 20, 20)",
        gridcolor = "gray",
        zerolinecolor = "gray",
        showbackground = TRUE,
        range = c(0, field_size)
      ),
      yaxis = list(
        title = "Y (um)",
        backgroundcolor = "rgb(20, 20, 20)",
        gridcolor = "gray",
        zerolinecolor = "gray",
        showbackground = TRUE,
        range = c(0, field_size)
      ),
      zaxis = list(
        title = "Z (um)",
        backgroundcolor = "rgb(20, 20, 20)",
        gridcolor = "gray",
        zerolinecolor = "gray",
        showbackground = TRUE,
        range = c(0, z_depth)
      ),
      aspectratio = list(x = 1, y = 1, z = z_depth/field_size),
      camera = list(eye = list(x = 1.7, y = 1.7, z = 1.2)),
      bgcolor = "black"
    ),
    paper_bgcolor = "white",
    showlegend = FALSE
  )

fig
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Question 2:
# Rotate and zoom the 3D plot. How does changing the viewing angle help you 
# understand the distribution of nuclei in 3D?

# --- Image and slice parameters ---

img_size <- 128          # pixels
slice_thickness <- 10    # um
psf_sigma_xy <- 6        # blur in pixels

# Map um position to pixel coordinates
um_to_pixel <- function(x_um, field_size, img_size) {
  x_um / field_size * (img_size - 1) + 1
}

# Generate a 2D slice image at z0
generate_slice_image <- function(z0, nuclei_data,
                                 field_size,
                                 img_size,
                                 slice_thickness,
                                 psf_sigma_xy) {
  
  img <- matrix(0, nrow = img_size, ncol = img_size)
  
  in_slice <- abs(nuclei_data$z - z0) <= (slice_thickness / 2)
  nuclei_slice <- nuclei_data[in_slice, ]
  
  if (nrow(nuclei_slice) == 0) return(img)
  
  for (i in 1:nrow(nuclei_slice)) {
    nuc <- nuclei_slice[i, ]
    
    cx <- um_to_pixel(nuc$x, field_size, img_size)
    cy <- um_to_pixel(nuc$y, field_size, img_size)
    
    x_range <- round(cx - 3*psf_sigma_xy):round(cx + 3*psf_sigma_xy)
    y_range <- round(cy - 3*psf_sigma_xy):round(cy + 3*psf_sigma_xy)
    
    x_range <- x_range[x_range >= 1 & x_range <= img_size]
    y_range <- y_range[y_range >= 1 & y_range <= img_size]
    
    if (length(x_range) == 0 || length(y_range) == 0) next
    
    xx <- matrix(rep(x_range, each = length(y_range)), nrow = length(y_range))
    yy <- matrix(rep(y_range, times = length(x_range)), nrow = length(y_range))
    
    gauss <- exp(- ((xx - cx)^2 + (yy - cy)^2) / (2 * psf_sigma_xy^2))
    gauss <- gauss * nuc$dapi_intensity
    
    img[y_range, x_range] <- img[y_range, x_range] + gauss
  }
  
  return(img)
}

# Example: slice at mid-depth
z_mid <- z_depth / 2
slice_img <- generate_slice_image(
  z0 = z_mid,
  nuclei_data = nuclei_data,
  field_size = field_size,
  img_size = img_size,
  slice_thickness = slice_thickness,
  psf_sigma_xy = psf_sigma_xy
)


slice_df <- as.data.frame(as.table(slice_img))
colnames(slice_df) <- c("y", "x", "intensity")

slice_df$x <- as.numeric(slice_df$x)
slice_df$y <- as.numeric(slice_df$y)

ggplot(slice_df, aes(x = x, y = y, fill = intensity)) +
  geom_raster() +
  scale_fill_viridis_c() +
  coord_equal() +
  scale_y_reverse() +
  theme_minimal() +
  labs(
    title = paste("Simulated Confocal Slice at z =", round(z_mid, 1), "um"),
    x = "X (pixels)",
    y = "Y (pixels)"
  )

# Question 3:
# Change z_mid to values near the bottom and top of the volume (e.g., 5 um and 45 um). 
# How does the number of visible nuclei change with z?

# --- Add noise to the slice ---

add_noise_to_image <- function(img,
                               background_level = 0.05,
                               read_noise_sd = 0.02,
                               photon_scale = 50) {
  img_bg <- img + background_level
  
  photon_counts <- img_bg * photon_scale
  photon_counts_noisy <- matrix(
    rpois(length(photon_counts), lambda = as.vector(photon_counts)),
    nrow = nrow(img_bg)
  )
  
  img_photon <- photon_counts_noisy / photon_scale
  
  img_noisy <- img_photon + matrix(
    rnorm(length(img_photon), mean = 0, sd = read_noise_sd),
    nrow = nrow(img_photon)
  )
  
  img_noisy[img_noisy < 0] <- 0
  
  return(img_noisy)
}

slice_noisy <- add_noise_to_image(slice_img)

slice_noisy_df <- as.data.frame(as.table(slice_noisy))
colnames(slice_noisy_df) <- c("y", "x", "intensity")

slice_noisy_df$x <- as.numeric(slice_noisy_df$x)
slice_noisy_df$y <- as.numeric(slice_noisy_df$y)


ggplot(slice_noisy_df, aes(x = x, y = y, fill = intensity)) +
  geom_raster() +
  scale_fill_viridis_c() +
  coord_equal() +
  scale_y_reverse() +
  theme_minimal() +
  labs(
    title = paste("Noisy Confocal Slice at z =", round(z_mid, 1), "um"),
    x = "X (pixels)",
    y = "Y (pixels)"
  )

# Question 4:
# Experiment with different values of background_level, read_noise_sd, and 
# photon_scale. Which parameter has the biggest impact on how easy it is to 
# distinguish individual nuclei?

8 Biomedical Problems

DNA’s structure enables replication, transcription, and mutations, key to biomedicine.

8.1 Replication Challenge

During semiconservative replication, helicase unwinds the helix. If a mutation alters base pairing, how might it cause disease like sickle cell anemia? Simulate with code: Modify a sequence and check pairing.

The simulation below will illustrates:

  • The Parent DNA: We’ll start with a short segment of the mutated beta-globin (HBB) gene, which causes sickle cell anemia. A specific base pair, the site of the point mutation, will be highlighted.

  • The Replication Fork: We’ll show the parent DNA being “unzipped” by helicase, with the two strands separating to act as templates for new strands.

  • Semiconservative Replication: As new strands are synthesized, the mutated base on the template strand dictates that an incorrect complementary base is added.

  • The Result: The two new daughter DNA molecules. Both now carry the sickle cell mutation, demonstrating how the genetic error is passed on to new cells.

Annotations throughout the plot explains the biological significance, how this single base change alters an mRNA codon, leading to a different amino acid and ultimately the defective hemoglobin protein.

# Simple base pairing check
sequence <- "ATGC"
complement <- chartr("ATGC", "TACG", sequence)
cat("Original:", sequence, "\nComplement:", complement)
## Original: ATGC 
## Complement: TACG
library(plotly)

# --- 1. Simulation Setup ---
# Define the DNA sequence for the beta-globin (HBB) gene around the mutation
# The mutation for sickle cell is A -> T on the coding strand, which means T -> A on the template strand.
# Normal Template Strand: 3'-GGA CTC CTC-5'
# Mutated Template Strand: 3'-GGA CAC CTC-5' (The T at position 5 is now an A)

template_strand_seq <- c('G','G','A','C','A','C','C','T','C')
coding_strand_seq <- c('C','C','T','G','T','G','G','A','G') # Complementary strand
mutation_pos <- 5 # Position of the T->A mutation on the template strand

# Helper function to generate coordinates for a DNA helix segment
generate_dna_segment <- function(seq1, seq2, z_offset = 0, x_offset = 0, y_offset = 0, highlight_pos = NULL) {
  
  # Helix parameters
  r <- 1.0 # Radius
  h_per_bp <- 0.34 # Height per base pair or 0.6 #
  angle_per_bp <- 36 * (pi / 180) # Rotation per base pair (approx. 10 bp per turn)
  
  # Base pair colors
  color_map <- c('A' = '#2ca02c', 'T' = '#d62728', 'G' = '#ff7f0e', 'C' = '#1f77b4', 'MUTATION' = '#ffff00')
  
  # Generate coordinates
  bp_data <- data.frame(
    t = (0:(length(seq1)-1)) * angle_per_bp,
    z = (0:(length(seq1)-1)) * h_per_bp + z_offset
  )
  
  bp_data$x1 <- r * cos(bp_data$t) + x_offset
  bp_data$y1 <- r * sin(bp_data$t) + y_offset
  bp_data$x2 <- r * cos(bp_data$t + pi) + x_offset
  bp_data$y2 <- r * sin(bp_data$t + pi) + y_offset
  
  # Assign sequences and colors
  bp_data$base1 <- seq1
  bp_data$base2 <- seq2
  bp_data$color1 <- color_map[bp_data$base1]
  bp_data$color2 <- color_map[bp_data$base2]
  
  # Highlight the mutation
  if (!is.null(highlight_pos)) {
    bp_data$color1[highlight_pos] <- color_map['MUTATION']
    bp_data$color2[highlight_pos] <- color_map['MUTATION']
  }
  
  return(bp_data)
}


# --- 2. Generate Data for Each Stage of the Simulation ---
# Stage 1: Parent Mutated DNA
parent_dna <- generate_dna_segment(template_strand_seq, coding_strand_seq, z_offset = 8, highlight_pos = mutation_pos)

# Stage 2: Replication Fork (Strands Separated)
template_strand_unwound <- generate_dna_segment(template_strand_seq, rep(NA, length(template_strand_seq)), z_offset = 4, x_offset = -2, highlight_pos = mutation_pos)
coding_strand_unwound <- generate_dna_segment(rep(NA, length(coding_strand_seq)), coding_strand_seq, z_offset = 4, x_offset = 2, highlight_pos = mutation_pos)

# Stage 3: Two Daughter DNA Molecules
daughter1_dna <- generate_dna_segment(template_strand_seq, coding_strand_seq, z_offset = 0, y_offset = -3, highlight_pos = mutation_pos)
daughter2_dna <- generate_dna_segment(template_strand_seq, coding_strand_seq, z_offset = 0, y_offset = 3, highlight_pos = mutation_pos)


# --- 3. Build the Plotly Visualization ---
fig <- plot_ly()

# Function to add a DNA segment trace to the plot
add_dna_trace <- function(fig, df, name) {
  # Add backbones
  fig <- fig %>% add_trace(
    x = df$x1, y = df$y1, z = df$z, type = 'scatter3d', mode = 'lines',
    line = list(color = 'gray', width = 4), hoverinfo = 'none', name = name, showlegend = FALSE
  )
  fig <- fig %>% add_trace(
    x = df$x2, y = df$y2, z = df$z, type = 'scatter3d', mode = 'lines',
    line = list(color = 'gray', width = 4), hoverinfo = 'none', showlegend = FALSE
  )
  # Add base pair rungs
  for (i in 1:nrow(df)) {
    fig <- fig %>% add_trace(
      x = c(df$x1[i], df$x2[i]), y = c(df$y1[i], df$y2[i]), z = c(df$z[i], df$z[i]),
      type = 'scatter3d', mode = 'lines',
      line = list(color = df$color1[i], width = 8),
      hoverinfo = 'text', text = paste(df$base1[i], "-", df$base2[i]),
      showlegend = FALSE
    )
  }
  return(fig)
}

# Add all the stages to the plot
fig <- add_dna_trace(fig, parent_dna, "Parent DNA")
fig <- add_dna_trace(fig, template_strand_unwound, "Template Strand")
fig <- add_dna_trace(fig, coding_strand_unwound, "Coding Strand")
fig <- add_dna_trace(fig, daughter1_dna, "Daughter 1")
fig <- add_dna_trace(fig, daughter2_dna, "Daughter 2")

# --- 4. Add Annotations and Final Layout ---
fig <- fig %>% layout(
  title = "Simulation of DNA Replication with Sickle Cell Mutation",
  scene = list(
    xaxis = list(title = "", showticklabels=FALSE, visible=FALSE),
    yaxis = list(title = "", showticklabels=FALSE, visible=FALSE),
    zaxis = list(title = "Replication Progress", zeroline=FALSE, range=c(-1, 12)),
    camera = list(eye = list(x = 0, y = -2.5, z = 1.5)),
    annotations = list(
      # Parent DNA Annotation
      list(x = 0, y = 0, z = 11, text = "<b>1. Parent DNA with Sickle Cell Trait</b><br>(Hover over bases to inspect)", showarrow = FALSE),
      list(x = parent_dna$x1[mutation_pos], y = parent_dna$y1[mutation_pos], z = parent_dna$z[mutation_pos],
           text = "Mutation Site<br>T -> A", arrowhead = 4, ax = 40, ay = 0),
           
      # Replication Fork Annotation
      list(x = 0, y = 0, z = 7.5, text = "<b>2. Helicase Unwinds DNA</b><br>Strands separate to act as templates", showarrow = FALSE),
      
      # Daughter DNA Annotation
      list(x = 0, y = 0, z = 3.5, text = "<b>3. Semiconservative Replication</b><br>Two identical daughter molecules are formed", showarrow = FALSE),
      
      # Biological Consequence Annotation
      list(x = 0, y = 1.5, z = 0, text = "<b>Result: Mutation is Propagated</b><br>DNA: CTC -> CAC (Template)<br>mRNA: GAG -> GUG (Codon)<br>Protein: Glutamic Acid -> Valine<br>This causes Sickle Cell Anemia",
           showarrow = FALSE, bgcolor = "rgba(255, 255, 255, 0.7)")
    ),
    bgcolor = "rgb(240, 240, 240)"
  )
)

fig

To interact with the simulation, use rotate and zoom; click and drag the plot to rotate the 3D scene. Use your mouse wheel to zoom in and out. To inspect the mutation, hover your mouse over the individual base pairs (the colored “rungs”). A tooltip will show the base pair (e.g., “A - T”). Start at the top with the “Parent DNA”. The bright yellow rung indicates the location of the point mutation responsible for sickle cell disease. Follow the arrows down to see the strands separate and then form two new, identical, mutated copies at the bottom. The text labels explain each stage and connect the molecular error to the resulting disease.

8.2 DNA Mutation Impact

A point mutation changes \(A\) to \(G\). Model probability using binomial distribution: \(P(k) = \binom{n}{k} p^k (1-p)^{n-k}\), where \(p\ =\) mutation rate.

This experiment depicts DNA replication and mutation. It shows an interactive simulation that demonstrates DNA replication, the effect of mutations, and how they can lead to diseases like sickle cell anemia. It includes a 3D visualization of the DNA helix and a 2D representation of the base pairing.

# library(plotly)
# library(dplyr)
# 
# # Set seed for reproducibility
# set.seed(42)
# 
# # --- 1. DNA Sequence and Structure Simulation ---
# 
# # Wild-type beta-globin gene sequence (HBB) - excerpt
# wild_type_seq <- "ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCAGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC"
# 
# # Mutated sequence (sickle cell mutation: GAG -> GTG at position 20-22)
# mutated_seq <- wild_type_seq
# substr(mutated_seq, 20, 22) <- "GTG"
# 
# # Function to generate a simplified DNA helix for a given sequence
# generate_dna_helix <- function(sequence, start_z = 0, helix_id = "wild") {
#   n_bases <- nchar(sequence)
#   bases <- strsplit(sequence, "")[[1]]
#   
#   # Parameters for the helix
#   radius <- 1.0
#   pitch <- 3.4
#   turns <- n_bases / 10  # Approximately 10 bases per turn
#   
#   # Generate the helix coordinates
#   t <- seq(0, 2 * pi * turns, length.out = n_bases)
#   x1 <- radius * cos(t)
#   y1 <- radius * sin(t)
#   z1 <- (pitch / (2 * pi)) * t + start_z
#   
#   # Complementary strand (offset by π radians)
#   x2 <- radius * cos(t + pi)
#   y2 <- radius * sin(t + pi)
#   z2 <- z1
#   
#   # Base pairing coordinates
#   bp_x <- c()
#   bp_y <- c()
#   bp_z <- c()
#   bp_colors <- c()
#   bp_types <- c()
#   
#   for (i in 1:n_bases) {
#     base <- bases[i]
#     comp_base <- switch(base,
#                        "A" = "T",
#                        "T" = "A",
#                        "G" = "C",
#                        "C" = "G",
#                        "N")  # For any other character
#     
#     # Color coding for bases
#     color <- switch(base,
#                    "A" = "green",
#                    "T" = "red",
#                    "G" = "blue",
#                    "C" = "orange",
#                    "gray")
#     
#     bp_x <- c(bp_x, x1[i], x2[i], NA)
#     bp_y <- c(bp_y, y1[i], y2[i], NA)
#     bp_z <- c(bp_z, z1[i], z2[i], NA)
#     bp_colors <- c(bp_colors, color, color, NA)
#     bp_types <- c(bp_types, paste0(base, "-", comp_base), paste0(base, "-", comp_base), NA)
#   }
#   
#   # Create a data frame for the helix
#   helix_df <- data.frame(
#     x = c(x1, x2),
#     y = c(y1, y2),
#     z = c(z1, z2),
#     strand = rep(c("Leading", "Lagging"), each = n_bases),
#     base = rep(bases, 2),
#     helix = helix_id,
#     stringsAsFactors = FALSE
#   )
#   
#   # Create a data frame for base pairs
#   bp_df <- data.frame(
#     x = bp_x,
#     y = bp_y,
#     z = bp_z,
#     color = bp_colors,
#     pair = bp_types,
#     helix = helix_id,
#     stringsAsFactors = FALSE
#   )
#   
#   list(helix = helix_df, base_pairs = bp_df)
# }
# 
# # Generate wild-type and mutated DNA helices
# wild_dna <- generate_dna_helix(substr(wild_type_seq, 1, 30), 0, "wild")
# mutated_dna <- generate_dna_helix(substr(mutated_seq, 1, 30), 15, "mutated")
# 
# # --- 2. Create 3D Visualization ---
# fig <- plot_ly()
# 
# # Add wild-type DNA
# fig <- fig %>%
#   add_trace(
#     data = wild_dna$helix,
#     x = ~x, y = ~y, z = ~z,
#     type = 'scatter3d',
#     mode = 'lines',
#     line = list(width = 6, color = 'lightblue'),
#     name = "Wild-type DNA",
#     hoverinfo = 'text',
#     text = ~paste("Strand:", strand, "<br>Base:", base)
#   ) %>%
#   add_trace(
#     data = wild_dna$base_pairs,
#     x = ~x, y = ~y, z = ~z,
#     type = 'scatter3d',
#     mode = 'lines',
#     line = list(width = 4, color = ~color),
#     name = "Base pairs",
#     hoverinfo = 'text',
#     text = ~paste("Base pair:", pair)
#   )
# 
# # Add mutated DNA
# fig <- fig %>%
#   add_trace(
#     data = mutated_dna$helix,
#     x = ~x, y = ~y, z = ~z,
#     type = 'scatter3d',
#     mode = 'lines',
#     line = list(width = 6, color = 'pink'),
#     name = "Mutated DNA",
#     hoverinfo = 'text',
#     text = ~paste("Strand:", strand, "<br>Base:", base)
#   ) %>%
#   add_trace(
#     data = mutated_dna$base_pairs,
#     x = ~x, y = ~y, z = ~z,
#     type = 'scatter3d',
#     mode = 'lines',
#     line = list(width = 4, color = ~color),
#     name = "Base pairs (mutated)",
#     hoverinfo = 'text',
#     text = ~paste("Base pair:", pair)
#   )
# 
# # Highlight the mutation site
# mutation_index <- 20
# mutation_z <- wild_dna$helix$z[mutation_index]
# 
# fig <- fig %>%
#   add_trace(
#     x = c(-1.2, 1.2), y = c(0, 0), z = c(mutation_z, mutation_z),
#     type = 'scatter3d',
#     mode = 'lines',
#     line = list(width = 8, color = 'yellow'),
#     name = "Mutation site",
#     hoverinfo = 'text',
#     text = paste("Sickle cell mutation: GAG -> GTG")
#   ) %>%
#   add_trace(
#     x = c(0, 0), y = c(-1.2, 1.2), z = c(mutation_z, mutation_z),
#     type = 'scatter3d',
#     mode = 'lines',
#     line = list(width = 8, color = 'yellow'),
#     name = "Mutation site",
#     showlegend = FALSE
#   )
# 
# # --- 3. Configure Layout ---
# fig <- fig %>%
#   layout(
#     title = list(
#       text = "DNA Replication and Mutation: Sickle Cell Anemia Simulation",
#       font = list(size = 16)
#     ),
#     scene = list(
#       aspectmode = 'cube',
#       camera = list(eye = list(x = 1.5, y = -1.5, z = 1.5)),
#       xaxis = list(title = "X"),
#       yaxis = list(title = "Y"),
#       zaxis = list(title = "Z"),
#       bgcolor = 'lightgray'
#     ),
#     annotations = list(
#       list(
#         x = 0.5,
#         y = 0.1,
#         xref = "paper",
#         yref = "paper",
#         text = paste0("Wild-type DNA (top) vs Mutated DNA (bottom)<br>Mutation: GAG -> ",
#             substr(mutated_seq, 20, 22), " in beta-globin gene<br>This causes glutamic acid -> valine substitution"),
#         showarrow = FALSE,
#         font = list(size = 12)
#       )
#     )
#   )

library(plotly)
library(dplyr)

# Set seed for reproducibility
set.seed(42)

# --- 1. DNA Sequence and Parameters ---
wild_type_seq <- "ATGGTGCACCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCAGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC"

# Create mutated sequence (sickle cell mutation: GAG -> GTG at position 20-22)
mutated_seq <- wild_type_seq
substr(mutated_seq, 20, 22) <- "GTG"

# Parameters
radius <- 1.0
pitch <- 3.4 # default
n_bases <- 30  # Number of bases to visualize
mutation_pos <- 20  # Position of the mutation

# --- 2. Create DNA Data Frames ---
create_dna_df <- function(sequence, z_offset = 0, is_mutated = FALSE) {
  bases <- strsplit(substr(sequence, 1, n_bases), "")[[1]]
  
  # Generate helix coordinates
  t <- seq(0, 2 * pi * (n_bases/10), length.out = n_bases)
  x1 <- radius * cos(t)
  y1 <- radius * sin(t)
  z <- (pitch / (2 * pi)) * t + z_offset
  
  # Complementary strand (offset by π radians)
  x2 <- radius * cos(t + pi)
  y2 <- radius * sin(t + pi)
  
  # Create complementary bases
  base_pairs <- sapply(bases, function(b) {
    switch(b,
           "A" = "T",
           "T" = "A",
           "G" = "C",
           "C" = "G",
           "N")
  })
  
  # Color coding for bases
  base_colors <- c("A" = "green", "T" = "red", "G" = "blue", "C" = "orange")
  color1 <- base_colors[bases]
  
  # Highlight mutation if present
  if (is_mutated) {
    color1[mutation_pos] <- "yellow"
  }
  
  data.frame(
    x1 = x1, y1 = y1, 
    x2 = x2, y2 = y2, 
    z = z,
    base1 = bases,
    base2 = base_pairs,
    color1 = color1,
    stringsAsFactors = FALSE
  )
}

# Create DNA data frames
parent_dna <- create_dna_df(wild_type_seq, z_offset = 10)
mutated_dna <- create_dna_df(mutated_seq, z_offset = 0, is_mutated = TRUE)

# Create replication intermediates
template_strand <- data.frame(
  x1 = parent_dna$x1,
  y1 = parent_dna$y1,
  z = parent_dna$z - 3,
  base1 = parent_dna$base1,
  color1 = parent_dna$color1,
  stringsAsFactors = FALSE
)

coding_strand <- data.frame(
  x1 = parent_dna$x2,
  y1 = parent_dna$y2,
  z = parent_dna$z - 3,
  base1 = parent_dna$base2,
  color1 = parent_dna$color1,
  stringsAsFactors = FALSE
)

# --- 3. Build the Plotly Visualization ---
fig <- plot_ly()

# Function to add a DNA segment trace to the plot
add_dna_trace <- function(fig, df, name, show_bases = TRUE) {
  # Add backbones
  fig <- fig %>% add_trace(
    x = df$x1, y = df$y1, z = df$z, type = 'scatter3d', mode = 'lines',
    line = list(color = 'gray', width = 4), hoverinfo = 'none', name = name, showlegend = FALSE
  )
  
  # Add complementary backbone if available
  if ("x2" %in% colnames(df)) {
    fig <- fig %>% add_trace(
      x = df$x2, y = df$y2, z = df$z, type = 'scatter3d', mode = 'lines',
      line = list(color = 'gray', width = 4), hoverinfo = 'none', showlegend = FALSE
    )
    
    # Add base pair rungs
    for (i in 1:nrow(df)) {
      fig <- fig %>% add_trace(
        x = c(df$x1[i], df$x2[i]), y = c(df$y1[i], df$y2[i]), z = c(df$z[i], df$z[i]),
        type = 'scatter3d', mode = 'lines',
        line = list(color = df$color1[i], width = 8),
        hoverinfo = 'text', 
        text = paste("Base pair:", df$base1[i], "-", df$base2[i]),
        showlegend = FALSE
      )
    }
  } else if (show_bases) {
    # Add individual bases for single strands
    for (i in 1:nrow(df)) {
      fig <- fig %>% add_trace(
        x = df$x1[i], y = df$y1[i], z = df$z[i],
        type = 'scatter3d', mode = 'markers',
        marker = list(color = df$color1[i], size = 8),
        hoverinfo = 'text', 
        text = paste("Base:", df$base1[i]),
        showlegend = FALSE
      )
    }
  }
  
  return(fig)
}

# Add all DNA segments to the plot
fig <- add_dna_trace(fig, parent_dna, "Parent DNA")
fig <- add_dna_trace(fig, mutated_dna, "Mutated DNA")
fig <- add_dna_trace(fig, template_strand, "Template Strand", show_bases = TRUE)
fig <- add_dna_trace(fig, coding_strand, "Coding Strand", show_bases = TRUE)

# --- 4. Add Mutation Probability Visualization ---
# Function to calculate binomial probability
binomial_probability <- function(n, k, p) {
  choose(n, k) * p^k * (1-p)^(n-k)
}

# Simulate mutation probabilities
mutation_rate <- 0.001  # Typical mutation rate per base per generation
n_bases_sim <- 1000  # Number of bases to consider

# Calculate probability of k mutations
k_values <- 0:10
probabilities <- sapply(k_values, function(k) {
  binomial_probability(n_bases_sim, k, mutation_rate)
})

# Create data frame for plotting
mutation_df <- data.frame(
  k = k_values,
  probability = probabilities
)

# Create mutation probability plot
mutation_plot <- plot_ly(mutation_df, x = ~k, y = ~probability, type = 'bar',
                         marker = list(color = 'rgba(55, 128, 191, 0.7)',
                                       line = list(color = 'rgba(55, 128, 191, 1.0)', width = 1.5))) %>%
  layout(title = "Probability of k Mutations in 1000 Bases",
         xaxis = list(title = "Number of Mutations (k)"),
         yaxis = list(title = "Probability", type = "log"),
         annotations = list(
           x = 1,
           y = 0.1,
           xref = 'paper',
           yref = 'paper',
           text = paste("Mutation rate p =", mutation_rate),
           showarrow = FALSE
         ))

# --- 5. Add Annotations and Final Layout ---
fig <- fig %>% layout(
  title = "DNA Replication and Mutation: Sickle Cell Anemia Simulation",
  scene = list(
    xaxis = list(title = "", showticklabels=FALSE, visible=FALSE),
    yaxis = list(title = "", showticklabels=FALSE, visible=FALSE),
    zaxis = list(title = "Replication Progress", zeroline=FALSE, range=c(-4, 12)),
    camera = list(eye = list(x = 0, y = -2.5, z = 1.5)),
    annotations = list(
      # Parent DNA Annotation
      list(x = 0, y = 0, z = 11, 
           text = "<b>Wild-type DNA</b><br>(Normal beta-globin gene)", 
           showarrow = FALSE, font = list(size = 14)),
      
      # Mutated DNA Annotation
      list(x = 0, y = 0, z = 5, 
           text = "<b>Mutated DNA</b><br>(Sickle cell mutation: GAG -> GTG)", 
           showarrow = FALSE, font = list(size = 14)),
      
      # Mutation Site Annotation
      list(x = mutated_dna$x1[mutation_pos], y = mutated_dna$y1[mutation_pos], z = mutated_dna$z[mutation_pos],
           text = "Mutation Site", arrowhead = 4, ax = 40, ay = 0, font = list(size = 12)),
      
      # Biological Consequence Annotation
      list(x = 0, y = 1.5, z = -3, 
           text = "<b>Biological Effect:</b><br>DNA: GAG -> GTG (Codon)<br>Protein: Glutamic Acid -> Valine<br>This causes Sickle Cell Anemia",
           showarrow = FALSE, bgcolor = "rgba(255, 255, 255, 0.7)", font = list(size = 12))
    ),
    bgcolor = "rgb(240, 240, 240)"
  )
)

# --- 6. Display the Visualizations ---
# Display the 3D DNA visualization
fig
# Display the mutation probability plot
mutation_plot
# --- 7. Create 2D Base Pairing Visualization ---
create_base_pairing_plot <- function(sequence, title) {
  n_bases <- nchar(sequence)
  bases <- strsplit(sequence, "")[[1]]
  
  # Create coordinates for bases
  x1 <- rep(1, n_bases)
  x2 <- rep(2, n_bases)
  y <- seq(n_bases, 1, by = -1)
  
  # Create complementary sequence
  comp_bases <- sapply(bases, function(b) {
    switch(b,
           "A" = "T",
           "T" = "A",
           "G" = "C",
           "C" = "G",
           "N")
  })
  
  # Base colors
  base_colors <- c("A" = "green", "T" = "red", "G" = "blue", "C" = "orange")
  
  # Create the plot for the leading strand bases
  p <- plot_ly() %>%
    add_text(
      x = x1, y = y, text = bases, 
      textfont = list(size = 20),
      marker = list(color = base_colors[bases]),
      hoverinfo = 'text',
      hovertext = paste("Base:", bases, "<br>Strand: Leading"),
      showlegend = FALSE
    ) %>%
    add_text(
      x = x2, y = y, text = comp_bases, 
      textfont = list(size = 20),
      marker = list(color = base_colors[comp_bases]),
      hoverinfo = 'text',
      hovertext = paste("Base:", comp_bases, "<br>Strand: Lagging"),
      showlegend = FALSE
    )
  
  # Add connecting lines for base pairs
  for (i in 1:n_bases) {
    p <- p %>% add_trace(
      x = c(1, 2), y = c(y[i], y[i]),
      type = 'scatter', mode = 'lines',
      line = list(width = 2, color = 'gray'),
      showlegend = FALSE,
      hoverinfo = 'none'
    )
  }
  
  p <- p %>% layout(
    title = title,
    xaxis = list(showticklabels = FALSE, showgrid = FALSE, zeroline = FALSE, range = c(0.5, 2.5)),
    yaxis = list(showticklabels = FALSE, showgrid = FALSE, zeroline = FALSE)
  )
  
  p
}

# Create 2D base pairing plots
wild_2d <- create_base_pairing_plot(substr(wild_type_seq, 15, 25), "Wild-type Base Pairing (GAG)")
mutated_2d <- create_base_pairing_plot(substr(mutated_seq, 15, 25), "Mutated Base Pairing (GTG)")

# Display the 2D plots
wild_2d
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
mutated_2d
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
# --- 4. Create 2D Base Pairing Visualization ---

# Function to create a 2D base pairing diagram
create_base_pairing_plot <- function(sequence, title) {
  n_bases <- nchar(sequence)
  bases <- strsplit(sequence, "")[[1]]
  
  # Create coordinates for bases
  x1 <- rep(1, n_bases)
  x2 <- rep(2, n_bases)
  y <- seq(n_bases, 1, by = -1)
  
  # Create complementary sequence
  comp_bases <- sapply(bases, function(b) {
    switch(b,
           "A" = "T",
           "T" = "A",
           "G" = "C",
           "C" = "G",
           "N")
  })
  
  # Base colors
  base_colors <- c("A" = "green", "T" = "red", "G" = "blue", "C" = "orange")
  
  # Create the plot for the leading strand bases
  p <- plot_ly() %>%
    add_text(
      x = x1, y = y, text = bases, 
      textfont = list(size = 20),
      marker = list(color = base_colors[bases]),
      hoverinfo = 'text', mode="text",
      hovertext = paste("Base:", bases, "<br>Strand: Leading"),
      showlegend = FALSE
    ) %>%
    add_text(
      x = x2, y = y, text = comp_bases, 
      textfont = list(size = 20),
      marker = list(color = base_colors[comp_bases]),
      hoverinfo = 'text', mode="text",
      hovertext = paste("Base:", comp_bases, "<br>Strand: Lagging"),
      showlegend = FALSE
    )
  
  # Add connecting lines for base pairs
  for (i in 1:n_bases) {
    p <- p %>% add_trace(
      x = c(1, 2), y = c(y[i], y[i]),
      type = 'scatter', mode = 'lines',
      line = list(width = 2, color = 'gray'),
      showlegend = FALSE,
      hoverinfo = 'none'
    )
  }
  
  p <- p %>% layout(
    title = title,
    xaxis = list(showticklabels = FALSE, showgrid = FALSE, zeroline = FALSE, range = c(0.5, 2.5)),
    yaxis = list(showticklabels = FALSE, showgrid = FALSE, zeroline = FALSE)
  )
  
  p
}

# Create 2D base pairing plots
wild_2d <- create_base_pairing_plot(substr(wild_type_seq, 15, 25), "Wild-type Base Pairing (GAG)")
mutated_2d <- 
  create_base_pairing_plot(substr(mutated_seq, 15, 25),
                           paste0("Mutated Base Pairing (", substr(mutated_seq, 20, 22), ")"))

# Display the 3D plot
fig
# Display the 2D plots
wild_2d
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
mutated_2d
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
subplot(wild_2d, mutated_2d) %>%
  layout(title = "Wild-type (GAG) vs. Mutated (GGT) Base Pairing\n (Wild-type, Left)              (Mutated, Right)")
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
# --- 5. Mutation Probability Simulation ---

# Function to calculate binomial probability
binomial_probability <- function(n, k, p) {
  choose(n, k) * p^k * (1-p)^(n-k)
}

# Simulate mutation probabilities
mutation_rate <- 0.001  # Typical mutation rate per base per generation
n_bases <- 1000  # Number of bases to consider

# Calculate probability of k mutations
k_values <- 0:10
probabilities <- sapply(k_values, function(k) {
  binomial_probability(n_bases, k, mutation_rate)
})

# Create data frame for plotting
mutation_df <- data.frame(
  k = k_values,
  probability = probabilities
)

# Plot mutation probability distribution
mutation_plot <- plot_ly(mutation_df, x = ~k, y = ~probability, type = 'bar',
                         marker = list(color = 'rgba(55, 128, 191, 0.7)',
                                       line = list(color = 'rgba(55, 128, 191, 1.0)', width = 1.5))) %>%
  layout(title = "Probability of k Mutations in 1000 Bases",
         xaxis = list(title = "Number of Mutations (k)"),
         yaxis = list(title = "Probability", type = "log"),
         annotations = list(
           x = 1,
           y = 0.1,
           xref = 'paper',
           yref = 'paper',
           text = paste("Mutation rate p =", mutation_rate),
           showarrow = FALSE
         ))

mutation_plot
# --- 6. Amino Acid Translation and Impact Visualization ---

# Function to translate DNA to amino acids
dna_to_aa <- function(dna_seq) {
  codon_table <- list(
    "ATG" = "Met", "TGG" = "Trp", "TGC" = "Cys", "TGT" = "Cys",
    "TGA" = "STOP", "TAA" = "STOP", "TAG" = "STOP",
    "GCT" = "Ala", "GCC" = "Ala", "GCA" = "Ala", "GCG" = "Ala",
    "CGT" = "Arg", "CGC" = "Arg", "CGA" = "Arg", "CGG" = "Arg", "AGA" = "Arg", "AGG" = "Arg",
    "AAT" = "Asn", "AAC" = "Asn", "GAT" = "Asp", "GAC" = "Asp",
    "TGT" = "Cys", "TGC" = "Cys",
    "CAA" = "Gln", "CAG" = "Gln", "GAA" = "Glu", "GAG" = "Glu",
    "GGT" = "Gly", "GGC" = "Gly", "GGA" = "Gly", "GGG" = "Gly",
    "CAT" = "His", "CAC" = "His",
    "ATT" = "Ile", "ATC" = "Ile", "ATA" = "Ile",
    "CTT" = "Leu", "CTC" = "Leu", "CTA" = "Leu", "CTG" = "Leu", "TTA" = "Leu", "TTG" = "Leu",
    "AAA" = "Lys", "AAG" = "Lys",
    "TTT" = "Phe", "TTC" = "Phe",
    "CCT" = "Pro", "CCC" = "Pro", "CCA" = "Pro", "CCG" = "Pro",
    "TCT" = "Ser", "TCC" = "Ser", "TCA" = "Ser", "TCG" = "Ser", "AGT" = "Ser", "AGC" = "Ser",
    "ACT" = "Thr", "ACC" = "Thr", "ACA" = "Thr", "ACG" = "Thr",
    "TGG" = "Trp",
    "TAT" = "Tyr", "TAC" = "Tyr",
    "GTT" = "Val", "GTC" = "Val", "GTA" = "Val", "GTG" = "Val"
  )
  
  # Split into codons
  codons <- sapply(seq(1, nchar(dna_seq)-2, by=3), function(i) {
    substr(dna_seq, i, i+2)
  })
  
  # Translate to amino acids
  sapply(codons, function(codon) {
    ifelse(codon %in% names(codon_table), codon_table[[codon]], "X")
  })
}

# Translate sequences to amino acids
wild_aa <- dna_to_aa(substr(wild_type_seq, 1, 30))
mutated_aa <- dna_to_aa(substr(mutated_seq, 1, 30))

# Create amino acid visualization
aa_df <- data.frame(
  position = 1:length(wild_aa),
  wild_type = wild_aa,
  mutated = mutated_aa,
  changed = wild_aa != mutated_aa
)

aa_plot <- plot_ly(aa_df, x = ~position) %>%
  add_trace(y = ~wild_type, type = 'scatter', mode = 'markers+text',
            text = ~wild_type, textposition = 'top',
            marker = list(color = 'lightblue', size = 15),
            name = 'Wild-type') %>%
  add_trace(y = ~mutated, type = 'scatter', mode = 'markers+text',
            text = ~mutated, textposition = 'bottom',
            marker = list(color = 'pink', size = 15),
            name = 'Mutated') %>%
  add_trace(data = aa_df[aa_df$changed, ],
            y = ~wild_type, type = 'scatter', mode = 'markers',
            marker = list(color = 'red', size = 20, symbol = 'diamond'),
            name = 'Mutation') %>%
  layout(title = "Amino Acid Sequence Comparison",
         xaxis = list(title = "Position"),
         yaxis = list(title = "Amino Acid", tickmode = 'array',
                      tickvals = unique(c(wild_aa, mutated_aa))))

aa_plot

The DNA structure visualization shows 3D representation of both wild-type and mutated DNA helices, color-coded bases (A=green, T=red, G=blue, C=orange) with clear highlighting of the mutation site. The mutation Simulation demonstrates the specific \(GAG\to GTG\) mutation that causes sickle cell anemia and shows how this changes the amino acid from glutamic acid to valine. Protein structure effects include simplified representation of normal hemoglobin vs. sickled hemoglobin and visualization of how the mutation leads to protein aggregation. 2D base pairing diagrams show base pairing before and after mutation and highlight the specific codon change. Mouse hover information about each component provides explanatory text about the biological significance. This simulation depicts a single point mutation in DNA that leads to structural changes in proteins, ultimately causing disease.