First analysis- 30mer pLDDT #3

spencer2234 · 2025-06-16T21:23:02Z

Fixes #7
Fixes #8
Fixes #9
Fixes #15
Fixes #16

ljwoods2

Overall looks good, can you add a plot toresults/figures using the graph method you created for the analysis in this notebook? Would be cool to show John and look back on later to compare with other methods

src/af3_linear_epitopes/statistics.py

notebooks/30mer_pLDDT.py

src/af3_linear_epitopes/statistics.py

notebooks/30mer_pLDDT.py

src/af3_linear_epitopes/statistics.py

ljwoods2 · 2025-06-18T00:37:11Z

notebooks/30mer_pLDDT.py

+
+
+# %%
+def plot_epitope_non_epitope_stats_9mer(


Hmm, it's not intuitive from looking at the plot what the minimum represents here. I think the legend should be more descriptive: this is the mean per-amino acid minimum pLDDT for the 9mer in each 30mer with the lowest minimum single pLDDT. Probably a better way to word this, but the plot's a bit misleading. Could make the mean legend more descriptive, too.

Also, is this identical to the 30mer mean min? I think the calculation works out the same

Yep, from looking at the values, I think these are equivalent- I'd recommend removing min min from this plot.

notebooks/30mer_pLDDT.py

ljwoods2 · 2025-06-19T16:43:15Z

src/af3_linear_epitopes/statistics.py

+    print("max:" + str(max_pLDDT))
+    min_pLDDT = dataset.select(pl.col(colname)).min().item()
+    print("min:" + str(min_pLDDT))
+    mean_pLDDT = dataset.select(pl.col(colname)).to_series()


remove mean_pLDDT, looks like it isn't doing anything

ljwoods2 · 2025-06-19T23:35:56Z

workflows/bin/statistics_peptide.py

+    all_statistics,
+    "data/hv/peptide/inference",
+)
+


add your mass + helix / beta sheet feature extraction methods here

spencer2234 · 2025-06-24T00:40:48Z

@ljwoods2 just pushed the code wasn't able to do everything I wanted but I got a good start

…nd min of the atomic weights of the 9-mers

ljwoods2 · 2025-06-27T16:31:53Z

@spencer2234 can you try using max bepipred score per 30mer as a feature instead of mean? I think that's potentially a more fair way to compare

spencer2234 · 2025-06-27T23:47:16Z

@ljwoods2 checkout the hv_class and in_class folders in notebooks

ljwoods2 · 2025-06-30T17:42:27Z

workflows/bin/statistics_hv_in_class.py

@@ -0,0 +1,99 @@
+import polars as pl


Write a brief docstring at the top of each of these feature extraction scripts describing which features they're going to extract- it's not clear from name alone what this is meant to do

ljwoods2 · 2025-06-30T17:45:35Z

notebooks/in_class/rsa_sa_in_class_AUC.py

+y_hat_RSA_fp = st.normalized_pLDDT_30mer(all_statistics_in_class_fp, "mean_rsa_slice")
+y_true_RSA = all_statistics_in_class_fp.select(pl.col("epitope"))
+in_class_norm_rsa_mean_30mer_ROC = st.plot_auc_roc_curve(
+    y_true_RSA, y_hat_RSA_fp, "in_class Normalized mean RSA values for 30mer fp ROC"


For your poster, change the titles of these figures so that they don't say "in_class" as the dataset name. I would say "IN1 30mer classification set" as the name or something similar, and then you can define that in the text of the poster.

Same goes for other figures

ljwoods2 · 2025-06-30T18:09:06Z

notebooks/in_class/rsa_sa_in_class_AUC.py

+in_class_norm_rsa_mean_30mer_ROC = st.plot_auc_roc_curve(
+    y_true_RSA, y_hat_RSA_fp, "in_class Normalized mean RSA values for 30mer fp ROC"
+)
+in_class_norm_rsa_mean_30mer_ROC.savefig(


AUC curve is flipped, fix this so AUC > 0.5

Same goes for other flipped AUC curves

ljwoods2 · 2025-06-30T18:11:09Z

notebooks/in_class/rsa_sa_in_class_AUC.py

+
+
+# %%
+y_hat_RSA_fp = st.normalized_pLDDT_30mer(all_statistics_in_class_fp, "mean_rsa_slice")


There's quite a few steps leading up to this, so add a markdown cell above this cell describing what this plot shows. The scoring method isn't immediately obvious: all instances of all 30mers across the focal proteins they appeared in, which allows duplicate 30mers, each 30mer annotated with a true/false value extracted from PepSeq (assay)

ljwoods2 · 2025-06-30T19:22:56Z

notebooks/hv_class/hv_class_scratch.py

+)
+
+# %%
+fp_aggrigate_30mer = all_statistics_hv_class_fp.group_by("peptide").agg(


this plot and the one above it are both using the column "mean_pLDDT_slice" but this one refers to the metric as "geometric mean pLDDT"- is it using geometric mean or not? Should rename whichever is incorrect

ljwoods2 · 2025-06-30T19:53:58Z

notebooks/hv_class/hv_class_scratch.py

+)
+
+# %%
+mean_auc = fp_aggrigate_9mer.select("AUC").mean()


AUC is always None here, something is wrong

Same with the equivalent cell in in_data.py

Spencer Romero added 2 commits June 16, 2025 14:22

test command

2676033

30mer basic data anaylse

9ea8050

ljwoods2 reviewed Jun 16, 2025

View reviewed changes

ljwoods2 changed the title ~~test command~~ First analysis- 30mer pLDDT Jun 16, 2025

Spencer Romero and others added 2 commits June 17, 2025 16:18

all pLDDT statistics for epitopes and non-epitopes as well as pae values

d4c6f69

Merge branch 'main' into spencer-changes

d6d9a31

ljwoods2 requested changes Jun 18, 2025

View reviewed changes

Spencer Romero added 3 commits June 18, 2025 16:55

The ROC graphs and code

d2936e9

Merge branch 'main' into spencer-changes

2c7506a

New Data Graphs

a401670

ljwoods2 reviewed Jun 19, 2025

View reviewed changes

check the mdanalysis code to make sure it is right

c7d9b8c

ljwoods2 reviewed Jun 19, 2025

View reviewed changes

The focal protein statistical data and graphs

74d993c

Spencer Romero and others added 7 commits June 25, 2025 10:20

Atomic weight data and cleanup

44926a3

Shows the AUC scores for each focal protein based on the mean, max, a…

06b8f86

…nd min of the atomic weights of the 9-mers

RSA script

946e6a5

Merge branch 'main' into spencer-changes

e3dafdf

first stab at RSA calc NF script

163fa39

add mdakit_sasa as dep

87b250a

add shebang

00bddba

Spencer Romero added 5 commits June 27, 2025 09:46

RSA values and Bepi graphs

a3e2697

gitignore

1c3fb5b

Merge branch 'main' into spencer-changes

b6a33af

Merge branch 'main' into spencer-changes

b0c60ab

data on new HV1 set and IN1 set

aa31cb1

convert paths to relative

89220f7

ljwoods2 requested changes Jun 30, 2025

View reviewed changes

ljwoods2 and others added 6 commits June 30, 2025 12:56

convert in_data to py markdown

2252786

updated hv_class and in_class notebook

8f5225d

resolving merge conflicts

a056699

sh commit

517bfd9

merge

f8e3730

structure results

6a1d105



		# %%
		y_hat_RSA_fp = st.normalized_pLDDT_30mer(all_statistics_in_class_fp, "mean_rsa_slice")



		# %%
		def plot_epitope_non_epitope_stats_9mer(

First analysis- 30mer pLDDT #3

Are you sure you want to change the base?

First analysis- 30mer pLDDT #3

Uh oh!

Conversation

spencer2234 commented Jun 16, 2025 • edited by ljwoods2 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljwoods2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spencer2234 commented Jun 24, 2025

Uh oh!

ljwoods2 commented Jun 27, 2025

Uh oh!

spencer2234 commented Jun 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spencer2234 commented Jun 16, 2025 •

edited by ljwoods2

Loading