Skip to content

Conversation

spencer2234
Copy link
Collaborator

@spencer2234 spencer2234 commented Jun 16, 2025

Fixes #7
Fixes #8
Fixes #9
Fixes #15
Fixes #16

Copy link
Collaborator

@ljwoods2 ljwoods2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, can you add a plot toresults/figures using the graph method you created for the analysis in this notebook? Would be cool to show John and look back on later to compare with other methods

@ljwoods2 ljwoods2 changed the title test command First analysis- 30mer pLDDT Jun 16, 2025


# %%
def plot_epitope_non_epitope_stats_9mer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's not intuitive from looking at the plot what the minimum represents here. I think the legend should be more descriptive: this is the mean per-amino acid minimum pLDDT for the 9mer in each 30mer with the lowest minimum single pLDDT. Probably a better way to word this, but the plot's a bit misleading. Could make the mean legend more descriptive, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this identical to the 30mer mean min? I think the calculation works out the same

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, from looking at the values, I think these are equivalent- I'd recommend removing min min from this plot.

print("max:" + str(max_pLDDT))
min_pLDDT = dataset.select(pl.col(colname)).min().item()
print("min:" + str(min_pLDDT))
mean_pLDDT = dataset.select(pl.col(colname)).to_series()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove mean_pLDDT, looks like it isn't doing anything

all_statistics,
"data/hv/peptide/inference",
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add your mass + helix / beta sheet feature extraction methods here

@spencer2234
Copy link
Collaborator Author

@ljwoods2 just pushed the code wasn't able to do everything I wanted but I got a good start

@ljwoods2
Copy link
Collaborator

@spencer2234 can you try using max bepipred score per 30mer as a feature instead of mean? I think that's potentially a more fair way to compare

@spencer2234
Copy link
Collaborator Author

@ljwoods2 checkout the hv_class and in_class folders in notebooks

@@ -0,0 +1,99 @@
import polars as pl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a brief docstring at the top of each of these feature extraction scripts describing which features they're going to extract- it's not clear from name alone what this is meant to do

y_hat_RSA_fp = st.normalized_pLDDT_30mer(all_statistics_in_class_fp, "mean_rsa_slice")
y_true_RSA = all_statistics_in_class_fp.select(pl.col("epitope"))
in_class_norm_rsa_mean_30mer_ROC = st.plot_auc_roc_curve(
y_true_RSA, y_hat_RSA_fp, "in_class Normalized mean RSA values for 30mer fp ROC"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your poster, change the titles of these figures so that they don't say "in_class" as the dataset name. I would say "IN1 30mer classification set" as the name or something similar, and then you can define that in the text of the poster.

Same goes for other figures

in_class_norm_rsa_mean_30mer_ROC = st.plot_auc_roc_curve(
y_true_RSA, y_hat_RSA_fp, "in_class Normalized mean RSA values for 30mer fp ROC"
)
in_class_norm_rsa_mean_30mer_ROC.savefig(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUC curve is flipped, fix this so AUC > 0.5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for other flipped AUC curves



# %%
y_hat_RSA_fp = st.normalized_pLDDT_30mer(all_statistics_in_class_fp, "mean_rsa_slice")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite a few steps leading up to this, so add a markdown cell above this cell describing what this plot shows. The scoring method isn't immediately obvious: all instances of all 30mers across the focal proteins they appeared in, which allows duplicate 30mers, each 30mer annotated with a true/false value extracted from PepSeq (assay)

)

# %%
fp_aggrigate_30mer = all_statistics_hv_class_fp.group_by("peptide").agg(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this plot and the one above it are both using the column "mean_pLDDT_slice" but this one refers to the metric as "geometric mean pLDDT"- is it using geometric mean or not? Should rename whichever is incorrect

)

# %%
mean_auc = fp_aggrigate_9mer.select("AUC").mean()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUC is always None here, something is wrong

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the equivalent cell in in_data.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants