You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/public/content/annotations/bertozzi-stickies.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,13 +27,13 @@ description: The rubber hits the road on ColabFold! I hope that's rubber I'm sme
27
27
28
28
>%%
29
29
>```annotation-json
30
-
>{"text":"Some context — Dr. Bertozzi was one of the the [2022 Nobel Laureates in Chemistry]](https://www.nobelprize.org/prizes/chemistry/2022/bertozzi/facts/) for her contributions to developing [click](https://en.wikipedia.org/wiki/Click_chemistry) and [bioorthogonal](https://en.wikipedia.org/wiki/Bioorthogonal_chemistry) chemistries.\n\nMy understanding is that the basis for a lot of mucin research, especially in the synthesis space, is built on click chemistry — a general overview being that click chemistry is good for attaching complicated things to other complicated things. That explanation does the process zero justice, but it's the best I've got.","target":[{"source":"https://www.nature.com/articles/s41587-023-01840-6.pdf","selector":[{"type":"TextPositionSelector","start":622,"end":644},{"type":"TextQuoteSelector","exact":" Carolyn R. Bertozzi ","prefix":"Weaver 4,9, Heinz Läubli2,3 &","suffix":"1,10 Targeted protein degrada"}]}],"created":"2026-02-28T19:55:01.027Z","updated":"2026-02-28T19:55:01.027Z","document":{"title":"Design of a mucin-selective protease for targeted degradation of cancer-associated mucins","link":[{"href":"urn:x-pdf:05688d5cb251214f88ff40cb330bdcef"},{"href":"https://www.nature.com/articles/s41587-023-01840-6.pdf"}],"documentFingerprint":"05688d5cb251214f88ff40cb330bdcef"},"uri":"https://www.nature.com/articles/s41587-023-01840-6.pdf"}
30
+
>{"text":"Some context — Dr. Bertozzi was one of the the [2022 Nobel Laureates in Chemistry](https://www.nobelprize.org/prizes/chemistry/2022/bertozzi/facts/) for her contributions to developing [click](https://en.wikipedia.org/wiki/Click_chemistry) and [bioorthogonal](https://en.wikipedia.org/wiki/Bioorthogonal_chemistry) chemistries.\n\nMy understanding is that the basis for a lot of mucin research, especially in the synthesis space, is built on click chemistry — a general overview being that click chemistry is good for attaching complicated things to other complicated things. That explanation does the process zero justice, but it's the best I've got.","target":[{"source":"https://www.nature.com/articles/s41587-023-01840-6.pdf","selector":[{"type":"TextPositionSelector","start":622,"end":644},{"type":"TextQuoteSelector","exact":" Carolyn R. Bertozzi ","prefix":"Weaver 4,9, Heinz Läubli2,3 &","suffix":"1,10 Targeted protein degrada"}]}],"created":"2026-02-28T19:55:01.027Z","updated":"2026-02-28T19:55:01.027Z","document":{"title":"Design of a mucin-selective protease for targeted degradation of cancer-associated mucins","link":[{"href":"urn:x-pdf:05688d5cb251214f88ff40cb330bdcef"},{"href":"https://www.nature.com/articles/s41587-023-01840-6.pdf"}],"documentFingerprint":"05688d5cb251214f88ff40cb330bdcef"},"uri":"https://www.nature.com/articles/s41587-023-01840-6.pdf"}
31
31
>```
32
32
>%%
33
33
>*%%PREFIX%%Weaver 4,9, Heinz Läubli2,3 &%%HIGHLIGHT%% ==Carolyn R. Bertozzi== %%POSTFIX%%1,10 Targeted protein degrada*
34
34
>%%LINK%%[[#^e48qxcfcrom|show annotation]]
35
35
>%%COMMENT%%
36
-
>Some context — Dr. Bertozzi was one of the the [2022 Nobel Laureates in Chemistry]](https://www.nobelprize.org/prizes/chemistry/2022/bertozzi/facts/) for her contributions to developing [click](https://en.wikipedia.org/wiki/Click_chemistry) and [bioorthogonal](https://en.wikipedia.org/wiki/Bioorthogonal_chemistry) chemistries.
36
+
>Some context — Dr. Bertozzi was one of the the [2022 Nobel Laureates in Chemistry](https://www.nobelprize.org/prizes/chemistry/2022/bertozzi/facts/) for her contributions to developing [click](https://en.wikipedia.org/wiki/Click_chemistry) and [bioorthogonal](https://en.wikipedia.org/wiki/Bioorthogonal_chemistry) chemistries.
37
37
>
38
38
>My understanding is that the basis for a lot of mucin research, especially in the synthesis space, is built on click chemistry — a general overview being that click chemistry is good for attaching complicated things to other complicated things. That explanation does the process zero justice, but it's the best I've got.
39
39
>%%TAGS%%
@@ -43,7 +43,7 @@ description: The rubber hits the road on ColabFold! I hope that's rubber I'm sme
43
43
44
44
>%%
45
45
>```annotation-json
46
-
>{"text":"This is where my interest piques, since I'd need to bug my partner for ELI5 explanations of any of the research up to this point.\n\nI'm here to know how exactly they used ColabFold for this particular problem domain. From the core ColabFold paper, there's quite a few [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) that allow us to make sure the model best matches the environment it's seeking to emulate, as well as expose the relevant information for further study/replication/confirmation.\n\nIn a best-effort attempt to confirm this, I looked into the upstream paper referenced in the ColabFold method, where they are confirming the AlphaFold result with the [upstream Yu et al. paper](https://pubmed.ncbi.nlm.nih.gov/22483117/) investigating StcE specifically. The associated data for that paper references [3UJZ: Crystal Structure Of Enterohemorrhagic E. Coli Stce](https://www.ncbi.nlm.nih.gov/Structure/pdb/3UJZ), which — *I think* — is the experimentally-determined StcE structure. There's an [associated plaintext amino acid sequence](https://www.rcsb.org/fasta/entry/3UJZ/display) that we can pop into a `.fasta` file and feed to `localcolabfold` and... hopefully just get the same structure this paper got, but with the full ColabFold statistical report?\n\nComparing the outputs of our run versus this paper's run, then, we either **do**, or **don't** get the same structure:\n\n- If we **do** get the same structure, we can be fairly confident that this paper is also just using the default `localcolabfold` hyperparameters from their sample run, and have some comfort in continuing to use those hyperparameters in similar scenarios; or\n- If we **don't** get the same structure, we can assume they used different hyperparameters that aren't here, or in the supplementary materials, and we may need to reach out and ask what hyperparameters they used.\n\n---\n\nWell, it was audacious to expect a clear-cut answer here. After [using `localcolabfold` under sample hparams](https://github.com/chaoticgoodcomputing/chaoticgoodcomputing.github.io/blob/main/content/public/assets/3UJZ/README) to categorize the 3UJZ sequence, and coloring it to the same domain coloring map available at the [NIH 3UJZ source](https://www.ncbi.nlm.nih.gov/Structure/pdb/3UJZ), I'm getting... something vaguely similar. From the [Relaxed, Rank 1 PDB](content/public/assets/3UJZ/3UJZ_1_Chain_A_Metalloprotease_stcE_Escherichia_coli__83334__relaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb):\n\n![[/assets/Pasted image 20260228142647.png]]\n\nWe're about to get real fuzzy, here.\n\nThe Y shape demonstrated in the paper's results does seem to be present, although not quote as cleanly as the sample figure. Additionally, my assumption (with fingers crossed) was that the C and INS domains were in the 5 domains from the NIH source — although I'm not sure this ended up being the case.\n\nAs a more quantitative source that we do have a hub-and-spoke with three offshoots, though, we can take a look at the error graph:\n\n![[/assets/Pasted image 20260228143701.png]]\n\nFrom my understanding on how to read this chart from the upstream [[/annotations/protein-folding-for-fun|ColabFold paper]] — specifically, the extended figure from the [bioarXiv pre-print](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v1.full.pdf), area that have low confidence, but high consensus across the multiple models, may correspond to generally flexible offshoots to the core rigid structure of the protein. If that's a correct understanding, those three uncertain regions would correspond to three offshoots, two of which are likely the C and INS domains mentioned.\n\nThe best conclusion I can take away, then, is that the ColabFold defaults are likely *good enough* for cursory glances, but would need to be better understood.\n\nMy secondary conclusion, though, is that AlphaFold is generally a precursory/investigatory garnish that can assist in an exploratory phase. We can see here that it was used for just a handful of figures, to visually highlight important information, but is (obviously) no substitute for experimental evidence. It's a pair of binoculars to look closer at where you're headed, not the thing that gets you there.","target":[{"source":"https://www.nature.com/articles/s41587-023-01840-6.pdf","selector":[{"type":"TextPositionSelector","start":25391,"end":25699},{"type":"TextQuoteSelector","exact":"Fig. 2 | Structure-guided engineering of StcE yields mutants of reduced activity, binding and size. a, Structure of StcE, as predicted by ColabFold (Methods)62, with the C domain (purple) and INS domain (blue) highlighted. The Zn2+ active site is depicted in orange, while mutated residues are shown in teal.","prefix":"cell death in both populations","suffix":"b, Digestion of IRDye 800CW-lab"}]}],"created":"2026-02-28T20:06:09.538Z","updated":"2026-02-28T20:06:09.538Z","document":{"title":"Design of a mucin-selective protease for targeted degradation of cancer-associated mucins","link":[{"href":"urn:x-pdf:05688d5cb251214f88ff40cb330bdcef"},{"href":"https://www.nature.com/articles/s41587-023-01840-6.pdf"}],"documentFingerprint":"05688d5cb251214f88ff40cb330bdcef"},"uri":"https://www.nature.com/articles/s41587-023-01840-6.pdf"}
46
+
>{"text":"This is where my interest piques, since I'd need to bug my partner for ELI5 explanations of any of the research up to this point.\n\nI'm here to know how exactly they used ColabFold for this particular problem domain. From the core ColabFold paper, there's quite a few [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) that allow us to make sure the model best matches the environment it's seeking to emulate, as well as expose the relevant information for further study/replication/confirmation.\n\nIn a best-effort attempt to confirm this, I looked into the upstream paper referenced in the ColabFold method, where they are confirming the AlphaFold result with the [upstream Yu et al. paper](https://pubmed.ncbi.nlm.nih.gov/22483117/) investigating StcE specifically. The associated data for that paper references [3UJZ: Crystal Structure Of Enterohemorrhagic E. Coli Stce](https://www.ncbi.nlm.nih.gov/Structure/pdb/3UJZ), which — *I think* — is the experimentally-determined StcE structure. There's an [associated plaintext amino acid sequence](https://www.rcsb.org/fasta/entry/3UJZ/display) that we can pop into a `.fasta` file and feed to `localcolabfold` and... hopefully just get the same structure this paper got, but with the full ColabFold statistical report?\n\nComparing the outputs of our run versus this paper's run, then, we either **do**, or **don't** get the same structure:\n\n- If we **do** get the same structure, we can be fairly confident that this paper is also just using the default `localcolabfold` hyperparameters from their sample run, and have some comfort in continuing to use those hyperparameters in similar scenarios; or\n- If we **don't** get the same structure, we can assume they used different hyperparameters that aren't here, or in the supplementary materials, and we may need to reach out and ask what hyperparameters they used.\n\n---\n\nWell, it was audacious to expect a clear-cut answer here. After [using `localcolabfold` under sample hparams](https://github.com/chaoticgoodcomputing/chaoticgoodcomputing.github.io/blob/main/content/public/assets/3UJZ/README) to categorize the 3UJZ sequence, and coloring it to the same domain coloring map available at the [NIH 3UJZ source](https://www.ncbi.nlm.nih.gov/Structure/pdb/3UJZ), I'm getting... something vaguely similar. From the [Relaxed, Rank 1 PDB](content/public/assets/3UJZ/3UJZ_1_Chain_A_Metalloprotease_stcE_Escherichia_coli__83334__relaxed_rank_001_alphafold2_ptm_model_3_seed_000.pdb):\n\n![[/assets/Pasted image 20260228142647.png]]\n\nWe're about to get real fuzzy, here.\n\nThe Y shape demonstrated in the paper's results does seem to be present, although not quote as cleanly as the sample figure. Additionally, my assumption (with fingers crossed) was that the C and INS domains were in the 5 domains from the NIH source. I'm not sure this ended up being the case.\n\nAs a more quantitative source that we do have a hub-and-spoke with three offshoots, though, we can take a look at the error graph:\n\n![[/assets/Pasted image 20260228143701.png]]\n\nFrom my understanding on how to read this chart from the upstream [[/annotations/protein-folding-for-fun|ColabFold paper]] — specifically, the extended figure from the [bioarXiv pre-print](https://www.biorxiv.org/content/10.1101/2021.08.15.456425v1.full.pdf), area that have low confidence, but high consensus across the multiple models, may correspond to generally flexible offshoots to the core rigid structure of the protein. If that's a correct understanding, those three uncertain regions would correspond to three offshoots, two of which are likely the C and INS domains mentioned.\n\nThe best conclusion I can take away, then, is that the ColabFold defaults are likely *good enough* for cursory glances, but would need to be better understood.\n\nMy secondary conclusion, though, is that AlphaFold is generally a precursory/investigatory garnish that can assist in an exploratory phase. We can see here that it was used for just a handful of figures, to visually highlight important information, but is (obviously) no substitute for experimental evidence. It's a pair of binoculars to look closer at where you're headed, not the thing that gets you there.","target":[{"source":"https://www.nature.com/articles/s41587-023-01840-6.pdf","selector":[{"type":"TextPositionSelector","start":25391,"end":25699},{"type":"TextQuoteSelector","exact":"Fig. 2 | Structure-guided engineering of StcE yields mutants of reduced activity, binding and size. a, Structure of StcE, as predicted by ColabFold (Methods)62, with the C domain (purple) and INS domain (blue) highlighted. The Zn2+ active site is depicted in orange, while mutated residues are shown in teal.","prefix":"cell death in both populations","suffix":"b, Digestion of IRDye 800CW-lab"}]}],"created":"2026-02-28T20:06:09.538Z","updated":"2026-02-28T20:06:09.538Z","document":{"title":"Design of a mucin-selective protease for targeted degradation of cancer-associated mucins","link":[{"href":"urn:x-pdf:05688d5cb251214f88ff40cb330bdcef"},{"href":"https://www.nature.com/articles/s41587-023-01840-6.pdf"}],"documentFingerprint":"05688d5cb251214f88ff40cb330bdcef"},"uri":"https://www.nature.com/articles/s41587-023-01840-6.pdf"}
47
47
>```
48
48
>%%
49
49
>*%%PREFIX%%cell death in both populations%%HIGHLIGHT%% ==Fig. 2 | Structure-guided engineering of StcE yields mutants of reduced activity, binding and size. a, Structure of StcE, as predicted by ColabFold (Methods)62, with the C domain (purple) and INS domain (blue) highlighted. The Zn2+ active site is depicted in orange, while mutated residues are shown in teal.== %%POSTFIX%%b, Digestion of IRDye 800CW-lab*
@@ -68,7 +68,7 @@ description: The rubber hits the road on ColabFold! I hope that's rubber I'm sme
68
68
>
69
69
>We're about to get real fuzzy, here.
70
70
>
71
-
>The Y shape demonstrated in the paper's results does seem to be present, although not quote as cleanly as the sample figure. Additionally, my assumption (with fingers crossed) was that the C and INS domains were in the 5 domains from the NIH source — although I'm not sure this ended up being the case.
71
+
>The Y shape demonstrated in the paper's results does seem to be present, although not quote as cleanly as the sample figure. Additionally, my assumption (with fingers crossed) was that the C and INS domains were in the 5 domains from the NIH source. I'm not sure this ended up being the case.
72
72
>
73
73
>As a more quantitative source that we do have a hub-and-spoke with three offshoots, though, we can take a look at the error graph:
0 commit comments