Sometimes known as 128.95.2.92 when I forget to log in, I am a Ph.D. student in the Computer Science and Engineering department and the
Computational Molecular Biology program at the
University of Washington, in sunny
Seattle.
I added a lot to the article on endosymbionts after working on a related course project, and then I wrote a tiny tsetse fly article. Then I got tired of recording what I wrote (didn't take long).
Home page: http://www.cs.washington.edu/homes/zasha
Things I'm thinking of doing (i.e. notes to myself). You're welcome to beat me to the punch.
- SNPs (Single Nucleotide Polymorphism). Don't appear to be in Wikipedia and should be. Also probably the polymorphism stuff could link to it better. polymorphism could link to RFLP and microsatellites and maybe SINES/LINES as examples of polymorphism. How are SNPs found, how can an individual genome be classified in terms of which SNPs it has. Applications of SNPs: identifying disease/phenotype genes or genes correlated with something (i.e. extension to linkage analysis-based mapping); pharmacogenetics (or, sigh, pharmacogenomics), i.e. can we discover genetic correlations of SNPs to the effect of drugs, and thereby screen people to see what dose is appropriate, and which drugs are more likely to be effective / have side-effects; understanding human/other genetic variation in general, e.g. for the phylogeny folks. Paper with promising title: "Toward positional cloning with SNPs", Morton NE, Collins A. Actually search in PubMed: "positional cloning of disease genes AND review".
- Related to SNPs, there doesn't seem to be anything on linkage, linkage analysis, linkage groups or linkage disequilibrium, or for that matter, anything on how disease (or other) genes can be identified. That seems a major omission; I should make sure it's not there under another name. Just checked that Cystic fibrosis doesn't discuss this. Here's a possibly helpful review: "The molecular basis of genetic disease", Boehm CD, Kazazian HH Jr. PubMed search: "cystic fibrosis AND gene AND review". http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?219700#MAPPING and http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?143100#MAPPING
- EST/Expressed Sequence Tag, hopefully mRNA/cDNA and all that stuff is already there...
- Haplotype should exist somewhere.
- RNA has some weirdness wrt siRNA and RNAi. The article suggests that siRNA is believed to have evolved as a defence against viruses. I thought this was just RNAi against dsRNA. What's the scoop on this?
- Get that article on DNA fingerprinting from Genetics 371 and maybe inject it into Genetic fingerprinting.
- Maybe I could do something with Genetic recombination
- Dominance Relationships looks fishy. I don't think that Sickle Cell Anemia is an example of incomplete dominance, but I'm pretty confused by these terms now (plus they're somewhat subjective and seem to be used differently by different people). Anyway, have a look at this, and maybe the terms over-dominance/heterozygote advantage are appropriate, although I think these terms refer to how the phenotypes of homo/het/homo relate to selective pressure, rather than just the literal phenotype. While I'm at it, the notes at the end could be moved into the text, and the statements that the homozygous trait is more "serious" is vague.
- Article on cloning should reflect the usage of identifying a gene associated with a phenotype. Done.
- Yeast two-hybrid system. Article should describe the technique from Fields and Song, and then discuss variations (use of yeast mating types, using other organisms, large-scale screens and pooling-and-sequencing strains, 1-hybrid/3-hybrid, alternatives to GAL4, other reporter genes)
- The various forms of nucleotides, e.g. the Uracil article should mention its relationship with Uridylate.
- RNA gene: write stuff on how RNA genes can be identified (mutations, isolate RNA, computational). I think the flow of this article could use some work.
- miRNA: should be able to flesh this out more, and once I try out RNA folding programs, maybe I could give an example picture of a folded precursor miRNA (without violating copyright by stealing it from an miRNA paper...)
- RNA: article flows weirdly - need to think about this. Maybe write something on RNA folding?
- pages that link to rRNA or tRNA should now link to RNA gene. Have a look at this after a couple of weeks, to see if there are any major objections to splitting off the RNA gene page.
- dynamic programming is light on content and could use an example (the current content is pretty abstract). On the other hand, there's lots of books/web pages about dyn prog, so maybe the payoff is not that big.
- gene article should get some perspective about non-coding RNA genes.
Half-baked feature requests pending time for me to think seriously about them & post them:
- organization of most-wanted pages. When the feature was enabled I remember many articles coming up, many which I don't know anything about and are on related topics (e.g. Olympic Games of various years). makes it hard to get to things I could work on. Seems technically difficult, but maybe something reasonable to do. Performance will likely be bad, but it'd be fine to just do the query every day or so (or, like Google, cache the link structure info, but I don't really see the need for up-to-the-minute results.) Two overlapping tactics:
- The personalization angle. Wikipedia tries to guess what I'm capable of editing based on <brainstorming>pages I've edited, how frequently I edited those pages, how much text I added, pages in my watchlist</brainstorming> and perhaps the link structure of Wikipedia. Perhaps it could see that olympics is very far in # of links from things I've touched, so I'm more likely to work on other things. could be a complex google-ish alg. have to deal with links for dates, etc, that are in every page (i.e. discount frequent links that are probably not topic-specific). a perhaps simpler thing would be the latent semantic/Info Retrieval idea: take articles that link to the most-wanted page, count words. take articles I edit/watch & count words. compare word vectors & rank pages by that way. should be able to return pages first that are more similar to pages I edit. # of links to the page (the current sort order) could be factored in in some way, but I'd suggest it's less signif than what I might know something about (i.e. the site would rather bring to my attn a page that's not so important but I can do something about, than a very important page that I know nothing about)
- The HCI angle. Try to organize the most-wanteds in some way that it's quicker for me to skip over things that I'm not likely to work on. One possibility: sort by ancestors in the link structure. display the list of ancestores for every article. when multiple articles are from the same ancestor, just show the first with a "and other things..." e.g. I'm hoping that the olympics would say "1940 Olympic Games, link:view others in this category". If I knew something about olympics, I'd expand the "view others". Since I don't, I wouldn't and would skip over that. This idea would probably be pretty feasible if you just went 1 level in the graph, and this would catch a lot, because presumably most of the olympics articles are referenced from the same pages. Oh yeah, they have multiple parents; that's why they're most wanted. Related idea would be various clustering algs, and present the clusters, which link to articles in that cluster.
- option to collapse adjacent edits in history where both edits are by me. often I do an edit and it seems fine in preview, and then after submitting I realize I did something dumb (i.e. I acknowledge this is my fault). It'd be nice if I could just collapse the edits & fix it. Since both edits are by me, this seems safe. auto-collapsing is bad because I might have done very different things in the edits & want them to be separate (e.g. someone can decide that one of my edits is okay, but the other is stupid). ability to collapse other users' edits is reasonable, but might be like the auto-collapse since some users would just like collapsing things without necessarily looking at the logic of the edits.
Possible policy changes:
- Current policy is that a term should only be linked the first time it's used. Clearly, it's annoying if everyuse gets linked, but I think this policy can be somewhat over-applied. For example, in a medium or long article with many sections, it may be appropriate to have redundant links in the different sections; a reader may skip the sections they're not as interested in, and only read a later section. I think it's sometimes appropriate to have redundant links in some of the different sections for this reason.
All Wikipedia text
is available under the
terms of the GNU Free Documentation License