Encyclopedia > Wikipedia:Database queries

  Article Content

Wikipedia:Database queries

As an Administrator, you can directly query the MySQL database. Please be careful with this; our database is very large, and overly complex queries can seriously bog down the server.

Queries are entered on the special:Asksql page.

  • When copying a query from this page, some browsers copy a '*' in front of the SELECT statement. This has to be removed!
  • Most of these queries return article titles. These do not contain the namespace!
    • The namespaces have numeric codes in the separate cur_namespace field (0 is the regular article namespace):
      1. Talk
      2. User
      3. User talk
      4. Wikipedia
      5. Wikipedia talk
      6. Image
      7. Image talk

To avoid time-consuming or malformed queries, here are some useful suggestions (please add more - some requests might be found on this page's talk page):

Table of contents

Modifiers To limit results:

"SELECT ... LIMIT 20".
You can also add an offset;
"SELECT ... LIMIT 100, 20" will give you 20 records starting at the 101st.

Queries to find new users

Here's a couple of queries to find the 20 most newly created users - useful to find people to give welcome messages. It only counts users that have made edits (most users who create a login never make a single contribution and are just passing by, so it's not really worth welcoming them).

  • SELECT user_name, COUNT(*) FROM user, cur WHERE user_id=cur_user GROUP BY user_id ORDER BY user_id DESC LIMIT 20

The above query gives the user names of the 20 most recently created users and the number of edits they have made. The query is not perfect - only edits that have not been 'overwritten' by other edits are counted here. To do the same thing looking only at 'overwritten' edits use:

  • SELECT user_name, COUNT(*) FROM user, old WHERE user_id=old_user GROUP BY user_id ORDER BY user_id DESC LIMIT 20

Queries to find stub articles

For short pages containing "see" (takes about 2 seconds):

  • SELECT cur_title FROM cur WHERE cur_is_redirect=0 AND LENGTH(cur_text)<50 AND cur_text LIKE '%see%'
finds all pages with less than 50 chars.

Queries to find inappropriate links

For "double redirects", try this (takes about 4 seconds):

  • SELECT l_from,l_to,cb.cur_text AS rt,cb.cur_title AS ti FROM links,cur AS ca, cur AS cb WHERE ca.cur_is_redirect=1 AND cb.cur_is_redirect=1 AND l_to=cb.cur_id AND l_from=ca.cur_title LIMIT 250
where l_from is the first, ti is the second redirect; rt shows the content of ti.

List all articles that link to a file on http://meta.wikipedia.com (takes 2-3 seconds):

  • SELECT cur_title FROM cur WHERE cur_text LIKE "%http://meta.wikipedia.com/upload/%" AND cur_namespace=0

To find articles that link to jpegs, both external and internal, with the old non-[[Image:]] style (this isn't perfect, it also finds a few other pages):

  • SELECT cur_title FROM cur WHERE cur_text LIKE "%http://%.jpg%" AND cur_namespace=0

For articles containing a link to itself (!!24 seconds!!)

  • SELECT cur_title FROM cur,links WHERE cur_is_redirect=0 AND cur_namespace=0 AND l_from=cur_title AND l_to=cur_id LIMIT 20

For redirects to non-existing pages (note: when a page is found that is redirecting correctly, it usually means that there is text going with the redirect, 2-4 seconds):

  • SELECT cur_title FROM cur,brokenlinks WHERE bl_from=cur_id AND cur_is_redirect=1

Query to find articles without bold markings

Which means they don't have bold titles in the first sentence!
SELECT cur_title FROM cur WHERE cur_namespace=0 AND cur_is_redirect=0 AND cur_text NOT LIKE "%'''%" AND cur_text NOT LIKE "%<b>%" LIMIT 50

Query to find articles without bold markings, that are not disambiguation pages

Same as above only removes disambiguation pages which don't need bolds. SELECT cur_title FROM cur WHERE cur_namespace=0 AND cur_is_redirect=0 AND cur_text NOT LIKE "%'''%" AND cur_text NOT LIKE "%<b>%" AND cur_text NOT LIKE "%disambiguation%" LIMIT 50

Query to find sub-page articles

This finds all articles containing a "/" character; most of these are subpages.

  • SELECT cur_title FROM cur WHERE cur_namespace=0 and cur_title like "%/%" and cur_is_redirect=0

Queries to give statistics

For a count of how many articles are more than 1500 characters long, use:

  • SELECT COUNT(*) FROM cur WHERE LENGTH(cur_text)>1500 AND cur_namespace=0

Queries about blocked users

  • SELECT DISTINCT ipb_address, COUNT(*) AS times, MIN(ipb_reason) AS reason, user_name AS blocker, DATE_FORMAT(MIN(ipb_timestamp), '%b %e %Y %k:%i') AS date FROM ipblocks, user WHERE user_id=ipb_by GROUP BY ipb_address ORDER BY ipb_timestamp

returns the blocked IP addresses, and the first entry for the blocking (some IP addresses have been blocked by multiple people), ordered by date.

This is mostly unnecessary (but may be of interest as an example query), as the Special:Ipblocklist returns similar information.

it's...

Find all pages with "it's" for spellchecking purposes. WARNING: takes a long time. May want to apply a limit (see above)

  • SELECT cur_title from cur where cur_namespace=0 and cur_text regexp "[[:<:]]it's[[:>:]]" order by cur_title

Vandalism

When a certain IP-number or user has been found to be vandalising, it might be that some of their problems may already have been solved by someone else and others not. The following query shows those pages which the vandal (or otherwise problematic user) was the last to edit:

  • SELECT cur_title FROM cur WHERE cur_user_text="name" LIMIT 100

If the vandal is a registered user, use their Username for name, if not, use their IP-number.

Query to find sysops/developers

  • SELECT CONCAT("*[[User:",user_name,"|",user_name,"]]") from user where user_rights != "" ORDER BY user_name LIMIT 100


A detailed description of the database schema and the meaning of various fields is available here (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/%2Acheckout%2A/wikipedia/phpwiki/newcodebase/docs/schema.doc?rev=HEAD&content-type=text/plain).



All Wikipedia text is available under the terms of the GNU Free Documentation License

 
  Search Encyclopedia

Search over one million articles, find something about almost anything!
 
 
  
  Featured Article
Bullying

... period of time without a legitimate basis of authority. The first to have the title of "Tyrant" was Pisistratus in 560 BC. In modern times Tyrant has come to mean a ...

 
 
 
This page was created in 38.3 ms