wikipediaapi
¶
Wikipedia-API is easy to use wrapper for extracting information from Wikipedia.
It supports extracting texts, sections, links, categories, translations, etc. from Wikipedia. Documentation provides code snippets for the most common use cases.
-
wikipediaapi.
namespace2int
(namespace: Union[wikipediaapi.Namespace, int]) → int¶ Converts namespace into integer
-
class
wikipediaapi.
Wikipedia
(user_agent: str, language: str = 'en', extract_format: wikipediaapi.ExtractFormat = <ExtractFormat.WIKI: 1>, headers: Optional[Dict[str, Any]] = None, **kwargs)¶ Wikipedia is wrapper for Wikipedia API.
-
__del__
() → None¶ Closes session.
-
__init__
(user_agent: str, language: str = 'en', extract_format: wikipediaapi.ExtractFormat = <ExtractFormat.WIKI: 1>, headers: Optional[Dict[str, Any]] = None, **kwargs) → None¶ Constructs Wikipedia object for extracting information Wikipedia.
Parameters: - user_agent – HTTP User-Agent used in requests https://meta.wikimedia.org/wiki/User-Agent_policy
- language – Language mutation of Wikipedia - http://meta.wikimedia.org/wiki/List_of_Wikipedias
- extract_format – Format used for extractions
ExtractFormat
object. - headers – Headers sent as part of HTTP request
- kwargs – Optional parameters used in - http://docs.python-requests.org/en/master/api/#requests.request
Examples:
- Proxy:
Wikipedia('foo (merlin@example.com)', proxies={'http': 'http://proxy:1234'})
-
article
(title: str, ns: Union[wikipediaapi.Namespace, int] = <Namespace.MAIN: 0>, unquote: bool = False) → wikipediaapi.WikipediaPage¶ Constructs Wikipedia page with title title.
This function is an alias for
page()
Parameters: - title – page title as used in Wikipedia URL
- ns –
WikiNamespace
- unquote – if true it will unquote title
Returns: object representing
WikipediaPage
-
backlinks
(page: wikipediaapi.WikipediaPage, **kwargs) → Dict[str, wikipediaapi.WikipediaPage]¶ Returns backlinks from other pages with respect to parameters
API Calls for parameters:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bbacklinks
- https://www.mediawiki.org/wiki/API:Backlinks
Parameters: - page –
WikipediaPage
- kwargs – parameters used in API call
Returns: backlinks from other pages
-
categories
(page: wikipediaapi.WikipediaPage, **kwargs) → Dict[str, wikipediaapi.WikipediaPage]¶ Returns categories for page with respect to parameters
API Calls for parameters:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bcategories
- https://www.mediawiki.org/wiki/API:Categories
Parameters: - page –
WikipediaPage
- kwargs – parameters used in API call
Returns: categories for page
-
categorymembers
(page: wikipediaapi.WikipediaPage, **kwargs) → Dict[str, wikipediaapi.WikipediaPage]¶ Returns pages in given category with respect to parameters
API Calls for parameters:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bcategorymembers
- https://www.mediawiki.org/wiki/API:Categorymembers
Parameters: - page –
WikipediaPage
- kwargs – parameters used in API call
Returns: pages in given category
-
extracts
(page: wikipediaapi.WikipediaPage, **kwargs) → str¶ Returns summary of the page with respect to parameters
Parameter exsectionformat is taken from Wikipedia constructor.
API Calls for parameters:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bextracts
- https://www.mediawiki.org/wiki/Extension:TextExtracts#API
Example:
import wikipediaapi wiki = wikipediaapi.Wikipedia('en') page = wiki.page('Python_(programming_language)') print(wiki.extracts(page, exsentences=1)) print(wiki.extracts(page, exsentences=2))
Parameters: - page –
WikipediaPage
- kwargs – parameters used in API call
Returns: summary of the page
-
info
(page: wikipediaapi.WikipediaPage) → wikipediaapi.WikipediaPage¶ https://www.mediawiki.org/w/api.php?action=help&modules=query%2Binfo https://www.mediawiki.org/wiki/API:Info
-
langlinks
(page: wikipediaapi.WikipediaPage, **kwargs) → Dict[str, wikipediaapi.WikipediaPage]¶ Returns langlinks of the page with respect to parameters
API Calls for parameters:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Blanglinks
- https://www.mediawiki.org/wiki/API:Langlinks
Parameters: - page –
WikipediaPage
- kwargs – parameters used in API call
Returns: links to pages in other languages
-
links
(page: wikipediaapi.WikipediaPage, **kwargs) → Dict[str, wikipediaapi.WikipediaPage]¶ Returns links to other pages with respect to parameters
API Calls for parameters:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Blinks
- https://www.mediawiki.org/wiki/API:Links
Parameters: - page –
WikipediaPage
- kwargs – parameters used in API call
Returns: links to linked pages
-
page
(title: str, ns: Union[wikipediaapi.Namespace, int] = <Namespace.MAIN: 0>, unquote: bool = False) → wikipediaapi.WikipediaPage¶ Constructs Wikipedia page with title title.
Creating WikipediaPage object is always the first step for extracting any information.
Example:
wiki_wiki = wikipediaapi.Wikipedia('en') page_py = wiki_wiki.page('Python_(programming_language)') print(page_py.title) # Python (programming language) wiki_hi = wikipediaapi.Wikipedia('hi') page_hi_py = wiki_hi.article( title='%E0%A4%AA%E0%A4%BE%E0%A4%87%E0%A4%A5%E0%A4%A8', unquote=True, ) print(page_hi_py.title) # पाइथन
Parameters: - title – page title as used in Wikipedia URL
- ns –
WikiNamespace
- unquote – if true it will unquote title
Returns: object representing
WikipediaPage
-
-
class
wikipediaapi.
WikipediaPage
(wiki: wikipediaapi.Wikipedia, title: str, ns: Union[wikipediaapi.Namespace, int] = <Namespace.MAIN: 0>, language: str = 'en', url: Optional[str] = None)¶ Represents Wikipedia page.
Except properties mentioned as part of documentation, there are also these properties available:
- fullurl - full URL of the page
- canonicalurl - canonical URL of the page
- pageid - id of the current page
- displaytitle - title of the page to display
- talkid - id of the page with discussion
-
__init__
(wiki: wikipediaapi.Wikipedia, title: str, ns: Union[wikipediaapi.Namespace, int] = <Namespace.MAIN: 0>, language: str = 'en', url: Optional[str] = None) → None¶ Initialize self. See help(type(self)) for accurate signature.
-
__repr__
()¶ Return repr(self).
-
backlinks
¶ Returns all pages linking to the current page.
This is wrapper for:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bbacklinks
- https://www.mediawiki.org/wiki/API:Backlinks
Returns: PagesDict
-
categories
¶ Returns categories associated with the current page.
This is wrapper for:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bcategories
- https://www.mediawiki.org/wiki/API:Categories
Returns: PagesDict
-
categorymembers
¶ Returns all pages belonging to the current category.
This is wrapper for:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bcategorymembers
- https://www.mediawiki.org/wiki/API:Categorymembers
Returns: PagesDict
-
exists
() → bool¶ Returns True if the current page exists, otherwise False.
Returns: if current page existst or not
-
langlinks
¶ Returns all language links to pages in other languages.
This is wrapper for:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Blanglinks
- https://www.mediawiki.org/wiki/API:Langlinks
Returns: PagesDict
-
language
¶ Returns language of the current page.
Returns: language
-
links
¶ Returns all pages linked from the current page.
This is wrapper for:
- https://www.mediawiki.org/w/api.php?action=help&modules=query%2Blinks
- https://www.mediawiki.org/wiki/API:Links
Returns: PagesDict
-
namespace
¶ Returns namespace of the current page.
Returns: namespace
-
section_by_title
(title: str) → Optional[wikipediaapi.WikipediaPageSection]¶ Returns last section of the current page with given title.
Parameters: title – section title Returns: WikipediaPageSection
-
sections
¶ Returns all sections of the curent page.
Returns: List of WikipediaPageSection
-
sections_by_title
(title: str) → List[wikipediaapi.WikipediaPageSection]¶ Returns all section of the current page with given title.
Parameters: title – section title Returns: WikipediaPageSection
-
summary
¶ Returns summary of the current page.
Returns: summary
-
text
¶ Returns text of the current page.
Returns: text of the current page
-
title
¶ Returns title of the current page.
Returns: title
-
class
wikipediaapi.
WikipediaPageSection
(wiki: wikipediaapi.Wikipedia, title: str, level: int = 0, text: str = '')¶ WikipediaPageSection represents section in the page.
-
__init__
(wiki: wikipediaapi.Wikipedia, title: str, level: int = 0, text: str = '') → None¶ Constructs WikipediaPageSection.
-
__repr__
()¶ Return repr(self).
-
full_text
(level: int = 1) → str¶ Returns text of the current section as well as all its subsections.
Parameters: level – indentation level Returns: text of the current section as well as all its subsections
-
level
¶ Returns indentation level of the current section.
Returns: indentation level of the current section
-
section_by_title
(title: str) → Optional[wikipediaapi.WikipediaPageSection]¶ Returns subsections of the current section with given title.
Parameters: title – title of the subsection Returns: subsection if it exists
-
sections
¶ Returns subsections of the current section.
Returns: subsections of the current section
-
text
¶ Returns text of the current section.
Returns: text of the current section
-
title
¶ Returns title of the current section.
Returns: title of the current section
-
-
class
wikipediaapi.
ExtractFormat
¶ Represents extraction format.
-
WIKI
= 1¶ Allows recognizing subsections
Example: https://goo.gl/PScNVV
-
HTML
= 2¶ Alows retrieval of HTML tags
Example: https://goo.gl/1Jwwpr
-
-
class
wikipediaapi.
Namespace
¶ Represents namespace in Wikipedia
You can gen list of possible namespaces here:
- https://en.wikipedia.org/wiki/Wikipedia:Namespace
- https://en.wikipedia.org/wiki/Wikipedia:Namespace#Programming
Currently following namespaces are supported:
-
MAIN
= 0¶
-
TALK
= 1¶
-
USER
= 2¶
-
USER_TALK
= 3¶
-
WIKIPEDIA
= 4¶
-
WIKIPEDIA_TALK
= 5¶
-
FILE
= 6¶
-
FILE_TALK
= 7¶
-
MEDIAWIKI
= 8¶
-
MEDIAWIKI_TALK
= 9¶
-
TEMPLATE
= 10¶
-
TEMPLATE_TALK
= 11¶
-
HELP
= 12¶
-
HELP_TALK
= 13¶
-
CATEGORY
= 14¶
-
CATEGORY_TALK
= 15¶
-
PORTAL
= 100¶
-
PORTAL_TALK
= 101¶
-
PROJECT
= 102¶
-
PROJECT_TALK
= 103¶
-
REFERENCE
= 104¶
-
REFERENCE_TALK
= 105¶
-
BOOK
= 108¶
-
BOOK_TALK
= 109¶
-
DRAFT
= 118¶
-
DRAFT_TALK
= 119¶
-
EDUCATION_PROGRAM
= 446¶
-
EDUCATION_PROGRAM_TALK
= 447¶
-
TIMED_TEXT
= 710¶
-
TIMED_TEXT_TALK
= 711¶
-
MODULE
= 828¶
-
MODULE_TALK
= 829¶
-
GADGET
= 2300¶
-
GADGET_TALK
= 2301¶
-
GADGET_DEFINITION
= 2302¶
-
GADGET_DEFINITION_TALK
= 2303¶