wikipediaapi

Wikipedia-API is easy to use wrapper for extracting information from Wikipedia.

It supports extracting texts, sections, links, categories, translations, etc. from Wikipedia. Documentation provides code snippets for the most common use cases.

wikipediaapi.namespace2int(namespace: Union[wikipediaapi.Namespace, int]) → int

Converts namespace into integer

class wikipediaapi.Wikipedia(user_agent: str, language: str = 'en', extract_format: wikipediaapi.ExtractFormat = <ExtractFormat.WIKI: 1>, headers: Optional[Dict[str, Any]] = None, **kwargs)

Wikipedia is wrapper for Wikipedia API.

__del__() → None

Closes session.

__init__(user_agent: str, language: str = 'en', extract_format: wikipediaapi.ExtractFormat = <ExtractFormat.WIKI: 1>, headers: Optional[Dict[str, Any]] = None, **kwargs) → None

Constructs Wikipedia object for extracting information Wikipedia.

Parameters:

Examples:

  • Proxy: Wikipedia('foo (merlin@example.com)', proxies={'http': 'http://proxy:1234'})
article(title: str, ns: Union[wikipediaapi.Namespace, int] = <Namespace.MAIN: 0>, unquote: bool = False) → wikipediaapi.WikipediaPage

Constructs Wikipedia page with title title.

This function is an alias for page()

Parameters:
  • title – page title as used in Wikipedia URL
  • nsWikiNamespace
  • unquote – if true it will unquote title
Returns:

object representing WikipediaPage

Returns backlinks from other pages with respect to parameters

API Calls for parameters:

Parameters:
Returns:

backlinks from other pages

categories(page: wikipediaapi.WikipediaPage, **kwargs) → Dict[str, wikipediaapi.WikipediaPage]

Returns categories for page with respect to parameters

API Calls for parameters:

Parameters:
Returns:

categories for page

categorymembers(page: wikipediaapi.WikipediaPage, **kwargs) → Dict[str, wikipediaapi.WikipediaPage]

Returns pages in given category with respect to parameters

API Calls for parameters:

Parameters:
Returns:

pages in given category

extracts(page: wikipediaapi.WikipediaPage, **kwargs) → str

Returns summary of the page with respect to parameters

Parameter exsectionformat is taken from Wikipedia constructor.

API Calls for parameters:

Example:

import wikipediaapi
wiki = wikipediaapi.Wikipedia('en')

page = wiki.page('Python_(programming_language)')
print(wiki.extracts(page, exsentences=1))
print(wiki.extracts(page, exsentences=2))
Parameters:
Returns:

summary of the page

info(page: wikipediaapi.WikipediaPage) → wikipediaapi.WikipediaPage

https://www.mediawiki.org/w/api.php?action=help&modules=query%2Binfo https://www.mediawiki.org/wiki/API:Info

Returns langlinks of the page with respect to parameters

API Calls for parameters:

Parameters:
Returns:

links to pages in other languages

Returns links to other pages with respect to parameters

API Calls for parameters:

Parameters:
Returns:

links to linked pages

page(title: str, ns: Union[wikipediaapi.Namespace, int] = <Namespace.MAIN: 0>, unquote: bool = False) → wikipediaapi.WikipediaPage

Constructs Wikipedia page with title title.

Creating WikipediaPage object is always the first step for extracting any information.

Example:

wiki_wiki = wikipediaapi.Wikipedia('en')
page_py = wiki_wiki.page('Python_(programming_language)')
print(page_py.title)
# Python (programming language)

wiki_hi = wikipediaapi.Wikipedia('hi')

page_hi_py = wiki_hi.article(
    title='%E0%A4%AA%E0%A4%BE%E0%A4%87%E0%A4%A5%E0%A4%A8',
    unquote=True,
)
print(page_hi_py.title)
# पाइथन
Parameters:
  • title – page title as used in Wikipedia URL
  • nsWikiNamespace
  • unquote – if true it will unquote title
Returns:

object representing WikipediaPage

class wikipediaapi.WikipediaPage(wiki: wikipediaapi.Wikipedia, title: str, ns: Union[wikipediaapi.Namespace, int] = <Namespace.MAIN: 0>, language: str = 'en', url: Optional[str] = None)

Represents Wikipedia page.

Except properties mentioned as part of documentation, there are also these properties available:

  • fullurl - full URL of the page
  • canonicalurl - canonical URL of the page
  • pageid - id of the current page
  • displaytitle - title of the page to display
  • talkid - id of the page with discussion
__init__(wiki: wikipediaapi.Wikipedia, title: str, ns: Union[wikipediaapi.Namespace, int] = <Namespace.MAIN: 0>, language: str = 'en', url: Optional[str] = None) → None

Initialize self. See help(type(self)) for accurate signature.

__repr__()

Return repr(self).

Returns all pages linking to the current page.

This is wrapper for:

Returns:PagesDict
categories

Returns categories associated with the current page.

This is wrapper for:

Returns:PagesDict
categorymembers

Returns all pages belonging to the current category.

This is wrapper for:

Returns:PagesDict
exists() → bool

Returns True if the current page exists, otherwise False.

Returns:if current page existst or not

Returns all language links to pages in other languages.

This is wrapper for:

Returns:PagesDict
language

Returns language of the current page.

Returns:language

Returns all pages linked from the current page.

This is wrapper for:

Returns:PagesDict
namespace

Returns namespace of the current page.

Returns:namespace
section_by_title(title: str) → Optional[wikipediaapi.WikipediaPageSection]

Returns last section of the current page with given title.

Parameters:title – section title
Returns:WikipediaPageSection
sections

Returns all sections of the curent page.

Returns:List of WikipediaPageSection
sections_by_title(title: str) → List[wikipediaapi.WikipediaPageSection]

Returns all section of the current page with given title.

Parameters:title – section title
Returns:WikipediaPageSection
summary

Returns summary of the current page.

Returns:summary
text

Returns text of the current page.

Returns:text of the current page
title

Returns title of the current page.

Returns:title
class wikipediaapi.WikipediaPageSection(wiki: wikipediaapi.Wikipedia, title: str, level: int = 0, text: str = '')

WikipediaPageSection represents section in the page.

__init__(wiki: wikipediaapi.Wikipedia, title: str, level: int = 0, text: str = '') → None

Constructs WikipediaPageSection.

__repr__()

Return repr(self).

full_text(level: int = 1) → str

Returns text of the current section as well as all its subsections.

Parameters:level – indentation level
Returns:text of the current section as well as all its subsections
level

Returns indentation level of the current section.

Returns:indentation level of the current section
section_by_title(title: str) → Optional[wikipediaapi.WikipediaPageSection]

Returns subsections of the current section with given title.

Parameters:title – title of the subsection
Returns:subsection if it exists
sections

Returns subsections of the current section.

Returns:subsections of the current section
text

Returns text of the current section.

Returns:text of the current section
title

Returns title of the current section.

Returns:title of the current section
class wikipediaapi.ExtractFormat

Represents extraction format.

WIKI = 1

Allows recognizing subsections

Example: https://goo.gl/PScNVV

HTML = 2

Alows retrieval of HTML tags

Example: https://goo.gl/1Jwwpr

class wikipediaapi.Namespace

Represents namespace in Wikipedia

You can gen list of possible namespaces here:

Currently following namespaces are supported:

MAIN = 0
TALK = 1
USER = 2
USER_TALK = 3
WIKIPEDIA = 4
WIKIPEDIA_TALK = 5
FILE = 6
FILE_TALK = 7
MEDIAWIKI = 8
MEDIAWIKI_TALK = 9
TEMPLATE = 10
TEMPLATE_TALK = 11
HELP = 12
HELP_TALK = 13
CATEGORY = 14
CATEGORY_TALK = 15
PORTAL = 100
PORTAL_TALK = 101
PROJECT = 102
PROJECT_TALK = 103
REFERENCE = 104
REFERENCE_TALK = 105
BOOK = 108
BOOK_TALK = 109
DRAFT = 118
DRAFT_TALK = 119
EDUCATION_PROGRAM = 446
EDUCATION_PROGRAM_TALK = 447
TIMED_TEXT = 710
TIMED_TEXT_TALK = 711
MODULE = 828
MODULE_TALK = 829
GADGET = 2300
GADGET_TALK = 2301
GADGET_DEFINITION = 2302
GADGET_DEFINITION_TALK = 2303