.. highlight:: cython .. _language_basics: *************** Language Basics *************** ================= Cython File Types ================= There are three file types in Cython: * Implementation files carry a ``.pyx`` suffix * Definition files carry a ``.pxd`` suffix * Include files which carry a ``.pxi`` suffix Implementation File =================== What can it contain? -------------------- * Basically anything Cythonic, but see below. What can't it contain? ---------------------- * There are some restrictions when it comes to **extension types**, if the extension type is already defined else where... **more on this later** Definition File =============== What can it contain? -------------------- * Any kind of C type declaration. * ``extern`` C function or variable declarations. * Declarations for module implementations. * The definition parts of **extension types**. * All declarations of functions, etc., for an **external library** What can't it contain? ---------------------- * Any non-extern C variable declaration. * Implementations of C or Python functions. * Python class definitions * Python executable statements. * Any declaration that is defined as **public** to make it accessible to other Cython modules. * This is not necessary, as it is automatic. * a **public** declaration is only needed to make it accessible to **external C code**. What else? ---------- cimport ``````` * Use the **cimport** statement, as you would Python's import statement, to access these files from other definition or implementation files. * **cimport** does not need to be called in ``.pyx`` file for for ``.pxd`` file that has the same name, as they are already in the same namespace. * For cimport to find the stated definition file, the path to the file must be appended to the ``-I`` option of the **Cython compile command**. compilation order ````````````````` * When a ``.pyx`` file is to be compiled, Cython first checks to see if a corresponding ``.pxd`` file exits and processes it first. Include File ============ What can it contain? -------------------- * Any Cythonic code really, because the entire file is textually embedded at the location you prescribe. How do I use it? ---------------- * Include the ``.pxi`` file with an ``include`` statement like: ``include "spamstuff.pxi`` * The ``include`` statement can appear anywhere in your Cython file and at any indentation level * The code in the ``.pxi`` file needs to be rooted at the "zero" indentation level. * The included code can itself contain other ``include`` statements. ==================== Declaring Data Types ==================== As a dynamic language, Python encourages a programming style of considering classes and objects in terms of their methods and attributes, more than where they fit into the class hierarchy. This can make Python a very relaxed and comfortable language for rapid development, but with a price - the 'red tape' of managing data types is dumped onto the interpreter. At run time, the interpreter does a lot of work searching namespaces, fetching attributes and parsing argument and keyword tuples. This run-time ‘late binding’ is a major cause of Python’s relative slowness compared to ‘early binding’ languages such as C++. However with Cython it is possible to gain significant speed-ups through the use of ‘early binding’ programming techniques. .. note:: Typing is not a necessity Providing static typing to parameters and variables is convenience to speed up your code, but it is not a necessity. Optimize where and when needed. The cdef Statement ================== The ``cdef`` statement is used to make C level declarations for: :Variables: :: cdef int i, j, k cdef float f, g[42], *h :Structs: :: cdef struct Grail: int age float volume :Unions: :: cdef union Food: char *spam float *eggs :Enums: :: cdef enum CheeseType: cheddar, edam, camembert cdef enum CheeseState: hard = 1 soft = 2 runny = 3 :Functions: :: cdef int eggs(unsigned long l, float f): ... :Extension Types: :: cdef class Spam: ... .. note:: Constants Constants can be defined by using an anonymous enum:: cdef enum: tons_of_spam = 3 Grouping cdef Declarations ========================== A series of declarations can grouped into a ``cdef`` block:: cdef: struct Spam: int tons int i float f Spam *p void f(Spam *s): print s.tons, "Tons of spam" .. note:: ctypedef statement The ``ctypedef`` statement is provided for naming types:: ctypedef unsigned long ULong ctypedef int *IntPtr Parameters ========== * Both C and Python **function** types can be declared to have parameters C data types. * Use normal C declaration syntax:: def spam(int i, char *s): ... cdef int eggs(unsigned long l, float f): ... * As these parameters are passed into a Python declared function, they are magically **converted** to the specified C type value. * This holds true for only numeric and string types * If no type is specified for a parameter or a return value, it is assumed to be a Python object * The following takes two Python objects as parameters and returns a Python object:: cdef spamobjs(x, y): ... .. note:: -- This is different then C language behavior, where it is an int by default. * Python object types have reference counting performed according to the standard Python C-API rules: * Borrowed references are taken as parameters * New references are returned .. todo:: link or label here the one ref count caveat for numpy. * The name ``object`` can be used to explicitly declare something as a Python Object. * For sake of code clarity, it recommended to always use ``object`` explicitly in your code. * This is also useful for cases where the name being declared would otherwise be taken for a type:: cdef foo(object int): ... * As a return type:: cdef object foo(object int): ... .. todo:: Do a see also here ..?? Optional Arguments ------------------ * Are supported for ``cdef`` and ``cpdef`` functions * There differences though whether you declare them in a ``.pyx`` file or a ``.pxd`` file * When in a ``.pyx`` file, the signature is the same as it is in Python itself:: cdef class A: cdef foo(self): print "A" cdef class B(A) cdef foo(self, x=None) print "B", x cdef class C(B): cpdef foo(self, x=True, int k=3) print "C", x, k * When in a ``.pxd`` file, the signature is different like this example: ``cdef foo(x=*)``:: cdef class A: cdef foo(self) cdef class B(A) cdef foo(self, x=*) cdef class C(B): cpdef foo(self, x=*, int k=*) * The number of arguments may increase when subclassing, but the arg types and order must be the same. * There may be a slight performance penalty when the optional arg is overridden with one that does not have default values. Keyword-only Arguments ======================= * As in Python 3, ``def`` functions can have keyword-only argurments listed after a ``"*"`` parameter and before a ``"**"`` parameter if any:: def f(a, b, *args, c, d = 42, e, **kwds): ... * Shown above, the ``c``, ``d`` and ``e`` arguments can not be passed as positional arguments and must be passed as keyword arguments. * Furthermore, ``c`` and ``e`` are required keyword arguments since they do not have a default value. * If the parameter name after the ``"*"`` is omitted, the function will not accept any extra positional arguments:: def g(a, b, *, c, d): ... * Shown above, the signature takes exactly two positional parameters and has two required keyword parameters Automatic Type Conversion ========================= * For basic numeric and string types, in most situations, when a Python object is used in the context of a C value and vice versa. * The following table summarizes the conversion possibilities, assuming ``sizeof(int) == sizeof(long)``: +----------------------------+--------------------+------------------+ | C types | From Python types | To Python types | +============================+====================+==================+ | [unsigned] char | int, long | int | +----------------------------+ | | | [unsigned] short | | | +----------------------------+ | | | int, long | | | +----------------------------+--------------------+------------------+ | unsigned int | int, long | long | +----------------------------+ | | | unsigned long | | | +----------------------------+ | | | [unsigned] long long | | | +----------------------------+--------------------+------------------+ | float, double, long double | int, long, float | float | +----------------------------+--------------------+------------------+ | char * | str/bytes | str/bytes [#]_ | +----------------------------+--------------------+------------------+ | struct | | dict | +----------------------------+--------------------+------------------+ .. note:: **Python String in a C Context** * A Python string, passed to C context expecting a ``char*``, is only valid as long as the Python string exists. * A reference to the Python string must be kept around for as long as the C string is needed. * If this can't be guaranteed, then make a copy of the C string. * Cython may produce an error message: ``Obtaining char* from a temporary Python value`` and will not resume compiling in situations like this:: cdef char *s s = pystring1 + pystring2 * The reason is that concatenating to strings in Python produces a temporary variable. * The variable is decrefed, and the Python string deallocated as soon as the statement has finished, * Therefore the lvalue **``s``** is left dangling. * The solution is to assign the result of the concatenation to a Python variable, and then obtain the ``char*`` from that:: cdef char *s p = pystring1 + pystring2 s = p .. note:: **It is up to you to be aware of this, and not to depend on Cython's error message, as it is not guaranteed to be generated for every situation.** Type Casting ============= * The syntax used in type casting are ``"<"`` and ``">"`` .. note:: The syntax is different from C convention :: cdef char *p, float *q p = q * If one of the types is a python object for ``x``, Cython will try and do a coercion. .. note:: Cython will not stop a casting where there is no conversion, but it will emit a warning. * If the address is what is wanted, cast to a ``void*`` first. Type Checking ------------- * A cast like ``x`` will cast x to type ``MyExtensionType`` without type checking at all. * To have a cast type checked, use the syntax like: ``x``. * In this case, Cython will throw an error if ``"x"`` is not a (subclass) of ``MyExtensionType`` * Automatic type checking for extension types can be obtained whenever ``isinstance()`` is used as the second parameter Python Objects ============== ========================== Statements and Expressions ========================== * For the most part, control structures and expressions follow Python syntax. * When applied to Python objects, the semantics are the same unless otherwise noted. * Most Python operators can be applied to C values with the obvious semantics. * An expression with mixed Python and C values will have **conversions** performed automatically. * Python operations are automatically checked for errors, with the appropriate action taken. Differences Between Cython and C ================================ * Most notable are C constructs which have no direct equivalent in Python. * An integer literal is treated as a C constant * It will be truncated to whatever size your C compiler thinks appropriate. * Cast to a Python object like this:: 10000000000000000000 * The ``"L"``, ``"LL"`` and the ``"U"`` suffixes have the same meaning as in C * There is no ``->`` operator in Cython.. instead of ``p->x``, use ``p.x``. * There is no ``*`` operator in Cython.. instead of ``*p``, use ``p[0]``. * ``&`` is permissible and has the same semantics as in C. * ``NULL`` is the null C pointer. * Do NOT use 0. * ``NULL`` is a reserved word in Cython * Syntax for **Type casts** are ``value``. Scope Rules =========== * All determination of scoping (local, module, built-in) in Cython is determined statically. * As with Python, a variable assignment which is not declared explicitly is implicitly declared to be a Python variable residing in the scope where it was assigned. .. note:: * Module-level scope behaves the same way as a Python local scope if you refer to the variable before assigning to it. * Tricks, like the following will NOT work in Cython:: try: x = True except NameError: True = 1 * The above example will not work because ``True`` will always be looked up in the module-level scope. Do the following instead:: import __builtin__ try: True = __builtin__.True except AttributeError: True = 1 Built-in Constants ================== Predefined Python built-in constants: * None * True * False Operator Precedence =================== * Cython uses Python precedence order, not C For-loops ========== * ``range()`` is C optimized when the index value has been declared by ``cdef``:: cdef i for i in range(n): ... * Iteration over C arrays is also permitted, e.g. :: cdef double x cdef double* data for x in data[:10]: ... * Iterating over many builtin types such as lists and tuples is optimized. * There is also a more C-style for-from syntax * The target expression must be a variable name. * The name between the lower and upper bounds must be the same as the target name. for i from 0 <= i < n: ... * Or when using a step size:: for i from 0 <= i < n by s: ... * To reverse the direction, reverse the conditional operation:: for i from n > i >= 0: ... * The ``break`` and ``continue`` are permissible. * Can contain an else clause. ===================== Functions and Methods ===================== * There are three types of function declarations in Cython as the sub-sections show below. * Only "Python" functions can be called outside a Cython module from *Python interpreted code*. Callable from Python ===================== * Are declared with the ``def`` statement * Are called with Python objects * Return Python objects * See **Parameters** for special consideration Callable from C ================ * Are declared with the ``cdef`` statement. * Are called with either Python objects or C values. * Can return either Python objects or C values. Callable from both Python and C ================================ * Are declared with the ``cpdef`` statement. * Can be called from anywhere, because it uses a little Cython magic. * Uses the faster C calling conventions when being called from other Cython code. Overriding ========== ``cpdef`` functions can override ``cdef`` functions:: cdef class A: cdef foo(self): print "A" cdef class B(A) cdef foo(self, x=None) print "B", x cdef class C(B): cpdef foo(self, x=True, int k=3) print "C", x, k Function Pointers ================= * Functions declared in a ``struct`` are automatically converted to function pointers. * see **using exceptions with function pointers** Python Built-ins ================ The following are provided: .. todo:: incomplete +------------------------------+-------------+----------------------------+ | Function and arguments | Return type | Python/C API Equivalent | +==============================+=============+============================+ | abs(obj) | object | PyNumber_Absolute | +------------------------------+-------------+----------------------------+ | bool(obj) | object | Py_True, Py_False | +------------------------------+-------------+----------------------------+ | chr(obj) | object | char | +------------------------------+-------------+----------------------------+ | delattr(obj, name) | int | PyObject_DelAttr | +------------------------------+-------------+----------------------------+ | dir(obj) | object | PyObject_Dir | | getattr(obj, name) (Note 1) | | | | getattr3(obj, name, default) | | | +------------------------------+-------------+----------------------------+ | hasattr(obj, name) | int | PyObject_HasAttr | +------------------------------+-------------+----------------------------+ | hash(obj) | int | PyObject_Hash | +------------------------------+-------------+----------------------------+ | intern(obj) | object | PyObject_InternFromString | +------------------------------+-------------+----------------------------+ | isinstance(obj, type) | int | PyObject_IsInstance | +------------------------------+-------------+----------------------------+ | issubclass(obj, type) | int | PyObject_IsSubclass | +------------------------------+-------------+----------------------------+ | iter(obj) | object | PyObject_GetIter | +------------------------------+-------------+----------------------------+ | len(obj) | Py_ssize_t | PyObject_Length | +------------------------------+-------------+----------------------------+ | pow(x, y, z) (Note 2) | object | PyNumber_Power | +------------------------------+-------------+----------------------------+ | reload(obj) | object | PyImport_ReloadModule | +------------------------------+-------------+----------------------------+ | repr(obj) | object | PyObject_Repr | +------------------------------+-------------+----------------------------+ | setattr(obj, name) | void | PyObject_SetAttr | +------------------------------+-------------+----------------------------+ ============================ Error and Exception Handling ============================ * A plain ``cdef`` declared function, that does not return a Python object... * Has no way of reporting a Python exception to it's caller. * Will only print a warning message and the exception is ignored. * In order to propagate exceptions like this to it's caller, you need to declare an exception value for it. * There are three forms of declaring an exception for a C compiled program. * First:: cdef int spam() except -1: ... * In the example above, if an error occurs inside spam, it will immediately return with the value of ``-1``, causing an exception to be propagated to it's caller. * Functions declared with an exception value, should explicitly prevent a return of that value. * Second:: cdef int spam() except? -1: ... * Used when a ``-1`` may possibly be returned and is not to be considered an error. * The ``"?"`` tells Cython that ``-1`` only indicates a *possible* error. * Now, each time ``-1`` is returned, Cython generates a call to ``PyErr_Occurred`` to verify it is an actual error. * Third:: cdef int spam() except * * A call to ``PyErr_Occurred`` happens *every* time the function gets called. .. note:: Returning ``void`` A need to propagate errors when returning ``void`` must use this version. * Exception values can only be declared for functions returning an.. * integer * enum * float * pointer type * Must be a constant expression .. note:: .. note:: Function pointers * Require the same exception value specification as it's user has declared. * Use cases here are when used as parameters and when assigned to a variable:: int (*grail)(int, char *) except -1 .. note:: Python Objects * Declared exception values are **not** need. * Remember that Cython assumes that a function function without a declared return value, returns a Python object. * Exceptions on such functions are implicitly propagated by returning ``NULL`` .. note:: C++ * For exceptions from C++ compiled programs, see **Wrapping C++ Classes** Checking return values for non-Cython functions.. ================================================= * Do not try to raise exceptions by returning the specified value.. Example:: cdef extern FILE *fopen(char *filename, char *mode) except NULL # WRONG! * The except clause does not work that way. * It's only purpose is to propagate Python exceptions that have already been raised by either... * A Cython function * A C function that calls Python/C API routines. * To propagate an exception for these circumstances you need to raise it yourself:: cdef FILE *p p = fopen("spam.txt", "r") if p == NULL: raise SpamError("Couldn't open the spam file") ======================= Conditional Compilation ======================= * The expressions in the following sub-sections must be valid compile-time expressions. * They can evaluate to any Python value. * The *truth* of the result is determined in the usual Python way. Compile-Time Definitions ========================= * Defined using the ``DEF`` statement:: DEF FavouriteFood = "spam" DEF ArraySize = 42 DEF OtherArraySize = 2 * ArraySize + 17 * The right hand side must be a valid compile-time expression made up of either: * Literal values * Names defined by other ``DEF`` statements * They can be combined using any of the Python expression syntax * Cython provides the following predefined names * Corresponding to the values returned by ``os.uname()`` * UNAME_SYSNAME * UNAME_NODENAME * UNAME_RELEASE * UNAME_VERSION * UNAME_MACHINE * A name defined by ``DEF`` can appear anywhere an identifier can appear. * Cython replaces the name with the literal value before compilation. * The compile-time expression, in this case, must evaluate to a Python value of ``int``, ``long``, ``float``, or ``str``:: cdef int a1[ArraySize] cdef int a2[OtherArraySize] print "I like", FavouriteFood Conditional Statements ======================= * Similar semantics of the C pre-processor * The following statements can be used to conditionally include or exclude sections of code to compile. * ``IF`` * ``ELIF`` * ``ELSE`` :: IF UNAME_SYSNAME == "Windows": include "icky_definitions.pxi" ELIF UNAME_SYSNAME == "Darwin": include "nice_definitions.pxi" ELIF UNAME_SYSNAME == "Linux": include "penguin_definitions.pxi" ELSE: include "other_definitions.pxi" * ``ELIF`` and ``ELSE`` are optional. * ``IF`` can appear anywhere that a normal statement or declaration can appear * It can contain any statements or declarations that would be valid in that context. * This includes other ``IF`` and ``DEF`` statements .. [#] The conversion is to/from str for Python 2.x, and bytes for Python 3.x.