Last Updated on June 21, 2022
The Python language syntax is quite powerful and expressive. Hence it is concise to express an algorithm in Python. Maybe this is the reason why it is popular in machine learning, as we need to experiment a lot in developing a machine learning model.
If you’re new to Python but with experience in another programming language, you will sometimes find Python syntax understandable but weird. If you are used to writing in C++ or Java and then transitioning to Python, likely your program is not Pythonic.
In this tutorial, we will cover several common language features in Python that distinguishes itself from other programming languages.
Kick-start your project with my new book Python for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.
Let’s get started.
Some Language Features in Python
Photo by David Clode, some rights reserved.
Tutorial Overview
This tutorial is divided into two parts; they are:
- Operators
- Built-in data structures
- Special variables
- Built-in functions
Operators
Most of the operators used in Python are the same as the other languages. The precedence table is as follows, adopted from Chapter 6 of Python Language Reference (https://docs.python.org/3/reference/expressions.html):
Operator | Description |
---|---|
(expressions…), [expressions…], {key: value…}, {expressions…} | Binding or parenthesized expression, list display, dictionary display, set display |
x[index], x[index:index], x(arguments…), x.attribute | Subscription, slicing, call, attribute reference |
await x | Await expression |
** | Exponentiation |
+x, -x, ~x | Positive, negative, bitwise NOT |
*, @, /, //, % | Multiplication, matrix multiplication, division, floor division, remainder |
+, – | Addition and subtraction |
<<, >> | Shifts |
& | Bitwise AND |
^ | Bitwise XOR |
| | Bitwise OR |
in, not in, is, is not, <, <=, >, >=, !=, == | Comparisons, including membership tests and identity tests |
not x | Boolean NOT |
and | Boolean AND |
or | Boolean OR |
if – else | Conditional expression |
lambda | Lambda expression |
:= | Assignment expression |
Some key differences to other languages:
- Boolean operators are spelled out, while bitwise operators are characters
&
,^
and|
- exponentiation uses
2**3
- integer division uses
//
, and division/
always gives you floating point values - ternary operator: If you are familiar with the expression
(x)?a:b
in C, we write it asa if x else b
in Python - comparing if two things are equal can ether use
==
oris
. The==
operator is the same as other languages for equality, butis
is stricter, reserved for whether the two variable points to the same object
In Python, we allow concatenation in comparison operators. For example, to test if a value is between -1 and +1, we can do:
1 2 |
if value > -1 and value < 1: ... |
but we can also do:
1 2 |
if -1 < value < 1: ... |
Built-in Data Structures
As in many other languages, we have integer and floating point data types in Python. But there are also complex numbers (e.g., 3+1j
), Boolean as constants (True
and False
), strings, as well as a dummy type None
.
But the power of Python as a language lies in the fact that there are container types built-in: Python arrays are called “list,” and it will expand automatically. Associative arrays (or hash tables) are called “dict.” We also have “tuple” as a read-only list and “set” as a container for unique items. In C++, for example, you will need STL to give you these features.
The “dict” data structure is probably the most powerful one in Python and gives us some convenience in writing code. For example, in the problem of image classification between dogs and cats, our machine learning model may give you only a value of 0 or 1, and if you want to print the name, we can do:
1 2 3 4 |
value = 0 # This is obtained from a model value_to_name = {0: "cat", 1: "dog"} print("Result is %s" % value_to_name[value]) |
1 |
Result is cat |
In this case, we make use of the dict value_to_name
as a lookup table. Similarly, we can also make use of the dict to build a counter:
1 2 3 4 5 6 7 8 |
sentence = "Portez ce vieux whisky au juge blond qui fume" counter = {} for char in sentence: if char not in counter: counter[char] = 0 counter[char] += 1 print(counter) |
1 |
{'P': 1, 'o': 2, 'r': 1, 't': 1, 'e': 5, 'z': 1, ' ': 8, 'c': 1, 'v': 1, 'i': 3, 'u': 5, 'x': 1, 'w': 1, 'h': 1, 's': 1, 'k': 1, 'y': 1, 'a': 1, 'j': 1, 'g': 1, 'b': 1, 'l': 1, 'n': 1, 'd': 1, 'q': 1, 'f': 1, 'm': 1} |
This will build a dict called counter
that maps each character to the number of occurrences in the sentence.
Python list also comes with powerful syntax. Unlike some other languages, we can put anything into a list:
1 2 3 |
A = [1, 2, "fizz", 4, "buzz", "fizz", 7] A += [8, "fizz", "buzz", 11, "fizz", 13, 14, "fizzbuzz"] print(A) |
1 |
[1, 2, 'fizz', 4, 'buzz', 'fizz', 7, 8, 'fizz', 'buzz', 11, 'fizz', 13, 14, 'fizzbuzz'] |
and we can use +
to concatenate lists. In the above, we use +=
to extend the list A
.
Python list has slicing syntax. For example, in the above A
, we can make A[1:3]
to mean elements 1 and 2, i.e., [2, "fizz"]
and A[1:1]
is an empty list. Indeed we can assign something to a slice to insert or remove some elements. For example:
1 2 3 |
... A[2:2] = [2.1, 2.2] print(A) |
1 |
[1, 2, 2.1, 2.2, 'fizz', 4, 'buzz', 'fizz', 7, 8, 'fizz', 'buzz', 11, 'fizz', 13, 14, 'fizzbuzz'] |
and then,
1 2 3 |
... A[0:2] = [] print(A) |
1 |
[2.1, 2.2, 'fizz', 4, 'buzz', 'fizz', 7, 8, 'fizz', 'buzz', 11, 'fizz', 13, 14, 'fizzbuzz'] |
Tuple has a similar syntax as list, except it is defined using parenthesis:
1 |
A = ("foo", "bar") |
Tuple is immutable. It means you cannot modify it once it is defined. In Python, if you put several things together with commas to separate each other, it is assumed to be a tuple. The significance of this is that we can swap two variables in a very clean syntax:
1 2 3 4 5 |
a = 42 b = "foo" print("a is %s; b is %s" % (a,b)) a, b = b, a # swap print("After swap, a is %s; b is %s" % (a,b)) |
1 2 |
a is 42; b is foo After swap, a is foo; b is 42 |
Finally, as you have seen in the examples above, Python strings support substitution on the fly. With the similar template syntax as printf()
function in C, we can use %s
to substitute a string or %d
to substitute an integer. We can also use %.3f
to substitute a floating point number with 3 decimal places. Below is an example:
1 2 3 4 |
template = "Square root of %d is %.3f" n = 10 answer = template % (n, n**0.5) print(answer) |
1 |
Square root of 10 is 3.162 |
But this is just one of the many ways to do it. The above can also be achieved using the f-string and format() method.
Special variables
Python has several “special variables” predefined. __name__
tells the current namespace, and __file__
tells the filename of the script. More will be found inside objects, but almost all of them are generally not supposed to be directly used. As a convention (i.e., just a habit as no one is stopping you from doing it), we name internal variables with an underscore or double underscore as a prefix (by the way, double underscores are pronounced as “dunder” by some people). If you’re from C++ or Java, these are equivalent to the private members of a class, although they are not technically private.
One notable “special” variable that you may often see in Python code is _
, just an underscore character. It is by convention to mean a variable that we do not care about. Why do you need a variable if you don’t care? That’s because sometimes you hold a return value from a function. For example, in pandas, we can scan each row of a dataframe:
1 2 3 4 5 6 |
import pandas as pd A = pd.DataFrame([[1,2,3],[2,3,4],[3,4,5],[5,6,7]], columns=["x","y","z"]) print(A) for _, row in A.iterrows(): print(row["z"]) |
1 2 3 4 5 6 7 8 9 10 |
x y z 0 1 2 3 1 2 3 4 2 3 4 5 3 5 6 7 3 4 5 7 |
In the above, we can see that the dataframe has three columns, “x,” “y,” and “z,” and the rows are indexed by 0 to 3. If we call A.iterrows()
, it will give us the index and the row one by one, but we don’t care about the index. We can just create a new variable to hold it but not use it. To clarify that we are not going to use it, we use _
as the variable to hold the index while the row is stored into variable row
.
Want to Get Started With Python for Machine Learning?
Take my free 7-day email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Built-in functions
In Python, a small number of functions are defined as built-in while other functionalities are delivered in other packages. The list of all built-in functions are available in the Python Standard Library documentation (https://docs.python.org/3/library/functions.html). Below are those defined in Python 3.10:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
abs() aiter() all() any() anext() ascii() bin() bool() breakpoint() bytearray() bytes() callable() chr() classmethod() compile() complex() delattr() dict() dir() divmod() enumerate() eval() exec() filter() float() format() frozenset() getattr() globals() hasattr() hash() help() hex() id() input() int() isinstance() issubclass() iter() len() list() locals() map() max() memoryview() min() next() object() oct() open() ord() pow() print() property() range() repr() reversed() round() set() setattr() slice() sorted() staticmethod() str() sum() super() tuple() type() vars() zip() __import__() |
Not all are used every day, but some are particularly notable:
zip()
allows you to combine multiple lists together. For example,
1 2 3 4 5 |
a = ["x", "y", "z"] b = [3, 5, 7, 9] c = [2.1, 2.5, 2.9] for x in zip(a, b, c): print(x) |
1 2 3 |
('x', 3, 2.1) ('y', 5, 2.5) ('z', 7, 2.9) |
And it is handy if you want to “pivot” a list of list, e.g.,
1 2 3 4 5 |
a = [['x', 3, 2.1], ['y', 5, 2.5], ['z', 7, 2.9]] p,q,r = zip(*a) print(p) print(q) print(r) |
1 2 3 |
('x', 'y', 'z') (3, 5, 7) (2.1, 2.5, 2.9) |
enumerate()
is handy to let you number a list of items, for example:
1 2 3 |
a = ["quick", "brown", "fox", "jumps", "over"] for num, item in enumerate(a): print("item %d is %s" % (num, item)) |
1 2 3 4 5 |
item 0 is quick item 1 is brown item 2 is fox item 3 is jumps item 4 is over |
This is equivalent to the following if you do not use enumerate
:
1 2 3 |
a = ["quick", "brown", "fox", "jumps", "over"] for num in range(len(a)): print("item %d is %s" % (num, a[num])) |
Compared to other languages, the for loop in Python is to iterate over a predefined range rather than computing the values in each iteration. In other words, there is no direct equivalence to the following C for loop:
1 2 3 |
for (i=0; i<100; ++i) { ... } |
and in Python, we have to use range()
to do the same:
1 2 |
for i in range(100): ... |
In a similar sense, there are some functions that manipulate a list (or list-like data structures, which Python calls the “iterables”):
max(a)
: To find the maximum value in lista
min(a)
: To find the minimum value in lista
sum(a)
: To find the sum of values in lista
reverse(a)
: To iterate from lista
from backsorted(a)
: To return a copy of lista
with elements in sorted order
We will cover more on these in the next post.
Further reading
The above only highlighted some key features in Python. Surely there is no more authoritative documentation than the official documentation from Python.org; all beginners should start with the Python tutorial and check the Language Reference for syntax details and the Standard Library for additional libraries that come with the Python installation:
- The Python Tutorial – https://docs.python.org/3/tutorial/index.html
- The Python Language Reference – https://docs.python.org/3/reference/index.html
- The Python Standard Library – https://docs.python.org/3/library/index.html
For books, Learning Python from Lutz is an old but good primer. After that, Fluent Python can help you better understand the internal structure of the language. However, if you want something quick, Al Sweigart’s book can help you quickly pick up the language with examples. Once you get familiar with Python, you may want to learn some quick tips for a particular task from the Python Cookbook.
- Learning Python, 5th Edition by Mark Lutz, O’Reilly, 2013, https://www.amazon.com/dp/1449355730/
- Fluent Python by Luciano Ramalho, O’Reilly, 2015, https://www.amazon.com/dp/1491946008/
- Automate the Boring Stuff with Python, 2nd Edition by Al Sweigart, No Starch Press, 2019, https://www.amazon.com/dp/1593279922/
- Python Cookbook, 3rd Edition by David Beazley and Brian K. Jones, O’Reilly, 2013, https://www.amazon.com/dp/1449340377/
Summary
In this tutorial, you discovered some distinctive features of Python. Specifically, you learned:
- The operators provided by Python
- Some use of the built-in data structure
- Some frequently used built-in functions and why they are useful
For the counter example, this is better:
from collections import Counter
sentence = "Portez ce vieux whisky au juge blond qui fume"
counter = Counter(sentence)
print(counter)
Thanks but the point here is to demonstrate how a dict can be used as a scoreboard
> Compare to other languages, the for loop in Python is to iterate over a predefined range rather than computing the values in each iteration. In other words, there is not direct equivalence to the following C for loop
In what way are those not equivalent? In each, one has an integer
i
in scope throughout the loop that varies from 0 to 99 inclusive. Where’s the part where they are not equivalent?Equivalence here is on the low level implementation. You do not have an evaluation and assignment at each loop. Python for loop is more like scanning a set rather than computing each step
i=0
while i<10:
print(i)
i+=1
pass
Great post, as always. Thanks!
Thanks. Glad you like it.
Hi Brian…You are very welcome! Please let us know if you have any questions we may help you with.
Regards,
Your print statements would be much more readable if you used format strings
Thanks for the suggestion.
Thank you for your post, Great refresh and reminder for making things easy by pythonifying it
Thank you for the feedback and kind words Prashanth! Let us know which areas of machine learning most interest you.
Regards,
The column headers are off in your pandas example, the x is over the index, might confuse readers.
Thank you the feedback Vasanth!