The Basics

Welcome to this introductory lecture on Python for data science and geospatial data science. My primary goal is to provide an overview of the Python programming language. After working through this module you will be able to:

  1. declare variables.
  2. explain the difference between Python data types.
  3. perform mathematical and logical operations.
  4. work with lists, tuples, sets, and dictionaries.
  5. apply appropriate methods to different data types.

I assume that you have no prior experience with coding in general or Python specifically.

For a more detailed discussion of general Python, please consult w3school.com, which is a great resource for coders, scientists, and web developers.

Variables

I think of variables as links to information or data. For example, a variable could reference a file path on your local machine, such as a path to a video file, or a set of numbers. Once you create a variable, you can call it to reference the associated data or object for use in processes and analyses. Note that there are some rules for variable names:

  • cannot start with a number or special character (or, can only start with a letter or an underscore) (for example, x1 is valid while 1x is not.).
  • can only contain letters, numbers, or underscores. No other special characters are allowed (for example, x, x1, _x, and x1_ are all valid. x$ is not valid.).
  • are case-sensitive (for example, x1 and X1 are two separate variables).

The print() function is used to print the data referenced by the variable.

x = 1
y = "GIS"
x1 = 2
y1 = "Remote Sensing"
_x = 3
_y = "Web GIS"
print(x)
print(y)
print(x1)
print(y1)
print(_x)
print(_y)
1
GIS
2
Remote Sensing
3
Web GIS

You can also assign data to multiple variables as a single line of code as demonstrated below. I am reusing variable names; in Python variable names are dynamic, so you can overwrite them. This can, however, be problematic if you overwrite a variable name accidentally. So, use unique names if you do not want to overwrite prior variables.

x, x1, _x = 1, 2, 3
print(x)
print(x1)
print(_x)
1
2
3

Assignment Operators are used to assign values or data to variables. The most commonly used operator is =. However, there are some other options that allow variables to be manipulated mathematically with the resulting value saved back to the original data object that the variable references. These additional assignment operators can be useful, but we will use the = operator most of the time.

For example, += will add a value to the current value and assign the result back to the original variable. In the first example below, x references the value 2. The += assignment operator is then used to add 3 to x and save the result back to x. Work through the provided examples and make sure you understand the use of each operator.

x = 2
print(x)
x += 3
print(x)

x = 2
print(x)
x -= 3
print(x)

x = 2
print(x)
x *= 3
print(x)

x = 2
print(x)
x /= 3
print(x)

x = 2
print(x)
x **= 3
print(x)
2
5
2
-1
2
6
2
0.6666666666666666
2
8

I now want to step through a simple experiment that explains some important behavior of the Python language. In the code below, I have defined a variable a that holds a list of three values. We will discuss lists later in this section. I then create a new variable b and assign it to be equal to a. I then edit variable a by appending a new value to the list (we will discuss how this is done later, so don't worry if you don't understand how this works yet). When I print a and b, you can see that both variables contain the same set of numbers in the list even though I added 8 to a after setting b equal to a. Or, the change that I made to a was also applied to b.

In Python, certain types of objects, such as lists, are mutable. This means that it is possible to change the data stored in memory and referenced to the variable. When a mutable object is altered, all variables that point to it will reflect this change. What this means practically is that setting b equal to a results in a and b pointing to the same object or data in memory. So, changes to a or b will be reflected in both variables since the data being referenced by both have been updated.

When I first started using Python, I struggled with this because I was used to the behavior used in the R language, in which setting a variable equal to another variable would make copy that was not linked and could be altered without changing the original variable.

To test whether two variables reference the same object in memory, you can use the is keyword. if True is returned, then they reference the same object. You can also print the object ID, which represents the memory address for the object, using the id() function. Using both methods below, you can see that a and b reference the same object.

a = [5, 6, 7]
b = a
a.append(8)
print(a)
print(b)

print(a is b)
print(id(a))
print(id(b))
[5, 6, 7, 8]
[5, 6, 7, 8]
True
2499706086848
2499706086848

What if you wanted to make a copy of a variable referencing mutable data that does not reference the same object? For example, you may want to be able to make changes to a that do not impact b. This can be accomplished using the copy() or deepcopy() functions from the copy module.

In the experiment below, I have defined b as a deep copy of a. Now, changes made to a do not impact b. This is because they do not reference the same object in memory since deepcopy() makes a copy of the object or data to a new location in memory. This is confirmed using is and id().

import copy
a = [5, 6, 7]
b = copy.deepcopy(a)
a.append(8)
print(a)
print(b)

print(a is b)
print(id(a))
print(id(b))
[5, 6, 7, 8]
[5, 6, 7]
False
2499706087360
2499706087808

Comments

Comments are used to make your code more readable and are not interpreted by the computer. Instead, they are skipped and meant for humans. Different languages use different syntax to denote comments. Python uses the hashtag or pound sign. You can add comments as new lines or following code on the same line. Unfortunately, Python does not have specific syntax for multi-line comments. However, this can be accomplished by adding hashtags before each separate line or using a multi-line string that is not assigned to a variable. Examples are shown below.

It is generally a good idea to comment your code for later use and for use by others.

#Single-line comment
x = 1
y = 2 #Another single-line comment
#A
#multi-line
#comment
z = 3
"""
Another multi-line comment
"""
w = 4

Data Types

A variety of data types are available in Python to store and work with a variety of input. Below are explanations of the data types which you will use most often. There are additional types that we will not discuss.

When creating a variable, it is not generally necessary to explicitly define the data type. However, this can be accomplished using constructor functions if so desired. Constructor functions can also be used to change the data type of a variable, a process known as casting. Available constructor methods include str(), int(), float(), complex(), list(), tuple(), dict(), set(), and bool().

To determine the data type, you can use the type() function. See the examples below where I convert an integer to a float and then a float to a string.

  • Numeric
    • Int = whole numbers
    • Float = numbers with decimal values
    • Complex = can include imaginary numbers
  • Text
    • String = characters or numbers treated as characters
  • Boolean
    • Boolean = logical True or False
  • Sequence
    • List = list of features that can be re-ordered, allows for duplicates, and is indexed
    • Tuple = list of features that cannot be re-ordered, allows for duplicates, and is indexed
  • Mapping
    • Dictionary = list of features that can re-ordered, does not allow duplicates, is indexed, and contains key and value pairs
  • Set
    • Set = list of features that are unordered, not indexed, and does not allow for duplicates
#Create a variable and check the data type
x = 1
print(type(x))
#Change the data type
x = float(x)
print(type(x))
x= str(x)
print(type(x))
<class 'int'>
<class 'float'>
<class 'str'>

Numbers

Regardless of the the type (integer, float, or complex), numbers are defined without using quotes. If a number is placed in quotes it will be treated as a string as demonstrated below. This is important, since the behavior of the data is altered. In the example, x represents 1 as a number while y represents "1" as a string (note the quotes). Adding x to itself will yield 2 (1 + 1). Adding y to itself will yield "11", or the two strings are combined or concatenated.

#Create variables
x = 1
y = "1"
print(x + x)
print(y + y)
2
11

Numbers support mathematical operations, as demonstrated below. If you are not familiar with these concepts, modulus will return the remainder after division while floor division will round down to the nearest whole number after division.

If a whole number has no decimal values included or no period (1 vs. 1. or 1.0), this implies that the output is in the float data type as opposed to integer type.

x = 4
y = 3
print(x + y) #Addition
print(x - y) #Subtraction
print(x * y) #Multiplication
print(x / y) #Division
print(x % y) #Modulus
print(x ** y) #Exponentiation
print(x // y) #Floor Division

7
1
12
1.3333333333333333
1
64
1

Strings

Strings are defined using single or double quotes. If quotes are included as part of the text or string, then you can use the other type to define the data as text. Again, numbers placed in quotes will be treated as strings.

x = "GIS"
y = "That's great" #Must use double quotes since a single quote is use in the string
z = "2" #Number treated as a string

Portions of a string can be sliced out using indexes. Note that Python starts indexing at 0 as opposed to 1. So, the first character is at index 0 as opposed to index 1. Negative indexes can be used to index relative to the end of the string. In this case, the last character has an index of -1.

Indexes combined with square brackets can be used to slice strings. Note that the last index specified in the selection or range will not be included and that spaces are counted in the indexing.

x = "Geography 350: GIScience"
print(x[0:9])
print(x[10:13])
print(x[15:25])
print(x[-14:-11])
Geography
350
GIScience
350

Strings can be combined or concatenated using the addition sign. If you want to include a number in the string output, you can cast it to a string using str(). In the example below, note the use of blank spaces so that the strings are not ran together.

The len() function can be used to return the length of the string, which will count blank spaces along with characters.

x = "Geography"
y = 350
z = ":"
w = "GIScience"
strng1 = x + " " + str(y) + z + " " + w
print(strng1)
print(len(strng1))
Geography 350: GIScience
24

A method is a function that belongs to or is associated with an object. Or, it allows you to work with or manipulate the object and its associated properties in some way. Data types have default methods that can be applied to them.

Methods applicable to strings are demonstrated below. Specifically, methods are being used to change the case and split the string at each space to save each component to a list.

x = "Geography 657: Remote Sensing Principles"
print(x.upper())
print(x.lower())
lst1 = x.split(" ")
print(lst1)
GEOGRAPHY 657: REMOTE SENSING PRINCIPLES
geography 657: remote sensing principles
['Geography', '657:', 'Remote', 'Sensing', 'Principles']

When generating strings, issues arise when you use characters that have special uses or meaning in Python. These issues can be alleviated by including an escape character or backslash as demonstrated below.

s1 = "Issue with \"quotes\" in the string."
s2 = "C:\\data\\project_1" #Issue with file paths. 
s3 = "Add a new line \nto text string"
print(s1)
print(s2)
print(s3)
Issue with "quotes" in the string.
C:\data\project_1
Add a new line 
to text string

Booleans

Booleans can only be True or False and are often returned when an expression is logically evaluated. A variety of comparison operators are available. Note the use of double equals; a single equals cannot be used since it is already used for variable assignment, or is an assignment operator, and would thus be ambiguous.

  • Comparison Operators
    • Equal: ==
    • Not Equal: !=
    • Greater Than: >
    • Greater Than or Equal To: >=
    • Less Than: <
    • Less Than or Equal To: <=

Logical statements or multiple expressions can be combined using Logical Operators.

  • Logical Operators:
    • A AND B: and
    • A OR B: or
    • A NOT B: not
x = 3
y = 7
z = 2
print(x == 7)
print(x > y)
print(x < y)

print(x < y and x > z)
print(x < y and x < z)
print(x < y or x < z)
False
False
True
True
False
True

You can also assign Booleans to a variable. Note that you do not use quotes, as that would cause the text to be treated as a string instead of a Boolean.

x = "True"
y = True
print(type(x))
print(type(y))
<class 'str'>
<class 'bool'>

Lists

Lists allow you to store multiple numbers, strings, or Booleans in a single variable. Square brackets are used to denote lists.

Items in a list are ordered, indexed, and allow for duplicate members. Indexing starts at 0. If counting from the end, you start at -1 and subtract as you move left. A colon can be used to denote a range of indexes, and an empty argument before the colon indicates to select all elements up to the element following the colon while an empty argument after the colon indicates to select the element at the index specified before the colon and all features up to the end of the list. The element at the last index is not included in the selection.

Python lists can contain elements of different data types.

lst1 = [6, 7, 8, 9, 11, 2, 0]
lst2 = ["A", "B", "C", "D", "E"]
lst3 = [True, False, True, True, True, False]
print(lst1[0])
print(lst1[0:3])
print(lst2[-4:-1])
print(lst2[:3])
print(lst2[3:])
lst4 = [1, 2, "A", "B", True]
print(type(lst4[0]))
print(type(lst4[2]))
print(type(lst4[4]))
6
[6, 7, 8]
['B', 'C', 'D']
['A', 'B', 'C']
['D', 'E']
<class 'int'>
<class 'str'>
<class 'bool'>

When the len() function is applied to a list, it will return the number of items or elements in the list as opposed to the number of characters. When applied to a string item in a list, this function will return the length of the string.

lst1 = ["A", "B", "C", "D", "E"]
print(len(lst1))
print(len(lst1[0]))
5
1

The code below shows some example methods for strings.

lst1 = ["A", "B", "C", "D", "E"]
lst1.append("F") #Add item to list
print(lst1)
lst1.remove("F") #Remove item from a list
print(lst1) 
lst1.insert(2, "ADD") #Add item to list at defined position 
print(lst1)
lst1.pop(2) #Remove item at specified index or the last item if no index is provided
print(lst1)
['A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E']
['A', 'B', 'ADD', 'C', 'D', 'E']
['A', 'B', 'C', 'D', 'E']

As explained above, in order to copy a list and not just reference the original data object, you must use the copy() or deepcopy() method. Simply setting a new variable equal to the original list will cause it to reference the original data object, so changes made to the old list will update to the new list. This is demonstrated in the example below.

lst1 = ["A", "B", "C", "D", "E"]
lst2 = lst1
lst3 = lst1.copy()
print(lst2)
print(lst3)
lst1.append("F")
print(lst2)
print(lst3)
['A', 'B', 'C', 'D', 'E']
['A', 'B', 'C', 'D', 'E']
['A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E']

Lists can be concatenated together, or a list can be appended to another list, using the methods demonstrated below.

lst1 = ["A", "B", "C"]
lst2 = ["D", "E", "F"]
lst3 = lst1 + lst2
print(lst1)
print(lst2)
print(lst3)
lst1.extend(lst2)
print(lst1)
['A', 'B', 'C']
['D', 'E', 'F']
['A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E', 'F']

Lastly, lists can contain other lists, tuples, or dictionaries, which will be discussed below. In the example, lst2 contains four elements, the last of which is a list with three elements.

lst1 = ["A", "B", "C"]
lst2 = ["D", "E", "F", lst1]
print(lst2)
['D', 'E', 'F', ['A', 'B', 'C']]

Tuples and Sets

Tuples are similar to lists in that they are ordered and allow duplicate elements. However, they cannot be altered by adding items, removing items, or changing the order of items. To differentiate them from lists, parenthesis are used as opposed to square brackets. I generally think of tuples as lists that are protected from alteration, so I tend to use them when I want to make sure I don't accidentally make changes.

If you need to change a tuple, it can be converted to a list, manipulated, then converted back to a tuple.

t1 = (1, 3, 4, 7)
print(type(t1))
<class 'tuple'>

A set is similar to a tuple or list. However, elements are unordered, not indexed, and no duplicate elements are allowed. Sets are defined using curly brackets. Since no indexing is included, elements cannot be selected using associated indices.

I find that I rarely use sets.

Dictionaries

Dictionaries are unordered, changeable, indexed, and do not allow for duplicate elements. In contrast to lists, tuples, and sets, each value is also assigned a key. Elements can be selected using the associated key.

You can also use the key to define a value to change.

Similar to lists, you must use the copy() or deepcopy() method to obtain a copy of the dictionary that will not reference the original data or memory object.

cls = {"prefix" : "Geography", "Number" : 661, "Name": "Web GIS"}
print(cls)
print(cls["Name"])
cls["Number"] = 461
print(cls)
{'prefix': 'Geography', 'Number': 661, 'Name': 'Web GIS'}
Web GIS
{'prefix': 'Geography', 'Number': 461, 'Name': 'Web GIS'}

Multiple dictionaries can be combined into a nested dictionary, as demonstrated below.

The keys can then be used to extract a sub-dictionary or an individual element from a sub-dictionary.

cls1 = {"prefix" : "Geography", "Number" : 150, "Name": "Digital Earth"}
cls2 = {"prefix" : "Geography", "Number" : 350, "Name": "GIScience"}
cls3 = {"prefix" : "Geography", "Number" : 455, "Name": "Introduction to Remote Sensing"}
cls4 = {"prefix" : "Geography", "Number" : 661, "Name": "Web GIS"}
clsT = {
    "class1" : cls1,
    "class2" : cls2,
    "class3" : cls3,
    "class4" : cls4
}
print(clsT)
print(clsT["class1"])
print(clsT["class1"]["Name"])
{'class1': {'prefix': 'Geography', 'Number': 150, 'Name': 'Digital Earth'}, 'class2': {'prefix': 'Geography', 'Number': 350, 'Name': 'GIScience'}, 'class3': {'prefix': 'Geography', 'Number': 455, 'Name': 'Introduction to Remote Sensing'}, 'class4': {'prefix': 'Geography', 'Number': 661, 'Name': 'Web GIS'}}
{'prefix': 'Geography', 'Number': 150, 'Name': 'Digital Earth'}
Digital Earth

Additional Types

Arrays

Arrays are similar to lists; however, they must be declared. They are sometimes used in place of lists as they can be very compact and easy to apply mathematical operations to. In this course, we will primarily work with NumPy arrays, which will be discussed in more detail in a later module.

Classes

Classes are used to define specific types of objects in Python and are often described as templates. Once a class is defined, it can be copied and manipulated to create a subclass, which will inherit properties and methods from the parent class but can be altered for specific uses. We will not explore this topic in detail in this module, but will return to it in the next module. You will also see example uses of classes in later modules.

One use of classes is to define specialized data models and their associated methods and properties. For example, classes have been defined to work with digital map data.

Before moving one, I wanted to note which data types are mutable and which are immutable. Again, data or objects that are mutable can be altered after they are defined (such as adding a new element to a list).

Mutable types include lists, sets, and dictionaries.

Immutable types include boolean, integer, float, complex, string, and tuple.

Concluding Remarks

My goal here was to provide a basic introduction to Python. Again, w3school.com is a great resource for coders, scientists, and web developers if you want to explore additional examples and topics.

You likely do not feel comfortable with general Python yet. However, you will get practice while working through the remaining modules. I think you will find that a good grasp of the basics can go a long way.

In the next section, we will discuss more components of the Python languages including functions, control flow, loops, modules, and reading data from disk.