The Basics

Welcome to this introductory lecture on Python for data science and geospatial data science. My primary goal is to provide an overview of the Python object-based programming language. After working through this module you will be able to:

  1. declare variables.
  2. explain the diffrence between Python data types.
  3. perform mathematical and logical operations.
  4. work with lists, tubles, sets, and dictionaries.
  5. apply appropriate methods to different data types.
  6. use If...Else statements, While Loops, and For Loops.
  7. define and use functions.
  8. access and use modules and libraries.
  9. work with local files and directories.

I assume that you have no prior experience with coding in general or Python specifically. However, you should have an understanding of GIS and spatial analysis and have completed a prior GIScience course.

For a more detailed discussion of general Python, please consult, which is a great resource for coders, scientists, and web developers.

Variables and Comments

I think of variables as containers that store information or data. For example, a variable could reference a file on your local machine, such as a vector or raster geospatial layer, or a set of numbers. Once you create a variable, you can use it in processes and analyses. Note that there are some rules for variable names:

  • cannot start with a number or special character (or, can only start with a letter or an underscore) (for example, x1 is valid while 1x is not.).
  • can only contain letters, numbers, or underscores. No other special characters are allowed (for example, x, x1, _x, and x1_ are all valid. x$ is not valid.).
  • are case-sensitive (for example, x1 and X1 are two separate variables).

The code below provides examples for defining variables. Since Python is an object-based language, you will primarily interact with your data using variables.

x = 1
y = "GIS"
x1 = 2
y1 = "Remote Sensing"
_x = 3
_y = "Web GIS"
Remote Sensing

You can also assign data to multiple variables as a single line of code as demonstrated below. Note that I am reusing variable names; in Python variable names are dynamic, so you can overwrite them. This can, however, be problematic if you overwrite a variable name accidentally. So, use unique names if you do not want to overwrite prior variables.

x, x1, _x = 1, 2, 3

Assignment Operators are used to assign values to variables. The most commonly used operator is =. However, there are some other options that allow variables to be manipulated mathematically with the resulting value saved back to the original variable. These additional assignment operators can be useful, but we will use the = operator most of the time.

x = 2
x += 3

x = 2
x -= 3

x = 2
x *= 3

x = 2
x /= 3

x = 2
x **= 3

Comments are used to make your code more readable and are not interpreted by the computer. Instead, they are skipped and meant for humans. Different languages use different syntax to denote comments. Python uses the hashtag or pound sign. You can add comments as new lines or following code on the same line. Unfortunately, Python does not have specific syntax for multi-line comments. However, this can be accomplished by adding hashtags before each separate line or using a multi-line string that is not assigned to a variable. Examples are shown below.

It is generally a good idea to comment your code for later use and for use by others.

#Single-line comment
x = 1
y = 2 #Another single-line comment
z = 3
Another multi-line comment
w = 4

Data Types

A variety of data types are available in Python to store and work with a variety of input. Below are explanations of the data types which you will use most often. There are additional types that we will not discuss.

When creating a variable, it is not generally necessary to explicitly define the data type. However, this can be accomplished using constructor functions if so desired. Contructor functions can also be used to change the data type of a variable, a process known as casting. Available constructor methods include str(), int(), float(), complex(), list(), tuble(), dict(), set(), and bool().

To determine the data type, you can use the type() function. See the examples below where I convert an interger to a float and then a float to a string.

  • Numeric
    • Int = whole numbers
    • Float = numbers with decimal values
    • Complex = can include imaginary numbers
  • Text
    • String = characters or numbers treated as characters
  • Boolean
    • Boolean = logical True or False
  • Sequence
    • List = list of features that can be re-ordered, allows for duplicates, and is indexed
    • Tuble = list of features that cannot be re-ordered, allows for duplicates, and is indexed
  • Mapping
    • Dictionary = list of features that can re-ordered, does not allow duplicates, is indexed, and contains key and value pairs
  • Set
    • Set = list of features that are unordered, unindexed, and does not allow for duplicates
#Create a variable and check the data type.
x = 1
#Change the data type
x = float(x)
x= str(x)
<class 'int'>
<class 'float'>
<class 'str'>


Regardless of the the type (integer, float, or complex), numbers are defined without using quotes. If a number is placed in quotes it will be treated as a string as demonstrated below. This is important, since this will change the behavior of the data. In the example, x represents 1 as a number while y represents "1" as a string (note the quotes). Adding x to itself will yield 2 (1 + 1). Adding y to itself will yield "11", or the two strings are combined or concatenated.

#Create variables
x = 1
y = "1"
print(x + x)
print(y + y)

Numbers support mathematical operations, as demonstrated below. If you are not familiar with these concepts, modulus will return the remainder after division while floor division will round down to the nearest whole number after division.

If a whole number has zero decimal values included (1 vs. 1.0), this implies that the output is in the float data type as opposed to integer type.

x = 4
y = 3
print(x + y) #Addition
print(x - y) #Subtraction
print(x * y) #Multiplication
print(x / y) #Division
print(x % y) #Modulus
print(x ** y) #Exponentiation
print(x // y) #Floor Division



Strings are defined using single or double quotes. If quotes are included as part of the text or string, then you can use the other type to define the data as text. Again, numbers placed in quotes will be treated as strings.

x = "GIS"
y = "That's great" #Must use double quotes since a single quote is use in the string.
z = "2" #Number treated as a string

Portions of a string can be sliced out using indexes. Note that Python starts indexing at 0 as opposed to 1. So, the first character is at index 0 as opposed to index 1. Negative indexes can be used to index relative to the end of the string. In this case, the last character has an index of -1.

Indexes combined with square brackets can be used to slice strings. Note that the last index specified in the selection or range will not be included and that spaces are counted in the indexing.

x = "Geography 350: GIScience"

Strings can be combined or concatenated using the addition sign. If you want to inlude a number in the string output, you can cast it to a string using str(). In the example below, note the use of blank spaces so that the strings are not ran togehter.

The len() function can be used to return the length of the string, which will count blank spaces along with characters.

x = "Geography"
y = 350
z = ":"
w = "GIScience"
strng1 = x + " " + str(y) + z + " " + w
Geography 350: GIScience

A method is a function that belongs to or is associated with an object. Or, they allow you to work with or manipulate the object in some way. Data types have default methods that can be applied to them.

Methods applicable to srings are demonstrated below. Specifically, methods are being used to change the case and split the string at each space to save each component to a list.

x = "Geography 657: Remote Sensing Principles"
lst1 = x.split(" ")
geography 657: remote sensing principles
['Geography', '657:', 'Remote', 'Sensing', 'Principles']

When generating strings, issues arise when you use characters that have special uses or meaning in Python. These issues can be alleviated by including an escape character or backslash as demonstrated below.

s1 = "Issue with \"quotes\" in the string."
s2 = "C:\\data\\project_1" #Issue with file paths. 
s3 = "Add a new line \nto text string"
Issue with "quotes" in the string.
Add a new line 
to text string


Booleans can only be True or False and are often returned when an expression is logically evaluated. A variety of comparison operators are available. Note the use of the double equals; a single equals cannot be used since it is already used for variable assignment, or is an assignment operator, and would thus be ambiguous.

  • Comparison Operators
    • Equal: ==
    • Not Equal: !=
    • Greater Than: >
    • Greater Than or Equal To: >=
    • Less Than: <
    • Less Than or Equal To: <=

Logical statements or multiple expressions can be combined using Logical Operators.

  • Logical Operators:
    • A AND B: and
    • A OR B: or
    • A NOT B: not
x = 3
y = 7
z = 2
print(x == 7)
print(x > y)
print(x < y)

print(x  < y and x > z)
print(x < y and x < z)
print(x < y or x < z)

You can also assign Booleans to a variable. Note that you do not use quotes, as that would cause the text to be treated as a string instead of a Boolean.

x = "True"
y = True
<class 'str'>
<class 'bool'>


Lists allow you to store multiple numbers, strings, or Booleans in a single variable. Square brackets are used to denote lists.

Items in a list are ordered, indexed, and allow for duplicate members. Indexing starts at 0. If counting from the end, you start at -1 and subtract as you move left. A colon can be used to denote a range of indexes, and an empty argument before the colon will indicate to select all elements up to the element following the colon while an empty argument after the colon indicates to select the element at the index specified before the colon and all features up to the end of the list. The element at the last index is not included in the selection.

Python lists can contain elements of different data types.

lst1 = [6, 7, 8, 9, 11, 2, 0]
lst2 = ["A", "B", "C", "D", "E"]
lst3 = [True, False, True, True, True, False]
lst4 = [1, 2, "A", "B", True]
[6, 7, 8]
['B', 'C', 'D']
['A', 'B', 'C']
['D', 'E']
<class 'int'>
<class 'str'>
<class 'bool'>

When the len() function is applied to a list, it will return the number of items or elements in the list as opposed to the number of characters. When applied to a string item in a list, this function will return the length of the string.

lst1 = ["A", "B", "C", "D", "E"]

The code below shows some example methods for strings.

lst1 = ["A", "B", "C", "D", "E"]
lst1.append("F") # Add item to list
lst1.remove("F") # Remove item from a list
lst1.insert(2, "ADD") # Add item to list at defined position 
lst1.pop(2) #Remove item at specified index or the last item if no index is provided
['A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E']
['A', 'B', 'ADD', 'C', 'D', 'E']
['A', 'B', 'C', 'D', 'E']

In order to copy a list, you must use the copy() method. Simply setting a new variable equal to the original list will cause it to reference the original list, so changes made to the old list will update to the new list. This is demonstrated in the example below.

lst1 = ["A", "B", "C", "D", "E"]
lst2 = lst1
lst3 = lst1.copy()
['A', 'B', 'C', 'D', 'E']
['A', 'B', 'C', 'D', 'E']
['A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E']

Lists can be concatenated together, or a list can be appended to another list, using the methods demonstrated below.

lst1 = ["A", "B", "C"]
lst2 = ["D", "E", "F"]
lst3 = lst1 + lst2
['A', 'B', 'C']
['D', 'E', 'F']
['A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E', 'F']

Tubles and Sets

Tubles are similar to lists in that they are ordered and allow duplicate elements. However, they cannot be altered by adding items, removing items, or changing the order of items. To differentiate them from lists, parenthesis are used as opposed to square brackets. I generally think of tubles as lists that are protected from alteration, so I tend to use them when I want to make sure I don't accidentally make changes.

If you need to change a tuble, it can be converted to a list, manipulated, then converted back to a tuble.

t1 = (1, 3, 4, 7)
<class 'tuple'>

A set is similar to a tuble or list. However, elements are unordered, not indexed, and no duplicate elements are allowed. Sets are defined using curly brackets. Since no indexing is included, elements cannot be selected using an index.

I find that I rarely use sets.


Dictionaries are unordered, changeable, indexed, and do not allow for dublicate elements. In contrast to lists, tubles, and sets, each value is also assigned a key. Elements are selected using the associated key.

You can also use the key to define a value to change.

Similar to lists, you must use the copy() method to obtain a copy of the dictionary that will not reference the original dictionary.

cls = {"prefix" : "Geography", "Number" : 661, "Name": "Web GIS"}
cls["Number"] = 461
{'prefix': 'Geography', 'Number': 661, 'Name': 'Web GIS'}
{'prefix': 'Geography', 'Number': 461, 'Name': 'Web GIS'}

Multiple dictionaries can be combined into a nested dictionary, as demonstrated below.

The keys can then be used to extract a sub-dictionary or an individual element from a sub-dictionary.

cls1 = {"prefix" : "Geography", "Number" : 150, "Name": "Digital Earth"}
cls2 = {"prefix" : "Geography", "Number" : 350, "Name": "GIScience"}
cls3 = {"prefix" : "Geography", "Number" : 455, "Name": "Introduction to Remote Sensing"}
cls4 = {"prefix" : "Geography", "Number" : 661, "Name": "Web GIS"}
clsT = {
    "class1" : cls1,
    "class2" : cls2,
    "class3" : cls3,
    "class4" : cls4
{'class1': {'prefix': 'Geography', 'Number': 150, 'Name': 'Digital Earth'}, 'class2': {'prefix': 'Geography', 'Number': 350, 'Name': 'GIScience'}, 'class3': {'prefix': 'Geography', 'Number': 455, 'Name': 'Introduction to Remote Sensing'}, 'class4': {'prefix': 'Geography', 'Number': 661, 'Name': 'Web GIS'}}
{'prefix': 'Geography', 'Number': 150, 'Name': 'Digital Earth'}
Digital Earth

Additional Types


Arrays are similar to lists; however, they must be declared. They are sometimes used in place of lists as they can be very compact and easy to apply mathematical operations to. In this course, we will primarily work with NumPy arrays, which will be discussed in more detail in the next module.


Classes are used to define specific types of objects in Python and are often described as templates. Once a class is defined, it can be copied and manipulated to create a subclass, which will inherit properties and methods from the parent class but can be altered for specific uses. We will not explore this topic in detail in this module, but you will see examples of classes in later modules.

One use of classes is to define specialized data models and their associated methods and properties. For example, classes have been defined to work with geospatial data types.

If...Else and Loops


All coding languages allow for control flow in which different code is executed depending on a condition. If...Else statements are a key component of how this is implemented. Using logical conditions that evaluate to True or False, it is possible to program different outcomes.

The first example uses only if. So, if the condition evaluates to True, then the remaining code will be executed. If it evaluates to False then nothing is executed or returned. In this case, the condition evaluates to True, so the text is printed.

This is a good time to stop and discuss indentation. Python makes use of whitespace, indentations, or tabs to denote or interpret units of code. This is uncommon, as most other languages use brackets or punctuation of some kind. So, it is important to properly use indentations or you code will fail to execute correctly or at all.

x = 7
if x > 6:
    print(str(x) + " is greater than 6.")
7 is greater than 6.

It is common to have a default statement that is executed if the condition evaluates to False as opposed to simply doing nothing. This is the use of an else statement. No condition is required for the else statement since it will be executed for any case where the if condition evaluates to False.

x = 3
if x > 6:
    print(str(x) + " is greater than 6.")
    print(str(x) + " is less than or equal to 6.")
3 is less than or equal to 6.

What if you want to evaluate against more than one condion? This can be accomplished by incorporating one or multiple elif statements.

x = 6
if x > 6:
    print(str(x) + " is greater than 6.")
elif x == 6:
     print(str(x) + " is equal to 6.")
    print(str(x) + " is less than 6.")
6 is equal to 6.

While Loop

While loops are used to loop code as long as a condition evaluates to True. In the example below, a variable i is initially set to 14. The loop executes as long as i remains larger than 7. Note that at the end of each loop the -= assignment operator is used to subtract 1 from i. Also, i is simply a variable, so you do not need to use i specifically. For example, i could be replaced with x.

One issue with a while loop is the possibility of an infinite loop in which the loop never stops because the condition never evaluates to False. For example, if we change the assignment operator to +=, the conditon would continue to evaluate to True indefinitely.

i = 14
while i > 7:
    i -= 1

For Loop

For Loops will execute code for all items in a sequence, such as a list, tuble, dictionary, or set. In the example below, a for loop is being used to print every value in a list.

lst1 = [3, 6, 8, 9, 11, 13]
for i in lst1:
    print("Value is: " + str(i))
Value is: 3
Value is: 6
Value is: 8
Value is: 9
Value is: 11
Value is: 13

Combining a for loop and If...Else statements allows for different code to be executed for each element in a sequence, such as a list, based on condtions as demonstrated in the code below. In later modules, you will see example use cases for working with and analyzing spatial data. Note the levels of indentation used, which, again, are very important and required when coding in Python.

lst1 = [3, 6, 8, 9, 11, 13]
for i in lst1:
    if i < 8:
        print(str(i) + " is less than 8.")
    elif i == 8:
        print(str(i) + " is equal to 8.")
        print(str(i) + " is greater than 8.")
3 is less than 8.
6 is less than 8.
8 is equal to 8.
9 is greater than 8.
11 is greater than 8.
13 is greater than 8.



Functions do something when called. They are similar to methods except not tied to a specific object.

I am generating a simple function in the example below that multiplies two numbers together. The def keywork is used to define a function. Within the parenthesis, a list of parameters can be provided. In this case, the function accepts two parameters: a and b. On the next line, indented, and after the colon, what the function does is defined. In this case, the funciton simply returns the product of the two values.

Once a function is created, it can be used. In the example, I provide two arguments, or values assigned to the parameters, and save the result to a variable x.

When using a function, it is also possible to provide the arguments as key and value pairs, as in the second example.

When defining parameters, default values can be provided. If arguments are not provided, then the default values will be used.

#Example 1
def multab(a,b):
    return a*b

x = multab(3,6)

#Example 2
x = multab(a=5, b=3)

#Example 3
def multab(a=1,b=1):
    return a*b
x = multab()


A lambda function is a special function case that is generally used for simple functions that are anonymous, or not named. They can accepted multiple arguments but can only include one expression. Lambda functions are commonly used inside of other functions.

lam1 = lambda a, b, c: str(a) + " " + str(b) + " " + str(c)
print(lam1("Geospatial", "Data", "Science"))
Geospatial Data Science


Variables are said to have global scope if they can be accessed anywhere in the code. In contrast, local scope implies that variables are only accessible in portions of the code. For example, by default new variables defined within a function cannot be called or used outside of the function. If you need to specify a variable within a function as having global scope, the global keyword can be used.

Math Module

The functionality of Python is expanded using modules. Modules represent sets of code and tools and are combined into libraries. In this class, we will explore several modules or libraries used for data science including NumPy, Pandas, Matplotlib, and scikit-learn. We will also explore libraries for geospatial data analysis including GeoPandas and Rasterio.

As an introduction to modules and libraries, we will now explore the math module. To use a module, it first needs to be imported. You can then use the methods provided by the module. When using a method from the math module, you must include the module name as demonstrated below.

import math

x = 4.318

You can also provide an alias or shorthand name for the module when importing it by using the as keyword. This can be used to simplify code.

import math as m

x = 4.318

Working with Files

Read Files

As a data scientist or geospatial data scientist, you will need to be able to use Python to work with and analyze files on your local machine. First, a file or folder path can be assigned to a variable. On a Windows machine, you will need to either (1) change the backslashes to forward slashes or (2) use the backslash escape character ahead of any backslash.

txt1 = "SET YOUR FILE PATH HERE" #Must change backslash to forward slash.
txt1 = "SET YOUR FILE PATH HERE" #Or, use the backslash escape character. 

Specific modules or libraries will allow you to read in and work with specific types of data. For example, in the code below I am using Pandas to read in a comma-separated values (CSV) file as a Pandas DataFrame.

import numpy as np
import pandas as pd
movies_df = pd.read_csv("SET YOUR FILE PATH HERE", sep=",", header=0, encoding="ISO-8859-1")
Movie Name Director Release Year My Rating Genre Own
0 Almost Famous Cameron Crowe 2000 9.99 Drama Yes
1 The Shawshank Redemption Frank Darabont 1994 9.98 Drama Yes
2 Groundhog Day Harold Ramis 1993 9.96 Comedy Yes
3 Donnie Darko Richard Kelly 2001 9.95 Sci-Fi Yes
4 Children of Men Alfonso Cuaron 2006 9.94 Sci-Fi Yes
5 Annie Hall Woody Allen 1977 9.93 Comedy Yes
6 Rushmore Wes Anderson 1998 9.92 Independent Yes
7 Memento Christopher Nolan 2000 9.91 Thriller Yes
8 No Country for Old Men Joel and Ethan Coen 2007 9.90 Thriller Yes
9 Seven David Fincher 1995 9.88 Thriller Yes

The example below demonstrates one of many methods for reading in an image and adding it to the display.

from matplotlib import pyplot as plt
import cv2

img = cv2.imread("SET YOUR FILE PATH HERE")

<matplotlib.image.AxesImage at 0x1f798350288>


Working with Directories

Instead of reading in individual files, you may want to access entire lists of files in a directory. The example below demonstrates one method for accomplishing this using the os module and list comprehension. Specifically, it will find all TXT files in a directory and write their names to a list.

Only the file name is included in the generated list, so I use additonal list comprehension to add the full file path and generate a new list.

import os


files = os.listdir(direct)
files_txt = [i for i in files if i.endswith('.txt')]

txtlst = [direct + s for s in files_txt]
['t1.txt', 't2.txt', 't3.txt', 't4.txt', 't5.txt']
['PATH/text_filest1.txt', 'PATH/text_filest2.txt', 'PATH/text_filest3.txt', 'PATH/text_filest4.txt', 'PATH/text_filest5.txt']

The code below demonstrates three other methods for reading in a list of TXT files from a directory. The first method uses the listdir() method from the os module, the second uses the walk() method form the os module (which allows for recursive searching within subdirectories), and the last method uses the glob module.

You will see many other examples in this course of how to read files and file lists.

from os import listdir

def list_files1(directory, extension):
    return (f for f in listdir(directory) if f.endswith('.' + extension))

from os import walk

def list_files2(directory, extension):
    for (dirpath, dirnames, filenames) in walk(directory):
        return (f for f in filenames if f.endswith('.' + extension))

from glob import glob
from os import getcwd, chdir

def list_files3(directory, extension):
    saved = getcwd()
    it = glob('*.' + extension)
    return it

direct = "D:/Dropbox/Teaching_WVU/Python_Data_Science/data/text_files"
method1 = list(list_files1(direct, "txt"))
method2 = list(list_files2(direct, "txt"))
method3 = list_files3(direct, "txt")
['t1.txt', 't2.txt', 't3.txt', 't4.txt', 't5.txt']
['t1.txt', 't2.txt', 't3.txt', 't4.txt', 't5.txt']
['t1.txt', 't2.txt', 't3.txt', 't4.txt', 't5.txt']

Concluding Remarks

My goal here was to provide a basic introduction to Python. Again, is a great resource for coders, scientists, and web developers if you want to explore additional examples and topics.

You likely do not feel comfortable with general Python yet. However, you will get practice while working through the remaining modules. I think you will find that a good grasp of the basics can go along way.

In the next section, we will discuss two libraries that are central to data science in Python: NumPy and Pandas.