Thursday, January 30, 2020

ANALYTICS AND MACHINE LEARNING

Analytics is a collection of techniques and tools used for creating value from data. Techniques include concepts such as artificial intelligence (AI), machine learning (ML), and deep learning (DL) algorithms.

AI, ML, and DL are defined as follows:

1. Artificial Intelligence: Algorithms and systems that exhibit human-like intelligence.
2. Machine Learning: Subset of AI that can learn to perform a task with extracted data and/or models.
3. Deep Learning: Subset of machine learning that imitate the functioning of human brain to solve problems.

The relationship between AI, ML, and DL can be visualized as shown in Figure below:




Image result for relationship between AI, ML, and DL

There is another school of thought that believes that AI and ML are different (ML is not a subset of AI) with some overlap. The important point is that all of them are algorithms, which are nothing but set of instructions used for solving business and social problems.

Machine learning is a set of algorithms that have the capability to learn to perform tasks such as prediction and classification effectively using data. Learning is achieved using additional data and/or additional models. An algorithm can be called a learning algorithm when it improves on a performance metric while performing a task, for example, accuracy of classification such as fraud, customer churn, and so on. Machine learning algorithms are classified into four categories as defined below:

Supervised Learning Algorithms: These algorithms require the knowledge of both the outcome variable (dependent variable) and the features (independent variable or input variables). The algorithm learns (i.e., estimates the values of the model parameters or feature weights) by defining a loss function which is usually a function of the difference between the predicted value and actual value of the outcome variable. Algorithms such as linear regression, logistic regression, discriminant analysis are examples of supervised learning algorithms. The prediction is achieved (by estimating feature weights) with the knowledge of the actual values of the outcome variables, thus called supervised learning algorithms. That is, the supervision is achieved using the knowledge of outcome variable values.

Unsupervised Learning Algorithms: These algorithms are set of algorithms which do not have the knowledge of the outcome variable in the dataset. The algorithms must find the possible values of the outcome variable. Algorithms such as clustering, principal component analysis are examples of unsupervised learning algorithms. Since the values of outcome variable are unknown in the training data, supervision using that knowledge is not possible.

Reinforcement Learning Algorithms: In many datasets, there could be uncertainty around both input as well as the output variables. For example, consider the case of spell check in various text editors. If a person types “buutiful” in Microsoft Word, the spell check in Microsoft Word will immediately identify this as a spelling mistake and give options such as “beautiful”, “bountiful”, and “dutiful”. Here the prediction is not one single value, but a set of values. Another definition is: Reinforcement learning algorithms are algorithms that have to take sequential actions (decisions) to maximize a cumulative reward. Techniques such as Markov chain and Markov decision process are examples of reinforcement learning algorithms.

Evolutionary Learning Algorithms: Evolutional algorithms are algorithms that imitate natural evolution to solve a problem. Techniques such as genetic algorithm and ant colony optimization fall under the category of evolutionary learning algorithms.




Share:

Wednesday, January 29, 2020

SQLAlchemy

SQLAlchemy is a library used to interact with a wide variety of databases. It enables you to create data models and queries in a manner that feels like normal Python classes and statements. Created by Mike Bayer in 2005, SQLAlchemy is used by many companies and is considered by many to be the de facto way of working with relational databases in Python.

Image result for SQLAlchemy
It can be used to connect to most common databases such as Postgres, MySQL, SQLite, Oracle, and many others. It also provides a way to add support for other relational databases as well. Amazon Redshift, which uses a custom dialect of PostgreSQL, is a great example of database support added by the community.

The top reason to use SQLAlchemy is to abstract your code away from the underlying database and its associated SQL peculiarities. SQLAlchemy leverages powerful common statements and types to ensure its SQL statements are crafted efficiently and properly for each database type and vendor without you having to think about it.

This makes it easy to migrate logic from Oracle to PostgreSQL or from an application database to a data warehouse. It also helps ensure that database input is sanitized and properly escaped prior to being submitted to the database. This prevents common issues like SQL injection attacks.
SQLAlchemy also provides a lot of flexibility by supplying two major modes of usage: SQL Expression Language (commonly referred to as Core) and ORM. These modes can be used separately or together depending on your preference and the needs of your application.
Image result for SQLAlchemy
The SQL Expression Language is a Pythonic way of representing common SQL statements and expressions, and is only a mild abstraction from the typical SQL language. It is focused on the actual database schema; however, it is standardized in such a way that it provides a consistent language across a large number of backend databases. The SQL Expression Language also acts as the foundation for the SQLAlchemy ORM.

The SQLAlchemy ORM is similar to many other object relational mappers (ORMs) you may have encountered in other languages. It is focused around the domain model of the application and leverages the Unit of Work pattern to maintain object state. It also provides a high-level abstraction on top of the SQL Expression Language that enables the user to work in a more idiomatic way. You can mix and match use of the ORM with the SQL Expression Language to create very powerful applications. The ORM leverages a declarative system that is similar to the active-record systems used by many other ORMs such as the one found in Ruby on Rails. While the ORM is extremely useful, you must keep in mind that there is a difference between the way classes can be related, and how the underlying database relationships work.


Share:

Monday, January 27, 2020

Three-Dimensional Plots

While matplotlib is primarily a 2D plotting package, it does have basic 3D plotting capabilities. To create a 3D plot, we need to import Axes3D from mpl_toolkits.mplot3d and then set the keyword projection to '3d' in a subplot call as shown below:

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

Different 2D and 3D subplots can be mixed within the same figure window by setting projection='3d' only in those subplots where 3D plotting is desired. Alternatively, all the subplots in a figure can be set to be 3D plots using the subplots function:

fig, ax = plt.subplots(subplot_kw={'projection': '3d'})

As you might expect, the third axis in a 3D plot is called the z-axis, and the same commands for labeling and setting the limits that work for the x and y axes also work for the z-axis.


The following program shows a wireframe and a surface plot of equation shown below:



import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D


def pmgauss(x, y):
    r1 = (x - 1) ** 2 + (y - 2) ** 2
    r2 = (x - 3) ** 2 + (y - 1) ** 2
    return 2 * np.exp(-0.5 * r1) - 3 * np.exp(-2 * r2)


a, b = 4, 3

x = np.linspace(0, a, 60)
y = np.linspace(0, b, 45)

X, Y = np.meshgrid(x, y)
Z = pmgauss(X, Y)

fig, ax = plt.subplots(1, 2, figsize=(9.2, 4),
                       subplot_kw={'projection': '3d'})
for i in range(2):
    ax[i].set_zlim(-3, 2)
    ax[i].xaxis.set_ticks(range(a + 1))  # manually set ticks
    ax[i].yaxis.set_ticks(range(b + 1))
    ax[i].set_xlabel(r'$x$')
    ax[i].set_ylabel(r'$y$')
    ax[i].set_zlabel(r'$f(x,y)$')
    ax[i].view_init(40, -30)

# Plot wireframe and surface plots.
fig.subplots_adjust(left=0.04, bottom=0.04, right=0.96,
                    top=0.96, wspace=0.05)
p0 = ax[0].plot_wireframe(X, Y, Z, rcount=40, ccount=40,
                          color='C1')
p1 = ax[1].plot_surface(X, Y, Z, rcount=50, ccount=50,
                        color='C1')
fig.subplots_adjust(left=0.0)
fig.savefig('./figures/wireframeSurfacePlots.pdf')
plt.show()

The resulting plot will be as follows:

The 3D wireframe and surface plots use the same meshgrid function to set up the x-y 2D arrays. The rcount and ccount keywords set the maximum number of rows and columns used to sample the input
data to generate the graph.

These plotting examples are just a sample of many kinds of plots that can be made by matplotlib. You need to explore further possibilities.



Share:

Streamline plots

matplotlib can also make streamline plots, which are sometimes called field line plots. The matplotlib function call to make such plots is streamplot, and its use is illustrated in the following program to plot the streamlines of the velocity field of a viscous liquid around a sphere falling through it at constant velocity u.:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Circle


def v(u, a, x, z):
    """Return the velocity vector field v = (vx, vy)
    around sphere at r = 0."""
    r = np.sqrt(x * x + z * z)
    R = a / r
    RR = R * R
    cs, sn = z / r, x / r
    vr = u * cs * (1.0 - 0.5 * R * (3.0 - RR))
    vtheta = -u * sn * (1.0 - 0.25 * R * (3.0 + RR))
    vx = vr * sn + vtheta * cs
    vz = vr * cs - vtheta * sn
    return vx, vz


# Grid of x, y points
xlim, zlim = 12, 12
nx, nz = 100, 100
x = np.linspace(-xlim, xlim, nx)
z = np.linspace(-zlim, zlim, nz)
X, Z = np.meshgrid(x, z)

# Set particle radius and velocity
a, u = 1.0, 1.0

# Velocity field vector, V=(Vx, Vz) as separate components
Vx, Vz = v(u, a, X, Z)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9, 4.5))

# Plot the streamlines using colormap and arrow style
color = np.log(np.sqrt(Vx * Vx + Vz * Vz))
seedx = np.linspace(-xlim, xlim, 18)  # Seed streamlines evenly
seedz = -zlim * np.ones(len(seedx))  # far from particle
seed = np.array([seedx, seedz])
ax1.streamplot(x, z, Vx, Vz, color=color, linewidth=1,
               cmap='afmhot', density=5, arrowstyle='-|>',
               arrowsize=1.0, minlength=0.4, start_points=seed.T)
ax2.streamplot(x, z, Vx, Vz - u, color=color, linewidth=1,
               cmap='afmhot', density=5, arrowstyle='-|>',
               arrowsize=1.0, minlength=0.4, start_points=seed.T)
for ax in (ax1, ax2):
    # Add filled circle for sphere
    ax.add_patch(Circle((0, 0), a, color='C0', zorder=2))
    ax.set_xlabel('$x$')
    ax.set_ylabel('$z$')
    ax.set_aspect('equal')
    ax.set_xlim(-0.7 * xlim, 0.7 * xlim)
    ax.set_ylim(-0.7 * zlim, 0.7 * zlim)
fig.tight_layout()
fig.savefig('./figures/stokesFlowStream.pdf')
plt.show()

The resulting plot is shown below:

The left plot is in the reference frame of the falling sphere and the right plot is in the laboratory frame
where the liquid very far from the sphere is at rest.

The program starts by defining a function that calculates the velocity field as a function of the lateral distance x and the vertical distance z. The function is a solution to the Stokes equation, which describes flow in viscous liquids at very low (zero) Reynolds number. The velocity field serves as the primary input into the matplotlib streamplot function.

The next step is to use NumPy’s meshgrid program to define the 2D grid of points at which the velocity field will be calculated, just as we did for the contour plots. After setting up the meshgrid arrays X and Z, we call the function we defined v(u, a, X, Z) to calculate the velocity field (line 31).

The streamplot functions are set up in lines 36–39 and called in lines 40–47. Note that for the streamplot function the input x-z coordinate arrays are 1D arrays but the velocity arrays Vx-Vz are 2D arrays.

The arrays seedx and seedx set up the starting points (seeds) for the streamlines. You can leave them out and streamplot will make its own choices based on the values you set for the density and minlength keywords. Here we have chosen them, along with the seed settings, so that all the streamlines are continuous across the plot. The other keywords set the properties for the arrow size and style, the width of the streamlines, and the coloring of the streamlines, in this case according to the speed at a given point.

Share:

Sunday, January 26, 2020

Contour plots

The principal matplotlib routines for creating contour plots are contour and contourf. Sometimes you would like to make a contour plot of a function of two variables; other times you may wish to make
a contour plot of some data you have. Of the two, making a contour plot of a function is simpler. The following program does so:


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm  # color maps
import matplotlib


def pmgauss(x, y):
    r1 = (x - 1) ** 2 + (y - 2) ** 2
    r2 = (x - 3) ** 2 + (y - 1) ** 2
    return 2 * np.exp(-0.5 * r1) - 3 * np.exp(-2 * r2)


a, b = 4, 3

x = np.linspace(0, a, 60)
y = np.linspace(0, b, 45)

X, Y = np.meshgrid(x, y)
Z = pmgauss(X, Y)

fig, ax = plt.subplots(2, 2, figsize=(9.4, 6.5),
                       sharex=True, sharey=True,
                       gridspec_kw={'width_ratios': [4, 5]})

CS0 = ax[0, 0].contour(X, Y, Z, 8, colors='k')
ax[0, 0].clabel(CS0, fontsize=9, fmt='%0.1f')
matplotlib.rcParams['contour.negative_linestyle'] = 'dashed'
ax[0, 0].plot(X, Y, 'o', ms=1, color='lightgray', zorder=-1)

CS1 = ax[0, 1].contourf(X, Y, Z, 12, cmap=cm.gray, zorder=0)
cbar1 = fig.colorbar(CS1, shrink=0.8, ax=ax[0, 1])
cbar1.set_label(label='height', fontsize=10)
plt.setp(cbar1.ax.yaxis.get_ticklabels(), fontsize=8)

lev2 = np.arange(-3, 2, 0.3)
CS2 = ax[1, 0].contour(X, Y, Z, levels=lev2, colors='k',
                       linewidths=0.5)
ax[1, 0].clabel(CS2, lev2[1::2], fontsize=9, fmt='%0.1f')

CS3 = ax[1, 1].contour(X, Y, Z, 10, colors='gray')
ax[1, 1].clabel(CS3, fontsize=9, fmt='%0.1f')
im = ax[1, 1].imshow(Z, interpolation='bilinear',
                     origin='lower', cmap=cm.gray,
                     extent=(0, a, 0, b))
cbar2 = fig.colorbar(im, shrink=0.8, ax=ax[1, 1])
cbar2.set_label(label='height', fontsize=10)
plt.setp(cbar2.ax.yaxis.get_ticklabels(), fontsize=8)

for i in range(2):
    ax[1, i].set_xlabel(r'$x$', fontsize=14)
    ax[i, 0].set_ylabel(r'$y$', fontsize=14)
    for j in range(2):
        ax[i, j].set_aspect('equal')
        ax[i, j].set_xlim(0, a)
        ax[i, j].set_ylim(0, b)
fig.subplots_adjust(left=0.06, bottom=0.07, right=0.99,
                    top=0.99, wspace=0.06, hspace=0.09)
fig.savefig('./figures/contour4.pdf')
plt.show()

The output below shows four different contour plots. All were produced using contour except the upper left plot which was produced using contourf.

All plot the same function, which is the sum of a pair of Gaussians, one positive and the other negative:



After defining the function to be plotted, the next step is to create the x-y array of points at which the function will be evaluated using np.meshgrid. We use np.linspace rather than np.arange to define the extent of the x-y mesh because we want the x range to go precisely from 0 to a=4 and the y range to go precisely from 0 to b=3. We use np.linspace for two reasons. First, if we use np.arange, the array of data points does not include the upper bound, while np.linspace does. This is important for producing the grayscale (or color) background that extends all the way to the upper limits of the x-y ranges in the upper-right plot, produced by contourf, of Figure shown above. Second, to produce smooth-looking contours, one generally needs about 40–200 points in each direction across the plot, irrespective of the absolute magnitude of the numbers being plotted. The number of points is directly specified by np.linspace but must be calculated for np.arange. We follow the convention that the meshgrid variables are capitalized, which seems to be a standard followed by many programmers. It’s certainly not necessary.

The upper-left contour plot takes the X-Y 2D arrays made using gridspec as its first two arguments and Z as its third argument. The third argument tells contour to make approximately 5 different levels
in Z. We give the contour object a name, as it is needed by the clabel call in the next line, which sets the font size and the format of the numbers that label the contours. The line style of the negative contours is set globally to be “dashed” by a call to matplotlib’s rcparams. We also plot the location of the X-Y grid created by gridspec just for the sake of illustrating its function; normally these would not be plotted.

The upper-right contour plot is made using contourf with 12 different Z layers indicated by the different gray levels. The gray color scheme is set by the keyword argument cmap, which here is set to the matplotlib.cm color scheme cm.gray. Other color schemes can be found in the matplotlib documentation by an internet search on “matplotlib choosing colormaps.” The color bar legend on the right is created by the colorbar method, which is attached to fig. It is associated with the upper right plot by the name CS1 of the contourf method and by the keyword argument ax=ax[0, 1]. Its size relative to the plot is determined by the shrink keyword. The font size of the color bar label is set using the generic set property method setp using a somewhat arcane but compact syntax.

For the lower-left contour plot CS2, we manually specify the levels of the contours with the keyword argument levels=lev2. We specify that only every other contour will be labeled numerically with
lev2[1::2] as the second argument of the clabel call in line 38; lev2[0::2] would also label every other contour, but the even ones instead of the odd ones.

The lower-right contour plot CS3 has 10 contour levels and a continuously varying grayscale background created using imshow. The imshow method uses only the Z array to determine the gray levels. The x-y extent of the grayscale background is determined by the keyword argment extent. By default, imshow uses the upper-left corner as its origin. We override the default using the imshow keyword argument origin='lower' so that the grayscale is consistent with the data. The keyword argument iterpolation tells imshow how to interpolate the grayscale between different Z levels.





Share:

Saturday, January 25, 2020

Contour and Vector Field Plots

matplotlib has extensive tools for creating and annotating two dimensional contour plots and vector field plots. A contour plot is used to visualize two-dimensional scalar functions, such as the electric potential V (x;y) or elevations h(x;y) over some physical terrain. Vector field plots come in different varieties. There are field line plots, which in some contexts are called streamline plots, that show the direction of a vector field over some 2D (x;y) range. There are also quiver plots, which consist essentially of a 2D grid of arrows, that give the direction and magnitude of a vector field over some 2D (x;y) range.

Lets see how to make a 2D grid of points.When plotting a function f (x) of a single variable, the first step is usually to create a one-dimensional x array of points, and then to evaluate and plot the function f (x) at those points, often drawing lines between the points to create a continuous curve. Similarly, when making a two dimensional plot, we usually need to make a two-dimensional x-y array of points, and then to evaluate and plot the function f (x;y), be it a scaler or vector function, at those points, perhaps with continuous curves to indicate the value of the function over the 2D surface.
Thus, instead of having a line of evenly spaced x points, we need a grid of evenly spaced x-y points. Fortunately, NumPy has a function np.meshgrid for doing just that. The procedure is first to make an xarray at even intervals over the range of x to be covered, and then to do the same for y. These two one-dimensional arrays are used as input to the np.meshgrid function, which makes a two- dimensional mesh. The following program shows how it works:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib

x = np.linspace(-1, 1, 5)
y = np.linspace(2, 6, 5)

X, Y = np.meshgrid(x, y)

plt.plot(X, Y, 'o')
plt.show()

The output of plot(X, Y, 'o') is a 2D grid of points, as shown in figure below.


matplotlib’s functions for making contour plots and vector field plots generally use the output of gridmesh as the 2D input for the functions to be plotted. In the next post we'll see how to create contour plots.
Share:

Wednesday, January 22, 2020

Bokeh library



In the world of data visualization, there are three main libraries using Python that dominate the market, and these are as follows:
  1. Matplotlib
  2. Seaborn
  3. Bokeh
The first two, Matplotlib and Seaborn, let you plot static plots—plots that do not change and plots that cannot be interacted with. These plots are useful and add value when performing exploratory data analysis, as they are quick and easy to implement and very fast to execute.

The third plotting library, Bokeh, lets you plot interactive plots—plots that change when the user interacts with them. These plots are useful when you want to give your audience a wide range of options and tools for inferring and looking at data from various angles. Bokeh has a few dependencies. In order to use Bokeh, ensure that the following packages are already installed:

NumPy
Jinja2
Six
Requests
Tornado >= 4.0
PyYaml
DateUtil

If you're using Python 2.7, ensure that you have all the afore mentioned packages along with: Futures. If you have all of your Python packages installed and managed using a distribution such as Anaconda, you can install Bokeh using your Bash Terminal or a Windows Prompt using the following code:

conda install bokeh

You can also install Bokeh using PyPi for Python 2 via the following code:

pip install bokeh

You can install Bokeh using PyPi for Python 3 via the following code:

pip3 install bokeh

If you already have Bokeh installed and require an update, simply enter the following code in your terminal or shell:

sudo pip3 install bokeh --upgrade

Once you have installed Bokeh, you will want to verify that it is correctly installed. In order to verify the installation and create all your Bokeh plots, you'll need a Jupyter Notebook. You can verify your installation of Bokeh by generating a simple line plot using a Jupyter Notebook with the following code:

from bokeh.plotting import figure, output_file, show
#HTML file to output your plot into
output_file("bokeh.html")
#Constructing a basic line plot
x = [1,2,3]
y = [4,5,6]
p = figure()
p.line(x,y)
show(p)


This should open up a new tab on your browser with a plot illustrated as follows:

Share:

Tuesday, January 21, 2020

Vectors

Abstractly, vectors are objects that can be added together (to form new vectors) and that can be multiplied by scalars (i.e., numbers), also to form new vectors. Concretely (for us), vectors are points in some finite-dimensional space. Although you might not think of your data as vectors, they are a good way to represent numeric data.

For example, if you have the heights, weights, and ages of a large number of people, you can treat your data as three-dimensional vectors (height, weight, age). If you’re teaching a class with four exams, you can treat student grades as four-dimensional vectors (exam1, exam2, exam3, exam4). The simplest from-scratch approach is to represent vectors as lists of numbers. A list of three numbers corresponds to a vector in three-dimensional space, and vice versa:

height_weight_age = [70, # inches,
170, # pounds,
40 ] # years


grades = [95, # exam1
80, # exam2
75, # exam3
62 ] # exam4


One problem with this approach is that we will want to perform arithmetic on vectors. Because Python lists aren’t vectors (and hence provide no facilities for vector arithmetic), we’ll need to build these arithmetic tools ourselves. So let’s start with that. To begin with, we’ll frequently need to add two vectors. Vectors add componentwise. This means that if two vectors v and w are the same length, their sum is just the vector whose first element is v[0] + w[0], whose second element is v[1] + w[1], and so on. (If they’re not the same length, then we’re not allowed to add them.)

For example, adding the vectors [1, 2] and [2, 1] results in [1 + 2, 2 + 1] or [3, 3]. This addition iss shown below:





We can easily implement this by zip-ing the vectors together and using a list comprehension to add the corresponding elements:

def vector_add(v, w):
"""adds corresponding elements"""
return [v_i + w_i
for v_i, w_i in zip(v, w)]


Similarly, to subtract two vectors we just subtract corresponding elements:

def vector_subtract(v, w):
"""subtracts corresponding elements"""
return [v_i - w_i
for v_i, w_i in zip(v, w)]


We’ll also sometimes want to componentwise sum a list of vectors. That is, create a new vector whose first element is the sum of all the first elements, whose second element is the sum of all the second elements, and so on. The easiest way to do this is by adding one vector at a time:

def vector_sum(vectors):
"""sums all corresponding elements"""
result = vectors[0] # start with the first vector
for vector in vectors[1:]: # then loop over the others
result = vector_add(result, vector) # and add them to the result
return result


If you think about it, we are just reduce-ing the list of vectors using vector_add, which means we can rewrite this more briefly using higher-order functions:

def vector_sum(vectors):
return reduce(vector_add, vectors)
or even:
vector_sum = partial(reduce, vector_add)


although this last one is probably more clever than helpful.

We’ll also need to be able to multiply a vector by a scalar, which we do simply by multiplying each element of the vector by that number:

def scalar_multiply(c, v):
"""c is a number, v is a vector"""
return [c * v_i for v_i in v]


This allows us to compute the componentwise means of a list of (same-sized) vectors:

def vector_mean(vectors):
"""compute the vector whose ith element is the mean of the
ith elements of the input vectors"""
n = len(vectors)
return scalar_multiply(1/n, vector_sum(vectors))


A less obvious tool is the dot product. The dot product of two vectors is the sum of their componentwise products:

def dot(v, w):
"""v_1 * w_1 + ... + v_n * w_n"""
return sum(v_i * w_i
for v_i, w_i in zip(v, w))


The dot product measures how far the vector v extends in the w direction. For example, if w = [1, 0] then dot(v, w) is just the first component of v. Another way of saying this is that it’s the length of the vector you’d get if you projected v onto w as shown in figure below:



Using this, it’s easy to compute a vector’s sum of squares:

def sum_of_squares(v):
"""v_1 * v_1 + ... + v_n * v_n"""
return dot(v, v)


Which we can use to compute its magnitude (or length):

import math
def magnitude(v):
return math.sqrt(sum_of_squares(v)) # math.sqrt is square root function 


We now have all the pieces we need to compute the distance between two vectors, defined as:



def squared_distance(v, w):

"""(v_1 - w_1) ** 2 + ... + (v_n - w_n) ** 2"""
return sum_of_squares(vector_subtract(v, w))
def distance(v, w):
return math.sqrt(squared_distance(v, w))


Which is possibly clearer if we write it as (the equivalent):

def distance(v, w):
return magnitude(vector_subtract(v, w))
Share: