The post A Gentle Introduction To Vector Valued Functions appeared first on Machine Learning Mastery.

]]>In this tutorial, you will discover what vector valued functions are, how to define them and some examples.

After completing this tutorial, you will know:

- Definition of vector valued functions
- Derivatives of vector valued functions

Let’s get started.

This tutorial is divided into two parts; they are:

- Definition and examples of vector valued functions
- Differentiating vector valued functions

A vector valued function is also called a vector function. It is a function with the following two properties:

- The domain is a set of real numbers
- The range is a set of vectors

Vector functions are, therefore, simply an extension of scalar functions, where both the domain and the range are the set of real numbers.

In this tutorial we’ll consider vector functions whose range is the set of two or three dimensional vectors. Hence, such functions can be used to define a set of points in space.

Given the unit vectors i,j,k parallel to the x,y,z-axis respectively, we can write a three dimensional vector valued function as:

r(t) = x(t)i + y(t)j + z(t)k

It can also be written as:

r(t) = <x(t), y(t), z(t)>

Both the above notations are equivalent and often used in various textbooks.

We defined a vector function r(t) in the preceding section. For different values of t we get the corresponding (x,y,z) coordinates, defined by the functions x(t), y(t) and z(t). The set of generated points (x,y,z), therefore, define a curve called the space curve C. The equations for x(t), y(t) and z(t) are also called the parametric equations of the curve C.

This section shows some examples of vector valued functions that define space curves. All the examples are also plotted in the figure shown after the examples.

Let’s start with a simple example of a vector function in 2D space:

r_1(t) = cos(t)i + sin(t)j

Here the parametric equations are:

x(t) = cos(t)

y(t) = sin(t)

The space curve defined by the parametric equations is a circle in 2D space as shown in the figure. If we vary t from -𝜋 to 𝜋, we’ll generate all the points that lie on the circle.

We can extend the r_1(t) function of example 1.1, to easily generate a helix in 3D space. We just need to add the value along the z axis that changes with t. Hence, we have the following function:

r_2(t) = cos(t)i + sin(t)j + tk

We can also define a curve called the twisted cubic with an interesting shape as:

r_3(t) = ti + t^2j + t^3k

We can easily extend the idea of the derivative of a scalar function to the derivative of a vector function. As the range of a vector function is a set of vectors, its derivative is also a vector.

If

r(t) = x(t)i + y(t)j + z(t)k

then the derivative of r(t) is given by r'(t) computed as:

r'(t) = x'(t)i + y'(t)i + z'(t)k

We can find the derivatives of the functions defined in the previous example as:

The parametric equation of a circle in 2D is given by:

r_1(t) = cos(t)i + sin(t)j

Its derivative is therefore computed by computing the corresponding derivatives of x(t) and y(t) as shown below:

x'(t) = -sin(t)

y'(t) = cos(t)

This gives us:

r_1′(t) = x'(t)i + y'(t)j

r_1′(t) = -sin(t)i + cos(t)j

The space curve defined by the parametric equations is a circle in 2D space as shown in the figure. If we vary t from -𝜋 to π, we’ll generate all the points that lie on the circle.

Similar to the previous example, we can compute the derivative of r_2(t) as:

r_2(t) = cos(t)i + sin(t)j + tk

r_2′(t) = -sin(t)i + cos(t)j + k

The derivative of r_3(t) is given by:

r_3(t) = ti + t^2j + t^3k

r_3′(t) = i + 2tj + 3t^2k

All the above examples are shown in the figure, where the derivatives are plotted in red. Note the circle’s derivative also defines a circle in space.

Once you gain a basic understanding of these functions, you can have a lot of fun defining various shapes and curves in space. Other popular examples used by the mathematical community are defined below and illustrated in the figure.

**The toroidal spira**l:

r_4(t) = (4 + sin(20t))cos(t)i + (4 + sin(20t))sin(t)j + cos(20t)k

**The trefoil knot**:

r_5(t) = (2 + cos(1.5t)cos (t)i + (2 + cos(1.5t))sin(t)j + sin(1.5t)k

**The cardioid:**

r_6(t) = cos(t)(1-cos(t))i + sin(t)(1-cos(t))j

Vector valued functions play an important role in machine learning algorithms. Being an extension of scalar valued functions, you would encounter them in tasks such as multi-class classification and multi-label problems. Kernel methods, an important area of machine learning, can involve computing vector valued functions, which can be later used in multi-task learning or transfer learning.

This section lists some ideas for extending the tutorial that you may wish to explore.

- Integrating vector functions
- Projectile motion
- Arc length in space
- Kernel methods for vector output

If you explore any of these extensions, I’d love to know. Post your findings in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

- Additional resources on Calculus Books for Machine Learning

- Thomas’ Calculus, 14th edition, 2017. (based on the original works of George B. Thomas, revised by Joel Hass, Christopher Heil, Maurice Weir)
- Calculus, 3rd Edition, 2017. (Gilbert Strang)
- Calculus, 8th edition, 2015. (James Stewart)

In this tutorial, you discovered what vector functions are and how to differentiate them.

Specifically, you learned:

- Definition of vector functions
- Parametric curves
- Differentiating vector functions

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction To Vector Valued Functions appeared first on Machine Learning Mastery.

]]>The post Differential and Integral Calculus – Differentiate with Respect to Anything appeared first on Machine Learning Mastery.

]]>Integral calculus is the second half of the calculus journey that we will be exploring.

In this tutorial, you will discover the relationship between differential and integral calculus.

After completing this tutorial, you will know:

- The concepts of differential and integral calculus are linked together by the fundamental theorem of calculus.
- By applying the fundamental theorem of calculus, we can compute the integral to find the area under a curve.
- In machine learning, the application of integral calculus can provide us with a metric to assess the performance of a classifier.

Let’s get started.

This tutorial is divided into three parts; they are:

- Differential and Integral Calculus – What is the Link?
- The Fundamental Theorem of Calculus
- The Sweeping Area Analogy
- The Fundamental Theorem of Calculus – Part 1
- The Fundamental Theorem of Calculus – Part 2

- Integration Example
- Application of Integration in Machine Learning

In our journey through calculus so far, we have learned that differential calculus is concerned with the measurement of the rate of change. We have also discovered differentiation, and applied it to different functions from first principles. We have even understood how to apply rules to arrive to the derivative faster.

But we are only half way through the journey.

From A twenty-first-century vantage point, calculus is often seen as the mathematics of change. It quantifies change using two big concepts: derivatives and integrals. Derivatives model rates of change … Integrals model the accumulation of change …– Page 141, Infinite Powers, 2020.

Recall having said that calculus comprises two phases: cutting and rebuilding.

The cutting phase breaks down a curved shape into infinitesimally small and straight pieces that can be studied separately, such as by applying derivatives to model their rate of change, or *slope*.

This half of the calculus journey is called *differential* calculus, and we have already looked into it in some detail.

The rebuilding phase gathers the infinitesimally small and straight pieces, and sums them back together in an attempt to study the original whole. In this manner, we can determine the area or volume of regular and irregular shapes after having cut them into infinitely thin slices. This second half of the calculus journey is what we shall be exploring next. It is called *integral* calculus.

The important theorem that links the two concepts together is called the *fundamental theorem of calculus*.

In order to work our way towards understanding the fundamental theorem of calculus, let’s revisit the car’s position and velocity example:

In computing the derivative we had solved the *forward* problem, where we found the velocity from the slope of the position graph at any time, *t*. But what if we would like to solve the *backward* problem, where we are given the velocity graph, *v*(*t*), and wish to find the distance travelled? The solution to this problem is to calculate the *area under the curve* (the shaded region) up to time, *t*:

We do not have a specific formula to define the area of the shaded region directly. But we can apply the mathematics of calculus to cut the shaded region under the curve into many infinitely thin rectangles, for which we have a formula:

If we consider the *i*^{th} rectangle, chosen arbitrarily to span the time interval Δ*t*, we can define its area as its length times its width:

area_of_rectangle = *v*(*t** _{i}*) Δ

We can have as many rectangles as necessary in order to span the interval of interest, which in this case is the shaded region under the curve. For simplicity, let’s denote this closed interval by [*a*, *b*]. Finding the area of this shaded region (and, hence, the distance travelled), then reduces to finding the sum of the *n* number of rectangles:

total_area = *v*(*t*_{0}) Δ*t*_{0}* + v*(*t*_{1}) Δ*t*_{1}* + … + v*(*t*_{n}) Δ*t*_{n}

We can express this sum even more compactly by applying the Riemann sum with sigma notation:

If we cut (or divide) the region under the curve by a finite number of rectangles, then we find that the Riemann sum gives us an *approximation* of the area, since the rectangles will not fit the area under the curve exactly. If we had to position the rectangles so that their upper left or upper right corners touch the curve, the Riemann sum gives us either an underestimate or an overestimate of the true area, respectively. If the midpoint of each rectangle had to touch the curve, then the part of the rectangle protruding above the curve *roughly* compensates for the gap between the curve and neighbouring rectangles:

The solution to finding the *exact* area under the curve, is to reduce the rectangles’ width so much that they become *infinitely* thin (recall the Infinity Principle in calculus). In this manner, the rectangles would be covering the entire region, and in summing their areas we would be finding the *definite integral*.

The definite integral (“simple” definition): The exact area under a curve between t = a and t = b is given by the definite integral, which is defined as the limit of a Riemann sum …– Page 227, Calculus for Dummies, 2016.

The definite integral can, then, be defined by the Riemann sum as the number of rectangles, *n*, approaches infinity. Let’s also denote the area under the curve by *A*(*t*). Then:

Note that the notation now changes into the integral symbol, ∫, replacing sigma, Σ. The reason behind this change is, merely, to indicate that we are summing over a huge number of thinly sliced rectangles. The expression on the left hand side reads as, the integral of *v*(*t*) from *a* to *b*, and the process of finding the integral is called *integration*.

Perhaps a simpler analogy to help us relate integration to differentiation, is to imagine holding one of the thinly cut slices and dragging it rightwards under the curve in infinitesimally small steps. As it moves rightwards, the thinly cut slice will sweep a larger area under the curve, while its height will change according to the shape of the curve. The question that we would like to answer is, at which *rate* does the area accumulate as the thin slice sweeps rightwards?

Let *dt* denote each infinitesimal step traversed by the sweeping slice, and *v*(*t*) its height at any time, *t*. Then the infinitesimal area, *dA*(*t*), of this thin slice can be found by multiplying its height, *v*(*t*), to its infinitesimal width, *dt*:

*dA*(*t*) = *v*(*t*) *dt*

Dividing the equation by *dt* gives us the derivative of *A*(*t*), and tells us that the rate at which the area accumulates is equal to the height of the curve, *v*(*t*), at time, *t*:

*dA*(*t*) / *dt* = *v*(*t*)

We can finally define the fundamental theorem of calculus.

We found that an area, *A*(*t*), swept under a function, *v*(*t*), can be defined by:

We have also found that the rate at which the area is being swept is equal to the original function, *v*(*t*):

*dA*(*t*) / *dt* = *v*(*t*)

This brings us to the first part of the fundamental theorem of calculus, which tells us that if *v*(*t*) is continuous on an interval, [*a*, *b*], and if it is also the derivative of *A*(*t*), then *A*(*t*) is the *antiderivative* of *v*(*t*):

*A’*(*t*) = *v*(*t*)

Or in simpler terms, integration is the reverse operation of differentiation. Hence, if we first had to integrate *v*(*t*) and then differentiate the result, we would get back the original function, *v*(*t*):

The second part of the theorem gives us a shortcut for computing the integral, without having to take the longer route of computing the limit of a Riemann sum.

It states that if the function, *v*(*t*), is continuous on an interval, [*a*, *b*], then:

Here, *F*(*t*) is any antiderivative of *v*(*t*), and the integral is defined as the subtraction of the antiderivative evaluated at *a* and *b*.

Hence, the second part of the theorem computes the integral by subtracting the area under the curve between some starting point, *C*, and the lower limit, *a*, from the area between the same starting point, *C*, and the upper limit, *b*. This, effectively, calculates the area of interest between *a* and *b*.

Since the constant, *C*, defines the point on the *x*-axis at which the sweep starts, the simplest antiderivative to consider is the one with *C* = 0. Nonetheless, any antiderivative with any value of *C* can be used, which simply sets the starting point to a different position on the *x*-axis.

Consider the function, *v*(*t*) = *x*^{3}. By applying the power rule, we can easily find its derivative, *v’*(*t*) = 3x^{2}. The antiderivative of 3*x*^{2} is again *x*^{3} – we perform the reverse operation to obtain the original function.

Now suppose that we have a different function, *g*(*t*) = *x*^{3} + 2. Its derivative is also 3x^{2}, and so is the derivative of yet another function, *h*(*t*) = *x*^{3} – 5. Both of these functions (and other similar ones) have *x*^{3} as their antiderivative. Hence, we specify the family of all antiderivatives of 3x^{2} by the *indefinite* integral:

The indefinite integral does not define the limits between which the area under the curve is being calculated. The constant, *C*, is included to compensate for the lack of information about the limits, or the starting point of the sweep.

If we do have knowledge of the limits, then we can simply apply the second fundamental theorem of calculus to compute the *definite* integral:

We can simply set *C* to zero, because it will not change the result in this case.

We have considered the car’s velocity curve, *v*(*t*), as a familiar example to understand the relationship between integration and differentiation.

But you can use this adding-up-areas-of-rectangles scheme to add up tiny bits of anything — distance, volume, or energy, for example. In other words, the area under the curve doesn’t have to stand for an actual area.– Page 214, Calculus for Dummies, 2016.

One of the important steps of successfully applying machine learning techniques includes the choice of appropriate performance metrics. In deep learning, for instance, it is common practice to measure *precision* and *recall*.

Precision is the fraction of detections reported by the model that were correct, while recall is the fraction of true events that were detected.– Page 423, Deep Learning, 2017.

It is also common practice to, then, plot the precision and recall on a Precision-Recall (PR) curve, placing the recall on the *x*-axis and the precision on the *y*-axis. It would be desirable that a classifier is characterised by both high recall and high precision, meaning that the classifier can detect many of the true events correctly. Such a good classification performance would be characterised by a higher area under the PR curve.

You can probably already tell where this is going.

The area under the PR curve can, indeed, be calculated by applying integral calculus, permitting us to characterise the performance of the classifier.

This section provides more resources on the topic if you are looking to go deeper.

- Single and Multivariable Calculus, 2020.
- Calculus for Dummies, 2016.
- Infinite Powers, 2020.
- The Hitchhiker’s Guide to Calculus, 2019.
- Deep Learning, 2017.

In this tutorial, you discovered the relationship between differential and integral calculus.

Specifically, you learned:

- The concepts of differential and integral calculus are linked together by the fundamental theorem of calculus.
- By applying the fundamental theorem of calculus, we can compute the integral to find the area under a curve.
- In machine learning, the application of integral calculus can provide us with a metric to assess the performance of a classifier.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Differential and Integral Calculus – Differentiate with Respect to Anything appeared first on Machine Learning Mastery.

]]>The post A Gentle Introduction to Multivariate Calculus appeared first on Machine Learning Mastery.

]]>Multivariate calculus provides us with the tools to do so by extending the concepts that we find in calculus, such as the computation of the rate of change, to multiple variables. It plays an essential role in the process of training a neural network, where the gradient is used extensively to update the model parameters.

In this tutorial, you will discover a gentle introduction to multivariate calculus.

After completing this tutorial, you will know:

- A multivariate function depends on several input variables to produce an output.
- The gradient of a multivariate function is computed by finding the derivative of the function in different directions.
- Multivariate calculus is used extensively in neural networks to update the model parameters.

Let’s get started.

This tutorial is divided into three parts; they are:

- Re-Visiting the Concept of a Function
- Derivatives of Multi-Variate Functions
- Application of Multivariate Calculus in Machine Learning

We have already familiarised ourselves with the concept of a function, as a rule that defines the relationship between a dependent variable and an independent variable. We have seen that a function is often represented by *y* = *f*(*x*), where both the input (or the independent variable), *x*, and the output (or the dependent variable), *y*, are single real numbers.

Such a function that takes a single, independent variable and defines a one-to-one mapping between the input and output, is called a *univariate* function.

For example, let’s say that we are attempting to forecast the weather based on the temperature alone. In this case, the weather is the dependent variable that we are trying to forecast, which is a function of the temperature as the input variable. Such a problem can, therefore, be easily framed into a univariate function.

However, let’s say that we now want to base our weather forecast on the humidity level and the wind speed too, in addition to the temperature. We cannot do so by means of a univariate function, where the output depends solely on a single input.

Hence, we turn our attention to *multivariate* functions, so called because these functions can take several variables as input.

Formally, we can express a multivariate function as a mapping between several real input variables, *n*, to a real output:

For example, consider the following parabolic surface:

*f*(*x*, *y*) = *x*^{2}* + *2*y*^{2}

This is a multivariate function that takes two variables, *x* and *y*, as input, hence *n* = 2, to produce an output. We can visualise it by graphing its values for *x* and *y* between -1 and 1.

Similarly, we can have multivariate functions that take more variables as input. Visualising them, however, may be difficult due to the number of dimensions involved.

We can even generalize the concept of a function further by considering functions that map multiple inputs, *n*, to multiple outputs, *m*:

These functions are more often referred to as *vector-valued* functions.

Recall that calculus is concerned with the study of the rate of change. For some univariate function, *g*(*x*), this can be achieved by computing its derivative:

The generalization of the derivative to functions of several variables is the gradient.

– Page 146, Mathematics of Machine Learning, 2020.

The technique to finding the gradient of a function of several variables involves varying each one of the variables at a time, while keeping the others constant. In this manner, we would be taking the *partial derivative *of our multivariate function with respect to each variable, each time.

The gradient is then the collection of these partial derivatives.

– Page 146, Mathematics of Machine Learning, 2020.

In order to visualize this technique better, let’s start off by considering a simple univariate quadratic function of the form:

*g*(*x*) = *x*^{2}

Finding the derivative of this function at some point, *x*, requires the application of the equation for *g*’(*x*) that we have defined earlier. We can, alternatively, take a shortcut by using the power rule to find that:

*g’(x*) = 2*x*

Furthermore*,* if we had to imagine slicing open the parabolic surface considered earlier, with a plane passing through *y* = 0, we realise that the resulting cross-section of *f*(*x*, *y*) is the quadratic curve, *g*(*x*) = *x*^{2}. Hence, we can calculate the derivative (or the steepness, or *slope*) of the parabolic surface in the direction of *x*, by taking the derivative of *f*(*x*, *y*) but keeping *y* constant. We refer to this as the *partial* derivative of *f*(*x*, *y*) with respect to *x*, and denote it by *∂* to signify that there are more variables in addition to *x* but these are not being considered for the time being. Therefore, the partial derivative with respect to *x* of *f*(*x*, *y*) is:

We can similarly hold *x* constant (or, in other words, find the cross-section of the parabolic surface by slicing it with a plane passing through a constant value of *x*) to find the partial derivative of *f*(*x*, *y*) with respect to *y*, as follows:

What we have essentially done is that we have found the univariate derivative of *f*(*x*, *y*) in each of the *x* and *y* directions. Combining the two univariate derivatives as the final step, gives us the multivariate derivative (or the gradient):

The same technique remains valid for functions of higher dimensions.

Partial derivatives are used extensively in neural networks to update the model parameters (or weights).

We had seen that, in minimizing some error function, an optimization algorithm will seek to follow its gradient downhill. If this error function was univariate, and hence a function of a single independent weight, then optimizing it would simply involve computing its univariate derivative.

However, a neural network comprises many weights (each attributed to a different neuron) of which the error is a function. Hence, updating the weight values requires that the gradient of the error curve is calculated with respect to all of these weights.

This is where the application of multivariate calculus comes into play.

The gradient of the error curve is calculated by finding the partial derivative of the error with respect to each weight; or in other terms, finding the derivative of the error function by keeping all weights constant except the one under consideration. This allows each weight to be updated independently of the others, to reach the goal of finding an optimal set of weights.

This section provides more resources on the topic if you are looking to go deeper.

- Single and Multivariable Calculus, 2020.
- Mathematics for Machine Learning, 2020.
- Algorithms for Optimization, 2019.
- Deep Learning, 2019.

In this tutorial, you discovered a gentle introduction to multivariate calculus.

Specifically, you learned:

- A multivariate function depends on several input variables to produce an output.
- The gradient of a multivariate function is computed by finding the derivative of the function in different directions.
- Multivariate calculus is used extensively in neural networks to update the model parameters.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to Multivariate Calculus appeared first on Machine Learning Mastery.

]]>The post Applications of Derivatives appeared first on Machine Learning Mastery.

]]>It is an important concept that comes in extremely useful in many applications: in everyday life, the derivative can tell you at which speed you are driving, or help you predict fluctuations on the stock market; in machine learning, derivatives are important for function optimization.

This tutorial will explore different applications of derivatives, starting with the more familiar ones before moving to machine learning. We will be taking a closer look at what the derivatives tell us about the different functions we are studying.

In this tutorial, you will discover different applications of derivatives.

After completing this tutorial, you will know:

- The use of derivatives can be applied to real-life problems that we find around us.
- The use of derivatives is essential in machine learning, for function optimization.

Let’s get started.

This tutorial is divided into two parts; they are:

- Applications of Derivatives in Real-Life
- Applications of Derivatives in Optimization Algorithms

We have seen that derivatives model rates of change.

Derivatives answer questions like “How fast?” “How steep?” and “How sensitive?” These are all questions about rates of change in one form or another.

– Page 141, Infinite Powers, 2019.

This rate of change is denoted by, 𝛿*y* / 𝛿*x*, hence defining a change in the dependent variable, 𝛿*y*, with respect to a change in the independent variable, 𝛿*x*.

Let’s start off with one of the most familiar applications of derivatives that we can find around us.

Every time you get in your car, you witness differentiation.

– Page 178, Calculus for Dummies, 2016.

When we say that a car is moving at 100 kilometers an hour, we would have just stated its rate of change. The common term that we often use is *speed *or* velocity*, although it would be best that we first distinguish between the two.

In everyday life, we often use *speed* and *velocity* interchangeably if we are describing the rate of change of a moving object. However, this in not mathematically correct because speed is always positive, whereas velocity introduces a notion of direction and, hence, can exhibit both positive and negative values. Hence, in the ensuing explanation, we shall consider velocity as the more technical concept, defined as:

velocity = 𝛿*y* / 𝛿*t*

This means that velocity gives the change in the car’s position, 𝛿*y*, within an interval of time, 𝛿*t*. In other words, velocity is the *first derivative* of position with respect to time.

The car’s velocity can remain constant, such as if the car keeps on travelling at 100 kilometers an hour consistently, or it can also change as a function of time. In case of the latter, this means that the velocity function itself is changing as a function of time, or in simpler terms, the car can be said to be *accelerating*. Acceleration is defined as the first derivative of velocity, *v*, and the second derivative of position, *y*, with respect to time:

acceleration = 𝛿*v* / 𝛿*t = *𝛿^{2}*y* / 𝛿*t*^{2}

We can graph the position, velocity and acceleration curves to visualize them better. Suppose that the car’s position, as a function of time, is given by *y*(*t*) = *t*^{3} – 8*t*^{2} + 40*t*:

The graph indicates that the car’s position changes slowly at the beginning of the journey, slowing down slightly until around t = 2.7s, at which point its rate of change picks up and continues increasing until the end of the journey. This is depicted by the graph of the car’s velocity:

Notice that the car retains a positive velocity throughout the journey, and this is because it never changes direction. Hence, if we had to imagine ourselves sitting in this moving car, the speedometer would be showing us the values that we have just plotted on the velocity graph (since the velocity remains positive throughout, otherwise we would have to find the absolute value of the velocity to work out the speed). If we had to apply the power rule to *y*(*t*) to find its derivative, then we would find that the velocity is defined by the following function:

*v*(*t*) = *y*’(*t*) = 3*t*^{2} – 16*t* + 40

We can also plot the acceleration graph:

We find that the graph is now characterised by negative acceleration in the time interval, *t* = [0, 2.7) seconds. This is because acceleration is the derivative of velocity, and within this time interval the car’s velocity is decreasing. If we had to, again, apply the power rule to *v*(*t*) to find its derivative, then we would find that the acceleration is defined by the following function:

*a*(*t*) = *v*’(*t*) = 6*t* – 16

Putting all functions together, we have the following:

*y*(*t*) = *t*^{3} – 8*t*^{2} + 40*t*

*v*(*t*) = *y*’(*t*) = 3*t*^{2} – 16*t* + 40

*a*(*t*) = *v*’(*t*) = 6*t* – 16

If we substitute for *t* = 10s, we can use these three functions to find that by the end of the journey, the car has travelled 600m, its velocity is 180 m/s, and it is accelerating at 44 m/s^{2}. We can verify that all of these values tally with the graphs that we have just plotted.

We have framed this particular example within the context of finding a car’s velocity and acceleration. But there is a plethora of real-life phenomena that change with time (or variables other than time), which can be studied by applying the concept of derivatives as we have just done for this particular example. To name a few:

- Growth rate of a population (be it a collection of humans, or a colony of bacteria) over time, which can be used to predict changes in population size in the near future.
- Changes in temperature as a function of location, which can be used for weather forecasting.
- Fluctuations of the stock market over time, which can be used to predict future stock market behaviour.

Derivatives also provide salient information in solving optimization problems, as we shall be seeing next.

We had already seen that an optimization algorithm, such as gradient descent, seeks to reach the global minimum of an error (or cost) function by applying the use of derivatives.

Let’s take a closer look at what the derivatives tell us about the error function, by going through the same exercise as we have done for the car example.

For this purpose, let’s consider the following one-dimensional test function for function optimization:

*f*(*x*) = –*x* sin(*x*)

We can apply the product rule to *f*(*x*) to find its first derivative, denoted by *f*’(*x*), and then again apply the product rule to *f*’(*x*) to find the second derivative, denoted by *f*’’(*x*):

*f*’(*x*) = -sin(*x*) – *x *cos(*x*)

*f*’’(*x*) = *x* sin(*x*) – 2 cos(*x*)

We can plot these three functions for different values of *x* to visualize them:

Similar to what we have observed earlier for the car example, the graph of the first derivative indicates how *f*(*x*) is changing and by how much. For example, a positive derivative indicates that *f*(*x*) is an increasing function, whereas a negative derivative tells us that *f*(*x*) is now decreasing. Hence, if in its search for a function minimum, the optimization algorithm performs small changes to the input based on its learning rate, ε:

*x_new = x* – ε *f*’(*x*)

Then the algorithm can reduce *f*(*x*) by moving to the opposite direction (by inverting the sign) of the derivative.

We might also be interested in finding the second derivative of a function.

We can think of the second derivative as measuring curvature.

– Page 86, Deep Learning, 2017.

For example, if the algorithm arrives at a critical point at which the first derivative is zero, it cannot distinguish between this point being a local maximum, a local minimum, a saddle point or a flat region based on *f*’(*x*) alone. However, when the second derivative intervenes, the algorithm can tell that the critical point in question is a local minimum if the second derivative is greater than zero. For a local maximum, the second derivative is smaller than zero. Hence, the second derivative can inform the optimization algorithm on which direction to move. Unfortunately, this test remains inconclusive for saddle points and flat regions, for which the second derivative is zero in both cases.

Optimization algorithms based on gradient descent do not make use of second order derivatives and are, therefore, known as *first-order optimization algorithms*. Optimization algorithms, such as Newton’s method, that exploit the use of second derivatives, are otherwise called *second-order optimization algorithms*.

**Further Reading**

This section provides more resources on the topic if you are looking to go deeper.

**Books**

- Calculus for Dummies, 2016.
- Infinite Powers, 2020.
- Deep Learning, 2017.
- Algorithms for Optimization, 2019.

**Summary**

In this tutorial, you discovered different applications of derivatives.

Specifically, you learned:

- The use of derivatives can be applied to real-life problems that we find around us.
- The use of derivatives is essential in machine learning, for function optimization.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Applications of Derivatives appeared first on Machine Learning Mastery.

]]>The post A Gentle Introduction to Continuous Functions appeared first on Machine Learning Mastery.

]]>In this tutorial, you will discover what continuous functions are, their properties, and two important theorems in the study of optimization algorithms, i.e., intermediate value theorem and extreme value theorem.

After completing this tutorial, you will know:

- Definition of continuous functions
- Intermediate value theorem
- Extreme value theorem

Let’s get started.

This tutorial is divided into 2 parts; they are:

- Definition of continuous functions
- Informal definition
- Formal definition

- Theorems
- Intermediate value theorem
- Extreme value theorem

This tutorial requires an understanding of the concept of limits. To refresh your memory, you can take a look at limits and continuity, where continuous functions are also briefly defined. In this tutorial we’ll go into more details.

We’ll also make use of intervals. So square brackets mean closed intervals (include the boundary points) and parenthesis mean open intervals (do not include boundary points), for example,

- [a,b] means a<=x<=b
- (a,b) means a<x<b
- [a,b) means a<=x<b

From the above, you can note that an interval can be open on one side and closed on the other.

As a last point, we’ll only be discussing real functions defined over real numbers. We won’t be discussing complex numbers or functions defined on the complex plane.

Suppose we have a function f(x). We can easily check if it is continuous between two points a and b, if we can plot the graph of f(x) without lifting our hand. As an example, consider a straight line defined as:

f(x)=2x+1

We can draw the straight line between [0,1] without lifting our hand. In fact, we can draw this line between any two values of x and we won’t have to lift our hand (see figure below). Hence, this function is continuous over the entire domain of real numbers. Now let’s see what happens when we plot the ceil function:

The ceil function has a value of 1 on the interval (0,1], for example, ceil(0.5)= 1, ceil(0.7) = 1, and so on. As a result, the function is continuous over the domain (0,1]. If we adjust the interval to (0,2], ceil(x) jumps to 2 as soon as x>1. To plot ceil(x) for the domain (0,2], we must now lift our hand and start plotting again at x=2. As a result, the ceil function isn’t a continuous function.

If the function is continuous over the entire domain of real numbers, then it is a continuous function as a whole, otherwise, it is not continuous as whole. For the later type of functions, we can check over which interval they are continuous.

A function f(x) is continuous at a point a, if the function’s value approaches f(a) when x approaches a. Hence to test the continuity of a function at a point x=a, check the following:

- f(a) should exist
- f(x) has a limit as x approaches a
- The limit of f(x) as x->a is equal to f(a)

If all of the above hold true, then the function is continuous at the point a.

Some examples are listed below and also shown in the figure:

- f(x) = 1/x is not continuous as it is not defined at x=0. However, the function is continuous for the domain x>0.
- All polynomial functions are continuous functions.
- The trigonometric functions sin(x) and cos(x) are continuous and oscillate between the values -1 and 1.
- The trigonometric function tan(x) is not continuous as it is undefined at x=𝜋/2, x=-𝜋/2, etc.
- sqrt(x) is not continuous as it is not defined for x<0.
- |x| is continuous everywhere.

From the definition of continuity in terms of limits, we have an alternative definition. f(x) is continuous at x, if:

f(x+h)-f(x)→ 0 when (h→0)

Let’s look at the definition of a derivative:

f'(x) = lim(h→0) (f(x+h)-f(x))/h

Hence, if f'(x) exists at a point a, then the function is continuous at a. The converse is not always true. A function may be continuous at a point a, but f'(a) may not exist. For example, in the above graph |x| is continuous everywhere. We can draw it without lifting our hand, however, at x=0 its derivative does not exist because of the sharp turn in the curve.

The intermediate value theorem states that:

If:

- function f(x) is continuous on [a,b]
- and f(a) <= K <= f(b)

then:

- There is a point c between a and b, i.e., a<=c<=b such that f(c) = K

In very easy words, this theorem says that if a function is continuous over [a,b], then all values of the function between f(a) and f(b) will exist within this interval as shown in the figure below.

This theorem states that:

If:

- function f(x) is continuous on [a,b]

then:

- There are points x_min and x_max inside the interval [a,b], i.e.,
- a<=x_min<=b
- a<=x_max<=b

- and the function f(x) has a minimum value f(x_min), and a maximum value f(x_max), i.e.,
- f(x_min)<=f(x)<=f(x_max) when a<=x<=b

In simple words a continuous function always has a minimum and maximum value within an interval as shown in the above figure.

Continuous functions are very important in the study of optimization problems. We can see that the extreme value theorem guarantees that within an interval, there will always be a point where the function has a maximum value. The same can be said for a minimum value. Many optimization algorithms are derived from this fundamental property and can perform amazing tasks.

This section lists some ideas for extending the tutorial that you may wish to explore.

- Converging and diverging sequences
- Weierstrass and Jordan definitions of continuous functions based on infinitesimally small constants

If you explore any of these extensions, I’d love to know. Post your findings in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

- Additional resources on Calculus Books for Machine Learning

- Thomas’ Calculus, 14th edition, 2017. (based on the original works of George B. Thomas, revised by Joel Hass, Christopher Heil, Maurice Weir)
- Calculus, 3rd Edition, 2017. (Gilbert Strang)
- Calculus, 8th edition, 2015. (James Stewart)

In this tutorial, you discovered the concept of continuous functions.

Specifically, you learned:

- What are continuous functions
- The formal and informal definitions of continuous functions
- Points of discontinuity
- Intermediate value theorem
- Extreme value theorem
- Why continuous functions are important

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to Continuous Functions appeared first on Machine Learning Mastery.

]]>The post A Gentle Introduction to Indeterminate Forms and L’Hospital’s Rule appeared first on Machine Learning Mastery.

]]>In this tutorial, you will discover how to evaluate the limits of indeterminate forms and the L’Hospital’s rule for solving them.

After completing this tutorial, you will know:

- How to evaluate the limits of functions having indeterminate types of the form 0/0 and ∞/∞
- L’Hospital’s rule for evaluating indeterminate types
- How to convert more complex indeterminate types and apply L’Hospital’s rule to them

Let’s get started.

This tutorial is divided into 2 parts; they are:

- The indeterminate forms of type 0/0 and ∞/∞
- How to apply L’Hospital’s rule to these types
- Solved examples of these two indeterminate types

- More complex indeterminate types
- How to convert the more complex indeterminate types to 0/0 and ∞/∞ forms
- Solved examples of such types

This tutorial requires a basic understanding of the following two topics:

If you are not familiar with these topics, you can review them by clicking the above links.

When evaluating limits, we come across situations where the basic rules for evaluating limits might fail. For example, we can apply the quotient rule in case of rational functions:

lim(x→a) f(x)/g(x) = (lim(x→a)f(x))/(lim(x→a)g(x)) if lim(x→a)g(x)≠0

The above rule can only be applied if the expression in the denominator does not approach zero as x approaches a. A more complicated situation arises if both the numerator and denominator both approach zero as x approaches a. This is called an indeterminate form of type 0/0. Similarly, there are indeterminate forms of the type ∞/∞, given by:

lim(x→a) f(x)/g(x) = (lim(x→a)f(x))/(lim(x→a)g(x)) when lim(x→a)f(x)=∞ and lim(x→a)g(x)=∞

The L’Hospital rule states the following:

An important point to note is that L’Hospital’s rule is only applicable when the conditions for f(x) and g(x) are met. For example:

- lim(𝑥→0) sin(x)/(x+1) Cannot apply L’Hospital’s rule as it’s not 0/0 form
- lim(𝑥→0) sin(x)/x Can apply the rule as it’s 0/0 form
- lim(𝑥→∞) (e^x)/(1/x+1) Cannot apply L’Hospital’s rule as it’s not ∞/∞ form
- lim(𝑥→∞) (e^x)/x Can apply L’Hospital’s rule as it is ∞/∞ form

Some examples of these two types, and how to solve them are shown below. You can also refer to the figure below to refer to these functions.

Evaluate lim(𝑥→2) ln(x-1)/(x-2) (See the left graph in the figure)

Evaluate lim(𝑥→∞) ln(x)/x (See the right graph in the figure)

The L’Hospital rule only tells us how to deal with 0/0 or ∞/∞ forms. However, there are more indeterminate forms that involve products, differences, and powers. So how do we deal with the rest? We can use some clever tricks in mathematics to convert products, differences and powers into quotients. This can enable us to easily apply L’Hospital rule to almost all indeterminate forms. The table below shows various indeterminate forms and how to deal with them.

The following examples show how you can convert one indeterminate form to either 0/0 or ∞/∞ form and apply L’Hospital’s rule to solve the limit. After the worked out examples you can also look at the graphs of all the functions whose limits are calculated.

Evaluate lim(𝑥→∞) x.sin(1/x) (See the first graph in the figure)

Evaluate lim(𝑥→0) 1/(1-cos(x)) – 1/x (See the second graph in the figure below)

Evaluate lim(𝑥→∞) (1+x)^(1/x) (See the third graph in the figure below)

This section lists some ideas for extending the tutorial that you may wish to explore.

- Cauchy’s Mean Value Theorem
- Rolle’s theorem

If you explore any of these extensions, I’d love to know. Post your findings in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

- Additional resources on Calculus Books for Machine Learning

- Thomas’ Calculus, 14th edition, 2017. (based on the original works of George B. Thomas, revised by Joel Hass, Christopher Heil, Maurice Weir)
- Calculus, 3rd Edition, 2017. (Gilbert Strang)
- Calculus, 8th edition, 2015. (James Stewart)

In this tutorial, you discovered the concept of indeterminate forms and how to evaluate them.

Specifically, you learned:

- Indeterminate forms of type 0/0 and ∞/∞
- L’Hospital rule for evaluating types 0/0 and ∞/∞
- Indeterminate forms of type 0.∞, ∞-∞, and power forms, and how to evaluate them.

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to Indeterminate Forms and L’Hospital’s Rule appeared first on Machine Learning Mastery.

]]>The post The Power, Product and Quotient Rules appeared first on Machine Learning Mastery.

]]>This tutorial will continue exploring the different techniques by which we can find the derivatives of functions. In particular, we will be exploring the power, product and quotient rules, which we can use to arrive to the derivatives of functions faster than if we had to find every derivative from first principles. Hence, for functions that are especially challenging, keeping such rules at hand to find their derivatives will become increasingly important.

In this tutorial, you will discover the power, product and quotient rules to find the derivative of functions.

After completing this tutorial, you will know:

- The power rule to follow when finding the derivative of a variable base, raised to a fixed power.
- How the product rule allows us to find the derivative of a function that is defined as the product of another two (or more) functions.
- How the quotient rule allows us to find the derivative of a function that is the ratio of two differentiable functions.

Let’s get started.

This tutorial is divided into three parts; they are:

- The Power Rule
- The Product Rule
- The Quotient Rule

If we have a variable base raised to a fixed power, the rule to follow in order to find its derivative is to bring down the power in front of the variable base, and then subtract the power by 1.

For example, if we have the function, *f*(*x*) = *x** ^{2}*, of which we would like to find the derivative, we first bring down 2 in front of

*f*(*x*) = *x*^{2}

*f*’(*x*) = 2*x*

For the purpose of understanding better where this rule comes from, let’s take the longer route and find the derivative of *f*(*x*) by starting from the definition of a derivative:

Here, we substitute for *f*(*x*) = *x** ^{2}* and then proceed to simplify the expression:

As *h* approaches a value of 0, then this limit approaches 2*x*, which tallies with the result that we have obtained earlier using the power rule.

If applied to *f*(*x*) = *x*, the power rule give us a value of 1. That is because, when we bring a value of 1 in front of *x*, and then subtract the power by 1, what we are left with is a value of 0 in the exponent. Since, *x*^{0} = 1, then *f*’(*x*) = (1) (*x*^{0})= 1.

The best way to understand this derivative is to realize that f(x) = x is a line that fits the form y = mx + b because f(x) = x is the same as f(x) = 1x + 0 (or y = 1x + 0). The slope (m) of this line is 1, so the derivative equals 1. Or you can just memorize that the derivative of x is 1. But if you forget both of these ideas, you can always use the power rule.Page 131, Calculus for Dummies, 2016.

The power rule can be applied to any power, be it positive, negative, or a fraction. We can also apply it to radical functions by first expressing their exponent (or power) as a fraction:

*f*(*x*) = √*x* = *x*^{1/2}

*f’*(*x*) = (1 / 2) *x*^{-1/2}

Suppose that we now have a function, *f*(*x*), of which we would like to find the derivative, which is the product of another two functions, *u*(*x*) = 2*x*^{2} and *v*(*x*) = *x*^{3}:

*f*(*x*) = *u*(*x*) *v*(*x*) = (2*x*^{2}) (*x*^{3})

In order to investigate how to go about finding the derivative of *f*(*x*), let’s first start with finding the derivative of the product of *u*(*x*) and *v*(*x*) directly:

(*u*(*x*) *v*(*x*))’ = ((2*x*^{2}) (*x*^{3}))’ = (2*x*^{5})’ = 10*x*^{4}

Now let’s investigate what happens if we, otherwise, had to compute the derivatives of the functions separately first and then multiply them afterwards:

*u’*(*x*) *v’*(*x*) = (2*x*^{2})’ (*x*^{3})’ = (4*x*) (3*x*^{2}) = 12*x*^{3}

It is clear that the second result does not tally with the first one, and that is because we have not applied the *product rule*.

The product rule tells us that the derivative of the product of two functions can be found as:

*f’*(*x*) = *u’*(*x*) *v*(*x*) + *u*(*x*) *v’*(*x*)

We can arrive at the product rule if we our work our way through by applying the properties of limits, starting again with the definition of a derivative:

We know that *f*(*x*) = *u*(*x*) *v*(*x*) and, hence, we can substitute for *f*(*x*) and *f*(*x* + *h*):

At this stage, our aim is to factorise the numerator into several limits that can, then, be evaluated separately. For this purpose, the subtraction of terms, *u*(*x*) *v*(*x + h*) – *u*(*x*) *v*(*x + h*), shall be introduced into the numerator. Its introduction does not change the definition of *f*’(*x*) that we have just obtained, but it will help us factorise the numerator:

The resulting expression appears complicated, however, if we take a closer look we realize that we have common terms that can be factored out:

The expression can be simplified further by applying the limit laws that let us separate the sums and products into separate limits:

The solution to our problem has now become clearer. We can see that the first and last terms in the simplified expression correspond to the definition of the derivative of *u*(*x*) and *v*(*x*), which we can denote by *u*(*x*)’ and *v*(*x*)’, respectively. The second term approaches the continuous and differentiable function, *v*(*x*), as *h* approaches 0, whereas the third term is *u*(*x*).

Hence, we arrive again at the product rule:

*f’*(*x*) = *u’*(*x*) *v*(*x*) + *u*(*x*) *v’*(*x*)

With this new tool in hand, let’s reconsider finding *f*’(*x*) when *u*(*x*) = 2*x*^{2} and *v*(*x*) = *x*^{3}:

*f’*(*x*) = *u’*(*x*) *v*(*x*) + *u*(*x*) *v’*(*x*)

*f’*(*x*) = (4*x*) (*x*^{3}) + (2*x*^{2}) (3*x*^{2}) = 4*x*^{4} + 6*x*^{4} = 10*x*^{4}

The resulting derivative now correctly matches the derivative of the product, (*u*(*x*) *v*(*x*))’, that we have obtained earlier.

This was a fairly simple example that we could have computed directly in the first place. However, we might have more complex problems involving functions that cannot be multiplied directly, to which we can easily apply the product rule. For example:

*f*(*x*) = *x*^{2} sin *x*

*f’*(*x*) = (*x*^{2})’ (sin *x*) + (*x*^{2}) (sin *x*)’* =* 2*x* sin *x* + *x*^{2} cos *x*

We can even extend the product rule to more than two functions. For example, say *f*(*x*) is now defined as the product of three functions, *u*(*x*), *v*(*x*) and *w*(*x*):

*f*(*x*) = *u*(*x*) *v*(*x*) *w*(*x*)

We can apply the product rule as follows:

*f*’(*x*) = *u*’(*x*) *v*(*x*) *w*(*x*) + *u*(*x*) *v*’(*x*) *w*(*x*) + *u*(*x*) *v*(*x*) *w*’(*x*)

Similarly, the quotient rule tells us how to find the derivative of a function, *f*(*x*), that is the ratio of two differentiable functions, *u*(*x*) and *v*(*x*):

We can derive the quotient rule from first principles as we have done for the product rule, that is by starting off with the definition of a derivative and applying the properties of limits. Or we can take a shortcut and derive the quotient rule using the product rule itself. Let’s take this route this time around:

We can apply the product rule on *u*(*x*) to obtain:

*u*’(*x*) = *f*’(*x*) *v*(*x*) + *f*(*x*) *v*’(*x*)

Solving back for *f*’(*x*) gives us:

One final step substitutes for *f*(*x*) to arrive to the quotient rule:

We had seen how to find the derivative of the sine and cosine functions. Using the quotient rule, we can now find the derivative of the tangent function too:

*f*(*x*) = tan *x* = sin *x* / cos *x*

Applying the quotient rule and simplifying the resulting expression:

From the Pythagorean identity in trigonometry, we know that cos^{2}*x* + sin^{2}*x* = 1, hence:

Therefore, using the quotient rule, we have easily found that the derivative of tangent is the squared secant function.

This section provides more resources on the topic if you are looking to go deeper.

- Calculus for Dummies, 2016.

In this tutorial, you discovered how to apply the power, product and quotient rules to find the derivative of functions.

Specifically, you learned:

- The power rule to follow when finding the derivative of a variable base, raised to a fixed power.
- How the product rule allows us to find the derivative of a function that is defined as the product of another two (or more) functions.
- How the quotient rule allows us to find the derivative of a function that is the ratio of two differentiable functions.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post The Power, Product and Quotient Rules appeared first on Machine Learning Mastery.

]]>The post Derivative of the Sine and Cosine appeared first on Machine Learning Mastery.

]]>Optimization algorithms rely on the use of derivatives in order to understand how to alter (increase or decrease) the input values to the objective function, in order to minimize or maximize it. It is, therefore, important that the objective function under consideration is *differentiable*.

The two fundamental trigonometric functions, the sine and cosine, offer a good opportunity to understand the manoeuvres that might be required in finding the derivatives of differentiable functions. These two functions become especially important if we think of them as the fundamental building blocks of more complex functions.

In this tutorial, you will discover how to find the derivative of the sine and cosine functions.

After completing this tutorial, you will know:

- How to find the derivative of the sine and cosine functions by applying several rules from algebra, trigonometry and limits.
- How to find the derivative of the sine and cosine functions in Python.

Let’s get started.

This tutorial is divided into three parts; they are:

- The Derivative of the Sine Function
- The Derivative of the Cosine Function
- Finding Derivatives in Python

The derivative *f’*(*x*) of some function, *f*, at a particular point, *x*, may be specified as:

We shall start by considering the sine function. Hence, let’s first substitute for *f*(*x*) = sin *x*:

If we have a look at the trigonometric identities, we find that we may apply the *addition formula* to expand the sin(*x* + *h*) term:

sin(*x* + *y*) = sin *x* cos *y* + cos *x* sin *y*

Indeed, by substituting *y* with *h* we can define the derivative of sin *x* as:

We may simplify the expression further by applying one of the limit laws, which states that the limit of a sum of functions is equal to the sum of their limits:

We may simplify even further by bringing out any common factor that is a function of *x*. In this manner, we can factorise the expression to obtain the sum of two separate limits that do not depend on *x*:

Solving each of these two limits will give us the derivative of sin *x*.

Let’s start by tackling the first limit.

Recall that we may represent angle, *h *in radians, on the unit circle. The sine of *h* would then be given by the perpendicular to the x-axis (*BC*), at the point that meets the unit circle:

We will be comparing the area of different sectors and triangles, with sides subtending the angle *h*, in an attempt to infer how ((sin *h*) / *h*) behaves as the value of *h* approaches zero. For this purpose, consider first the area of sector *OAB*:

The area of a sector can be defined in terms of the circle radius, *r*, and the length of the arc *AB*, *h*. Since the circle under consideration is the *unit* circle, then *r* = 1:

area_of_sector_OAB = *r h* / 2 = *h* / 2

We can compare the area of the sector *OAB* that we have just found, to the area of the *triangle OAB* within the same sector.

The area of this triangle is defined in terms of its height, *BC* = sin *h*, and the length of its base, *OA* = 1:

area_of_triangle_OAB = (*BC*) (*OA*) / 2 = (sin *h*) / 2

Since we can clearly see that the area of the triangle, *OAB*, that we have just considered is smaller that the area of the sector that it is contained within, then we may say that:

(sin *h)* / 2 < *h* / 2

(sin *h*) / *h* < 1

This is the first piece of information that we have obtained regarding the behaviour of ((sin *h*) */* *h*), which tells us that its upper limit value will not exceed 1. * *

Let us now proceed to consider a second triangle, *OAB’*, that is characterised by a larger area than that of sector, *OAB*. We can use this triangle to provide us with the second piece of information about the behaviour of ((sin *h*) */* *h*), which is its lower limit value:

Applying the properties of similar triangles to relate *OAB’* to *OCB*, gives us information regarding the length, *B’A*, that we need to compute the area of the triangle:

*B’A* / *OA* = *BC* / *OC* = (sin *h*) / (cos *h*)

Hence, the area of triangle *OAB’* may be computed as:

area_of_triangle_OAB’ = (*B’A*) (*OA*) / 2 = (sin *h*) / (2 cos *h*)

Comparing the area of triangle *OAB’* to that of sector *OAB*, we can see that the former is now larger:

*h* / 2 < (sin *h*) / (2 cos *h*)

cos *h* < (sin *h*) / *h*

This is the second piece of information that we needed, which tells us that the lower limit value of ((sin *h*) */* *h*) does not drop below cos *h*. We also know that as *h* approaches 0, the value of cos *h* approaches 1.

Hence, putting the two pieces of information together, we find that as *h* becomes smaller and smaller, the value of ((sin *h*) */* *h*) itself is *squeezed* to 1 by its lower and upper limits. This is, indeed, referred to as the *squeeze* or *sandwich* theorem.

Let’s now proceed to tackle the second limit.

By applying standard algebraic rules:

We can manipulate the second limit as follows:

We can then express this limit in terms of sine, by applying the Pythagorean identity from trigonometry, sin^{2}*h* = 1 – cos^{2}*h*:

Followed by the application of another limit law, which states that the limit of a product is equal to the product of the separate limits:

We have already tackled the first limit of this product, and we have found that this has a value of 1.

The second limit of this product is characterised by a cos *h* in the denominator, which approaches a value of 1 as *h* becomes smaller. Hence, the denominator of the second limit approaches a value of 2 as h approaches 0. The sine term in the numerator, on the other hand, attains a value of 0 as *h* approaches 0. This drives not only the second limit, but also the entire product limit to 0:

Putting everything together, we may finally arrive to the following conclusion:

sin’(*x*) = (1) (cos *x*) + (0) (sin *x*)

sin’(*x*) = cos *x*

This, finally, tells us that the derivative of sin *x* is simply cos *x*.

Similarly, we can calculate the derivative of the cosine function by re-using the knowledge that we have gained in finding the derivative of the sine function. Substituting for *f*(*x*) = cos *x*:

The *addition formula* is now applied to expand the cos(*x* + *h*) term as follows:

cos(*x* + *y*) = cos *x* cos *y* + sin *x* sin *y*

Which again leads to the summation of two limits:

We can quickly realise that we have already evaluated these two limits in the process of finding the derivative of sine; the first limit approaches 1, whereas the second limit approaches 0, as the value of *h* become smaller:

cos’(*x*) = (1) (-sin *x*) + (0) (cos *x*)

cos’(*x*) = -sin *x*

Which, ultimately, tells us that the derivative of cos *x* is conversely -sin *x*.

The importance of the derivatives that we have just found lies in their definition of the *rate of change* of the function under consideration, at some particular angle, *h*. For instance, if we had to recall the graph of the periodic sine function, we can observe that its first positive peak coincides with an angle of π / 2 radians.

We can use the derivative of the sine function in order to compute directly the rate of change, or slope, of the tangent line at this peak on the graph:

sin’(π / 2) = cos(π / 2) = 0

We find that this result corresponds well with the fact that the peak of the sine function is, indeed, a stationary point with zero rate of change.

A similar exercise can be easily carried out to compute the rate of change of the tangent line at different angles, for both the sine and cosine functions.

In this section, we shall be finding the derivatives of the sine and cosine functions in Python.

For this purpose, we will be making use of the SymPy library, which will let us deal with the computation of mathematical objects symbolically. This means that the SymPy library will let us define and manipulate the sine and cosine functions, with unevaluated variables, in symbolic form. We will be able to define a variable as symbol by making use of *symbols* in Python, whereas to take the derivatives we shall be using the *diff* function.

Before proceeding further, let us first load the required libraries.

from sympy import diff from sympy import sin from sympy import cos from sympy import symbols

We can now proceed to define a variable *x* in symbolic form, which means that we can work with *x* without having to assign it a value.

# define variable as symbol x = symbols('x')

Next, we can find the derivative of the sine and cosine function with respect to *x*, using the *diff* function.

# find the first derivative of sine and cosine with respect to x print('The first derivative of sine is:', diff(sin(x), x)) print('The first derivative of cosine is:', diff(cos(x), x))

We find that the *diff* function correctly returns *cos*(*x*) as the derivative of sine, and –*sin*(*x*) as the derivative of cosine.

The first derivative of sine is: cos(x) The first derivative of cosine is: -sin(x)

The *diff* function can take multiple derivatives too. For example, we can find the second derivative for both sine and cosine by passing *x* twice.

# find the second derivative of sine and cosine with respect to x print('The second derivative of sine is:', diff(sin(x), x, x)) print('The second derivative of cosine is:', diff(cos(x), x, x))

This means that, in finding the second derivative, we are taking the derivative of the derivative of each function. For example, to find the second derivative of the sine function, we take the derivative of *cos*(*x*), its first derivative. We can find the second derivative for the cosine function by similarly taking the derivative of –*sin*(*x*), its first derivative.

The second derivative of sine is: -sin(x) The second derivative of cosine is: -cos(x)

We can, alternatively, pass the number 2 to the diff function to indicate that we are interested in finding the second derivative.

# find the second derivative of sine and cosine with respect to x print('The second derivative of sine is:', diff(sin(x), x, 2)) print('The second derivative of cosine is:', diff(cos(x), x, 2))

Tying all of this together, the complete example of finding the derivative of the sine and cosine functions is listed below.

# finding the derivative of the sine and cosine functions from sympy import diff from sympy import sin from sympy import cos from sympy import symbols # define variable as symbol x = symbols('x') # find the first derivative of sine and cosine with respect to x print('The first derivative of sine is:', diff(sin(x), x)) print('The first derivative of cosine is:', diff(cos(x), x)) # find the second derivative of sine and cosine with respect to x print('\nThe second derivative of sine is:', diff(sin(x), x, x)) print('The second derivative of cosine is:', diff(cos(x), x, x)) # find the second derivative of sine and cosine with respect to x print('\nThe second derivative of sine is:', diff(sin(x), x, 2)) print('The second derivative of cosine is:', diff(cos(x), x, 2))

This section provides more resources on the topic if you are looking to go deeper.

In this tutorial, you discovered how to find the derivative of the sine and cosine functions.

Specifically, you learned:

- How to find the derivative of the sine and cosine functions by applying several rules from algebra, trigonometry and limits.
- How to find the derivative of the sine and cosine functions in Python.

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post Derivative of the Sine and Cosine appeared first on Machine Learning Mastery.

]]>The post A Gentle Introduction to Slopes and Tangents appeared first on Machine Learning Mastery.

]]>In this tutorial, you will discover what is the slope of a line and what is a tangent to a curve.

After completing this tutorial, you will know:

- The slope of a line
- The average rate of change of f(x) on an interval w.r.t. x
- The slope of a curve
- The tangent line to a curve at a point

Let’s get started.

This tutorial is divided into two parts; they are:

- The slope of a line and a curve
- The tangent line to a curve

Let’s start by reviewing the slope of a line. In calculus the slope of a line defines its steepness as a number. This number is calculated by dividing the change in the vertical direction to the change in the horizontal direction when moving from one point on the line to another. The figure shows how the slope can be calculated from two distinct points A and B on a line.

A straight line can be uniquely defined by two points on the line. The slope of a line is the same everywhere on the line; hence, any line can also be uniquely defined by the slope and one point on the line. From the known point we can move to any other point on the line according to the ratio defined by the slope of the line.

We can extend the idea of the slope of a line to the slope of a curve. Consider the left graph of the figure below. If we want to measure the ‘steepness’ of this curve, it is going to vary at different points on the curve. The average rate of change when moving from point A to point B is negative as the value of the function is decreasing when x is increasing. It is the same when moving from point B to point A. Hence, we can define it over the interval [x0,x1] as:

(y1-y0)/(x1-x0)

We can see that the above is also an expression for the slope of the secant line that includes the points A and B. To refresh your memory, a secant line intersects the curve at two points.

Similarly, the average rate of change between point C and point D is positive and it’s given by the slope of the secant line that includes these two points.

Let’s now look at the right graph of the above figure. What happens when we move point B towards point A? Let’s call the new point B’. When the point B’ is infinitesimally close to A, the secant line would turn into a line that touches the curve only once. Here the x coordinate of B’ is (x0+h), with h an infinitesimally small value. The corresponding value of the y-coordinate of the point B’ is the value of this function at (x0+h), i.e., f(x0+h).

The average rate of change over the interval [x0,x0+h] represents the rate of change over a very small interval of length h, where h approaches zero. This is called the slope of the curve at the point x0. Hence, at any point A(x0,f(x0)), the slope of the curve is defined as:

The expression of the slope of the curve at a point A is equivalent to the derivative of f(x) at the point x0. Hence, we can use the derivative to find the slope of the curve. You can review the concept of derivatives in this tutorial.

Here are a few examples of the slope of the curve.

- The slope of f(x) = 1/x at any point k (k≠0) is given by (-1/k^2). As an example:
- Slope of f(x) = 1/x at (x=2) is -1/4
- Slope of f(x) = 1/x at (x=-1) is -1

- The slope of f(x) = x^2 at any point k is given by (2k). For example:
- Slope of f(x) = x^2 at (x=0) is 0
- Slope of f(x) = x^2 at (x=1) is 2

- The slope of f(x) = 2x+1, is a constant value equal to 2. We can see that f(x) defines a straight line.
- The slope of f(x) = k, (where k is a constant) is zero as the function does not change anywhere. Hence its average rate of change at any point is zero.

It was mentioned earlier that any straight line can be uniquely defined by its slope and a point that passes through it. We also just defined the slope of a curve at a point A. Using these two facts, we’ll define the tangent to a curve f(x) at a point A(x0,f(x0)) as a line that satisfies two of the following:

- The line passes through A
- The slope of the line is equal to the slope of the curve at the point A

Using the above two facts, we can easily determine the equation of the tangent line at a point (x0,f(x0)). A few examples are shown next.

The graph of f(x) along with the tangent line at x=1 and x=-1 are shown in the figure. Below are the steps to determine the tangent line at x=1.

- Equation of a line with slope m and y-intercept c is given by: y=mx+c
- Slope of the line at any point is given by the function f'(x) = -1/x^2
- Slope of the tangent line to the curve at x=1 is -1, we get y=-x+c
- The tangent line passes through the point (1,1) and hence substituting in the above equation we get:
- 1 = -(1)+c ⟹ c = 2

- The final equation of the tangent line is y = -x+2

Shown below is the curve and the tangent lines at the points x=2, x=-2, x=0. At x=0, the tangent line is parallel to the x-axis as the slope of f(x) at x=0 is zero.

This is how we compute the equation of the tangent line at x=2:

- Equation of a line with slope m and y-intercept c is given by: y=mx+c
- Slope of the line at any point is given by the function f'(x) = 2x
- Slope of the tangent line to the curve at x=2 is 4, we get y=4x+c
- The tangent line passes through the point (2,4) and hence substituting in the above equation we get:
- 4 = 4(2)+c ⟹ c = -4

- The final equation of the tangent line is y = 4x-4

This function is shown below, along with its tangent lines at x=0, x=2 and x=-2. Below are the steps to derive an equation of the tangent line at x=0.

- Equation of a line with slope m and y-intercept c is given by: y=mx+c
- Slope of the line at any point is given by the function f'(x) = 3x^2+2
- Slope of the tangent line to the curve at x=0 is 2, we get y=2x+c
- The tangent line passes through the point (0,1) and hence substituting in the above equation we get:
- 1 = 2(0)+c ⟹ c = 1

- The final equation of the tangent line is y = 2x+1

Note that the curve has the same slope at both x=2 and x=-2, and hence the two tangent lines at x=2 and x=-2 are parallel. The same would be true for any x=k and x=-k as f'(x) = f'(-x) = 3x^2+2

This section lists some ideas for extending the tutorial that you may wish to explore.

- Velocity and acceleration
- Integration of a function

If you explore any of these extensions, I’d love to know. Post your findings in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

- Additional resources on Calculus Books for Machine Learning

- Calculus, 3rd Edition, 2017. (Gilbert Strang)
- Calculus, 8th edition, 2015. (James Stewart)

In this tutorial, you discovered the concept of the slope of a curve at a point and the tangent line to a curve at a point.

Specifically, you learned:

- What is the slope of a line
- What is the average rate of change of a curve over an interval w.r.t. x
- Slope of a curve at a point
- Tangent to a curve at a point

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to Slopes and Tangents appeared first on Machine Learning Mastery.

]]>The post A Gentle Introduction to Derivatives of Powers and Polynomials appeared first on Machine Learning Mastery.

]]>In this tutorial, you will discover how to compute the derivative of powers of x and polynomials.

After completing this tutorial, you will know:

- General rule for computing the derivative of polynomials
- General rule for finding the derivative of a function that involves any non-zero real powers of x

Let’s get started.

This tutorial is divided into two parts; they are:

- The derivative of a function that involve integer powers of x
- Differentiation of a function that has any real non-zero power of x

Let’s start by finding a simple rule that governs the sum of two functions. Suppose we have two functions f(x) and g(x), then the derivative of their sum can be found as follows. You can refer to the definition of the derivative, in case you need to review it.

Here we have a general rule that says that the derivative of the sum of two functions is the sum of the derivatives of the individual functions.

Before we talk about derivatives of integer powers of x, let’s review the Binomial theorem, which tells us how to expand the following expression (here C(n,k) is the choose function):

(a+b)^n = a^n + C(n,1)a^(n-1)b + C(n,2)a^(n-2)b^2 + … + C(n,n-1)ab^(n-1) + b^n

We’ll derive a simple rule for finding the derivative of a function that involves x^n, where n is an integer and n>0. Let’s go back to the definition of a derivative discussed in this tutorial and apply it to kx^n, where k is a constant.

Following are some examples of applying this rule:

- Derivative of x^2 is 2x
- Derivative of 3x^5 is 15x^4
- Derivative of 4x^9 is 36x^8

The two rules, i.e., the rule for the derivative of the sum of two functions, and the rule for the derivative of an integer power of x, enable us to differentiating a polynomial. If we have a polynomial of degree n, we can consider it as a sum of individual functions that involve different powers of x. Suppose we have a polynomial P(x) of degree n, then its derivative is given by P'(x) as:

This shows that the derivative of the polynomial of degree n, is in fact a polynomial of degree (n-1).

Some examples are shown below, where the polynomial function and its derivatives are all plotted together. The blue curve shows the function itself, while the red curve is the derivative of that function.

The rules derived above extend to non-integer real powers of x, which can be fractions, negative numbers or irrational numbers. The general rule is given below, where a and k can be any real numbers not equal to zero.

f(x) = kx^a

f'(x) = kax^(a-1)

A few examples are:

- Derivative of x^(0.2) is (0.2)x^(-0.8)
- Derivative of x^(𝜋) is 𝜋x^(𝜋-1)
- Derivative of x^(-3/4) is (-3/4)x^(-7/4)

Here are a few examples, which are plotted along with their derivatives. Again, the blue curve denotes the function itself, and the red curve denotes the corresponding derivative:

This section lists some ideas for extending the tutorial that you may wish to explore.

- Rules for derivatives of the product of two functions
- Rules for derivatives of rational functions
- Integration

If you explore any of these extensions, I’d love to know. Post your findings in the comments below.

This section provides more resources on the topic if you are looking to go deeper.

- Additional resources on Calculus Books for Machine Learning

- Calculus, 3rd Edition, 2017. (Gilbert Strang)
- Calculus, 8th edition, 2015. (James Stewart)

In this tutorial, you discovered how to differentiate a polynomial function and functions involving a sum of non-integer powers of x.

Specifically, you learned:

- Derivative of the sum of two functions
- Derivative of a constant multiplied by an integer power of x
- Derivative of a polynomial function
- Derivative of a sum of expressions involving non-integers powers of x

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction to Derivatives of Powers and Polynomials appeared first on Machine Learning Mastery.

]]>