Get fitted coefficient of linear regression equation

Question

Home

Get fitted coefficient of linear regression equation

0

scikit-learn

I have a dataset with predicted and observed data. The equation that predicts the data is given by: y = AfT $\sqrt{gh}$

With Af = constant (now at 1.35), T = wave period, g = gravitation 9.81, h = wave height.

Id like to use linear regression to find the best fitted coefficient (Af in the equation), so that the predicted value is closer to the observed data.

I now have Af = 1.35 (from suggestion in the literature) results in r^2 = 0.5676 Ideally, I`d use python to find the best fitted coefficient for my data.

import statsmodels.formula.api as smf from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris  X = np.array([11.52, 11.559, 12.31, 16.46, 11.84, 7.38, 9.99, 16.72, 11.617, 11.77, 6.48, 9.035, 12.87, 11.18, 6.75]) y = np.array([25.51658407, 24.61306145, 19.4007494, 24.85111923, 25.99397106, 14.30284824, 17.69451713, 27.37460301, 22.23326366, 18.44905152, 10.28001306, 10.68681843, 28.85399089, 14.02840557, 18.41941787]).reshape((-1, 1))  X, y = load_iris(return_X_y=True) clf = LogisticRegression(random_state=0).fit(X, y)  print(clf.coef_, clf.intercept_)

X = observed/measured values in the field, y = the predicted values of X using the equation

I have difficulties incorporating the actual equation and finding the best fit for Af.

Alexanderclaytonbarbara Asked on July 16, 2020 in Python.

Share
Comment(0)

Add Comment

1 Answer(s)

Votes
Oldest

0

Using the data you provided but taking the log you can get fit to an ordinary least squares model from scikit-learn:

import numpy as np import pymc3 as pm import pandas as pd import theano.tensor as tt from sklearn.linear_model import LinearRegression import statsmodels.formula.api as smf  X = np.log(np.array([11.52, 11.559, 12.31, 16.46, 11.84, 7.38, 9.99, 16.72, 11.617,                      11.77, 6.48, 9.035, 12.87, 11.18, 6.75])) y = np.log(np.array([25.51658407, 24.61306145, 19.4007494, 24.85111923,                       25.99397106, 14.30284824, 17.69451713, 27.37460301,                       22.23326366, 18.44905152, 10.28001306, 10.68681843,                       28.85399089, 14.02840557, 18.41941787]))  reg = LinearRegression().fit(X.reshape(-1, 1), y.reshape(-1, 1)) print('Af estimate: ', np.exp(reg.intercept_))

This gives the following estimate of Af

Af estimate:  [2.4844087]

Since you don’t seem to be interested in predicting new data using a model but you want the best estimate of the linear model parameters you could use statsmodels:

results = smf.ols('y ~ X + 1', data=pd.DataFrame({'y':y,'X':X})).fit() print('Statsmodels Af estimate: ', np.exp(results.params['X']))

which gives 2.366 quite close to the previous value. r^2 is similar to the one you quote.

Lastly my suggestion would be to use pymc3 and get a full bayesian fit that would allow you to naturally estimate the uncertainty of the quantity you want to measure. There is undoubtedly a bit a learning curve to pymc3 but it is great package for probabilistic programming. It allows to estimate the full posterior of your parameter space which is what most people are interested when fitting a model. An implementation for your problem could be the following:

with pm.Model() as model:      # Prior     alpha = pm.Normal('alpha', mu=1.35, sd=5) # centered around the literature value     beta = pm.HalfNormal('beta', sd=10) # only positive values as it goes into the sqrt. Also is height always positive here?     sigma = pm.HalfNormal("sigma", sd=1)      beta2 = pm.Deterministic('beta2', tt.sqrt(beta*9.81)) # g is very well known     alpha_f = pm.Deterministic('alpha_f', tt.exp(alpha)) # estimate directly the output value we want      # Likelihood     likelihood = pm.Normal('y', mu=alpha + beta2 * X,sigma=sigma,observed=y)      # Samplingtemp     trace = pm.sample(init='adapt_diag')  print(pm.summary(trace))

This gives the following output:

          mean     sd  hpd_3%  hpd_97%  ...  ess_sd  ess_bulk  ess_tail  r_hat alpha    0.781  0.544  -0.232    1.864  ...   309.0     440.0     406.0   1.01 beta     0.091  0.044   0.013    0.167  ...   517.0     438.0     359.0   1.01 sigma    0.259  0.056   0.172    0.368  ...   530.0     479.0     147.0   1.00 beta2    0.917  0.229   0.439    1.316  ...   434.0     438.0     359.0   1.01 alpha_f  2.535  1.552   0.465    5.224  ...   317.0     440.0     406.0   1.01

and you can see that there is a lot of uncertainty in Af.

However it is important to consider the data that goes in too and not overinterpret the results. At the moment you don’t provide any uncertainty in either y or X or the covariance matrix. It is however very unlikely that you know these quantities perfectly so the rational thing to do is including these uncertainties in your modelling. pymc3 allow for this to be implemented naturally. My implementation above includes an estimate of uncertainty from the data but you may well have your own uncertainty from the measurement device. See also this question and this paper.

Mohamedlucianofrankie Answered on July 16, 2020.

Share
Comment(0)

Add Comment

Your Answer

Answer 1

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 2

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 3

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 4

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 5

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 6

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 7

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 8

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

Answer 9

BuddyPress is a plugin for WordPress that enables you to create a social network or community website. It has all the...

Answer 10

I value you getting some margin to help me with this task. Without you, no part of this would have...

Answer 11

Try to define a Cohesive class, until and unless the methods are written relevant to the class and it defines...

Answer 12

Try to add exportAllData: true, as an other option, hope it helps :)

Answer 13

DataSet can read an XML, infer schema and create a tabular representation that's easy to manipulate: DataSet ip1 = new...

Answer 14

I created a class and used Xml Linq : using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Xml; using...

Answer 15

XDocument first = XDocument.Load(args[0]); XDocument second = XDocument.Load(args[1]); var result = new XElement( "ipaddresses", first.Root.Elements("ip") .Zip(second.Root.Elements("ip"), (f, s) => {...

Answer 16

Following your code for the header row, you could achieve this by an <xsl:apply-templates select="/report/order_actions/order_action[order_id = current()/order_id]" /> As well...

LATEST ANSWERS

Get fitted coefficient of linear regression equation

Your Answer

TOP USERS

HOT QUESTIONS

LATEST ANSWERS

Get fitted coefficient of linear regression equation

Your Answer

Tags Widget

TOP USERS

HOT QUESTIONS